Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading Sage PSM files; various minor fixes #31

Merged
merged 24 commits into from
May 2, 2023
Merged

Conversation

ArthurDeclercq
Copy link
Contributor

@ArthurDeclercq ArthurDeclercq commented Apr 18, 2023

Added

  • Add reader for Sage PSM files.

Changed

  • psm: The default values of PSM.provenance_data, PSM.metadata and PSM.rescoring_features are now dict() instead of None.
  • io.mzid.MzidReader: Attempt to parse retention time or scan start time cvParams from both SpectrumIdentificationResult as SpectrumIdentificationItem levels. Note that according to the mzIdentML specification document (v1.1.1) neither cvParams are expected to be present at either level.
  • io.mzid.MzidReader: Prefer spectrum title cvParam over spectrumID attribute for PSM.spectrum_id as these titles always match to the peak list files. In this case, spectrumID is saved in metadata["mzid_spectrum_id"]. Fall back to spectrumID if spectrum title is absent.
  • io.mzid.MzidWriter: PSM.retention_time is now written as cvParam retention time instead of scan start time, and to the SpectrumIdentificationItem level instead of the SpectrumIdentificationResult level, as theoretically in psm_utils, multiple PSMs for the same spectrum can have different values for retention_time.
  • io.mzid.MzidWriter: Write PSM score as cvParam search engine specific score instead of userParam score.
  • io.percolator.PercolatorTabWriter: For PIN-style files: Use SpecId instead of PSMId and write PSMScore and ChargeN columns by default.

Fixed

  • peptidoform: ProForma mass modifications are now correctly parsed within the rename_modifications function.
  • io.maxquant.MSMSReader: Correctly parse empty Proteins column to None
  • io.mzid.MzidReader: Set PSM.retention_time to None instead of float('nan') if missing from the PSM file.
  • io.percolator.PercolatorTabReader: Correctly parse Percolator peptidoform notation if no leading or trailing amino acids are present (e.g. .ACDK. instead of K.ACDK.E).
  • io.percolator.PercolatorTabWriter: ScanNr is now correctly written as an integer counting from the first PSM in the file.
  • io.percolator.PercolatorTabWriter: If no protein information is present, write the peptidoform preceded by PEP_ to the Proteins column.

@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2023

Codecov Report

Patch coverage: 37.68% and project coverage change: -0.05 ⚠️

Comparison is base (638d19b) 41.52% compared to head (4dacc86) 41.48%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #31      +/-   ##
==========================================
- Coverage   41.52%   41.48%   -0.05%     
==========================================
  Files          18       19       +1     
  Lines        1416     1468      +52     
==========================================
+ Hits          588      609      +21     
- Misses        828      859      +31     
Impacted Files Coverage Δ
psm_utils/io/maxquant.py 84.61% <ø> (ø)
psm_utils/io/mzid.py 29.71% <0.00%> (-0.29%) ⬇️
psm_utils/io/percolator.py 28.82% <20.00%> (-0.35%) ⬇️
psm_utils/io/sage.py 41.02% <41.02%> (ø)
psm_utils/peptidoform.py 31.47% <60.00%> (+0.78%) ⬆️
psm_utils/io/__init__.py 37.03% <100.00%> (+0.78%) ⬆️
psm_utils/psm.py 85.71% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@RalfG RalfG changed the title Pin fix Add support for reading Sage PSM files; various minor fixes Apr 24, 2023
- Write PSM score as cvParam `search engine specific score` instead of userParam `score`.
- Write retention time to the spectrumIdentificationItem as cvParam `retention time` instead of at the Result level as `scan start time`.
- Update documentation notes
@lazear
Copy link
Contributor

lazear commented May 1, 2023

I just opened a fork to do this, should've checked here first! Thanks for implementing this

@RalfG
Copy link
Member

RalfG commented May 2, 2023

@lazear, great to hear that you wanted to add Sage support! Feel free to take a look at the implementation and give feedback where needed.
One thing I noticed was that the Sage documentation lists the spectrum_q column, while the actual output file seems to contain a spectrum_fdr column instead.

@RalfG RalfG added this to the 0.4.0 milestone May 2, 2023
@RalfG RalfG added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels May 2, 2023
@RalfG RalfG self-assigned this May 2, 2023
@RalfG RalfG merged commit d204fe6 into main May 2, 2023
@RalfG RalfG deleted the pin_fix branch May 2, 2023 15:35
@lazear
Copy link
Contributor

lazear commented May 2, 2023

@lazear, great to hear that you wanted to add Sage support! Feel free to take a look at the implementation and give feedback where needed. One thing I noticed was that the Sage documentation lists the spectrum_q column, while the actual output file seems to contain a spectrum_fdr column instead.

Thanks for the catch - this is what I get for using ChatGPT to write docs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants