-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change PSM ID with identifier in peak file #105
Conversation
Codecov Report
@@ Coverage Diff @@
## main #105 +/- ##
==========================================
- Coverage 79.31% 78.60% -0.71%
==========================================
Files 10 10
Lines 788 790 +2
==========================================
- Hits 625 621 -4
- Misses 163 169 +6
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me - we just need to figure out why tests failed on GitHub.
The first line in the Pytest error seems to indicate that there are no PSMs written in the output file:
|
Woops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test to verify that spectrum identifiers from mzML files are correct as well. Unlike MGF files, mzML reports scan numbers rather than indexes. I don't know a library that can simulate mzXML files, so we don't explicitly test that behavior. But mzXML is quite similar to mzML, so I think our tests are sufficient.
The added behavior works as intended, and unit and integration tests are all successful locally. The failing test is because the generated mzTab indeed doesn't contain PSMs. This is not related to changes in this PR, but rather because of issues with multiprocessing (again). Running with a GPU works, but CPU-only (as the tests are run using GitHub Actions) seems problematic. This is not a new issue, our previous tests just didn't capture it.
Changing the strategy to None
and the number of devices to 1
actually makes the unit tests run correctly here. But that removes all multiprocessing as well of course. I suggest that we merge this PR, rather than keep it dangling for an indefinite amount of time, and address the CPU-only multiprocessing in a follow-up PR.
What do you think @wfondrie?
Although my approval might have been a bit hasty. We still need to add an extra test: predicting from multiple files. In that case, the different files should be reported as |
I agree with @bittremieux! |
Ok, I fixed the mapping from PSMs to input files. Can you do a new review before merging @wfondrie? |
I reviewed Wout's changes and tests pass locally on a GPU machine - good to merge! |
Fixes #70.
Addressing issue, we change PSM ID in the output mztab files, which only denotes the order in output file, to PSI standard identifier used in the peak file.