Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change PSM ID with identifier in peak file #105

Merged
merged 28 commits into from
Feb 3, 2023
Merged

Change PSM ID with identifier in peak file #105

merged 28 commits into from
Feb 3, 2023

Conversation

melihyilmaz
Copy link
Collaborator

@melihyilmaz melihyilmaz commented Dec 2, 2022

Fixes #70.

Addressing issue, we change PSM ID in the output mztab files, which only denotes the order in output file, to PSI standard identifier used in the peak file.

@codecov
Copy link

codecov bot commented Dec 3, 2022

Codecov Report

Merging #105 (58658a4) into main (b8815f7) will decrease coverage by 0.71%.
The diff coverage is 100.00%.

❗ Current head 58658a4 differs from pull request most recent head 8b3972e. Consider uploading reports for the commit 8b3972e to get more accurate results

@@            Coverage Diff             @@
##             main     #105      +/-   ##
==========================================
- Coverage   79.31%   78.60%   -0.71%     
==========================================
  Files          10       10              
  Lines         788      790       +2     
==========================================
- Hits          625      621       -4     
- Misses        163      169       +6     
Impacted Files Coverage Δ
casanovo/data/datasets.py 85.24% <100.00%> (+2.48%) ⬆️
casanovo/data/ms_io.py 96.29% <100.00%> (+0.14%) ⬆️
casanovo/denovo/model.py 79.13% <100.00%> (-0.58%) ⬇️
casanovo/utils.py 70.58% <0.00%> (-29.42%) ⬇️
casanovo/denovo/model_runner.py 48.59% <0.00%> (-0.94%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@wfondrie wfondrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me - we just need to figure out why tests failed on GitHub.

tests/test_integration.py Outdated Show resolved Hide resolved
@wfondrie
Copy link
Collaborator

wfondrie commented Dec 6, 2022

The first line in the Pytest error seems to indicate that there are no PSMs written in the output file:

self = Series([], Name: PSM_ID, dtype: object), key = 0

    def __getitem__(self, key):
        check_deprecated_indexers(key)
        key = com.apply_if_callable(key, self)
    
        if key is Ellipsis:
            return self
    
        key_is_scalar = is_scalar(key)
        if isinstance(key, (list, tuple)):
            key = unpack_1tuple(key)
    
        if is_integer(key) and self.index._should_fallback_to_positional:
>           return self._values[key]
E           IndexError: index 0 is out of bounds for axis 0 with size 0

/opt/hostedtoolcache/Python/3.10.8/x64/lib/python3.10/site-packages/pandas/core/series.py:978: IndexError

Copy link
Collaborator

@bittremieux bittremieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test to verify that spectrum identifiers from mzML files are correct as well. Unlike MGF files, mzML reports scan numbers rather than indexes. I don't know a library that can simulate mzXML files, so we don't explicitly test that behavior. But mzXML is quite similar to mzML, so I think our tests are sufficient.

The added behavior works as intended, and unit and integration tests are all successful locally. The failing test is because the generated mzTab indeed doesn't contain PSMs. This is not related to changes in this PR, but rather because of issues with multiprocessing (again). Running with a GPU works, but CPU-only (as the tests are run using GitHub Actions) seems problematic. This is not a new issue, our previous tests just didn't capture it.

Changing the strategy to None and the number of devices to 1 actually makes the unit tests run correctly here. But that removes all multiprocessing as well of course. I suggest that we merge this PR, rather than keep it dangling for an indefinite amount of time, and address the CPU-only multiprocessing in a follow-up PR.

What do you think @wfondrie?

casanovo/data/datasets.py Show resolved Hide resolved
@bittremieux
Copy link
Collaborator

Although my approval might have been a bit hasty. We still need to add an extra test: predicting from multiple files. In that case, the different files should be reported as ms_run[x]-location in the mzTab header, and the PSMs should refer to the correct ms_run index. in the spectra_ref column This behavior is not correct yet in the current code.

@wfondrie
Copy link
Collaborator

I agree with @bittremieux!

@bittremieux
Copy link
Collaborator

Ok, I fixed the mapping from PSMs to input files. Can you do a new review before merging @wfondrie?

@melihyilmaz
Copy link
Collaborator Author

I reviewed Wout's changes and tests pass locally on a GPU machine - good to merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Track input spectra origin
3 participants