Change PSM ID with identifier in peak file #105

melihyilmaz · 2022-12-02T23:55:36Z

Fixes #70.

Addressing issue, we change PSM ID in the output mztab files, which only denotes the order in output file, to PSI standard identifier used in the peak file.

codecov · 2022-12-03T00:10:16Z

Codecov Report

Merging #105 (58658a4) into main (b8815f7) will decrease coverage by 0.71%.
The diff coverage is 100.00%.

❗ Current head 58658a4 differs from pull request most recent head 8b3972e. Consider uploading reports for the commit 8b3972e to get more accurate results

@@            Coverage Diff             @@
##             main     #105      +/-   ##
==========================================
- Coverage   79.31%   78.60%   -0.71%     
==========================================
  Files          10       10              
  Lines         788      790       +2     
==========================================
- Hits          625      621       -4     
- Misses        163      169       +6

Impacted Files	Coverage Δ
casanovo/data/datasets.py	`85.24% <100.00%> (+2.48%)`	⬆️
casanovo/data/ms_io.py	`96.29% <100.00%> (+0.14%)`	⬆️
casanovo/denovo/model.py	`79.13% <100.00%> (-0.58%)`	⬇️
casanovo/utils.py	`70.58% <0.00%> (-29.42%)`	⬇️
casanovo/denovo/model_runner.py	`48.59% <0.00%> (-0.94%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

wfondrie

These changes look good to me - we just need to figure out why tests failed on GitHub.

tests/test_integration.py

wfondrie · 2022-12-06T19:22:08Z

The first line in the Pytest error seems to indicate that there are no PSMs written in the output file:

self = Series([], Name: PSM_ID, dtype: object), key = 0

    def __getitem__(self, key):
        check_deprecated_indexers(key)
        key = com.apply_if_callable(key, self)
    
        if key is Ellipsis:
            return self
    
        key_is_scalar = is_scalar(key)
        if isinstance(key, (list, tuple)):
            key = unpack_1tuple(key)
    
        if is_integer(key) and self.index._should_fallback_to_positional:
>           return self._values[key]
E           IndexError: index 0 is out of bounds for axis 0 with size 0

/opt/hostedtoolcache/Python/3.10.8/x64/lib/python3.10/site-packages/pandas/core/series.py:978: IndexError

Woops.

bittremieux

I added a test to verify that spectrum identifiers from mzML files are correct as well. Unlike MGF files, mzML reports scan numbers rather than indexes. I don't know a library that can simulate mzXML files, so we don't explicitly test that behavior. But mzXML is quite similar to mzML, so I think our tests are sufficient.

The added behavior works as intended, and unit and integration tests are all successful locally. The failing test is because the generated mzTab indeed doesn't contain PSMs. This is not related to changes in this PR, but rather because of issues with multiprocessing (again). Running with a GPU works, but CPU-only (as the tests are run using GitHub Actions) seems problematic. This is not a new issue, our previous tests just didn't capture it.

Changing the strategy to None and the number of devices to 1 actually makes the unit tests run correctly here. But that removes all multiprocessing as well of course. I suggest that we merge this PR, rather than keep it dangling for an indefinite amount of time, and address the CPU-only multiprocessing in a follow-up PR.

What do you think @wfondrie?

casanovo/data/datasets.py

bittremieux · 2023-01-13T18:04:33Z

Although my approval might have been a bit hasty. We still need to add an extra test: predicting from multiple files. In that case, the different files should be reported as ms_run[x]-location in the mzTab header, and the PSMs should refer to the correct ms_run index. in the spectra_ref column This behavior is not correct yet in the current code.

wfondrie · 2023-01-13T20:53:28Z

I agree with @bittremieux!

bittremieux · 2023-01-16T14:42:10Z

Ok, I fixed the mapping from PSMs to input files. Can you do a new review before merging @wfondrie?

melihyilmaz · 2023-02-03T23:09:02Z

I reviewed Wout's changes and tests pass locally on a GPU machine - good to merge!

Change PSM id with identifier in peak file

e2dbbc3

melihyilmaz requested a review from wfondrie December 2, 2022 23:55

Add tensorboard to dependency requirements

4424edf

melihyilmaz added 2 commits December 2, 2022 16:22

Add unit test for psm id in output

029b9e1

Add test for PSM id in output mztab

9978f74

wfondrie reviewed Dec 6, 2022

View reviewed changes

tests/test_integration.py Outdated Show resolved Hide resolved

melihyilmaz and others added 15 commits December 6, 2022 11:31

Add newline

06a26b8

Minor refactoring

586319e

Whitespace fix

902ed40

Woops.

Debug file content

afad2d6

Why no PSMs? ☹

a6f16f3

Test indexed mzML by scan number

73ae6a5

Do we have spectra in the index?

37ee42a

Do we predict?

8538a68

Does prediction end?

a317705

Does it work if we only use a single worker?

ad73463

Revert debugging

58658a4

Remove debug logging

6c40713

Create full mzML file

7dcd7a3

Don't show NumPy RuntimeWarning

1cc8975

Add precursor activation

10ef0cb

bittremieux approved these changes Jan 13, 2023

View reviewed changes

casanovo/data/datasets.py Show resolved Hide resolved

Update changelog

cf4b794

bittremieux added 3 commits January 16, 2023 14:36

Write full peak filenames to mzTab

3bb39f7

Correctly match spectra to runs

30acb9a

Fix linting issues

4c34017

bittremieux added 2 commits January 16, 2023 15:37

Fix unit tests

da8aa3f

Update changelog

1325ce2

bittremieux requested a review from wfondrie January 16, 2023 14:42

melihyilmaz mentioned this pull request Jan 18, 2023

Casanovo predicts for invalid spectra #56

Open

melihyilmaz and others added 3 commits February 3, 2023 14:21

Update CHANGELOG.md

f032451

Merge branch 'main' into scan_id

eedff84

Fix lint

8b3972e

melihyilmaz merged commit 745c08b into main Feb 3, 2023

melihyilmaz deleted the scan_id branch February 3, 2023 23:09

melihyilmaz mentioned this pull request Feb 3, 2023

Questions about comparing de novo methods #133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change PSM ID with identifier in peak file #105

Change PSM ID with identifier in peak file #105

melihyilmaz commented Dec 2, 2022 •

edited by bittremieux

Loading

codecov bot commented Dec 3, 2022 •

edited

Loading

wfondrie left a comment

wfondrie commented Dec 6, 2022

bittremieux left a comment

bittremieux commented Jan 13, 2023

wfondrie commented Jan 13, 2023

bittremieux commented Jan 16, 2023

melihyilmaz commented Feb 3, 2023

Change PSM ID with identifier in peak file #105

Change PSM ID with identifier in peak file #105

Conversation

melihyilmaz commented Dec 2, 2022 • edited by bittremieux Loading

codecov bot commented Dec 3, 2022 • edited Loading

Codecov Report

wfondrie left a comment

Choose a reason for hiding this comment

wfondrie commented Dec 6, 2022

bittremieux left a comment

Choose a reason for hiding this comment

bittremieux commented Jan 13, 2023

wfondrie commented Jan 13, 2023

bittremieux commented Jan 16, 2023

melihyilmaz commented Feb 3, 2023

melihyilmaz commented Dec 2, 2022 •

edited by bittremieux

Loading

codecov bot commented Dec 3, 2022 •

edited

Loading