Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oddities in output #6

Open
kevinkovalchik opened this issue Apr 22, 2020 · 5 comments
Open

oddities in output #6

kevinkovalchik opened this issue Apr 22, 2020 · 5 comments

Comments

@kevinkovalchik
Copy link

kevinkovalchik commented Apr 22, 2020

Hello,

Thanks for making this tool! I am finding it useful and am planning to use it in a large-scale reanalysis of published data to avoid difficulties with missing/incomplete information on acquisition parameters.

I noticed something that seems odd about the output and am wondering if you can help clarify it. Here are the details of an analysis of some data from a sciex triple tof:

Details:
  Charge 0
Spectra in same averagine bin as another: 1768
    ... and also within m/z tolerance: 1267
    ... and also within scan range: 557
    ... and also with sufficient in-common fragments: 189
  Charge 2
Spectra in same averagine bin as another: 19037
    ... and also within m/z tolerance: 13767
    ... and also within scan range: 11484
    ... and also with sufficient in-common fragments: 189
  Charge 3
Spectra in same averagine bin as another: 1912
    ... and also within m/z tolerance: 1528
    ... and also within scan range: 1232
    ... and also with sufficient in-common fragments: 189
  Charge 4
Spectra in same averagine bin as another: 489
    ... and also within m/z tolerance: 414
    ... and also within scan range: 350
    ... and also with sufficient in-common fragments: 189

All these numbers make sense to me except and also with sufficient in-common fragments:, which is exactly the same for each charge state. Is this expected?

Also, when I run the same file and specify --charges 2 then this is the output:

Details:
  Charge 2
Spectra in same averagine bin as another: 19037
    ... and also within m/z tolerance: 13767
    ... and also within scan range: 11484
    ... and also with sufficient in-common fragments: 170

The numbers match charge 2 from above except now sufficient in-common fragments is different. Is this expected?

Also, I'm aware that I'm seeing these detail reports because there are not enough paired spectra to do the analysis. But I would still like to understand the output here.

Best,
Kevin

@dhmay
Copy link
Owner

dhmay commented Apr 22, 2020

Looks like you found a bug in errorcalc.py, on line 254. As you noted, it appears to be giving you the same number of spectra that it's able to use for every charge. What it's actually reporting is the total number of usable spectra across all charges.

I believe I could fix the bug very easily by changing line 254 to report len(percharge_calculator.paired_fragment_peaks) instead of len(precursor_distances_ppm)

However, it's been a long time since I looked at this code, and I'm a little nervous about screwing it up. So, two options for you:

  1. As you noticed, if you restrict to a single charge, you'll get a different number than if you run all charges. That number is, in fact, correct for that charge. So, if you want those numbers, you can run them separately for each charge and sum them up.
  2. You could try implementing the fix I suggested above. If you do, please make a pull request!

I'll try to get around to fixing it, but verifying the fix would take me far longer than making it. If I made the fix on a branch, would you be willing check out the branch and verify it for me? If so, I'll update this issue when it's done on a branch.

@kevinkovalchik
Copy link
Author

Thanks for the quick response.
Hm... that might be the fix. I changed that line and here is the output:

2020-04-22 14:42:10,086 INFO: Need >= 200 peak pairs to fit mixed distribution. Got only 189.
Details:
  Charge 0
Spectra in same averagine bin as another: 1768
    ... and also within m/z tolerance: 1267
    ... and also within scan range: 557
    ... and also with sufficient in-common fragments: 20
  Charge 2
Spectra in same averagine bin as another: 19037
    ... and also within m/z tolerance: 13767
    ... and also within scan range: 11484
    ... and also with sufficient in-common fragments: 850
  Charge 3
Spectra in same averagine bin as another: 1912
    ... and also within m/z tolerance: 1528
    ... and also within scan range: 1232
    ... and also with sufficient in-common fragments: 85
  Charge 4
Spectra in same averagine bin as another: 489
    ... and also within m/z tolerance: 414
    ... and also within scan range: 350
    ... and also with sufficient in-common fragments: 10

which looks more reasonable. But the largest reported number there is 850 which is not the number of peak pairs, 189. Is that because 850 represents the total number of paired spectra, not the number of peak pairs?

@dhmay
Copy link
Owner

dhmay commented Apr 22, 2020

Ha, that's what I get for trying to barge back into code I haven't looked at in years. I gave you the wrong variable to plug in there. Try it with len(percharge_calculator.paired_precursor_mzs).

@kevinkovalchik
Copy link
Author

kevinkovalchik commented Apr 23, 2020 via email

@kevinkovalchik
Copy link
Author

Okay, this looks good now! Here is the output this time:

2020-04-23 09:04:26,381 INFO: Need >= 200 peak pairs to fit mixed distribution. Got only 189.
Details:
  Charge 0
Spectra in same averagine bin as another: 1768
    ... and also within m/z tolerance: 1267
    ... and also within scan range: 557
    ... and also with sufficient in-common fragments: 4
  Charge 2
Spectra in same averagine bin as another: 19037
    ... and also within m/z tolerance: 13767
    ... and also within scan range: 11484
    ... and also with sufficient in-common fragments: 170
  Charge 3
Spectra in same averagine bin as another: 1912
    ... and also within m/z tolerance: 1528
    ... and also within scan range: 1232
    ... and also with sufficient in-common fragments: 17
  Charge 4
Spectra in same averagine bin as another: 489
    ... and also within m/z tolerance: 414
    ... and also within scan range: 350
    ... and also with sufficient in-common fragments: 2

The numbers for charge 2, 3 and 4 add up to the reported number of peak pairs (189). Charge 0 doesn't seem to be contributing to the number of peak pairs. Are unknown charges not used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants