Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read ion mobility from mzML and write to mzIdentML #32

Closed
chambm opened this issue May 23, 2018 · 13 comments

Comments

Projects
None yet
2 participants
@chambm
Copy link
Contributor

commented May 23, 2018

The PSI CV has been tweaked to allow ion mobility terms to be put in the mzIdentML at the SpectrumIdentificationResult the same way scan start time already could be:
https://sourceforge.net/p/psidev/mailman/message/36317835/

How hard would it be to get MS-GF+ to carry this attribute through to the output mzIdentML?

@FarmGeek4Life

This comment has been minimized.

Copy link
Collaborator

commented May 23, 2018

I'm assuming that this will only matter for mzML input? (I can't think of how it would be encoded in other supported spectrum input formats).
Are you needing all three CV terms that have the "is_a: MS:1002892 ! ion mobility attribute" relationship ('MS:1001581 FAIMS compensation voltage', 'MS:1002476 ion mobility drift time' and 'MS:1002815 inverse reduced ion mobility'), or just the ion mobility drift time? I ask because it would be a bit easier to add just one, and because I don't know if the library MS-GF+ uses to read mzML has the relationship mappings that would let it just read all cvParams that can be PSM-level attributes; it currently reads data by referring to specific accession numbers.

The important classes here are:

  • edu.ucsd.msjava.msutil.Spectrum for storing the additional information
  • edu.ucsd.msjava.mzml.SpectrumConverter for reading the information from the mzML and putting it into a edu.ucsd.msjava.msutil.Spectrum object
  • edu.ucsd.msjava.mzid.MZIdentMLGen for adding the new information (when available) to the mzid output.

I think coding this could be done in less than an hour, testing the functionality is a different story.

@chambm

This comment has been minimized.

Copy link
Contributor Author

commented May 24, 2018

Mostly mzML for now but parsing it from MGF title is possibility as well (although getting the specific CV term and units would be tricky).

I think all 3 IMS types should be supported, yes. An amusing implementation would be just to take the whole cvParam element (with one of the 3 supported accessions) as a string and plop in the mzIdentML rather than trying to parse it into value, type, and units.

@FarmGeek4Life

This comment has been minimized.

Copy link
Collaborator

commented May 24, 2018

Well, I don't think the mzML parsing/mzid writing library used will let me just copy the whole string from one to the other, although I could possibly just store the cvParam object(s) to transfer them from one to the other; parsing the value, type, and units isn't hard due to that library.

@chambm

This comment has been minimized.

Copy link
Contributor Author

commented May 24, 2018

Indeed. And the jmzml and jmzidml models use distinct cvParams classes so you can't plop one into the other. So it seems 2 values will have to be carried through: the accession and the value (the unit is implied by the accession, i.e. the mobility value type).

Let me know if you want me to test it. Thanks!

@FarmGeek4Life

This comment has been minimized.

Copy link
Collaborator

commented Jul 9, 2018

I just made a commit that should provide this functionality. Do you want me to provide a binary, or are you okay with checking out and compiling the current master branch?

@chambm

This comment has been minimized.

Copy link
Contributor Author

commented Jul 9, 2018

I don't think I ever set up the build environment for this, so a binary would be nice (or I can wait for the next release).

@FarmGeek4Life

This comment has been minimized.

@chambm

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2018

Didn't seem to work for a Waters HDDDA. Here's the input and output and the command I used.
HDDDA.zip

I just used a random FASTA I had around. Just needed to see one SpectrumIdentificationResult and that's all I got. Might need to be a pretty large FASTA to get a random hit. I'm not sure what species this sample is, or if it's even peptides. :)

The first test I did was a legit search with TIMS PASEF data, but that failed at the end (see my PR to fix that).

@FarmGeek4Life

This comment has been minimized.

Copy link
Collaborator

commented Jul 12, 2018

https://github.com/MSGFPlus/msgfplus/releases/tag/IMS_CV_Preview2

That HDDDA mzML file says that all spectra are profile, which MS-GF+ skips. However it does have an internal evaluation that might be saying that the spectra are centroided (it looks for a median difference of >=50 PPM between m/zs of consecutive peaks in the spectra), if you didn't see an error saying that it "skipped spectrum x since it is not centroided".

Overall, if you're able to get meaningful information out of a MS-GF+ search on IMS/TIMS data, that would be great since MS-GF+ was never designed to work on such data (and if it does work reasonably well, then we might need to introduce some new scoring models to properly accommodate it).

@chambm

This comment has been minimized.

Copy link
Contributor Author

commented Jul 13, 2018

I'm still not seeing the CV term carried through. It says the build is from 6-28. Are you giving me the right binary?

I know the data is ridiculous:

I'm not sure what species this sample is, or if it's even peptides.

All I wanted was a single SIR to test whether the cvParam is getting carried through. It doesn't need to be a legit result. Ironically, when I ran the default CWT peak picker on this data so that they really were centroided, I got NO results. Only when I turn the SNR down to 0 then it gets better. The ion mobility spectra are very sparse even in profile mode so that's probably why.

@FarmGeek4Life

This comment has been minimized.

Copy link
Collaborator

commented Jul 13, 2018

Well, the date that MS-GF+ outputs is only manually updated, and I haven't updated it yet. I will have to try that file with some fasta file here, while using the debugger, to figure out exactly what's going on.

@FarmGeek4Life

This comment has been minimized.

Copy link
Collaborator

commented Jul 17, 2018

Well, found the main bug, which also affects other searches: MS-GF+ originally only checked the scanList in mzML spectra for the "[Thermo Trailer Extra]Monoisotopic M/Z:" userParam, so there was a check to only enter an if statement if there was at least one userParam in scanList:scan[0]. This bug also meant that the scan start time would not be output for a search on data from, say, an Agilent QTOF.

https://github.com/MSGFPlus/msgfplus/releases/tag/v2018.07.17 fixes it, I did see the desired cvParam in the single search result I got (searching against a human refseq fasta file I had on hand).

@chambm

This comment has been minimized.

Copy link
Contributor Author

commented Jul 17, 2018

Excellent. Bruker TIMS results now have both ion mobility and scan time. I hadn't realized they were missing scan time previously. Thanks!

@chambm chambm closed this Jul 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.