-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native scan identification and ion mobility spectra #1
Comments
Hi Matt, thanks for these thoughtful comments.
|
|
|
|
Scan numbers should absolutely not be used to identify WIFF spectra. There's simply no reliable way to get back from a scan number to the <sample, period, cycle, experiment> tuple that is necessary to actually pinpoint a spectrum in a WIFF file. Limiting the id to a single number makes the "universal" modifier rather inaccurate. :) The same goes for Waters spectra, where function and scan are orthogonal and both are needed to pinpoint a spectrum in the .raw data.
Index is also unsuitable for maintaining a link back to the native spectrum, especially for multi-dimension formats (WIFF and Waters .raw). Because the enumeration order of the dimensions is not guaranteed nor is there any clarity that the indexes used for any format are based on a completely unfiltered enumeration of data. In other words, someone generating USIs from a DDA mzML that has been filtered to only MS1s will get different indices than someone looking them up in an unfiltered file. It's simply not worth the potential for confusion!
We already solved this problem a decade ago with mzML and nativeIDs. Since they can be a bit verbose in a USI which is already quite long, I suggest we use an abbreviated format. Instead of "controllerType=0 controllerNumber=1 scan=123" we can put "MS:1000768:0.1.123" which is the combination of the Thermo nativeID accession and the abbreviated nativeID. Likewise:
MS:1000770:1.1.123.2
MS:1000769:1.0.123
MS:1000772:123
MS:1000773:_x0031_00_x0020_fmol_x0020_BSA_x002f_0_B1_x002f_1_x002f_1SRef_x002f_fid
(this is an encoded version of100 fmol BSA/0_B1/1/1SRef/fid
because IDREF is the datatype)MS:1000774:123
MS:100776:123
The WIFF nativeID also solves another problem described here: the sample index in the WIFF file which can contain multiple samples which are NOT necessarily named uniquely. For a WIFF file, the "run name" part of the USI should refer ONLY to the WIFF filename, not the sample name.
However, there is an unresolved discussion about nativeIDs in the soon-to-be-recommended 3-array representation for ion mobility spectra in mzML. That discussion should apply to USIs as well, probably even more urgently because USIs may be paired with a spectrum interpretation. A single 3-array diaPASEF (or Agilent/Waters full IM frame) spectrum may correspond with multiple peptides. When the peptides are separated in the IM dimension, then creating a combined spectrum actually combines evidence that could otherwise be kept separate and combined for each peptide individually (using a unique range of mobility scans).
For example, let's say there is a Waters IM frame, which has 200 mobility scans (they all have the same retention time but cover a range of drift times). One peptide at drift time 5ms is supported by scans 50-60, and another peptide at drift time 10ms is supported by scans 120-130. If the combined spectrum was the entire frame of 200 scans (as @edeutsch suggested in email), then that evidence would all be combined in the same spectrum, and USIs to the spectrum would be ambiguous (kind of like a chimeric spectrum). When reading/converting the raw data, there's no interpretation of course, so a reader/converter can't know that the spectra should be separated by drift time. I was going to suggest that the raw spectra be given the full range of drift scans explicitly, like
frame=123 scanStart=1 scanEnd=200
and the interpreting software can make a USI with a subset of the start/end range to refer to a specific subset of mobility scans. But I feel that's too complex if accessing the full combined spectrum in mzML. I think it makes more sense to make sure the USIs for ion mobility identifications include the IM window so reader code can do its own filtering (similar to using the peptide sequence to infer the precursor and product m/zs). The same logic would apply for diaPASEF, but not ddaPASEF. The latter can be easily separated into combined spectra with just the subset of the mobility range relevant to a specific precursor (e.g.frame=123 scanStart=456 scanEnd=567
for precursor 678.9). It's worth noting that ddaPASEF spectra are usually further merged (between frames) for searching purposes, and I think representing that is outside the scope of nativeIds. So those spectra, if searched, could only be tracked back to the mzML or MGF file (amerged=123
spectrum).The text was updated successfully, but these errors were encountered: