Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing comet pep.xml with sparse MGF spectrum file #11

Closed
jj-umn opened this issue Apr 14, 2016 · 1 comment
Closed

Parsing comet pep.xml with sparse MGF spectrum file #11

jj-umn opened this issue Apr 14, 2016 · 1 comment
Assignees

Comments

@jj-umn
Copy link
Contributor

jj-umn commented Apr 14, 2016

I am trying to help a user with a PeptideShakerCLI run with MGF inputs that are sparse, i.e the scan number in the TITLE does represent the ordinal location in the file.

For example here is an spectrum_query element from the comet pep.xml

<spectrum_query spectrum="Mascot formatted MGF of data 10.74772.74772.3" spectrumNativeID="_x0032_0160111_ERLIC_MCF7_ingel_digest_band110kb_replicate1_017.74772.74772.3" start_scan="74772" end_scan="74772" precursor_neutral_mass="2077.794029" assumed_charge="3" index="66920" retention_time_sec="0.0">

The correspond MGF only contains 74771 spectra, scan "74772" should have zero-based index 68034:

grep '^TITLE' "searchgui_input/data/Mascot formatted MGF of data 10.mgf" | grep -n '^TITLE' | grep 74772
68035:TITLE=_x0032_0160111_ERLIC_MCF7_ingel_digest_band110kb_replicate1_017.74772.74772.3

java.lang.IndexOutOfBoundsException: Index: 74771, Size: 74771
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at com.compomics.util.experiment.io.massspectrometry.MgfIndex.getSpectrumTitle(MgfIndex.java:239)
at com.compomics.util.experiment.massspectrometry.SpectrumFactory.getSpectrumTitle(SpectrumFactory.java:992)
at eu.isas.peptideshaker.fileimport.FileImporter$IdProcessorFromFile.importSpectrum(FileImporter.java:999)
at eu.isas.peptideshaker.fileimport.FileImporter$IdProcessorFromFile.importPsms(FileImporter.java:727)
at eu.isas.peptideshaker.fileimport.FileImporter$IdProcessorFromFile.importFiles(FileImporter.java:482)
at eu.isas.peptideshaker.fileimport.FileImporter.importFiles(FileImporter.java:158)
at eu.isas.peptideshaker.PeptideShaker.importFiles(PeptideShaker.java:232)
at eu.isas.peptideshaker.cmd.PeptideShakerCLI.createProject(PeptideShakerCLI.java:696)
at eu.isas.peptideshaker.cmd.PeptideShakerCLI.call(PeptideShakerCLI.java:205)
at eu.isas.peptideshaker.cmd.PeptideShakerCLI.main(PeptideShakerCLI.java:908)

Comet is putting the actual TITLE from the MGF into attribute named "spectrumNativeID".
so I tried to use that to correct the scanNumber with code below, but failed since spectrumFactory.fileLoaded(inputFileName) is false at that point

Any suggestion for better handling this?

diff --git a/src/main/java/com/compomics/util/experiment/io/identifications/idfilereaders/PepxmlIdfileReader.java b/src/main/java/com/compomics/util/experiment/io/identifications/idfilereaders/PepxmlIdfileReader.java
index aa49e30..862f284 100644
--- a/src/main/java/com/compomics/util/experiment/io/identifications/idfilereaders/PepxmlIdfileReader.java
+++ b/src/main/java/com/compomics/util/experiment/io/identifications/idfilereaders/PepxmlIdfileReader.java
@@ -475,6 +475,7 @@ public class PepxmlIdfileReader implements IdfileReader {

     Integer scanNumber = null;
     String spectrumId = null;
  •    String spectrumNativeId = null;^M
    
     for (int i = 0; i < parser.getAttributeCount(); i++) {
         String name = parser.getAttributeName(i);
    

    @@ -487,15 +488,23 @@ public class PepxmlIdfileReader implements IdfileReader {
    } catch (Exception e) {
    throw new IllegalArgumentException("An error occurred while parsing start_scan " + value + ". Integer expected.");
    }

  •        } else if (name.equals("spectrumNativeId")) {^M
    
  •            spectrumNativeId = parser.getAttributeValue(i);    ^M
         }
     }
    
  •    ^M
     if (scanNumber == null) {
         throw new IllegalArgumentException("No start scan found for spectrum " + spectrumId + ".");
     }
    
     String spectrumTitle = scanNumber + "";
     if (spectrumFactory.fileLoaded(inputFileName)) {
    
  •        if (spectrumNativeId != null) {^M
    
  •            Integer spectrumNum = spectrumFactory.getSpectrumIndex(spectrumNativeId, inputFileName);^M
    
  •            if (spectrumNum != null && spectrumNum >= 0) {^M
    
  •                scanNumber = spectrumFactory.getSpectrumIndex(spectrumTitle, inputFileName);^M
    
  •            }^M
    
  •        }^M
         spectrumTitle = spectrumFactory.getSpectrumTitle(inputFileName, scanNumber);
     }
    
@hbarsnes
Copy link
Member

Thanks for telling us about this. For follow-up of this issue please see compomics/peptide-shaker#157.

@hbarsnes hbarsnes self-assigned this Apr 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants