-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.lang.NullPointerException when running MS-GF+ #13
Comments
That's useful information that you provided, but it's not enough for us to solve the problem. It may be related to the protein names or protein sequences in the FASTA file, but without the actual files, we won't be able to diagnose. Please send SearchGUI-3.2.18/resources/MS-GF+/params/Mods.txt along with a portion of the .mgf file (e.g. a sampling of 25 spectra from the middle of the scan range) to proteomics@pnnl.gov |
Also, please provide us info on where you obtained the TREMBL nr_fungal FASTA file. It would also be helpful if you sent us a portion of your FASTA file, including both the normal proteins and the decoy proteins that you added. This will let us see the format you're using for protein names, descriptions, and sequences. I'm going to guess you're using uniprot_trembl_fungi.dat.gz from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/ but please confirm. |
If you're using the full-size TREMBL nr_fungal FASTA file, I'm frankly surprised that MSGF+ is not running out of memory. We have found that when FASTA files get larger than ~800 MB, we get memory usage issues (in that the system requires 16 GB of memory or more, scaling with FASTA file size). In cases like that we split the FASTA file into multiple parts, run MSGF+ once on each FASTA file part, then merge the results together. The May 2017 release of uniprot_trembl_fungi.dat has 6.6 million proteins, giving a 4 GB FASTA file. The decoy version of that is 8 GB. I see you're allocating 50 GB via |
I have got a similar error when running an mzML file, which has undergone PeakPicking on MS2 level with the OpenMS tool PeakPickerHiRes. When I instead use the vendor peak picking provided by MSConvert, MSGF runs without any error. You can find the database, the original file and the vendor-peak-picked file here. I uploaded the PeakPickerHiRes output to Dropbox. The command I ran: The error I got:
|
@Stortebecker maybe this is related to OpenMS/OpenMS#3082 @alchemistmatt is it possible that MSGF+ relies on optional elements in the mzML file? |
@Stortebecker That file has no charge state information for the precursors, which is what MS-GF+ is trying to read when it crashes. PeakPickerHiRes does not report the charge states, but as of 2014 there was work in progress to implement charge state determination/deconvolution algorithms as options in OpenMS, according to OpenMS issue #877. @RiegardtJohnson: This is a problem with the implementation of the search in MS-GF+, and limitations of Java. Java uses a 32-bit integer as the index for an array, which limits values to ~2.147 billion entries; MS-GF+ accesses all peptides in the fasta file in a way that means each residue is one entry in an array. Your database file, at 4GB, is big enough to have this problem for just a target or decoy search; when creating the concatenated target/decoy files for a target and decoy combined search, the number of residues is doubled, which doesn't make it any easier. |
Dear Developers, I had the same problem recently. I was using a fasta file size of 14GB, and by reading the replies between everyone, I realised that I needed to slice the database for searching. Because there are cases where a single MSMS is matched to different peptides in different searches, it seems to me that it is not possible to directly concatenate the results of these searches. So I wonder if there is an official tool for merging the results from these sliced searches? The command I use is: Any replies will be appreciated! |
Use the MzidMerger to combine |
Thanks a lot, I will try it! |
Dear, I've got another problem.
I'm using Ubuntu 20.04 with dotnet version 5.0.408. Any replies will be appreciated! |
I think you need to use "dotnet run /data/liuqingxiu/wkf/MSGFMerge/net5.0/MzidMerger.dll -inDir a -outDir b". I can't say with certainty, but I know the .exe is designed to be run standalone, so it's probably not the correct file to specify there, and all of the online examples show the use of a .dll.
…________________________________
From: Kaifei Wang ***@***.***>
Sent: Friday, December 22, 2023 12:13:34 AM
To: MSGFPlus/msgfplus ***@***.***>
Cc: Gibbons, Bryson C ***@***.***>; Comment ***@***.***>
Subject: Re: [MSGFPlus/msgfplus] java.lang.NullPointerException when running MS-GF+ (#13)
Check twice before you click! This email originated from outside PNNL.
Dear,
I've got another problem.
When I use the command: dotnet /data/liuqingxiu/wkf/MSGFMerge/net5.0/MzidMerger.exe -inDir a -out b, I receive the following error:
Error:
An assembly specified in the application dependencies manifest (MzidMerger.deps.json) has already been found but with a different file extension:
package: 'MzidMerger', version: '1.3.1'
path: 'MzidMerger.dll'
previously found assembly: '/data/liuqingxiu/wkf/MSGFMerge/net5.0/MzidMerger.exe'
I'm using Ubuntu 20.04 with dotnet version 5.0.408.
Any replies will be appreciated!
—
Reply to this email directly, view it on GitHub<#13 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABPPX5N6JW3IGFVMXO3U3A3YKU6K5AVCNFSM4DOJPKS2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBWG4ZTOMRWGEYQ>.
You are receiving this because you commented.Message ID: ***@***.***>
|
I ran an MS-GF+(v2017.01.13) search using SearchGUI, and received the following errors when the output files were being generated:
Writing results...
java.lang.NullPointerException
at edu.ucsd.msjava.mzid.MZIdentMLGen.getDBSequence(MZIdentMLGen.java:661)
at edu.ucsd.msjava.mzid.MZIdentMLGen.getPeptideEvidenceList(MZIdentMLGen.java:619)
at edu.ucsd.msjava.mzid.MZIdentMLGen.addSpectrumIdentificationResults(MZIdentMLGen.java:347)
at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:397)
at edu.ucsd.msjava.ui.MSGFPlus.runMSGFPlus(MSGFPlus.java:106)
at edu.ucsd.msjava.ui.MSGFPlus.main(MSGFPlus.java:57)
The search finishes without any errors, however no output .mzid files are generated. The command used to run the search was as follows:
ms-gf+ command:
/home/user/anaconda2/jre/bin/java -Xmx50g -jar /run/media/user/Data/rmj_proteomics/SearchGUI-3.2.18/resources/MS-GF+/MSGFPlus.jar -s /run/media/user/Data/rmj_proteomics/proteomics/RECONVERTED/RJ_FC2_DCE.mgf -d /run/media/user/Data/rmj_proteomics/TREMBL_database/nr_fungal/nr_fungal_concatenated_target_decoy.fasta -o /run/media/user/Data/rmj_proteomics/proteomics/nr_fungal_lin/.SearchGUI_temp/RJ_FC2_DCE.msgf.mzid -t 10.0ppm -tda 0 -mod /run/media/user/Data/rmj_proteomics/SearchGUI-3.2.18/resources/MS-GF+/params/Mods.txt -minCharge 2 -maxCharge 6 -inst 3 -thread 23 -m 3 -e 1 -ntt 2 -protocol 0 -minLength 8 -maxLength 45 -n 10 -addFeatures 0 -ti 0,4
Can you advise on how to resolve this error?
Kind regards,
Riegardt Johnson
The text was updated successfully, but these errors were encountered: