Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error running with mzML file #316

Closed
JianAtSeer opened this issue Sep 3, 2021 · 9 comments
Closed

error running with mzML file #316

JianAtSeer opened this issue Sep 3, 2021 · 9 comments

Comments

@JianAtSeer
Copy link

Describe the bug
I tried to run the pipeline on a two mzml files, but I got the error
An exception occured running AlphaPept version 0.3.28:
File extension .mzML not understood.

I read in the documentation that the pipeline is relying on pyteomics to read/parse mzml files.
So i tried to load the mzml file with pyteomics separately and it seems to be fine
....................................................

File extension .mzML not understood.
....................................................
To Reproduce
Steps to reproduce the behavior:

  1. alphapept workflow default_settings.yaml

Expected behavior
The pipeline finish all the steps

Screenshots
image

Version (please complete the following information):

  • OS: ubuntu 16
  • Version [e.g. 0.6.8]
  • Installation Type: pip install after cloning the repo from git

Additional context
Add any other context about the problem here. Attached log files or upload data files if possible.

@straussmaximilian
Copy link
Member

Hi,
It looks as the mzML import is not integrated correctly. As for the mzML files, the spectrum title is not defined. Could you upload and share one of these files so that we can ensure compatibility?

@JianAtSeer
Copy link
Author

JianAtSeer commented Sep 8, 2021

Hi,
Thanks for the quick reply. Actually, after playing around with it a little bit, I think the issue could be somewhere else. I was able to run the pipeline complete when there is a single file. But it fails when there are multiple files, specifically during the step of feature finding.

Specifically in file interface.py, the following code section seems to raise an error when the file is not .raw or .d, I am not sure if this is intended? Is feature finding not supported for mzML files? The pipeline seems to run fine when i just input a single mzML files, though

#Limit number of processes for Bruker FF
if step.name == 'find_features':
base, ext = os.path.splitext(files[0])
if ext.lower() == '.d':
memory_available = psutil.virtual_memory().available/10243
n_processes = max((int(memory_available //25 ),1))
logging.info(f'Using Bruker Feature Finder. Setting Process limit to {n_processes}.')
elif ext.lower() == '.raw':
memory_available = psutil.virtual_memory().available/1024
3
n_processes = max((int(memory_available //8 ), 1))
logging.info(f'Setting Process limit to {n_processes}')
else:
raise NotImplementedError('File extension {} not understood.'.format(ext))

straussmaximilian added a commit that referenced this issue Sep 8, 2021
@straussmaximilian
Copy link
Member

Good catch. I made some fixes in the https://github.com/MannLabs/alphapept/tree/qc_fixes branch. Do you mind testing if this solves the problem? This will then be included in the next release.

@JianAtSeer
Copy link
Author

Thanks. I just tested the fixes and it works! On side note i was wondering if you can help me with a different issue. While I was testing the branch codes. I noticed, some of my mzML files takes a really long time for feature finding to run (i.e. > 1 day). But my other files finish in like < 20 min. They are all from similar samples. I was wondering if you can help me see why this set of files is special and takes so long to do feature finding. here is the file: https://seer.box.com/s/yofs6w3vy3twodsbiigf2d1yj8z1kime
Thanks a lot.

@straussmaximilian
Copy link
Member

Hi,
this sounds like a bug. Thanks for sharing the file, I will investigate.

@JianAtSeer
Copy link
Author

Hi,
Just checking to see if you have any clue on the reason for the long-running feature detection. Also do you want me to open this as a separate issue? As this is not exactly related to the original issue I open the ticket about. Thanks

@straussmaximilian
Copy link
Member

Hi�,
yes I� could reproduce the bug, it is probably some runtime condition and probably h�ave time to investigate / fix this tomorrow.
Good idee to open another issue, then we reference this properly.

straussmaximilian added a commit that referenced this issue Sep 22, 2021
@straussmaximilian
Copy link
Member

Bug was related to having zero intensities in the mzML, should now be fixed. Feel free to check out the develop branch, otherwise it will be included in the next release.

@JianAtSeer
Copy link
Author

Great! Thanks. I just confirmed the fixes works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants