Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Qemistree so it runs from SIRIUS workspace and generic input files #145

Open
lfnothias opened this issue Jun 5, 2021 · 6 comments

Comments

@lfnothias
Copy link

Hi @anupriyatripathi

This is to initiate a discussion to resolve the current issue users are having running Qemistree.

Context:

A sustainable solution would be to modify the Qemistree workflow by externalizing the SIRIUS computation part.
The user would provide the SIRIUS workspace as input to run qiime qemistree make-hierarchy. Of course, the user would be instructed to have computed a minimal set of steps SIRIUS/CSIFINGERID and ZODIAC/CANOPUS as optional.

For even larger flexibility in the long run and for offering wider support for other similarity functions, like basic cosine score ,and those that are being developed (like MS2DeepScore, https://www.biorxiv.org/content/10.1101/2021.04.18.440324v1), the best would be to have the possibility to run the hierarchy from the generic input files: They would be:

  • a similarity matrix.
  • feature annotation metadata table(s).
  • the feature quantificatation file.

Actually @ElDeveloper, with my support, wrote a python script for the Earth Microbiome Project that generates a tree/hierarchy from a novel SIRIUS workspace. The script only usesscipy scikit qiime2 libraries. Maybe we should release that very soon to help the users who are struggling ?
Is anyone interested in testing that solution ?

@lfnothias lfnothias changed the title Running Qemistree on the SIRIUS workspace from the latest version. Updating Qemistree so it runs from SIRIUS workspace and generic input files Jun 5, 2021
@anupriyatripathi
Copy link
Collaborator

Hi @lfnothias, thanks for raising these important issues regarding:

a) support for making hierarchies with other similarity/dissimilarity matrices such as cosine scores, Tanimoto scores, etc
b) the incompatibility of q2-qemistree with the latest Sirius version
It is good to know that @ElDeveloper has a prototype to generate a chemical hierarchy from a new Sirius workspace, which means that this can be done.

Would you or @ElDeveloper or someone else be interested in working on adding some of these functionalities to q2-qemistree? I would be able to support the development process by discussing how to best implement this and providing code reviews.

@ElDeveloper
Copy link
Member

Yes, this is absolutely a great idea. I think probably the best way is to create a new directory format (SiriusWorkspacev440 or something like that). Then we can write two transformers, one to extract the fingerprints and one to extract the feature metadata. The commands you would run are something along the lines of:

To get the fingerprints (in a matrix form)

qiime tools import \
--input-path emp-sirius-workspace \
--output-path emp-fingerprints.qza \
--format SiriusWorkspacev440 \
--type FeatureTable[Frequency]

To get the feature metadata (for use with other plugins)

qiime tools import \
--input-path emp-sirius-workspace \
--output-path emp-feature-metadata.qza \
--format SiriusWorkspacev440 \
--type FeatureData[Molecules]

After the user has done this, then a user would need to use the fingerprints to build the tree (we can add a new action).

The biggest change from this is that we would leave running Sirius up to the end users, and we would mostly be handling the tree construction QCing, etc by parsing the Sirius workspaces. I kinda like this idea because when Sirius changes its outputs in the future, then we'll only need to write a new directory format, for example SiriusWorkspacev666. In the artifact outputs would remain the same (a feature table and the corresponding feature metadata).

@lfnothias
Copy link
Author

Nice. That seems a very practical way to deal with SIRIUS in the long run !
If we could also support a similarity matrix as input, that would give the maximum flexibility for incorporating other tools/similarity function.

@ElDeveloper
Copy link
Member

Great, thanks @lfnothias. Any thoughts @anupriyatripathi?

@amcaraballor
Copy link

Hey dear all @lfnothias @ElDeveloper @anupriyatripathi , do you guys have an updated way for this Qemistree workflow?

@anupriyatripathi
Copy link
Collaborator

Hi @amcaraballor, we have worked on updating Qemistree with @helenamrusso. Her Github branch has the latest version that is compatible with the latest version of Sirius. It will be merged into the main workflow soon but you can use the branch if needed. @helenamrusso has been using it and helping other users as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants