Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orthosnap output #9

Closed
evelinepinseel opened this issue Dec 14, 2023 · 11 comments
Closed

Orthosnap output #9

evelinepinseel opened this issue Dec 14, 2023 · 11 comments

Comments

@evelinepinseel
Copy link

Hello,

OrthoSNAP is a great tool - thanks for developing it!

I have a general question. Is there a way for orthoSNAP to output the removed in-paralogs and their corresponding orthologs? For my downstream ortholog-level analyses, I am looking for a tool that can identify orthologs, but also informs me on which in-paralogs correspond with these orthologs.

If this is not possible, can orthoSNAP output singleton orthologs (= a 'lonely' sequence in the gene tree)? I noticed in orthoSNAP's test example that these singletons are not assigned to their own ortholog. However, if it would be possible to do this, I could come up with a workaround for my first issue.

Thanks already!

Best wishes,
Eveline

@JLSteenwyk
Copy link
Owner

Hi Eveline,

Firstly, thank you for choosing to use OrthoSNAP! Your usage and feedback are greatly appreciated!

Regarding both of your questions, I believe I understand them correctly; however, I want to make sure I am. Would it be possible to provide a figure or draw on top of the one from the test example to clarify your question?

I apologize for any inconvenience this may cause you, but I value your question and want to make sure I appropriately address it.

Thanks again for using OrthoSNAP!

Best,

Jacob

@evelinepinseel
Copy link
Author

Hi Jacob,

Many thanks for the quick reply and interest in my question - absolutely no inconvenience from my part!

Let me explain it using the example tree from the orthoSNAP paper.

Question 1: orthoSNAP will output one sequence per species in each ortholog. This means that in the test example, only one of the sequences with the label 'copy' are retained. In my case, I would need to figure out which of these removed 'copy' sequences correspond with which ortholog. Given that I am dealing with 9000+ gene trees, I would like to do this automatically. Is it possible for orthoSNAP to output which in-paralogs were removed? For example, say that the test example retains species2|gene2-copy0 and species4|gene2-copy1 in ortholog1, would it be possible for orthoSNAP to inform me that species2|gene2-copy1 and species4|gene2-copy0 were pruned from ortholog1?

Question 2 (if there is a solution to question 1, this question becomes obsolete - it's more of a workaround): I would like to run orthoSNAP with an occupancy threshold of 1 taxon, instead of the default 50%. I tried doing this on the test example and noticed that orthoSNAP outputs all orthologs, except for the singleton (species0|gene3 in the test example). Is there a way for orthoSNAP to output this singleton also as its own ortholog?

orthosnap

Thanks again!

@JLSteenwyk
Copy link
Owner

Hi Eveline,

Just wanted to let you know that I am working on this and will get back to you.

@evelinepinseel
Copy link
Author

evelinepinseel commented Dec 15, 2023 via email

@JLSteenwyk
Copy link
Owner

Hi Eveline,

I hope you are doing well.

Please see the new -rih argument, which stands for report inparalog handling. This argument addresses your request by outputting a three column file. The first column is the relevant SNAP-OG file, the second column is the inparalog that was kept, and the third column is/are the inparalog/s that were removed. See here for docs: https://jlsteenwyk.com/orthosnap/usage/index.html#report-inparalog-handling

Thank you for choosing to use OrthoSNAP. You may find some of the other software I have developed (see here: https://jlsteenwyk.com/software.html) helpful for your studies.

Happy coding and happy holidays!

Best,

Jacob

P.S., Congrats on your contributions to the field of diatom evolution. I thought Pinseel (2022) ISME was particularly cool and underscores the importance of considering strain heterogeneity in 'omic studies.

@evelinepinseel
Copy link
Author

evelinepinseel commented Dec 19, 2023 via email

@JLSteenwyk JLSteenwyk reopened this Dec 19, 2023
@JLSteenwyk
Copy link
Owner

Hi Eveline,

Sorry about that.

Would you be willing to share your test files and expected output?

best,

Jacob

@evelinepinseel
Copy link
Author

evelinepinseel commented Dec 19, 2023 via email

@JLSteenwyk
Copy link
Owner

Hi Eveline,

Sorry about that error. It should be fixed as of 1.3.0. Regarding monophyletic pruning versus paraphyletic pruning, I don't think OrthoSNAP should support paraphyletic pruning. It would be too difficult to determine which paralog to keep.

One thing I was pleasantly surprised to see when developing OrthoSNAP is that SC-OGs and SNAP-OGs have the same phylogenetic information content (see Fig. 3 from the manuscript). I think this, in part, stems from our strict definition of inparalog pruning. More importantly, this observation suggests SNAP-OGs are useful for downstream phylogenomic analyses, such as genome-wide scans of selection, species tree inference, and calculations of gene-gene coevolution.

Thank you again for choosing OrthoSNAP! Please let me know if there are any other features you may want, including for other software like ClipKIT and PhyKIT!

best,

Jacob

@evelinepinseel
Copy link
Author

evelinepinseel commented Dec 20, 2023 via email

@JLSteenwyk
Copy link
Owner

Yay - so glad I could help your research program!

I look forward to seeing what you do with OrthoSNAP!

Wishing you a happy holidays and a strong start to 2024!

best,

Jacob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants