Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New protein used for HGT #5

Closed
JianjiaDTU opened this issue Apr 16, 2023 · 1 comment
Closed

New protein used for HGT #5

JianjiaDTU opened this issue Apr 16, 2023 · 1 comment

Comments

@JianjiaDTU
Copy link

In the readme, it notes that protein id should be from the GenBank protein database. But if I denovo assembled an new genome. Then I want to detect the HGT gene in the NCBI database. How should I deal with this problem. In a word, if I just want to calculate the AI value for the new protein. is it ok for me?

@le-yuan
Copy link
Collaborator

le-yuan commented Apr 24, 2023

Maybe you are the person who contacted us by email before, but I also post my response here for reference.

Thanks for your interest. For your new proteins, it is also possible to detect HGT events by using HGTphyloDetect. There are two possible solutions we can offer you, depending on your programming background:

(1) If you have basic Python programming skills, you can modify the code in HGT_workflow.py to define the kingdom and subphylum based on your input protein. In the ReadMe file, we mentioned that the protein ID should be from the GenBank protein database to allow the program to automatically determine the kingdom and subphylum information. However, for your new proteins, you will need to manually input this information;

(2) If you have no programming background, you can use a protein ID that is very similar to your query protein in the phylogeny (but keep your original sequence and only change the protein ID that can be found by the program). The input protein ID in HGTphyloDetect is primarily used to detect the clade of the query protein and then divide those protein hits into ingroup and outgroup from the Blastp process. Therefore, using a similar protein ID should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants