Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does prediction with no coordinates indicate the whole sequences is a viral sequence? #24

Closed
shenwei356 opened this issue Jun 27, 2023 · 2 comments

Comments

@shenwei356
Copy link

I just found the answer:

coordinates: 1-indexed coordinates of the provirus region within host sequences. Will be NA for viruses that were not predicted to be integrated.

Normal prediction result.

seq_name               length   topology   coordinates     n_genes   genetic_code   virus_score   fdr   n_hallmarks   marker_enrichment   taxonomy            
--------------------   ------   --------   -------------   -------   ------------   -----------   ---   -----------   -----------------   --------------------
CP000388.1|provirus_   52034    Provirus   774617-826650   58        11             0.9401        NA    11            45.1836             Viruses;            
774617_826650                                                                                                                             Duplodnaviria;      
                                                                                                                                          Heunggongvirae;     
                                                                                                                                          Uroviricota;        
                                                                                                                                          Caudoviricetes

Prediction without coordinates (genome is GCA_000166735.2, a MAG).

seq_name         length   topology              coordinates   n_genes   genetic_code   virus_score   fdr   n_hallmarks   marker_enrichment   taxonomy            
--------------   ------   -------------------   -----------   -------   ------------   -----------   ---   -----------   -----------------   --------------------
AEMJ01000831.1   698      No terminal repeats   NA            1         11             0.9638        NA    0             1.7183              Viruses;            
                                                                                                                                             Duplodnaviria;      
                                                                                                                                             Heunggongvirae;     
                                                                                                                                             Uroviricota;        
                                                                                                                                             Caudoviricetes      
AEMJ01000737.1   1746     No terminal repeats   NA            2         11             0.9244        NA    0             1.7183              Viruses;            
                                                                                                                                             Duplodnaviria;      
                                                                                                                                             Heunggongvirae;     
                                                                                                                                             Uroviricota;        
                                                                                                                                             Caudoviricetes      
AEMJ01000706.1   826      No terminal repeats   NA            1         11             0.8908        NA    0             1.4495              Viruses;            
                                                                                                                                             Duplodnaviria;      
                                                                                                                                             Heunggongvirae;     
                                                                                                                                             Uroviricota;        
                                                                                                                                             Caudoviricetes      
AEMJ01000847.1   3369     No terminal repeats   NA            2         11             0.8785        NA    0             1.7183              Viruses;            
                                                                                                                                             Duplodnaviria;      
                                                                                                                                             Heunggongvirae;     
                                                                                                                                             Uroviricota;        
                                                                                                                                             Caudoviricetes      
AEMJ01000526.1   288      No terminal repeats   NA            2         11             0.8497        NA    0             0.0000              Unclassified        
AEMJ01000792.1   2672     No terminal repeats   NA            3         11             0.8414        NA    0             1.7183              Unclassified        
AEMJ01000320.1   1885     No terminal repeats   NA            2         11             0.8297        NA    0             1.7183              Unclassified        
AEMJ01000546.1   283      No terminal repeats   NA            2         11             0.8262        NA    0             0.0000              Unclassified        
AEMJ01000712.1   349      No terminal repeats   NA            2         11             0.8238        NA    0             0.0000              Unclassified
$ seqkit stats GCA_000166735.2.fna.gz 
file                    format  type  num_seqs    sum_len  min_len  avg_len  max_len
GCA_000166735.2.fna.gz  FASTA   DNA        893  2,298,088      101  2,573.4   82,336

$ seqkit seq -n GCA_000166735.2.fna.gz | head -n 3
AEMJ01000893.1 UNVERIFIED_ORG: Leuconostoc inhae KCTC 3774 contig00909, whole genome shotgun sequence
AEMJ01000892.1 UNVERIFIED_ORG: Leuconostoc inhae KCTC 3774 contig00908, whole genome shotgun sequence
AEMJ01000891.1 UNVERIFIED_ORG: Leuconostoc inhae KCTC 3774 contig00907, whole genome shotgun sequence
@apcamargo
Copy link
Owner

Good that you already found the answer :)

As a note, be careful with those very short sequences. In particular with the ones without markers. I usually require a hallmark for the short ones.

@shenwei356
Copy link
Author

I forgot to say that I used the --relaxed flag.

Thanks for the message. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants