Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent start score computed for some genes #19

Open
althonos opened this issue Oct 21, 2022 · 1 comment
Open

Inconsistent start score computed for some genes #19

althonos opened this issue Oct 21, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@althonos
Copy link
Owner

althonos commented Oct 21, 2022

While adding some tests to check for the GFF output (in order to fix #18) I noticed that the start score of some genes were deviating from the Prodigal reference results. This was not verified before since the GFF format is the only output format to contain these statistics. This change in start score affects the may score and the confidence of each gene marginally.

Genes scored with Prodigal:

NODE_23_length_79939_cov_26.984653	Prodigal_v2.6.3	CDS	1	177	8.4	-	0	ID=1_1;partial=10;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.237;conf=90.13;score=9.62;cscore=10.74;sscore=-1.12;rscore=-5.22;uscore=-1.07;tscore=3.94;
NODE_23_length_79939_cov_26.984653	Prodigal_v2.6.3	CDS	168	386	25.1	-	0	ID=1_2;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.251;conf=99.77;score=26.33;cscore=27.03;sscore=-0.70;rscore=-6.04;uscore=0.68;tscore=3.41;
NODE_23_length_79939_cov_26.984653	Prodigal_v2.6.3	CDS	389	1483	186.7	-	0	ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.254;conf=99.99;score=186.70;cscore=168.23;sscore=18.47;rscore=14.49;uscore=0.04;tscore=3.94;
NODE_23_length_79939_cov_26.984653	Prodigal_v2.6.3	CDS	1632	2981	218.9	-	0	ID=1_4;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=3-4bp;gc_cont=0.296;conf=99.99;score=218.26;cscore=200.52;sscore=17.74;rscore=14.49;uscore=-0.04;tscore=3.94;
NODE_23_length_79939_cov_26.984653	Prodigal_v2.6.3	CDS	3569	3925	25.5	+	0	ID=1_5;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.266;conf=99.72;score=25.49;cscore=21.09;sscore=4.41;rscore=1.46;uscore=-1.00;tscore=3.94;

Genes scored with Pyrodigal v0.6.4:

NODE_23_length_79939_cov_26.984653_1	pyrodigal_v0.6.4	CDS	1	177	8.4	-	0	ID=1_1;partial=10;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.237;conf=90.13;score=9.62;cscore=10.74;sscore=-1.12;rscore=-5.22;uscore=-1.07;tscore=3.94;
NODE_23_length_79939_cov_26.984653_2	pyrodigal_v0.6.4	CDS	168	386	25.1	-	0	ID=1_2;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.251;conf=99.77;score=26.33;cscore=27.03;sscore=-0.70;rscore=-6.04;uscore=0.68;tscore=3.41;
NODE_23_length_79939_cov_26.984653_3	pyrodigal_v0.6.4	CDS	389	1483	186.7	-	0	ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.254;conf=99.99;score=186.70;cscore=168.23;sscore=18.47;rscore=14.49;uscore=0.04;tscore=3.94;
NODE_23_length_79939_cov_26.984653_4	pyrodigal_v0.6.4	CDS	1632	2981	218.9	-	0	ID=1_4;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=3-4bp;gc_cont=0.296;conf=99.99;score=218.26;cscore=200.52;sscore=17.74;rscore=14.49;uscore=-0.04;tscore=3.94;
NODE_23_length_79939_cov_26.984653_5	pyrodigal_v0.6.4	CDS	3569	3925	25.5	+	0	ID=1_5;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.266;conf=99.72;score=25.49;cscore=21.09;sscore=4.41;rscore=1.46;uscore=-1.00;tscore=3.94;

Genes scored with Pyrodigal v1.1.2:

NODE_23_length_79939_cov_26.984653_1	pyrodigal_v1.1.2	CDS	1	177	8.4	-	0	ID=1_1;partial=10;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.237;conf=87.32;score=8.39;cscore=10.74;sscore=-2.35;rscore=-5.22;uscore=-1.07;tscore=3.94
NODE_23_length_79939_cov_26.984653_2	pyrodigal_v1.1.2	CDS	168	386	25.1	-	0	ID=1_2;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.251;conf=99.69;score=25.07;cscore=27.03;sscore=-1.96;rscore=-6.04;uscore=0.68;tscore=3.41
NODE_23_length_79939_cov_26.984653_3	pyrodigal_v1.1.2	CDS	389	1483	186.7	-	0	ID=1_3;partial=00;start_type=ATG;rbs_motif=GGAGG;rbs_spacer=5-10bp;gc_cont=0.254;conf=99.99;score=186.70;cscore=168.23;sscore=18.47;rscore=14.49;uscore=0.04;tscore=3.94
NODE_23_length_79939_cov_26.984653_4	pyrodigal_v1.1.2	CDS	1632	2981	218.9	-	0	ID=1_4;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=3-4bp;gc_cont=0.296;conf=99.99;score=218.91;cscore=200.52;sscore=18.39;rscore=14.49;uscore=-0.04;tscore=3.94
NODE_23_length_79939_cov_26.984653_5	pyrodigal_v1.1.2	CDS	3569	3925	25.5	+	0	ID=1_5;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.266;conf=99.72;score=25.49;cscore=21.09;sscore=4.41;rscore=1.46;uscore=-1.00;tscore=3.94

After bissecting, I found that the bug was introduced between v0.6.4 and v0.7.0.

@althonos althonos added the bug Something isn't working label Oct 21, 2022
@althonos
Copy link
Owner Author

althonos commented Oct 21, 2022

It looks like the bug may be coming from a weird Prodigal behaviour, and only occurs in metagenomic mode.

In the original Prodigal code, the gene data string is created right when the best genes are identified but the nodes may be changed after that, so there is a discrepancy between the gene data string and the actual start node attributes. This only occurs for genes that have been corrected with eliminate_bad_genes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant