Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--minFraction too stringent for fragmented genomes? #70

Closed
wwood opened this issue Jul 17, 2020 · 9 comments
Closed

--minFraction too stringent for fragmented genomes? #70

wwood opened this issue Jul 17, 2020 · 9 comments

Comments

@wwood
Copy link

wwood commented Jul 17, 2020

Hi, thanks a report by @apcamargo at wwood/galah#7 I came across an issue with --minFraction on these fragmented genomes. They seem to align well:

$ fastANI -q a1.fna -r 2.fna -o /dev/stdout --minFraction 0.2 2>/dev/null
1.fna	2.fna	97.4762	228	629
$ fastANI -r 1.fna -q 29.fna -o /dev/stdout 2>/dev/null
2.fna	1.fna	98.351	232	255

But when --minFraction is used the hit goes away. This is even though 232/255 > 0.5:

$ fastANI -q 1.fna -r 2.fna -o /dev/stdout --minFraction 0.5 2>/dev/null
$ fastANI -r 1.fna -q 2.fna -o /dev/stdout --minFraction 0.5 2>/dev/null
$ fastANI -q 1.fna -r 2.fna -o /dev/stdout  --minFraction 0.5 --fragLen 1000 2>/dev/null
1.fna	2.fna	98.2643	1113	2276

(galah-dev) ben@u2:~/git/galah$ fastANI --version
version 1.31
(galah-dev) ben@u2:~/git/galah/antonio$ seqstat 1.fna
seqstat - show some simple statistics on a sequence file
SQUID 1.9g (January 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Format:              FASTA
Type (of 1st seq):   DNA
Number of sequences: 369
Total # residues:    2456354
Smallest:            2028
Largest:             32450
Average length:      6656.8
(galah-dev) ben@u2:~/git/galah/antonio$ seqstat 2.fna
seqstat - show some simple statistics on a sequence file
SQUID 1.9g (January 2003)
Copyright (C) 1992-2003 HHMI/Washington University School of Medicine
Freely distributed under the GNU General Public License (GPL)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Format:              FASTA
Type (of 1st seq):   DNA
Number of sequences: 409
Total # residues:    1468919
Smallest:            2005
Largest:             16940
Average length:      3591.5

Filtering out this alignment by the minFraction seems incorrect to me. I wonder what the definition of the minFraction actually is. Is it the fraction of the total genome length or the fraction of the genome that is long enough to be included as a fragment, or something along those lines?

Thanks, ben

@cjain7
Copy link
Member

cjain7 commented Jul 26, 2020

Check this line in the code, it should be clear from there.

@cjain7
Copy link
Member

cjain7 commented Jul 26, 2020

Essentially it checks the ratio of shared genome length vs. the length of the smaller of two genomes being compared. In older versions, FastANI had an absolute cutoff on shared genome length before trusting the ANI value, which was not good as the cutoff value should ideally depend on genome lengths.

@wwood
Copy link
Author

wwood commented Jul 26, 2020 via email

@cjain7
Copy link
Member

cjain7 commented Jul 26, 2020

I think I understand your point. Are you able to attach the fna files here? I'll take a look.

@wwood
Copy link
Author

wwood commented Jul 26, 2020

I've sent the MAGs to you over email just now.

wwood added a commit to wwood/galah that referenced this issue Aug 13, 2020
The 'test_fraglen' is now commented out as the test is now invalid.

See also:
ParBLiSS/FastANI#70

Reported by: @apcamargo
cjain7 added a commit that referenced this issue Aug 20, 2020
@cjain7
Copy link
Member

cjain7 commented Aug 20, 2020

Apologies for the delay in my response. I've revised the master branch to fix this. It is now working on this example.

$EXE -q MAG52.fna -r MAG189.fna -o /dev/stdout --minFraction 0.5
MAG52.fna	MAG189.fna	97.4762	228	629

$EXE -r MAG52.fna -q 189.fna -o /dev/stdout --minFraction 0.5
MAG189.fna	MAG52.fna	98.351	232	255

@wwood
Copy link
Author

wwood commented Aug 20, 2020 via email

@cjain7 cjain7 closed this as completed Aug 20, 2020
@simonrharris
Copy link

Hi, would it be possible to add this fix to a release?

@cjain7
Copy link
Member

cjain7 commented Sep 30, 2020

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants