Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about max-target-seqs option #29

Closed
wmnwmn opened this issue Dec 24, 2015 · 15 comments

Comments

@wmnwmn
Copy link

@wmnwmn wmnwmn commented Dec 24, 2015

Question, does max-target-seqs=N mean that the overall best N matches are shown, or only the first N that pass the e-value threshold? Hopefully the former, which would be a real improvement over the similar option of blastx.

@bbuchfink

This comment has been minimized.

Copy link
Owner

@bbuchfink bbuchfink commented Dec 27, 2015

It means the first N matches that pass the e-value treshold. But if you want to ignore the e-value, you can just set the treshold to something high like 10.

@wmnwmn

This comment has been minimized.

Copy link
Author

@wmnwmn wmnwmn commented Dec 27, 2015

If I set evalue to 10 would I not then get the first N matches, regardless of quality? But what I want is to find the best N matches overall.

@bbuchfink

This comment has been minimized.

Copy link
Owner

@bbuchfink bbuchfink commented Dec 27, 2015

The matches are still sorted by e-value so you would get the best N.

@wmnwmn

This comment has been minimized.

Copy link
Author

@wmnwmn wmnwmn commented Dec 27, 2015

Great, thanks!

@wmnwmn

This comment has been minimized.

Copy link
Author

@wmnwmn wmnwmn commented Dec 28, 2015

Actually now I'm confused. If -e 10 -k 5 gives the top 5 overall hits for each query, shouldn't -e .00001 -k 5 give the top 5 hits under 1E-5? But your original reply seems to imply that the second parameters would only give the first hits that were found under 1E-5, which might not be any of the top 5 best hits.

@bbuchfink

This comment has been minimized.

Copy link
Owner

@bbuchfink bbuchfink commented Dec 29, 2015

By first N I meant the best N matches, not the first N that are found.

@wmnwmn

This comment has been minimized.

Copy link
Author

@wmnwmn wmnwmn commented Jan 12, 2016

Ok that seems to resolve it, thanks!

@wmnwmn wmnwmn closed this Jan 12, 2016
@Myskmadra

This comment has been minimized.

Copy link

@Myskmadra Myskmadra commented Nov 21, 2016

Hey,

I am currently using Diamond0.8.24 and running the following command to find only the best hit per query with an e-value of less than 0.001:

diamond blastp -q Query.fasta -d Prot.db -o Diamond.out -f 6 --e 0.001 -k 1

I receive a file with several hits per query sequence plus e-values higher than 0.001. What am I doing wrong? Thank you in advance.

@bbuchfink

This comment has been minimized.

Copy link
Owner

@bbuchfink bbuchfink commented Nov 21, 2016

Hi,
can you show me an example of that in the output?

@Myskmadra

This comment has been minimized.

Copy link

@Myskmadra Myskmadra commented Nov 21, 2016

Sure, here is 17 times the same query with different alignments:
TR1|c1_g1_i1 AT5G03540.1 37.4 476 252 3 925 2349 200 630 9.7e-89 325.9
TR1|c1_g1_i1 AT5G03540.2 37.4 476 252 3 925 2349 85 515 9.7e-89 325.9
TR1|c1_g1_i1 AT5G03540.3 37.4 476 252 3 925 2349 226 656 9.7e-89 325.9
TR1|c1_g1_i1 AT5G52340.1 36.2 469 246 4 946 2349 207 623 2.2e-88 324.7
TR1|c1_g1_i1 AT1G72470.1 36.9 471 255 5 964 2361 192 625 4.7e-83 307.0
TR1|c1_g1_i1 AT3G14090.1 34.3 472 269 5 943 2349 180 613 5.7e-81 300.1
TR1|c1_g1_i1 AT5G58430.1 35.2 471 248 4 946 2352 204 619 2.6e-78 291.2
TR1|c1_g1_i1 AT1G54090.1 35.1 470 269 5 943 2349 177 611 1.0e-77 289.3
TR1|c1_g1_i1 AT5G50380.1 31.6 472 282 5 943 2355 242 673 3.7e-72 270.8
TR1|c1_g1_i1 AT5G13150.1 34.5 487 258 9 931 2352 207 645 8.2e-72 269.6
TR1|c1_g1_i1 AT5G13990.1 32.4 476 271 6 931 2352 256 682 2.1e-67 255.0
TR1|c1_g1_i1 AT5G52350.1 29.6 469 277 5 946 2349 161 577 2.1e-59 228.4
TR1|c1_g1_i1 AT5G59730.2 28.4 479 299 9 940 2364 165 603 1.4e-55 215.7
TR1|c1_g1_i1 AT5G59730.1 28.4 479 299 9 940 2364 165 603 1.4e-55 215.7
TR1|c1_g1_i1 AT2G39380.1 24.5 470 313 4 946 2352 183 611 5.9e-46 183.7
TR1|c1_g1_i1 AT1G07000.1 36.5 230 143 1 934 1614 182 411 7.5e-41 166.8
TR1|c1_g1_i1 AT2G28640.1 30.7 231 159 1 934 1623 153 383 1.5e-33 142.5

And those high evalues I talked about:
TR383|c0_g1_i1 AT5G42260.1 35.1 37 24 0 998 1108 262 298 2.4e+00 31.6
TR25921|c0_g1_i1 AT1G18080.1 42.1 38 21 1 961 1071 10 47 3.5e+01 29.3

That's bizarre, right?

@bbuchfink

This comment has been minimized.

Copy link
Owner

@bbuchfink bbuchfink commented Nov 21, 2016

It's possible to get several alignments with the same query/subject pair. As these are Hsps contained in the same subject sequence, they always get reported regardless of their e-value or the -k option. To disable that, you can use the option --single-domain.

However, it shouldn't be that you have more than one alignment with different subjects if you use the -k 1 option. I'll try if I can reproduce this problem somehow.

@bbuchfink

This comment has been minimized.

Copy link
Owner

@bbuchfink bbuchfink commented Nov 21, 2016

I can't reproduce this problem. Can you tell me more about your system specs, your OS, and the kind of query data and reference db?

@Myskmadra

This comment has been minimized.

Copy link

@Myskmadra Myskmadra commented Nov 22, 2016

I am using an Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz in x86-64 with Linux Mint 17.1 Rebecca. Anything else you need to know?
The query are contigs that I got from an assembly with Trinity. The reference are peptide sequences from Arabidopsis which I retrieved from Phytozome.

@bbuchfink

This comment has been minimized.

Copy link
Owner

@bbuchfink bbuchfink commented Nov 24, 2016

I tried to reproduce this under Linux Mint 17.1 now but still no luck. If you want you can send me your data or a small part of it that exhibits the problem and I will try that.

@Myskmadra

This comment has been minimized.

Copy link

@Myskmadra Myskmadra commented Dec 12, 2016

Sorry for the late reply. I tried now a couple of times but I can't reproduce it either. If I encounter the problem again I'll let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.