Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scores are often incorrect #10

Closed
tolot27 opened this issue Aug 8, 2014 · 11 comments
Closed

scores are often incorrect #10

tolot27 opened this issue Aug 8, 2014 · 11 comments

Comments

@tolot27
Copy link
Contributor

tolot27 commented Aug 8, 2014

By comparison with swipe I observed that the scores computed by swimd are often incorrect. Since I did a comparison between swipe and ssearch sometimes ago, I trust the scores of swipe.

Compare for instance the first sequence of ./test_data/db/uniprot_sprot15.fasta with the fourth sequence of the same file with blosum50 and gap open of 13 and gap extend of 2. Swipe returns a raw score of 41, whereas swimd returns 43.

./swimd_aligner -m Blosum50 -o 13 -e 2 ../test_data/db/uniprot_sprot1.fasta ../test_data/db/uniprot_sprot1_4.fasta

./swipe -M BLOSUM50 -G 13 -E 2 -i ../swimd/test_data/db/uniprot_sprot1.fasta -d ../swimd/test_data/db/uniprot_sprot1_4.fasta -m 7 -v 15 -b 15 -e 10000 | grep score

BTW: I've compared the scoring matrices and they are identical.

@Martinsos
Copy link
Owner

@tolot27 Thank you, I will give it a look!

@tolot27
Copy link
Contributor Author

tolot27 commented Aug 14, 2014

I made some further comparison with ssearch, swipe, swimd and sswlib and the same parameters and matrices as above. Query is again the first sequence of ./test_data/db/uniprot_sprot15.fasta (sp|Q6GZX4|001R_FRG3G), database is again ./test_data/db/uniprot_sprot15.fasta. Differences are marked in boldface.

subject_id ssearch swipe swimd sswlib
sp|Q6GZX4|001R_FRG3G 1734 1734 1734 1734
sp|Q6GZX3|002L_FRG3G 45 45 45 45
sp|Q197F8|002R_IIV3 41 41 41 41
sp|Q197F7|003L_IIV3 41 41 43 43
sp|Q6GZX2|003R_FRG3G 38 38 38 38
sp|Q6GZX1|004R_FRG3G 24 24 24 24
sp|Q197F5|005L_IIV3 36 36 42 42
sp|Q6GZX0|005R_FRG3G 37 37 41 41
sp|Q91G88|006L_IIV6 51 51 56 56
sp|Q6GZW9|006R_FRG3G 33 33 35 35
sp|Q6GZW8|007R_FRG3G 37 37 37 37
sp|Q197F3|007R_IIV3 50 50 52 52
sp|Q197F2|008L_IIV3 37 37 37 37
sp|Q6GZW6|009L_FRG3G 40 40 44 44
sp|Q91G85|009R_IIV6 30 30 30 30

Interestingly, swimd and sswlib gave always the same score beside the fact that they use inter- vs. intra-sequence comparison.

I'll dig further.

@Martinsos
Copy link
Owner

Hm, very interesting, sounds like maybe it is not a bug. It is interesting that they are always giving smaller scores. Can we get alignment, or at least starting position, and see if they are all possible?

@Martinsos
Copy link
Owner

Oh yes, one more thing: did you check how gap opening and gap extension penalty are defined? Sometimes gap opening already includes first extension, and sometimes not. For example, for gap of length 3, gap opening penalty of 10 and gap extension of 1, it can be different result depending on previous definition: it can be 10 + 1 + 1 + 1 = 13 or 10 + 1 + 1 = 12.

@tolot27
Copy link
Contributor Author

tolot27 commented Aug 14, 2014

Okay, found it. First of all, I accidentially used a hard-masked database with swipe. After using unmasked database, swipe returns almost the same scores as ssearch.

Then I checked swimd an fixed it as follows:

diff --git a/src/Swimd.cpp b/src/Swimd.cpp
index fcb68ac..54254e3 100644
--- a/src/Swimd.cpp
+++ b/src/Swimd.cpp
@@ -232,10 +232,10 @@ static int searchDatabaseSW_(unsigned char query[], int queryLength,
         // ----------------------- CORE LOOP (ONE COLUMN) ----------------------- //
         for (int r = 0; r < queryLength; r++) { // For each cell in column
             // Calculate E = max(lH-Q, lE-R)
-            __mxxxi E = SIMD::max(SIMD::sub(prevHs[r], Q), SIMD::sub(prevEs[r], R));
+            __mxxxi E = SIMD::sub(SIMD::max(SIMD::sub(prevHs[r], Q), prevEs[r]), R);

             // Calculate F = max(uH-Q, uF-R)
-            __mxxxi F = SIMD::max(SIMD::sub(uH, Q), SIMD::sub(uF, R));
+            __mxxxi F = SIMD::sub(SIMD::max(SIMD::sub(uH, Q), uF), R);

             // Calculate H
             __mxxxi H = SIMD::max(F, E);

Now, swimd returns the same scores than swipe and ssearch, at least for this 15 hits. Now, I'll check sswlib.

@tolot27
Copy link
Contributor Author

tolot27 commented Aug 14, 2014

Fixed the sswlib too: mengyao/Complete-Striped-Smith-Waterman-Library#14

@mengyao
Copy link

mengyao commented Aug 15, 2014

Hi Mathias,

Thank you for your revision for sswlib. I was wondering would you mind to help to have a check of the following:

The gap opening penalty's definition of sswlib and ssearch are different. For example, if you set the gap opening parameter as 10, extension penalty as 2, when open a gap, sswlib use 10, while ssearch use 12. Please have a check of the parameter settings when you compare sswlib with ssearch.

If you do find they give different results, when they are using equivalent parameters, please let me know.

Many thanks,

Mengyao

@tolot27
Copy link
Contributor Author

tolot27 commented Aug 15, 2014

I've checked the gap penalties again and you are right. The reason why the scores of sswlib and swimd differ from the scores of ssearch and swipe is the different use of the gap open penalty as you wrote.
I even checked again the NCBI Blast C++ Toolkit. There, the opening penalty is also computed by adding the gap extension cost to the gap open cost - returning the same results as swipe and ssearch.

What was your and/or Martinsos decision to implement the gap penalties differently than Pearson, Altschul or Rognes - beside the fact that IMHO you implemented it as the natural understanding is. ;-)

BLAST and FASTA are de facto standard. Wouldn't it be better to keep compatibility to this standard - even if you have to give up the natural understanding of gap opening penalty? In this case, we have to rename the issues and mention compatibility instead of 'incorrect'. ;-)

BTW: Did you find a documentation of the penalty implementations in Blast or ssearch? It is neither explained in the programs command line help nor at their websites - as far as I found it.

@mengyao
Copy link

mengyao commented Aug 18, 2014

Dear Martinsos,

Thank you for bringing up this issue.

The two different definitions of gap open exist all the time, and there is no standard.

I think I would like to keep sswlib’s definition as its current status. I just added a note of this gap open definition in the README file.

It is good to make it clear to the users.

Best wishes,

Mengyao

On Jan 15, 2014, at 1:20 PM, Mathias Walter notifications@github.com wrote:

I've checked the gap penalties again and you are right. The reason why the scores of sswlib and swimd differ from the scores of ssearch and swipe is the different use of the gap open penalty as you wrote.
I even checked again the NCBI Blast C++ Toolkit. There, the opening penalty is also computed by adding the gap extension cost to the gap open cost - returning the same results as swipe and ssearch.

What was your and/or Martinsos decision to implement the gap penalties differently than Pearson, Altschul or Rognes - beside the fact that IMHO you implemented it as the natural understanding is. ;-)

BLAST and FASTA are de facto standard. Wouldn't it be better to keep compatibility to this standard - even if you have to give up the natural understanding of gap opening penalty? In this case, we have to rename the issues and mention compatibility instead of 'incorrect'. ;-)

BTW: Did you find a documentation of the penalty implementations in Blast or ssearch? It is neither explained in the programs command line help nor at their websites - as far as I found it.


Reply to this email directly or view it on GitHub.

@Martinsos
Copy link
Owner

Dear @mengyao and @tolot27,
I am glad we solved this. In that case I will also keep the current definition since another one is not any better and some people are already using it as it is, so a change could introduce confusion!
I will add additional comments to make this more obvious.

@Martinsos
Copy link
Owner

Added explanations in commit 91caf5a! Thanks @tolot27 again for pointing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants