-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scores are often incorrect #10
Comments
@tolot27 Thank you, I will give it a look! |
I made some further comparison with ssearch, swipe, swimd and sswlib and the same parameters and matrices as above. Query is again the first sequence of
Interestingly, swimd and sswlib gave always the same score beside the fact that they use inter- vs. intra-sequence comparison. I'll dig further. |
Hm, very interesting, sounds like maybe it is not a bug. It is interesting that they are always giving smaller scores. Can we get alignment, or at least starting position, and see if they are all possible? |
Oh yes, one more thing: did you check how gap opening and gap extension penalty are defined? Sometimes gap opening already includes first extension, and sometimes not. For example, for gap of length 3, gap opening penalty of 10 and gap extension of 1, it can be different result depending on previous definition: it can be 10 + 1 + 1 + 1 = 13 or 10 + 1 + 1 = 12. |
Okay, found it. First of all, I accidentially used a hard-masked database with swipe. After using unmasked database, swipe returns almost the same scores as ssearch. Then I checked swimd an fixed it as follows: diff --git a/src/Swimd.cpp b/src/Swimd.cpp
index fcb68ac..54254e3 100644
--- a/src/Swimd.cpp
+++ b/src/Swimd.cpp
@@ -232,10 +232,10 @@ static int searchDatabaseSW_(unsigned char query[], int queryLength,
// ----------------------- CORE LOOP (ONE COLUMN) ----------------------- //
for (int r = 0; r < queryLength; r++) { // For each cell in column
// Calculate E = max(lH-Q, lE-R)
- __mxxxi E = SIMD::max(SIMD::sub(prevHs[r], Q), SIMD::sub(prevEs[r], R));
+ __mxxxi E = SIMD::sub(SIMD::max(SIMD::sub(prevHs[r], Q), prevEs[r]), R);
// Calculate F = max(uH-Q, uF-R)
- __mxxxi F = SIMD::max(SIMD::sub(uH, Q), SIMD::sub(uF, R));
+ __mxxxi F = SIMD::sub(SIMD::max(SIMD::sub(uH, Q), uF), R);
// Calculate H
__mxxxi H = SIMD::max(F, E); Now, swimd returns the same scores than swipe and ssearch, at least for this 15 hits. Now, I'll check sswlib. |
Fixed the sswlib too: mengyao/Complete-Striped-Smith-Waterman-Library#14 |
Hi Mathias, Thank you for your revision for sswlib. I was wondering would you mind to help to have a check of the following: The gap opening penalty's definition of sswlib and ssearch are different. For example, if you set the gap opening parameter as 10, extension penalty as 2, when open a gap, sswlib use 10, while ssearch use 12. Please have a check of the parameter settings when you compare sswlib with ssearch. If you do find they give different results, when they are using equivalent parameters, please let me know. Many thanks, Mengyao |
I've checked the gap penalties again and you are right. The reason why the scores of sswlib and swimd differ from the scores of ssearch and swipe is the different use of the gap open penalty as you wrote. What was your and/or Martinsos decision to implement the gap penalties differently than Pearson, Altschul or Rognes - beside the fact that IMHO you implemented it as the natural understanding is. ;-) BLAST and FASTA are de facto standard. Wouldn't it be better to keep compatibility to this standard - even if you have to give up the natural understanding of gap opening penalty? In this case, we have to rename the issues and mention compatibility instead of 'incorrect'. ;-) BTW: Did you find a documentation of the penalty implementations in Blast or ssearch? It is neither explained in the programs command line help nor at their websites - as far as I found it. |
Dear Martinsos, Thank you for bringing up this issue. The two different definitions of gap open exist all the time, and there is no standard. I think I would like to keep sswlib’s definition as its current status. I just added a note of this gap open definition in the README file. It is good to make it clear to the users. Best wishes, Mengyao On Jan 15, 2014, at 1:20 PM, Mathias Walter notifications@github.com wrote:
|
By comparison with swipe I observed that the scores computed by swimd are often incorrect. Since I did a comparison between swipe and ssearch sometimes ago, I trust the scores of swipe.
Compare for instance the first sequence of ./test_data/db/uniprot_sprot15.fasta with the fourth sequence of the same file with blosum50 and gap open of 13 and gap extend of 2. Swipe returns a raw score of 41, whereas swimd returns 43.
BTW: I've compared the scoring matrices and they are identical.
The text was updated successfully, but these errors were encountered: