-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indels aren't calculated correctly #178
Comments
Hi, thanks |
Hi, In line 160/161 you say this: in line 163 you say Consequently in line 165 "mut_is_indel " will be always an empty list If you replace in line 161 sub_seq_j = seq_i.np_sequence[:length] with sub_seq_j = seq_j.np_sequence[:length] i think it will be work. |
Ah, didn't notice the j vs. i typo :)
good catch. thanks!
Will correct and issue a bug fix.
In general, i think it should not make a noticeable difference in the
results since there are not a lot of indels in illumina sequencing, and we
used a very high upper bound on the error probability just in case.
what happens with the current bug is that like you wrote, it will not
identify any indels, and therefore will count them as mismatches. when
using the default error profile and parameters, we will therefore get for
the correction value (i.e. upper bound on expected error reads fraction)
instead of 0.1:
0.06 for 1 indel
0.02 for 2/3 indels
and we will not have a cutoff for >3 indels (and instead use the cutoff for
number of mismatches)
since the 0.1 for indel is somewhat arbitrary and high, it should not have
an effect in most cases.
Let me know if you encounter a case where it made a difference (how did you
discover it?)
Thanks a lot
Amnon
…On Sat, Aug 4, 2018 at 1:53 PM MichaelRade ***@***.***> wrote:
Hi,
for example:
seq_i.sequence = ---GGAGGGT-----
seq_j.sequence = ---AGG-GCGG----
seq_i.np_sequence = [4 4 4 2 2 0 2 2 2 3 4 4 4 4 4]
seq_j.np_sequence = [4 4 4 0 2 2 4 2 1 2 2 4 4 4 4]
In line 160/161 you say this:
sub_seq_i = seq_i.np_sequence[:length]
sub_seq_j = seq_i.np_sequence[:length]
"sub_seq_i" and "sub_seq_j" assigned the same sequence. So "sub_seq_i ==
sub_seq_j"
Therefore:
sub_seq_i = [4 4 4 2 2 0 2 2 2 3 4 4 4 4 4]
sub_seq_j = [4 4 4 2 2 0 2 2 2 3 4 4 4 4 4]
in line 163 you say
mask = (sub_seq_i != sub_seq_j)
Because "sub_seq_i " and "sub_seq_j " are always equal all element in
"mask" have the boolean false.
Consequently in line 165 "mut_is_indel " will be always an empty list
mut_is_indel = np.logical_or(sub_seq_i[mask] == 4, sub_seq_j[mask] == 4)
If you replace in line 161 *sub_seq_j = seq_i.np_sequence[:length]* with *sub_seq_j
= seq_j.np_sequence[:length]* i think it will be work.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#178 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFkA8oAUvcDjZ7CV_EIA4qVmQ8sNM6z3ks5uNX1FgaJpZM4Vubpq>
.
|
Hi, |
Thank you @MichaelRade for reporting, and for a detailed example of the bug. I'm reopening the issue until a fix is in place. |
fix thru #179 |
Hi,
Indels aren't calculated correctly because
sub_seq_i
andsub_seq_j
are identical.deblurring.py
160
sub_seq_i = seq_i.np_sequence[:length]
161
sub_seq_j = seq_i.np_sequence[:length]
The text was updated successfully, but these errors were encountered: