Indels aren't calculated correctly #178

MichaelRade · 2018-08-03T18:30:42Z

Hi,
Indels aren't calculated correctly because sub_seq_i and sub_seq_j are identical.

deblurring.py
160 sub_seq_i = seq_i.np_sequence[:length]
161 sub_seq_j = seq_i.np_sequence[:length]

The text was updated successfully, but these errors were encountered:

amnona · 2018-08-04T08:39:09Z

Hi,
thanks for the feeback.
I didn't fully understand what you mean regarding the indel calculation problem.
can you please elaborate what exactly you think the problem is?
can you give an example where you think the indel calculation isn't correct (starting from the non-aligned sequences)?

thanks
Amnon

MichaelRade · 2018-08-04T10:53:56Z

Hi,
for example:
seq_i.sequence = ---GGAGGGT-----
seq_j.sequence = ---AGG-GCGG----
seq_i.np_sequence = [4 4 4 2 2 0 2 2 2 3 4 4 4 4 4]
seq_j.np_sequence = [4 4 4 0 2 2 4 2 1 2 2 4 4 4 4]

In line 160/161 you say this:
sub_seq_i = seq_i.np_sequence[:length]
sub_seq_j = seq_i.np_sequence[:length]
"sub_seq_i" and "sub_seq_j" assigned the same sequence. So "sub_seq_i == sub_seq_j"
Therefore:
sub_seq_i = [4 4 4 2 2 0 2 2 2 3 4 4 4 4 4]
sub_seq_j = [4 4 4 2 2 0 2 2 2 3 4 4 4 4 4]

in line 163 you say
mask = (sub_seq_i != sub_seq_j)
Because "sub_seq_i " and "sub_seq_j " are always equal all element in "mask" have the boolean false.

Consequently in line 165 "mut_is_indel " will be always an empty list
mut_is_indel = np.logical_or(sub_seq_i[mask] == 4, sub_seq_j[mask] == 4)

If you replace in line 161 sub_seq_j = seq_i.np_sequence[:length] with sub_seq_j = seq_j.np_sequence[:length] i think it will be work.

amnona · 2018-08-04T11:18:09Z

Ah, didn't notice the j vs. i typo :) good catch. thanks! Will correct and issue a bug fix. In general, i think it should not make a noticeable difference in the results since there are not a lot of indels in illumina sequencing, and we used a very high upper bound on the error probability just in case. what happens with the current bug is that like you wrote, it will not identify any indels, and therefore will count them as mismatches. when using the default error profile and parameters, we will therefore get for the correction value (i.e. upper bound on expected error reads fraction) instead of 0.1: 0.06 for 1 indel 0.02 for 2/3 indels and we will not have a cutoff for >3 indels (and instead use the cutoff for number of mismatches) since the 0.1 for indel is somewhat arbitrary and high, it should not have an effect in most cases. Let me know if you encounter a case where it made a difference (how did you discover it?) Thanks a lot Amnon

…

On Sat, Aug 4, 2018 at 1:53 PM MichaelRade ***@***.***> wrote: Hi, for example: seq_i.sequence = ---GGAGGGT----- seq_j.sequence = ---AGG-GCGG---- seq_i.np_sequence = [4 4 4 2 2 0 2 2 2 3 4 4 4 4 4] seq_j.np_sequence = [4 4 4 0 2 2 4 2 1 2 2 4 4 4 4] In line 160/161 you say this: sub_seq_i = seq_i.np_sequence[:length] sub_seq_j = seq_i.np_sequence[:length] "sub_seq_i" and "sub_seq_j" assigned the same sequence. So "sub_seq_i == sub_seq_j" Therefore: sub_seq_i = [4 4 4 2 2 0 2 2 2 3 4 4 4 4 4] sub_seq_j = [4 4 4 2 2 0 2 2 2 3 4 4 4 4 4] in line 163 you say mask = (sub_seq_i != sub_seq_j) Because "sub_seq_i " and "sub_seq_j " are always equal all element in "mask" have the boolean false. Consequently in line 165 "mut_is_indel " will be always an empty list mut_is_indel = np.logical_or(sub_seq_i[mask] == 4, sub_seq_j[mask] == 4) If you replace in line 161 *sub_seq_j = seq_i.np_sequence[:length]* with *sub_seq_j = seq_j.np_sequence[:length]* i think it will be work. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#178 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFkA8oAUvcDjZ7CV_EIA4qVmQ8sNM6z3ks5uNX1FgaJpZM4Vubpq> .

MichaelRade · 2018-08-04T11:55:14Z

Hi,
thanks. II'm going to check this out. Regarding your question: I've decided to read the scripts if I want to use it for my analyses. Let's see how long I can keep that up :)

wasade · 2018-08-04T18:29:41Z

Thank you @MichaelRade for reporting, and for a detailed example of the bug. I'm reopening the issue until a fix is in place.

RNAer · 2018-09-12T17:32:43Z

fix thru #179

MichaelRade closed this as completed Aug 4, 2018

wasade reopened this Aug 4, 2018

wasade mentioned this issue Aug 9, 2018

fix seq_i, seq_j type #179

Merged

RNAer closed this as completed Sep 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indels aren't calculated correctly #178

Indels aren't calculated correctly #178

MichaelRade commented Aug 3, 2018

amnona commented Aug 4, 2018

MichaelRade commented Aug 4, 2018

amnona commented Aug 4, 2018 via email

MichaelRade commented Aug 4, 2018

wasade commented Aug 4, 2018

RNAer commented Sep 12, 2018

Indels aren't calculated correctly #178

Indels aren't calculated correctly #178

Comments

MichaelRade commented Aug 3, 2018

amnona commented Aug 4, 2018

MichaelRade commented Aug 4, 2018

amnona commented Aug 4, 2018 via email

MichaelRade commented Aug 4, 2018

wasade commented Aug 4, 2018

RNAer commented Sep 12, 2018