Sequence Complement does not return the same results on Python3.3.3 as Python3.4 #518

pkMyt1 · 2015-04-16T16:21:25Z

When complementing DNA during a search for mutations I found that when run on Python 3.3.3 I got about 3% fewer hits than on Python 3.4 using Seq(str(ref_seq+alt_seq)).complement(). When I wrote my own simple code to generate the complement this difference went away. The error is rare and not random. I have not been able to capture the actual sequence that is doing this. The difference in the two are significant.

peterjc · 2015-04-16T16:39:55Z

This sounds very serious, but likely very difficult to identify without more information. Have you compared with Python 2.6 or 2.7? That might help work out if this is related to the string/unicode changes.

You said it is not random? Otherwise I might have guessed it was faulty RAM to blame.

pkMyt1 · 2015-04-16T16:56:12Z

Not random in that multiple runs always yield the same result. My code will not run on Python 2.x so I cannot test that without some gymnastics. I have been chasing this for about a week now. When I run the code block below with a ref_seq of say 10,000 nucleotides and an alt_seq of say 10,000 also and filtering them so I know ref != alt; Python 3.3.3 I might get a count of 1,000 and Python 3.4 it will be 1,300. If I make the complement myself like in the last code block the results are the same. Both Python 3.3 and 3.4 should treat strings/unicode the same.

list = ['CT', 'CA', 'TA', 'TC']
count = 0
comp = Seq(str(ref_seq+alt_seq)).complement()
for i in list:
   if comp == i:
       count +=1

def complement(seq):
    comp_seq = ''
    for i in range(len(seq)):
        if seq[i] is "G":
            nuc = "C"
        elif seq[i] is "A":
            nuc = "T"
        elif seq[i] is "T":
            nuc = "A"
        elif seq[i] is "C":
            nuc = "G"
        else:
            nuc = "N"
        comp_seq += nuc
    return comp_seq

pkMyt1 · 2015-04-16T18:52:06Z

I did some checking to try and find the root of the problem. I copied your code (shown below) to do the complementing and tested it outside of Biopython with both versions of Python. Worked like a charm. I also checked and the version test in your code and it is working fine. I am stumped at what else to check at this point.

    ambiguous_dna_complement = {
    "A": "T",
    "C": "G",
    "G": "C",
    "T": "A",
    "M": "K",
    "R": "Y",
    "W": "W",
    "S": "S",
    "Y": "R",
    "K": "M",
    "V": "B",
    "H": "D",
    "D": "H",
    "B": "V",
    "X": "X",
    "N": "N",
    }
    complement_mapping = ambiguous_dna_complement
    before = ''.join(complement_mapping.keys())
    after = ''.join(complement_mapping.values())
    before = before + before.lower()
    after = after + after.lower()
    ttable = str.maketrans(before, after)
    comp_seq = seq.translate(ttable)

    return comp_seq

peterjc · 2015-04-16T21:51:58Z

I was suggesting comparing Python 2 and Python 3 (just in case this is Python 3 specific which would be a useful clue).

etal · 2015-04-16T23:23:20Z

Can you verify that your Python 3.3.3 and Python 3.4 installations are both using the same version of Biopython? The Seq equality operation in Biopython recently changed semantics to compare string values instead of object identity.

Assuming that's all fine, then if is is being used for string comparison in some places (e.g. if seq[i] is "G"), then maybe Pythons 3.4 changed the way strings are interned. Using == or direct dictionary lookups should protect against that.

pkMyt1 · 2015-04-17T00:48:58Z

The Biopython versions are the same (1.65). I was using "is", did not know
about the possibility you brought up so I changed all "is" to "==". It did
not help for this issue but probably makes my code better [?]. If there is
anything else I can do to help on this let me know. I keep looking at it
on my end as well.

On Thu, Apr 16, 2015 at 7:23 PM, Eric Talevich notifications@github.com
wrote:

Can you verify that your Python 3.3.3 and Python 3.4 installations are
both using the same version of Biopython? The Seq equality operation in
Biopython recently changed semantics to compare string values instead of
object identity.

Assuming that's all fine, then if is is being used for string comparison
in some places (e.g. if seq[i] is "G"), then maybe Pythons 3.4 changed
the way strings are interned. Using == or direct dictionary lookups
should protect against that.

—
Reply to this email directly or view it on GitHub
#518 (comment).

peterjc · 2015-04-17T13:03:06Z

Good idea @etal on checking == versus is.

@pkMyt1 Would you be able to share the entire example (privately if need be)? Otherwise trying to cut this down to a small self contained example would be a sensible debugging step...

pkMyt1 · 2015-04-17T14:42:21Z

The program is woven together in a way that cutting it down would be a big
effort. I can send you a zipped file of the program with notes pointing to
the block that is calling Biopython. I would prefer to send it privately.
Unfortunately my smallest BAM is 1.8 Gb. This will not work on a generic
BAM. I have modified the sequence headers in the file. Do you have access
to the hg19.fa reference?

On Fri, Apr 17, 2015 at 9:03 AM, Peter Cock notifications@github.com
wrote:

Good idea @etal https://github.com/etal on checking == versus is.

@pkMyt1 https://github.com/pkMyt1 Would you be able to share the entire
example (privately if need be)? Otherwise trying to cut this down to a
small self contained example would be a sensible debugging step...

—
Reply to this email directly or view it on GitHub
#518 (comment).

peterjc · 2018-07-04T15:10:16Z

I'm going to close this issue now - I hope the problem went away, but without more information I don't see what else we can do.

peterjc closed this as completed Jul 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence Complement does not return the same results on Python3.3.3 as Python3.4 #518

Sequence Complement does not return the same results on Python3.3.3 as Python3.4 #518

pkMyt1 commented Apr 16, 2015

peterjc commented Apr 16, 2015

pkMyt1 commented Apr 16, 2015

pkMyt1 commented Apr 16, 2015

peterjc commented Apr 16, 2015

etal commented Apr 16, 2015

pkMyt1 commented Apr 17, 2015

peterjc commented Apr 17, 2015

pkMyt1 commented Apr 17, 2015

peterjc commented Jul 4, 2018

Sequence Complement does not return the same results on Python3.3.3 as Python3.4 #518

Sequence Complement does not return the same results on Python3.3.3 as Python3.4 #518

Comments

pkMyt1 commented Apr 16, 2015

peterjc commented Apr 16, 2015

pkMyt1 commented Apr 16, 2015

pkMyt1 commented Apr 16, 2015

peterjc commented Apr 16, 2015

etal commented Apr 16, 2015

pkMyt1 commented Apr 17, 2015

peterjc commented Apr 17, 2015

pkMyt1 commented Apr 17, 2015

peterjc commented Jul 4, 2018