# Reverse Complement DNA Sequence in Python

To [reverse complement](https://www.bioinformatics.org/sms/rev_comp.html) a DNA sequence, the sequence (i.e., the `string`) needs to be:
1. reversed (i.e., 3' ➡ 5')
2. complemented, i.e., 
 + A ➡ T
 + C ➡ G
 + G ➡ C
 + T ➡ A

---
## Method 1: Using the `string` `replace` method

### Step 1: Create sequence
Assign the DNA sequence to a `string` variable:

In [1]:
dna = 'ACGGGAGGACGGGaaaattACTACGGCATTAGC'
print(dna)

ACGGGAGGACGGGaaaattACTACGGCATTAGC


Convert the dna sequence to upper case just to be sure is that the string is all upper case. Notice that there are small t's in the middle.

In [2]:
dna_upper = dna.upper()
print(dna_upper)

ACGGGAGGACGGGAAAATTACTACGGCATTAGC


### Step 2: Reverse sequence

In [3]:
rev = dna_upper[::-1] # The -1 is the step and it is negative to read the string from the opposite direction
print(rev)

CGATTACGGCATCATTAAAAGGGCAGGAGGGCA


Basically, we need to replace the four nucleotides to their complements, similar to the following code:

```
somestring.replace("A", "T")
somestring.replace("C", "G")
somestring.replace("G", "C")
somestring.replace("T", "A")

```
However, the 3rd and the 4th replace statements will **undo** the replacements from the 1st and 2nd replace statements. Therefore, the trick is simply to replace to entirely new characters so that we would not overwrite (undo) our changes. Those entirley new characters will be the lower case version of the complements i.e.,

```
somestring.replace("A", "t")
somestring.replace("C", "g")
somestring.replace("G", "c")
somestring.replace("T", "a")

```

### Step 3: Complement the "reversed" sequence

In [4]:
revcomp1 = rev.replace("A", "t")
print(revcomp1)

CGtTTtCGGCtTCtTTttttGGGCtGGtGGGCt


In [5]:
revcomp1 = revcomp1.replace("C", "g")
print(revcomp1)

gGtTTtgGGgtTgtTTttttGGGgtGGtGGGgt


In [6]:
revcomp1 = revcomp1.replace("G", "c")
print(revcomp1)

gctTTtgccgtTgtTTttttcccgtcctcccgt


In [7]:
revcomp1 = revcomp1.replace("T", "a")
print(revcomp1)

gctaatgccgtagtaattttcccgtcctcccgt


In [8]:
print ("Original DNA:\t\t\t" + dna)
print ("Upper Case DNA:\t\t\t" + dna_upper)
print ("Reversed DNA:\t\t\t" + rev)
print ("Reversed Complemented DNA:\t" + revcomp1)

Original DNA:			ACGGGAGGACGGGaaaattACTACGGCATTAGC
Upper Case DNA:			ACGGGAGGACGGGAAAATTACTACGGCATTAGC
Reversed DNA:			CGATTACGGCATCATTAAAAGGGCAGGAGGGCA
Reversed Complemented DNA:	gctaatgccgtagtaattttcccgtcctcccgt


---
## Method 2: Using the `string` `translate` method

Using the convenient [`translate`](https://docs.python.org/2/library/stdtypes.html?highlight=translate#str.translate) method, we can define a *translation table* with [`maketrans`](https://docs.python.org/2/library/string.html#string.maketrans), in which each nucleotide will be translated into its complement, for example, `str.maketrans("ACTG", "TGAC")`

Using the reversed sequence `rev` from above, the reverse complement will be the output of the translation the reversed sequence ... voilà 🤗

In [9]:
revcomp2 = rev.translate(str.maketrans("ACTG", "TGAC"))
print(revcomp2)

GCTAATGCCGTAGTAATTTTCCCGTCCTCCCGT


In [10]:
print ("Original DNA:\t\t\t" + dna)
print ("Upper Case DNA:\t\t\t" + dna_upper)
print ("Reversed DNA:\t\t\t" + rev)
print ("Reversed Complemented DNA:\t" + revcomp2)

Original DNA:			ACGGGAGGACGGGaaaattACTACGGCATTAGC
Upper Case DNA:			ACGGGAGGACGGGAAAATTACTACGGCATTAGC
Reversed DNA:			CGATTACGGCATCATTAAAAGGGCAGGAGGGCA
Reversed Complemented DNA:	GCTAATGCCGTAGTAATTTTCCCGTCCTCCCGT


---
## Method 3: Using looping and list

This is an overkill for what needs to be accomplished but the goal here is to demonstrate working with loops and handling lists in Python.

First, we convert our DNA sequence `string` into a `list`:

In [11]:
seq = list(dna)
print(seq)

['A', 'C', 'G', 'G', 'G', 'A', 'G', 'G', 'A', 'C', 'G', 'G', 'G', 'a', 'a', 'a', 'a', 't', 't', 'A', 'C', 'T', 'A', 'C', 'G', 'G', 'C', 'A', 'T', 'T', 'A', 'G', 'C']


Notice how the above print of the sequence `list` is different from the previous print of the sequence `string`

Then, we define an **empty** `list`, where we will be storing the reverse complement in a loop, one nucleotide at a time.

In [12]:
revcomp3 = [] # An empty list

Now, we loop over  `seq` (`list`) from the end back to the beginning (i.e., in the reverse direction). For each nucleotide, we compute the complement then append at the end of the growing list `revcomp`

In [13]:
for i in range(len(seq) -1, -1, -1):
  nucleotide = seq[i]
  if (nucleotide == "A" or nucleotide == "a"):
    complement = "T"
  elif (nucleotide == "T" or nucleotide == "t"):
    complement = "A"
  elif (nucleotide == "G" or nucleotide == "g"):
    complement = "C"
  elif (nucleotide == "C" or nucleotide == "c"):
    complement = "G"
  else:
    complement = "N" # Just in case there is something other than A, C, T, or G.
  
  revcomp3.append(complement)

In [14]:
print (revcomp3)

['G', 'C', 'T', 'A', 'A', 'T', 'G', 'C', 'C', 'G', 'T', 'A', 'G', 'T', 'A', 'A', 'T', 'T', 'T', 'T', 'C', 'C', 'C', 'G', 'T', 'C', 'C', 'T', 'C', 'C', 'C', 'G', 'T']


Finally, we convert the `list` into a `string` using `join`

In [15]:
"".join(revcomp3)

'GCTAATGCCGTAGTAATTTTCCCGTCCTCCCGT'

In [16]:
print ("Original DNA:\t\t\t" + dna)
print ("Upper Case DNA:\t\t\t" + dna_upper)
print ("Reversed DNA:\t\t\t" + rev)
print ("Reversed Complemented DNA:\t" + "".join(revcomp3))

Original DNA:			ACGGGAGGACGGGaaaattACTACGGCATTAGC
Upper Case DNA:			ACGGGAGGACGGGAAAATTACTACGGCATTAGC
Reversed DNA:			CGATTACGGCATCATTAAAAGGGCAGGAGGGCA
Reversed Complemented DNA:	GCTAATGCCGTAGTAATTTTCCCGTCCTCCCGT


---
##  Method 4: Using  Biopython


[Biopython](https://biopython.org/) is a Python module with rich bioinformatics functionalities. Probably, this is the neatest approach to reverse complement a DNA sequence (or generally to handle molecular sequences in Python)

We first need to install the [Biopython](https://biopython.org/) module (if it is not already installed). In the command line window, the installation can be done using the following command:

`pip install biopython`.


Without having to actually switch to the shell (terminal) to install, in the `Notebook` environment, that command be elgantaly perofmred using the following command

In [17]:
!pip install biopython

Collecting biopython
  Downloading biopython-1.77-cp37-cp37m-manylinux1_x86_64.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 2.1 MB/s eta 0:00:01
Installing collected packages: biopython
Successfully installed biopython-1.77
You should consider upgrading via the '/opt/venv/bin/python -m pip install --upgrade pip' command.[0m


Then all we need to do is to:
1. Import the `Seq` which is basically a `string` with biological methods
2. Create a `Seq` object based on the original DNA  `string`
3. Perform the reverse complement using the `reverse_complement` function

In [18]:
from Bio.Seq import Seq
seq = Seq(dna)
revcomp4 = seq.reverse_complement()

print ("DNA Original:\t\t\t" + seq)
print ("Reversed Complemented DNA:\t" + revcomp4)


DNA Original:			ACGGGAGGACGGGaaaattACTACGGCATTAGC
Reversed Complemented DNA:	GCTAATGCCGTAGTaattttCCCGTCCTCCCGT


---