# 1. Source

Click on the link to go to the source web page of **Rosalind**: [Complementing a Strand of DNA](https://rosalind.info/problems/revc/)

**Problem**

![Complementing a Strand of DNA!](revc_problem.png 'Complementing a Strand of DNA')

**Sample Dataset**

AAAACCCGGT

**Sample Output**

ACCGGGTTTT

# 2. Workspace

In [1]:
# read the given dna sequence from the file

with open('revc_test.txt', 'r') as file:
    dnaSeq = file.read().strip().upper()
    
# print what we have just read

print(dnaSeq)

AAAACCCGGT


In [2]:
# we can use .replace() built-in function to replace each base by their complements
# start with replaceing A by Ts
# replace As by ts (lowercase)
# otherwise (if we replace As with uppercase Ts) it will direct us wrong direction

revcSeq = dnaSeq.replace('A', 't')

print(revcSeq)

ttttCCCGGT


In [3]:
# with the same strategy,
# replace other bases

revcSeq = revcSeq.replace('T', 'a')
revcSeq = revcSeq.replace('C', 'g').replace('G', 'c')

print(revcSeq)

ttttgggcca


In [4]:
# since this is a reverse complement we need to arrange the sequence in reverse order
# and after that we can make all bases uppercase again

revcSeq = revcSeq[::-1].upper()

print(revcSeq)

ACCGGGTTTT


In [5]:
# copy paste the sample output given by rosalind
# and perform a simple equality check to see if the result is correct

sample_output = 'ACCGGGTTTT'

revcSeq == sample_output

True

In [6]:
# we do not need to write the same replace command again and again
# if we can create a dictionary for base convertion, we can write the command
# with a less effort and then also we will not have to deal with uppercases / lowercases

# create the dictionary
transcriptionDict = {
    'A': 'T', 'G': 'C', 'C': 'G', 'T': 'A', 'N': 'N' # to make more realistic, we can add unknown bases
}

# initiate an empty rna sequence
revcSeq = ''

# loop over the dna sequence and populate revc sequence
for base in dnaSeq[::-1]: # we can arrange the dna sequence in the reverse order at the beginning
    revcSeq += transcriptionDict[base]
    
print(revcSeq)

ACCGGGTTTT


In [7]:
# we can avoid arrange the dna or the revc string in the reverse order at the beginning or at the end
# if we loop dna string directly in the reverse order
# for the longer dna strings, the runtime may be shorter than the previous - also may be not!

transcriptionDict = {
    'A': 'T', 'G': 'C', 'C': 'G', 'T': 'A', 'N': 'N' 
}

revcSeq = ''

for i in range(len(dnaSeq) - 1, -1, -1):
    revcSeq += transcriptionDict[dnaSeq[i]]
    
print(revcSeq)

ACCGGGTTTT


In [8]:
# apply the previous method to list comprehension

transcriptionDict = {
    'A': 'T', 'G': 'C', 'C': 'G', 'T': 'A', 'N': 'N' 
}

revcSeq = ''.join([transcriptionDict[dnaSeq[i]] for i in range(len(dnaSeq) - 1, -1, -1)])

    
print(revcSeq)

ACCGGGTTTT


In [9]:
# on the other hand, we can use biopython library

from Bio.Seq import Seq

dnaSeq = Seq(dnaSeq)

print(dnaSeq)

AAAACCCGGT


In [10]:
# look into the Seq object without print

dnaSeq

Seq('AAAACCCGGT')

In [11]:
# we can eaasily create a reverse complement using biopython .reverse_complement() method

revcSeq = dnaSeq.reverse_complement()

print(revcSeq)

ACCGGGTTTT


In [12]:
revcSeq

Seq('ACCGGGTTTT')

In [13]:
# perform a simple speed test to see which option is better

### --A Simple Speed Test

In [14]:
# increase the size of dna sequence to see the differences (if there is any) easier

dnaSeq = 'AAAACCCGGT'
print('initial dnaSeq length:', len(dnaSeq))
dnaSeq *= 100000
print('final dnaSeq length:', len(dnaSeq))

initial dnaSeq length: 10
final dnaSeq length: 1000000


In [15]:
# option 1

In [16]:
%%timeit -n 500

revcSeq = dnaSeq.replace('A', 't').replace('T', 'a').replace('G', 'c').replace('C', 'g')
revcSeq = revcSeq[::-1].upper()

3 ms ± 68.5 µs per loop (mean ± std. dev. of 7 runs, 500 loops each)


In [17]:
# option 2

In [18]:
%%timeit -n 500

transcriptionDict = {
    'A': 'T', 'G': 'C', 'C': 'G', 'T': 'A'
}

revcSeq = ''

for base in dnaSeq[::-1]:
    revcSeq += transcriptionDict[base]

86.3 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 500 loops each)


In [19]:
# option 3

In [20]:
%%timeit -n 500

transcriptionDict = {
    'A': 'T', 'G': 'C', 'C': 'G', 'T': 'A'
}

revcSeq = ''

for i in range(len(dnaSeq) - 1, -1, -1):
    revcSeq += transcriptionDict[dnaSeq[i]]

113 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 500 loops each)


In [21]:
# option 4

In [22]:
%%timeit -n 500

transcriptionDict = {
    'A': 'T', 'G': 'C', 'C': 'G', 'T': 'A'
}

revcSeq = ''.join([transcriptionDict[dnaSeq[i]] for i in range(len(dnaSeq) - 1, -1, -1)])

64.5 ms ± 2.16 ms per loop (mean ± std. dev. of 7 runs, 500 loops each)


In [23]:
# option 5

In [24]:
%%timeit -n 500

transcriptionDict = {
    'A': 'T', 'G': 'C', 'C': 'G', 'T': 'A'
}

revcSeq = ''.join([transcriptionDict[base] for base in dnaSeq])

42.9 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 500 loops each)


In [25]:
# option 6

In [26]:
%%timeit -n 500

from Bio.Seq import Seq

seqDna = Seq(dnaSeq)
revcSeq = seqDna.reverse_complement()

1.31 ms ± 6.53 µs per loop (mean ± std. dev. of 7 runs, 500 loops each)


In [27]:
# looping over the input dna sequence gets really worse as its length increases
# looping the dna sequence directly in the reverse order did not provide any improvement
# list comprehension is better than looping over dna sequence in a straightforward way
# but at the end of the day all loops gave worse results comparing with the others: 1 and 6
# the biopython module gave the best result
# python's built-in .replace() func's runtime is closer to biopython .reverse_complement() func runtime

In [28]:
# here, let's implement the solution using biopython module

# 3. Implementation

In [29]:
def revc(filename):
    
    '''
    input
        a file contains a dna string
    process
        generates a reverse complement of given dna sequence
    output
        writes and saves new rev com sequence as an answer into a file
        prints answer to console
    '''
    
    # load the necessary biopyhton module
    from Bio.Seq import Seq
    
    # open and read the file
    with open(filename, 'r') as file:
        dnaString = file.read().strip().upper()
        
    # convert dna sequence into a Seq object
    dnaSeq = Seq(dnaString)
    
    # create reverse complement
    revcSeq = dnaSeq.reverse_complement()
    
    # print answer to console
    print('\n\x1B[1mANSWER\x1B[0m\n______\n')
    print(f'{revcSeq}')
    
    # open file and write answer
    file = open(f'{filename.split(".")[0]}_answer.txt', 'w')
    file.write(f'{revcSeq}')
    file.close()
    print('\n\n#! The answer has been written into the file:',
          f'\x1B[1m./{filename.split(".")[0]}_answer.txt\x1B[0m\n')

# 4. Execution

In [30]:
revc('revc_test.txt')


[1mANSWER[0m
______

ACCGGGTTTT


#! The answer has been written into the file: [1m./revc_test_answer.txt[0m



In [31]:
revc('rosalind_revc.txt')


[1mANSWER[0m
______

TGTATAGTCTACTTGATGGTCGAGGTTTCATTTGGCATTTCATTGACTATAACCTAAGGAGAGGTAGGCGAGATTTGTCCCGCTAACTTTATTCCGTAACATTTTCTACTGGGCTCCATCTCAGCGGTTTATAAACAACGCGTTCCCTAGTTAGAGTACCCAGCACCTTGATATGTGCCGGGACGGGTTGGGCCATCCACCAAACCGTATGATACCGAGTTGGATTCCAACACGTAGCAACATTGAGGGTATGTCTCCAGGTAGTCGCCTCAGGAGGTGTGAATATAATGGTGTGTGTTCATCCTTTCGAAAGGCTAACGCCTGAACTGAAGCTCTCCAGCACCGGAAGGGTTGGAGTACGATTTCCATGATGTAAATTATTCCTTCGGGTCTTCCTAAAATAGCGGCGGGCTAGTGACATCGTTGCCCATAGCGCTCCTAAATTATACTTAATCAACTAAAGCTCGAGGTTAAATAAAAGAGGAACTCGATCCATGATGGACTCCGGAGTTACGGCTGGGATGCAGAAAAAGGTCGTTCGCCCGGTAAGACTCATGATGCAGCAGGGGCCACTACAATCCTAAGGTTAGGGGTGGAGGCATAAACGTGTCGTGGTGGCCCCCTATCCCACTTAGGCTGGTGATAATATAGAGTTATCCTTGTCAGGATCCAAAACGCGATAAACTCCTGGAAAGGTGGTGACCATTCCGGCTAAGCAGTATCTACACACCATCTTCGGCACGAGACAGGATCCACGCCATACCTTCGCCCGTGGCACGTTGTTTAACTCATGTACTAGATTT


#! The answer has been written into the file: [1m./rosalind_revc_answer.txt[0m



<p style='text-align: right;'>
    <!--<b><font size = '5'>Contact</font></b><br>-->
    <b>Orcun Tasar</b><br>
    <i>Bioinformatician / Data Scientist</i><br>
    orcuntasar |at@| ogr.iu.edu.tr<br>
    tasar.orcun |at@| gmail.com<br>
    <a href = 'https://www.linkedin.com/in/orçun-taşar-7b5992a1/'>Linkedin</a> | <a href = 'https://www.instagram.com/shatranuchor/'>Instagram</a>
</p>