# 1. Source

Click on the link to go to the source web page of **Rosalind**: [Transcribing DNA into RNA](https://rosalind.info/problems/rna/)

**Problem**

![Transcribing DNA into RNA!](rna_problem.png 'Transcribing DNA into RNA')

**Sample Dataset**

GATGGAACTTGACTACGTAAATT

**Sample Output**

GAUGGAACUUGACUACGUAAAUU

### Sample Dataset Workspace

In [1]:
# read the given dataset from the file

with open('rna_test.txt', 'r') as file:
    dnaSeq = file.read().strip().upper()
    
# print what we have just read

print(dnaSeq)

GATGGAACTTGACTACGTAAATT


In [2]:
# we can simply replace all Ts by Us with python's .replace() function

rnaSeq = dnaSeq.replace('T', 'U')

# check rnaSeq

print(rnaSeq)

GAUGGAACUUGACUACGUAAAUU


In [3]:
# since there are many bases it is better no to trust manual eye check
# copy paste the sample output given by rosalind
# and perform a simple equality check

sample_output = 'GAUGGAACUUGACUACGUAAAUU'

rnaSeq == sample_output

True

In [4]:
# again we can handle the problem with a for-loop
# which will not be so efficient as the dnaSeq length increases

rnaSeq = ''

for base in dnaSeq:
    if base == 'T':
        rnaSeq += 'U'
    else:
        rnaSeq += base
        
print(rnaSeq)

GAUGGAACUUGACUACGUAAAUU


In [5]:
# or for just a brainstorm
# we can create a dictionary
# so, we can avoid the if-else block using that dictionary

dna_to_rna_dict = {
    'A': 'A', 'T': 'U', 'C': 'C', 'G': 'G'
}

# we can use this dictionary to populate the new rnaSeq

rnaSeq = ''

for base in dnaSeq:
    rnaSeq += dna_to_rna_dict[base]
    
# print

print(rnaSeq)

GAUGGAACUUGACUACGUAAAUU


In [6]:
# however python's built-in function: .replace() works well
# the other 2 options are not efficient
# we can perform a simple speed test to show that

### --A Simple Speed Test

In [7]:
# increase the size of dna sequence to see the differences (if there is any) easier

print('initial dnaSeq length:', len(dnaSeq))
dnaSeq *= 10000
print('final dnaSeq length:', len(dnaSeq))

initial dnaSeq length: 23
final dnaSeq length: 230000


In [8]:
# option 1

In [9]:
%%timeit -n 500

rnaSeq = dnaSeq.replace('T', 'U')

102 µs ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 500 loops each)


In [10]:
# option 2

In [11]:
%%timeit -n 500

rnaSeq = ''

for base in dnaSeq:
    if base == 'T':
        rnaSeq += 'U'
    else:
        rnaSeq += base

23.2 ms ± 179 µs per loop (mean ± std. dev. of 7 runs, 500 loops each)


In [12]:
# option 3

In [13]:
%%timeit -n 500

dna_to_rna_dict = { 'A': 'A', 'T': 'U', 'C': 'C', 'G': 'G' }

rnaSeq = ''

for base in dnaSeq:
    rnaSeq += dna_to_rna_dict[base]

18 ms ± 61.2 µs per loop (mean ± std. dev. of 7 runs, 500 loops each)


In [14]:
# as we see the first option is way better than other two

# 3. Implementation

In [15]:
def rna(filename):
    
    '''
    input
        a file contains a dna string
    process
        replaces each T by U, keeps rest as same
    returns
        writes and saves new rna sequence as answer in a file
        prints answer to console
    '''
    
    # open and read the file
    with open(filename, 'r') as file:
        dnaString = file.read()
        
    # generate the rna string
    rnaString = dnaString.replace('T', 'U')
    
    # print answer to console
    print('\n\x1B[1mANSWER\x1B[0m\n______\n')
    print(f'{rnaString}')
    
    # open file and write answer
    file = open(f'{filename.split(".")[0]}_answer.txt', 'w')
    file.write(f'{rnaString}')
    file.close()
    print('\n\n#! The answer has been written into the file:',
          f'\x1B[1m./{filename.split(".")[0]}_answer.txt\x1B[0m\n')

# 4. Execution

In [16]:
rna('rna_test.txt')


[1mANSWER[0m
______

GAUGGAACUUGACUACGUAAAUU


#! The answer has been written into the file: [1m./rna_test_answer.txt[0m



In [17]:
rna('rosalind_rna.txt')


[1mANSWER[0m
______

UGCGGAGGGAGACGCGGAGAAUGUAGGAGGCAGGCGUAUGAAGUUGCAUGGACCGGUGAACAGCGAACGUCGUCAAUUAUUGAAACGAGAUCUCUAAAGGCCUCAUUUAAGCUACAUGGAUUCACGACUCAAGCGGUUCGAUCUGCACCGACUGGGUACGGUCGGUACUUUGUAGCAUCCAGGUCACGAAAGCGUUCCGACUGCUAUGUAUGAAAAUGAGGGGUUUACUUUAGUAGUAAAGUCGGACGGUAGGUUGUCCCUCUUCACUCAUGUAUCUGGGAUUUACCGGGUAACUCGCCAACAUACGAAUCUGCGCACGACUGCGCGCUCGGCAAGCAUUAUCCUCGAGACUAGAGGCGAACAGCUCUCAACUGAACCAUUGGGGACGAUUACCGUACGGCAUGCUGGUGCUAUCUCAGUAUAUAGACCCGGAGAUCCCUUGGGGCAAUGGUUGGCCUUCGAAGGCCGGUAUUGAGCAGGAUCCCGAAUCCUAACCAGUAUUCCAUCUUUUCUCAGUUAAACAAGUCCAGGCGAGAACCCCUUGAGGCAUAUAGCUGCUGUAAAUCCUGUUACCUACAAUCCAUGCAAAGCUAGCCGGUCGAUGAUAUGCAGAACUCCUUCGAAAUGACAUGUCUGUAGAGCUAUCCCCUCACGUGCUAUCUGUGACGUGAGCUAAAGUUUAGUACGCUUAUCGGGGCUAGGGCGCCAGCGAGGGGAUUGCCUGCUAACUUGAAGUCCCGUUAAUAGACUCUAGUUUAGUCAUAGCGAGGACAAUCAAACGGUCCAGUUGUACUGCCUCUUCUGACUGGUACCGCCAUGGAAUACUGUUACAUAUCGAGUACCAGGAGUGUCGUAUAAUCUCGGAGGCGGGAUUGUAUCACUUUUGUUAAUUAUACACAUGAUCGCAUGGCAACGUAGCGGGUUCUGAUGCCAUCUUCCCAAU



#! The answer has been writte

<p style='text-align: right;'>
    <!--<b><font size = '5'>Contact</font></b><br>-->
    <b>Orcun Tasar</b><br>
    <i>Bioinformatician / Data Scientist</i><br>
    orcuntasar |at@| ogr.iu.edu.tr<br>
    tasar.orcun |at@| gmail.com<br>
    <a href = 'https://www.linkedin.com/in/orçun-taşar-7b5992a1/'>Linkedin</a> | <a href = 'https://www.instagram.com/shatranuchor/'>Instagram</a>
</p>