# RNA splicing
https://rosalind.info/problems/splc/

### Practice
After identifying the exons and introns of an RNA string, we only need to delete the introns and concatenate the exons to form a new string ready for translation.
- Given: a DNA string `s` and a collection of substrings acting as introns. All strings and substrings are given in FASTA format.
- Return: a protein string resulting from transcribing and translating the exons of `s`. 

In [2]:
import util
from util import codon_dict

In [2]:
file_path = 'rosalind_splc_test.txt'
fasta = util.read_fasta(file_path)

#### First we want to extrapolate the strings and substrings

In [3]:
fasta_list = list(fasta.values())
s = fasta_list[0]
introns = [fasta_list[1], fasta_list[2]] #this we will use to find the introns

In [4]:
print(s)
print(introns)

ATGGTCTACATAGCTGACAAACAGCACGTAGCAATCGGTCGAATCTCGAGAGGCATATGGTCACATGATCGGTCGAGCGTGTTTCAAAGTTTGCGCCTAG
['ATCGGTCGAA', 'ATCGGTCGAGCGTGT']


#### I then create a list of lenghts to loop through

In [5]:
lengths = []
for item in introns:
    lengths.append(len(item))
print(lengths)

[10, 15]


#### Putting all together

In [6]:
exons = s 
for i in range(len(exons)):
    curr_s = exons
    for length in lengths:
        for intron in introns:
            if curr_s[i:i+length] == intron:
                exons = curr_s[:i] + curr_s[i+length:]
                curr_s = exons

In [7]:
rna = exons.replace('T', 'U')

In [8]:
counter = 0
aa = ""
for codon in rna:
    triplet = rna[counter:counter+3]
    if len(triplet) < 3:
        break
    elif codon_dict[triplet] == 'None':
        continue
    else:
        aa = aa + codon_dict[triplet] 
        counter = counter + 3
print(aa)

MVYIADKQHVASREAYGHMFKVCA


#### Creating a function

In [7]:
def exons_splicing(string):
        
    exons = s 
    for i in range(len(exons)):
        curr_s = exons
        for length in lengths:
            for intron in introns:
                if curr_s[i:i+length] == intron:
                    exons = curr_s[:i] + curr_s[i+length:]
                    curr_s = exons
    
    rna = exons.replace('T', 'U')
    counter = 0
    aa = ""
    for codon in rna:
        triplet = rna[counter:counter+3]
        if len(triplet) < 3:
            break
        elif codon_dict[triplet] == 'None':
            continue
        else:
            aa = aa + codon_dict[triplet] 
            counter = counter + 3
    return(aa)

In [10]:
exons_splicing(s)

'MVYIADKQHVASREAYGHMFKVCA'

### Alternative 
The previous approach is a little convoluted and after looking a little more into details on the way the `.replace()` function work I have also decided to try to replace the introns directly in the string!

In [9]:
exons = s
for intron in introns: 
    exons = exons.replace(intron, "")

rna = exons.replace('T', 'U')

counter = 0
aa = ""
for codon in rna:
    triplet = rna[counter:counter+3]
    if len(triplet) < 3:
        break
    elif codon_dict[triplet] == 'None':
        continue
    else:
        aa = aa + codon_dict[triplet] 
        counter = counter + 3
print(aa)

MPSQFT


## Rosalind's problem 

In [12]:
file_path = 'rosalind_splc.txt'
fasta = util.read_fasta(file_path)

In [27]:
fasta_list = list(fasta.values())
s = fasta_list[0]
introns = fasta_list[1:]

In [31]:
lengths = []
for item in introns: # this had to be changed to include more introns!
    lengths.append(len(item))
print(lengths)

[30, 13, 22, 27, 28, 28, 50, 34, 46, 33, 10, 18, 22, 17, 18]


In [32]:
exons_splicing(s)

'MAVRGVTCRTDLSKSLPCRDRAQNTYSGYELELPYLTRNYETKYTLIPSILTFFGRRRGAGRLNTAIVNASSVMTTWLSGVTDTTRARLHAVTLTCASTTGIAVQRSSSLTQHDTRIENPIYPQVENEDSTGGFQQEGSPIRNIFYRREAFVRPAHVSTWYRAYGWLGNSVLTVWGQKLRDDPHNFGIAIAGH'