# Problem

After identifying the exons and introns of an RNA string, we only need to delete the introns and concatenate the exons to form a new string ready for translation.

**Given**: A DNA string s (of length at most 1 kbp) and a collection of substrings of ss acting as introns. All strings are given in FASTA format.

**Return**: A protein string resulting from transcribing and translating the exons of s. (Note: Only one solution will exist for the dataset provided.)

In [9]:
DNA_codon_table = {
    'TTT': 'F',     'CTT': 'L',     'ATT': 'I',     'GTT': 'V',
    'TTC': 'F',     'CTC': 'L',     'ATC': 'I',     'GTC': 'V',
    'TTA': 'L',     'CTA': 'L',     'ATA': 'I',     'GTA': 'V',
    'TTG': 'L',     'CTG': 'L',     'ATG': 'M',     'GTG': 'V',
    'TCT': 'S',     'CCT': 'P',     'ACT': 'T',     'GCT': 'A',
    'TCC': 'S',     'CCC': 'P',     'ACC': 'T',     'GCC': 'A',
    'TCA': 'S',     'CCA': 'P',     'ACA': 'T',     'GCA': 'A',
    'TCG': 'S',     'CCG': 'P',     'ACG': 'T',     'GCG': 'A',
    'TAT': 'Y',     'CAT': 'H',     'AAT': 'N',     'GAT': 'D',
    'TAC': 'Y',     'CAC': 'H',     'AAC': 'N',     'GAC': 'D',
    'TAA': '-',     'CAA': 'Q',     'AAA': 'K',     'GAA': 'E',
    'TAG': '-',     'CAG': 'Q',     'AAG': 'K',     'GAG': 'E',
    'TGT': 'C',     'CGT': 'R',     'AGT': 'S',     'GGT': 'G',
    'TGC': 'C',     'CGC': 'R',     'AGC': 'S',     'GGC': 'G',
    'TGA': '-',     'CGA': 'R',     'AGA': 'R',     'GGA': 'G',
    'TGG': 'W',     'CGG': 'R',     'AGG': 'R',     'GGG': 'G'
}
def readTab(infile): # read in txt file
    with open(infile, 'r') as input_file:
    # read in tab-delim text
        output = []
        for input_line in input_file:
            input_line = input_line.strip()
            temp = input_line.split('\t')
            output.append(temp)
    return output
def extract_fasta(fasta):
    sequences = {}
    headers = []
    flag = ""
    for i in fasta:
        if i[0].startswith(">"):
            headers.append(i[0])
            flag = i[0]
            sequences[flag] = ""
        else:
            sequences[flag] = sequences[flag] + i[0]
    return sequences, headers
def splice_introns(sequences, headers):
    DNA_string = sequences[headers[0]]
    for i in headers[1:]:
        DNA_string = DNA_string.replace(sequences[i],"")
    return DNA_string
def translateDNA_protein(sequence):
    protein = ""
    for i in range(0, len(sequence), 3):
        protein = protein+DNA_codon_table[sequence[i:i+3]]
    protein = protein.replace("-","")
    return protein

In [10]:
RNA_test, headers = extract_fasta(readTab("RNA_splicing_sample.fasta"))

In [11]:
print translateDNA_protein(splice_introns(RNA_test,headers))

MVYIADKQHVASREAYGHMFKVCA


In [12]:
RNA_final, headers = extract_fasta(readTab("rosalind_splc.txt"))
print translateDNA_protein(splice_introns(RNA_final,headers))

MYGLRRGDAPNPAGAKLPSYYSVYDCTAFLGSRLHKRLLPAPSLTVQLVLARVALSRGPHFVAKLHSRDLVPGKIITKSLYSGTSALVSKLDDSLTISAVLKVSYDSLLGSRVSILDIQMRLHDGSSGLRTKWILERDMKRLPISDVRTCVNMQLSWACRRLRLVPKRSDSRWPVMSSYIGPPSVCR
