# Problem 5: Computing GC Content

The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

Sample Dataset
>\>Rosalind_6404 \
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG \
>\>Rosalind_5959 \
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTTGTCAGCAGACACGC \
>\>Rosalind_0808 \
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT

Sample Output
>Rosalind_0808
60.919540

In [41]:
from collections import Counter
def processFASTA_GC(fastaString):
    fastaSplit = fastaString.split("\n") # split the fasta at newline
    currentHighestName = ""
    currentHighestScore = 0
    for i in range(0,len(fastaSplit),2):# iterate through the sequence names
        temp_name = fastaSplit[i].replace(">","")
        temp_score = (Counter(fastaSplit[i+1])['C'] + Counter(fastaSplit[i+1])['G'])/len(fastaSplit[i+1]) * 100 # count and calculate percentage
        
        if temp_score > currentHighestScore: # check against current highest percentage
            currentHighestName = temp_name
            currentHighestScore = temp_score
    return print(currentHighestName, currentHighestScore)

In [42]:
testFastaString = """>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT"""

In [43]:
processFASTA_GC(testFastaString)

Rosalind_0808 60.91954022988506


In [44]:
actualFastaString = """>Rosalind_0878
TTCATGGCCAGGTATTGGCGCGCTTGGTAGGAAATCCGGGACACAATGTTGCAGTCAGTTGGCTAGAAGAAGACCCACTGGTGAAGAGAAAACTTATTATCCTTCTTACCGGTTTCCAGATCATTACAACATACTTACATAGGCGCCTCACACCAATGCCAGATTCCAAAAATTTTATGGATCTCTTCAATGAGGGCTTCACGAGGAGCCATCGCGGTAATAACCAGCCGGTAAGTGCACGGTGAGGGCGCGAGGTTGAGTCTTCCAATAGCCCCTCCGTGATATACACAAGCGCGGAGTGATGCGGTATGAACCGACCTAGTGACGGACACCAACCGCCCTCACCTAGGGCGTAGTTACATAATCCTTATGAAGGGAAGTGGGCGGCTATATGCCCTATCCTTCCTCCAATGATAGCCCCTGATCTGGCGTGGGAGTGGTTTGCAGGTTACACAGTGTTATACCCGCTCCTGGGATGTCACTCGCGAGGGTACCAACGACTGAACGTAATCCTAGGCTTGTCCGCCAATCGTGACCCGGCGCGGTCAGCTATCCAGACGAATGACCTGGTAAATCAGGCAGTGATACCAGCGCGGTCAATCGTCGTGCCGACTGATTTAGTCATTCCTTCATAATGGCAGCATCATCGTCTTGACAGAACCTGAAAACAGAATCGAACCGGACTATCCATAGAGGCCATTAACTGAATAATTAACCGACCCGACCCTACCCAAATAGTTGAGTGTATCGGCGCCGCGTGACTTCTACATCACATTGCGTTTCCCTAGTGCGAATTCGTGTCGTAAGCGTCTCGCGCCACCGCTGAGAGGGCTGTAGTACAGCGGGGTTCGTTATACGTTGGCCTCGTTTTCCAAGGCATTACTCTATGATGAA
>Rosalind_9865
AGCTTTCACTTCAAGAGTCCTGTCGGGATGCAGGTCTGGCGGTTAGTTTGTGAACCCCAGAGCATTATAGCTCTTTAAATCGGTAGTTCGTTCAAGACTACGCTAAAGTCAAGAACTGGTGTGATGATCCGAATGCGCGGGCCATTGGGCACTCGTAGAATCTCGTGGACCCCATCGAACCGACCTGATGGCAAGCTCCTCGCAACGTCCAAGTTCGTACTGGCCGACAGACGTAGCAAATCAACTCAACACGGTGTATCCCACCTGCTTCGAACCTCTATAAATCTGGCGGGCCAATTTAAACCTGCGGACACTGGGAAGCTGGCTTTGTTACAATAAACTGTACTGATGTCTGCGGAAGATGTACGTTCTCCCTAATGTTACCTAAAAGCCGTGGCGTCTTAGCTGGCGGTTACGGCTTTGAGCGCTTCTCGACCCCGTGCTAACTGATAGAGGACATGCTATTTAGCAAAATATACTCGCTGCCGGCATTCTTAACAAAGCCCAGATGACAAGACGTATCATCTTAGTTGAATAACTGAACTCATCCCATTAATTGAGTTGCGGTGTCATTGATTGTGTGACTAACAAAATCTTGAAACCGAGTGTATTAGAGAGGGGGCCCCGCCTGACACAAGTACCCCATCGGTCTGCCGGGGGCCCTACTCGGGATAGCGTAACTCGGGGGGTTACCCATCGGTTGGCACGCACATTCAATTGATAAGCCCCGTGTGTCTACACTTTGCCTAGTTACGATCGTGTGGGGGCCATCTGTGCGAACGCATAAAAACAGAGGGACT
>Rosalind_7760
GAAGTACATGGTTGACACTTGGGGGACTAATGGATCTAGACGTCTATCTCCTATACATCCTAGATCTAAGTTAGTATTTAGCAGCAATGTGGTATAACGGAAGAGGTGTGGCACATAACACTACCACCTGAAATCACACAAAACTCTAAGGGCCAATCCACCTTCATATTAAGAGCAATCGGCCGTCATGAGAATTTACCAGCCCTATGGCGGATCACCTGCCGGATTGTCTCGGGGCCCATAGGTACGAAACAAAAGGATCAACGGGGTTCATCGCTAATCCTTTCTGCTTCCATAGCTATGGTTCGTGTCATCAGATAGACGGCACTGATGCGTCCTAACGCTAGCTTATTGGCCTGCAAGGGCTTACAGCTTCGGCCGCTTGGGGTGAGGAATACCTCGCACCGATATATGAGGCTCGATCGTTCCGTATGTGAGCCGAACCGTTTTTGCGTAGACGAGTGGCGCACTCAGCATAACCTTTTACGACTTGCACTTGAGGAGTGGGTCGTTCTGAAGTCGAGAAGAGGACTCAGCCGGCTGGATGAATCACTTGAATTCGACGAAAAAATCTCAAGTCCAGCAGACTTCATTCCTAGTATTCGAAGATAATGCTCGGAGACTTACGAGCATCCCTGCGAACACCGTTGCAAATGGAATAAGCTTCATCCTACGTTCTAACTCTGATGAAACTAGTGTCGAGCATAACTTTCGGATCTAGGGTTCTACAAAACAAATCCCTTAAGCCGCGTACTACTACATAGACGTCACTTCGGGAGGTTGCCGCGAGCCATCCATTTGGGAAGAACAGGAATTTGCAAAGTGTACCTAGATGGTGACGCAAGATTTCCATGGCGGTGCTGTTATCGCTATAATGAGTGGGTGACTTTTCCAAT
>Rosalind_0952
GCAATCTTGCCTATAACTCATGGGAAAATCCGCTGGCTTCAAGATTTAGCTTTCCGAAGTACCAGTGATGAAGATGCATAAATCTTTCGTTCGGTCCTCCATAACCAACCGTCCCAATTAATGTTACGCGCGATTGTTTCGGAGACAATAAACGAAGCGAAGAACTTGATCTGATGGTAGAGTACTCATCTGCCACTCCCGTTAATCTCTGCTCTGGCCGTCCGTCTTATTAATCCCGCGTGAGGATAGGTGACTCCTTAGTGCCGACCGTTTCCGATGGGCACGTCATGGGACATGCAGAGGATGCGAACAATCACGCGACGTCCGTAGGAGATGTCCTCTGTTGAAGCATCGTAGCTTCATAGAAGGGATGTGGATCAGTAGATTGAACAAGACTCCGCGAACCCTGTATTACCCCTGTTGGTGCAATCGCCTCGCGTGGAATAAAGGACGCGACCTCAAGGCCTTCCAAGCTCGAGAGTCGAGAACCTCATTTATTTTGCAAGGGTAGTCGACCTCCCAACGCTGTTATGTTCGCGTAGCAGGGACCTACCGTTGTCTTTCATATGGATTTTGTTTGATTGTGGAGCGAGCTAATAACGGAACTTGCCGCGAAACTTGCAACCCGCACTCTCAAAACGGGCAATCTCCAGAATAACGATGGAGGACGAGCTCTCTTCTTAGAATGCGAATAACGAGAACAGAGTGCCTCTCTATGGTTAACGAACCCTTTCATTTATGAAACGCACCCAGTACCAGGAAGTATCGGGGGTACAATCACAGGGAGTGAAGTAGTACAACCAAGTGCA
>Rosalind_3931
ATACTAAACCTGGACCCGAAGTCTAGATATACAGGTTTCCAAGCAACTTACGCGAAAGCCGCCATCATCCGCCTTATACAGTATGCTTGCCGAATGTAATGACGGAGCAGTAGAAGAGGTGCGTGCAGAGAGACGGGAGCACCCGCTATCAGGGCACATCGTGGTAGTGAAGCGTATACCGGAAATGGTGTCCGTGCTAAAGCCTGCGGGGGCTGGATGCTGAGGCGAGATGCCTTTGCCCAACGGATATTAAGCTCTGTGCAAGCCTCACTTTCTAAATCCCCGAACCGAAGTATAAATACGATGACACCGTATTTCACTCTTTCTTTAAGCTGTAACTGACGTTGGTTCCCGCCGCTAGGAACGGCACCTCCTGGATGTGATCTGACTGTGCTTTCTCTGAGGTTTGTTCTGGCACACACATTTTTAATGCGGGATCGAAATTGCCAGCCTTATCTGCCCAGAACGGAAGGGTAGAGAATTTAAGCTCGAACCGGGGGGGAGGGACTTGATACACTGGGAACGCAGAAAAACATTATCCGAGCTCCGTTTTGCCGTAATGGGCGTCTGAGGTTAAATCGAATAAGATTGTAGTCGCAATCCTGCAAACTATCTGTGGCGGTCAGAATCTTTCCGCAGATTATGACCTTGTGCCCAGGCCCAGTGTTACCTCTCCCATGCGAGCGAGTGGCACAAGTGATAAAACCAGGACCAGTCAAAGTCAATAGAGCATCCTTTCTATGTTAGGTTCTTATGTCTTACCTCTATTGGTCGTTGCCACCAGAAATTTGGCACCCATTACGGCGCGGCTCCCAATAATAGAGTTGCTAGAAAAGACTCTGACAAGCCGTGCAACTGCACAGTCTTACAAGACTACCCATCCGCGGAGGTGTCCGGCTCTCTGGGCCTGAGGCACTCCAAATGCCGCCGCGCCTGGGCCGTGCAC
>Rosalind_2577
CCCTTGGCAAACGACCCGCCCCATCATTATTGATTGCGACCTGGCCGTGAGGTGGCCCGTCCATCTTCATACCCAACAGCCACCAGAAGGGGCACGACTATATCATAGGATGAAGTTGGAGTATGCAGAGAACTTGCTGCCGGTCGCTTGTTTACAAGTGTTCGTAGGGAGACTTAGACGAAGACGCGATGACCTTCTGGCTGCTGCGATGCAATTCCACCATCCAGCGGTCGAGTATGTGCTAGTCCCTTGAAGGTGAGAGGGAGCGTGGCCTATCGCTCTTTACGTAATTTGCGCGTTCTCTACTATCAACTAAATCGGACCATCATTACATATATTAGTAACATTGTAGTACTCTTTGGCTTAAAATCAGTGTACAGGGACCACAAGCTATGATTAAGGGGCCCACGGTGAAATGGCAAATGCCGCTACGTGGTAACATGATCTGCGAACTGTTGAATTGCCTCTTTAACGACGAAGGTGATAGTCATACCAGTCTTATCATTATAGTCCTCATTTCCTACCGAATATCCATTTCCTCCCGTCGTGGCAGCACATTAAGTCAATTAGCTGTGAGCTCTACCCCGAATAAATCCTAAATTATGGGTTATTGGGGTACGGTTGGTGACTCCCATTAATTGCTTAGCCCCATTGCCTACATCGGGTACATGGATAGAGACGTGTGGCGGTGCTTTACATTATCCAGGAGTTAAGTCCAGAAATGAGGTCTTCGAACTACCTACGAACGAGGCGTCATTGGCTTGTCCATCCCACCTTTGGATATCGCGGGGCGCGACCAGTCGCACTGCACGGCGCGTTAGGCGAACGTATCTATACCATTGGTACCGCTCCATGACAATAACCGTGAGGCTTCAGCTAGTAGACTGCCCCATAGACCTGACCGAGAGCACATGAGCGGGCCACCGCCATACATTCCCAGTATTTTAACGATGTGAGCAAAGTAACTGCATCGTGGTGTCGCAAACGTTTC
>Rosalind_9760
GTGCCTCACGTTGCCCCGCGTACGTTCGATGCAGTATCCATACTGCAGAGTACAGAGGGTCCCCTGTGCTTGCTGTTGGCTCAGGCAATCAACCAATGATCCATTAAGCATGTAATCTTCTGCGACAAGGGACATTTTCATATGGCGGAAATTGAGGAGGACCAAATAAAGACAGAGGCCATTGCAGTGGACAACCCAAGGATCACTCGTAAAGTCTTGGCTAATTTAGGTTTCTCTTACTTCGCGATCCGGGAGACGAGTCCTCCTTGACTAGTTCGTCGTGTAAATGTCTGTACACTAAAAACTCGATGGCGACTGCGAGTCTTTCGCCTCGATTTGCGTGGAAGTCTGTCGCGGATGTCATCATTAGCATCTGTGGAGACGTTCTCCCCTTTTAGAGCCTACGAGACGGTCATTCCAAGTCCCCGAAAATGATCTCAAACATTTCGACGACACTCATGCCCAAGTCATTCGGGAGTGGGCAATTCTTTAGTCTGGAACACGGGGGAGAGAGTTTGGGGGCTAGGATCGCCGGTCGTCGAAAGGAAGCCTGACTATTATGGGCGCGTCTGAAAACAAACCTCATCCGTATACGGCCCTCGACACTCAGCGGTTTCCAAGTTCAAAGGGCGAGCAGTCTATAAGCCACTCAGAAGCTCCACCCTGGAAATAAGCCGCCGGTTTAATGCGACTTTTTATAATAGGTGTGCCGCGAAAATCTGCGCACACATGCGGCAGCGTTGGTATGTCACGGGATATAAGTTGCGGGTCCGCGCCACTACATTGGGACTCGGAAGCTTAGTCGCGACCGCAAGCCCGCGCTGCCAGTCAGTACGTGGCAATCTAATAGATCCCTAGGGCAATGGCGACTCTCGCCGGAACTCGCCTAAGCGCACGCACGGCTATTCG"""

In [45]:
processFASTA_GC(actualFastaString)

Rosalind_9760 52.035203520352034
