# [Identifying Unknown DNA Quickly](http://rosalind.info/problems/gc/)

**Problem**
The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

**Sample Dataset**
```
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT
```

**Sample Output**
```
Rosalind_0808
60.919540
```

In [None]:
def parse(data):
    dna_strings = {}
    name = ''
    
    for line in data.split('\n'):
        if line[0] == '>':
            name = line
            dna_strings[name] = ''
        else:
            dna_strings[name] += line
    
    return dna_strings

In [None]:
def get_gc_content(dna_string):
    gc_count = 0

    for letter in dna_string:
        if letter == 'G' or letter == 'C':
            gc_count += 1

    return gc_count * 100/ len(dna_string)

In [None]:
file = open('./dataset.txt', 'r')
data = file.read().strip()

dna_strings = parse(data)
highest_gc_content_name = ''
highest_gc_content_value = 0

for name, dna_string in dna_strings.items():
    gc_content = get_gc_content(dna_string)
    
    if highest_gc_content_value < gc_content:
        highest_gc_content_value = gc_content
        highest_gc_content_name = name
        
print(highest_gc_content_name)
print(highest_gc_content_value)

print(f'{highest_gc_content_name[1:]} {highest_gc_content_value}%')