# Computing GC content
## Problem

The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called **FASTA** format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

> **Given:** At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

> **Return:** The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

In [1]:
#let's define GC function first
def gc_content(fasta):
    count = [0, 0]
    for nuc in fasta:
        if nuc == "G":
            count[0] += 1
        elif nuc == "C":
            count[1] += 1
    return (count[0] + count[1]) / len(fasta)

In [None]:
#open and read the fasta file
with open("rosalind_gc.txt","r") as fasta:
    nextline = str()
    dict = {}
    for line in fasta:
        if line.startswith(">"):
            header = line.strip(">").strip("\n")
        else:
            nextline = (line.strip("\n") + nextline)
        dict[header] = nextline

In [None]:
#run the code
for key, value in dict.items():
    print(key + "\n" + str(gc_content(value) * 100))