# Review of last week's homework assignment

The homework last week instructed you to take some example sequences, put them in a file called 'sequences.txt', read in the file, calculate the GC content on each sequence, and then print only the highest GC content value.

Here were the example sequences that should placed into `sequences.txt`:
```
AGCTCGATCGATACG
GGCTCTCAAG
CTAGCTAGACGA
```

First, we need to open the file and assign it to a variable:

In [1]:
seq_file = open("sequences.txt")

Next we will start a new list to hold the gc content values:

In [2]:
gc_cont_list = []

Next we will loop through the file to grab each sequence. There are two important things to note here. 

The first is that each sequence line will have a special character at the end of the line. On MAC/Un\*x, this will be `\n`. On Windows machines, line endings are different, e.g. `\r`. We will need to either ignore those characters or remove them from each sequence before we calculate the gc content. We will do this with a special string method called `.strip()`. This takes a character as an argument, then chops it off at the end of a string.

The second is that for this example, we will have a loop inside of a loop. The first loop will loop through the lines in the file, while the second will loop through the characters in the string. Note that you should be careful when using nested loops like this. They can greatly increase the complexity of your program and the resulting time needed to run them.

In [None]:
for line in seq_file.readlines():
    new_line = line.strip("\n")
    line_length = len(new_line)
    gc = 0
    for i in new_line:
        if i == "G" or i == "C":
            gc += 1
    gc_cont = gc/line_length
    gc_cont_list.append(gc_cont)

Now let's take a look at the `gc_cont_list` to see if its contents make sense:

In [None]:
print(gc_cont_list)

That looks pretty good. It's often good to use an example like this that you can calculate by hand before applying it to a larger series of sequences. Now let's take the maximum value and print it:

In [None]:
max_gc = max(gc_cont_list)
print("Your maximum GC content is: " + str(max_gc) + "!")

And, finally, we want to make sure that we close the file.

In [None]:
seq_file.close()

There are some things that we could have done differently that might make this code more robust. For example, instead of using the `.strip()` method, we could have just counted the number of `ACTG`s and divide that by the number of `C` and `G`. We could also have translated the whole sequence into uppercase letters so that we didn't miss any lowercase or soft masked nucleotides. While we are doing all of that, we could also check for characters that aren't `ACTG` and throw an error when we encounter them (keeping in mind that we wouldn't want to throw an error for a `\n`. 

Here is how that might look:

In [None]:
seq_file = open("sequences.txt")
gc_cont_list = []
for line in seq_file.readlines():
    new_line = line.upper()
    gc = 0
    at = 0
    for i in line:
        if i == "C" or i == "G":
            gc += 1
        elif i == "A" or i == "T":
            at += 1
        elif i == "\n":
            pass
        else:
            print("Warning: Uh oh, it looks like you have a character that is not A, C, T, or G." +
                  " This program only estimates GC content based on those characters. We" +
                  " will move on, but check your file to make sure your results are consistent.")
    gc_cont = (gc/(at + gc))
    gc_cont_list.append(gc_cont)
max_gc = max(gc_cont_list)
print("Your maximum GC content is: " + str(max_gc) + "!")
seq_file.close()

This might be a bit better than our example above, but we could still make it more robust to problems with the data. For example, it's probably better to raise an error when there is a character that doesn't match your expectations. This *forces* the user to fix their problem before moving on, which is usually the best call depending on how important the error is. You can raise a `ValueError` to do that. We will go over an example where we do this in our next section on functions.