# Introduction to Python for Biology
# Day 3

# Code Along

## Special Characters

If we want to add a new line when we print something out, we'll have to use a special character `\n`

There are some other special characters that we use when working with strings in Python. They all have the backslash followed by a letter. For example, you may also use the `\t` (tab) character. Note that we don't have to add spaces in between.

## String Manipulation

If we want to glue together two strings, we can **concatenate** them using the `+` (plus) symbol.

We often concatenate in a print statement (you can sometimes also use commas).

Like we saw with lists yesterday, strings have a lot of built-in methods (functions that go along with them). Remember string methods show up after the name of the string variable and have parentheses after them. 


For example, there is a method `.lower()` that will change a string to all lowercase letters. This doesn't change the original variable. It returns a copy of the variable in lower case (that you can save to a new variable).

Another useful string manipulation method is `.replace()`. It takes two arguments (both strings) and will return a copy of the original variable (so save it if you want to use it again).

Remember how we pulled out items from a list yesterday using their indices? We can do the same with a string to extract a substring. Remember that Python starts counting at 0.

If we don't include a second number, we'll get all of the letters up until the end of the string.

Often we will need to count the number of times some pattern in a string occurs in biology. `.count()` can help us count how many times a substring occurs in a string. It takes the substring as an argument and returns a number.

If we want to find the location of the substrings, we can use `.find()`. It takes a single string argument and returns a number that is the position that substring first appears in the string. 

Both `.count()` and `.find()` can only find exact matches. This doesn't work great for variable site pattern searches, but we'll learn regular expressions later (which will help us with that).

Another thing that we might want to do with strings is split them up into pieces. We can split a string into items in a list using `.split()` and then be able to iterate over it. `.split()` takes a single argument which is the character we want to split on (we call this the **delimiter.**)

In [26]:
words = "red,green,blue,yellow"

## Reading Text from a File

As biologists, we often need to read in text from a file as part of a pipeline. Let's learn how to use Python to interact with files we have.

What kinds of text files do you use in your work? How is the data formatted? 

Before we can read a file, we have to open it. This creates a file object that we can give a variable name. 

Once we've opened the file, we can read it and then treat it sort of like a string. These file contents are different than the file object and from the name of the file. Confusing these is a common cause of errors.

We have newlines at the end of a file we read in, and we can strip them off using the `.rstrip()` method which takes the character you'd like to remove as its argument.

Commonly, we'll do this all in one line. You can string together Python methods. 

What is the best way to write it? The easiest way for you to understand and read it. If it makes more sense for you to write it out line by line for readability, go ahead. I often write my code line by line to start with and shorten it up in future passes. 

## Iterating Over Lines of Text in a File

Remember loops? We can treat file objects like lists and loop over them, with every line as an individual element. This is super useful if we need to process a file line by line.

Make sure you loop over the file object, not the contents of the file (that you got from `.read()`. You'll know you've messed this up if you just get a single character for each line. It is helpful to ask yourself if you want to read your file in as one big chunk (in which case you use use `.read()`) or if you want to read your file in line by line (in which case you should loop over the file object).

Another thing to watch out for is looping over the same file object twice. You may have run into this before if you tried to rerun code from above because file objects are exhaustible. Python remembers that it is at the end of the file once you've looped over it, so it lets you know there are no more lines. You can close and reopen the file if you want to loop over it again or (better idea) you can read the contents into a list and iterate over the list multiple times without a problem. 

The `.readlines()` method, which is used on file objects, will read the lines of a file into a list.

First we will store a list of lines in the file.

Then we can do stuff with the list by looping over it.

## Writing to a File

Let's take a moment and look at the Python documentation (either by Googling or by using cmd/ctrl + tab) to try to figure out how to use the `open()` function to write to a file. 

We see that we can use the second (optional) argument version of the `open()` function and use "w" for writing. 

This second argument can be "r" for reading (it is this by default if we leave it off), "w" for writing", or "a" for appending. "w" will overwrite an existing file, while "a" will add new data to the end of the file without removing content. (If the file doesn't exist, both "w" and "a" do the same thing). 

Now that we've opened a file for writing, we can use the `.write()` method to write some text to it. This method is a lot like print and takes a string as an argument. (It can also take any function that would return a string.)

If we check the folder we are currently working in, we can see we now have a new file with the name we gave it. Let's open it and check the file contents. 

## Closing Files

We'll also need to call the `.close()` method on the file when we are done reading to it or writing to it. (Note that `.close()` is a method and `open()` is a function). This will be a good habit to have and will prevent errors that are hard to track down. 

## Taking User Input

To interact with the program user and get their input, we can use the `input()` function. This function takes a string as its argument. 

This will get stored as a string and will have a newline character afte rit. So if you want a number, you can change the data type. And remember `.rstrip()` removes newlines if you don't want that.

When we take input from the user, we open ourselves up to new and fun errors if they don't know they are supposed to input something or input it in the wrong format. So, it's a good idea to do input validation to make sure their input makes sense. 

This sort of "defensive programming" and testing is really important to creating quality code, and we'll come back to it! We'll learn exceptions later, and that will be a better way to check our user input.

# Independent Practice

### Creating a FASTA file
FASTA is a file format that is used to store DNA and protein sequence data. The header row has a greater than symbol and the accession name. There may be multiple sequences in one file.

\>sequence_one
<br>
GTTTCAAAGAT
<br>
\>sequence_two
<br>
ATCAGATCGGA
<br>
\>sequence_three
<br>
ACTGCATCGTACT


Write a Python program that will make FASTA files for the following sequences. Make sure all are in uppercase letters.

SEQ1: atcggccatctagccgg
<br>
SEQ2: ACTGTACATGTGCGCTAG
<br>
SEQ3: ccatctagcTGTAC

### Creating Multiple FASTA files
Use the sequences from the previous exercises but instead create three new files in the FASTA format (one sequence per file). The names of the files will be the same as the sequence names and end in ".fasta"

### Bonus Problem: Fire Ant DNA Sequences
In the data folder there is a file called "test_ant.txt" that contains genomic data from the Red Imported Fire Ant. Write out each DNA sequence into its own separate file. 