# Working with text files

Now that we can process text, all we need is... more text. And odds are, that text is going to come in the form of a file, so it's high time that we start using them.

## Opening Filehandles

A **filehandle** is an object that controls the stream of information between your program and a file stored somewhere on the computer. Filehandles are not filenames, and they are not the files themselves. Like variables, filehandles contain the address of the file on the hard drive or other storage media. But unlike variables, filehandles also keep track of your current read position in the file. Imagine your file is like a book in a library. The filehandle tells Python where that book is, and keeps a bookmark in the book for where you currently are. Because filehandles are not the files themselves, deleting a filehandle in your script using the **del** command does nothing to the file to which the filehandle refers.

We create filehandles in the simplest sense with the **open()** command:

```python
fh = open('some_file')
```

where some_file is the path to a file (i.e. the filename) on your filesystem. In general, it is good practice to use absolute path nomenclature (e.g. /Users/aaron/some_file or /home/aaron/some_file), but you can be lazy if you know the file you want is going to be in the same directory as your program.

In [None]:
fh = open('hello.txt')
contents = fh.read()
print contents
fh.close()

As you can see, the **read()** method of the filehandle just sucks in the whole file in a single string, newlines and all! This is quick and easy, for sure, but it's not necessarily the most orderly way to deal with the contents of a file.

In [None]:
with open('hello.txt') as fp:
    for cur_line in fp:
        print cur_line,

The first line of this program has a lot going on. Let’s start by looking
at the open() function. To do any work with a file, even just printing its contents,
you first need to open the file to access it. The **open()** function needs
one argument: the name of the file you want to open. Python looks for this
file in the directory where the program that’s currently being executed is
stored.

The next line `for curline in fp:` loops over the file's lines via the **fp** filehandle variable - it will execute everything in the code block below it once per line, putting a new line of text in **cur_line** on each loop.

(Another nicety of this **with** construct is that when program execution leaves the code block below it, the filehandle is closed for you. Typically you have to manage closing the filehandle yourself, and this is important to do: to flush any waiting output to disk and to release the filehandle for reuse -- there are a finite number of them.)

We can do all our string tricks on each line. Let's use a different form, and a different kind of input file...

## Reading our first sequences!

In [None]:
#printing all the lines....
nexus_file = open("happy_face.fastq")
for cur_line in nexus_file:
    print cur_line.strip()

In [None]:
# Let's start by printing only lines that start with "@"
nexus_file = open("happy_face.fastq")
for cur_line in nexus_file:
    cur_line = cur_line.rstrip()
    if cur_line.startswith('@'):
        print "Sequence name and description: ", cur_line

In [None]:
# First version of the function - splits on "@" and prints the list
def print_sequence_name(cur_line):
    as_list = cur_line.split('@')
    print as_list

In [None]:
def simplefunction():
    i = 3
    m = i * 45
    d = m // 15
    return d
print simplefunction()

In [None]:
print simplefunction()

In [None]:
%who

In [None]:
# Now, let's send each line that starts with @ to a function...
nexus_file = open("happy_face.fastq")
for cur_line in nexus_file:
    cur_line = cur_line.strip()
    if cur_line.startswith('@'):
        print "Sequence name and description: ", cur_line
        print_sequence_name(cur_line)

In [None]:
## Writing files

In [None]:
%%bash
ls

In [None]:
output_file = open("output_file","w")
output_file.write ("one line file! We win!\n")

In [None]:
%%bash
ls

In [None]:
%%bash
cat output_file

# Stupid letter tricks

ASCII is fun - all letters have a corresponding numerical value. The built-in funciton __chr()__ converts them... the alphabet happens to start at ascii 97..

In [None]:
print chr(97)

In [None]:
for i in range(97,97+26):
    print chr(i),

In [None]:
print "table of char codes can be found at https://unicode-table.com"
print "notice strings are prefixed with a u in the Python code"
print
print u"using unicode: \u00DCmlaut \u00FC \u00C6 \u0996"
print u"using unicode: \u2713 \u262F \u221E \u2766 \u2705"
print u"so many cats:  \U0001F408  \U0001F431  \U0001F638  \U0001F639  \U0001F63A  \U0001F63B",
print u" \U0001F63C  \U0001F63D  \U0001F63E  \U0001F63F  \U0001F640"
print u" \U00011A31" 

<img src="the_history_of_unicode_2x.png">