<a id='section2'></a>
## The File Object

A file on your computer is something like a document, data file, Python source code, etc. We are going to focus on ASCII text files which you can think of as a sequence of characters stored someplace on your computer.

It is pretty common to have programs that read and write files: read in some data, do some calculation, write out the results. This is pretty much the core of any data science task.

In Python files are represented by objects and we will look at some basic methods of the file object here.

## Working with Files

Working with files is a lot like working with a physical notebook. 

- A file has to be opened. 
- When you are done, it has to be closed. 
- While the file is open, it can either be read from or written to. 
- Like a bookmark, the file keeps track of where you are reading to or writing from. 
- You can read the whole file in its natural order or you can skip around. 

### Opening and Closing a File

Python has a built-in function where you specify the filename and the mode of access ("w" = write, "r" = read, "a" = append).

In [None]:
myfile = open("test.txt", "w")
#type(myfile)
#dir(myfile)

This command will open `test.txt` in the folder where the program is being executed. If `test.txt` does not exist it will be created. If it does exist, it will be **over-written!!!**

`myfile` is an object that keeps track of information about the file (e.g., where you are in it). If you want to write to (or read from) the file, you need to do so via the file object.

In [None]:
myfile.write("CATS!")
myfile.close()

This command writes a string to myfile. It is like `print` but does not add the newline. So:

In [None]:
myfile = open("test.txt", "w")

myfile.write("CATS!")
myfile.write("\n")
myfile.write("I <3 APS106\n")  #need to add \n newline character, unlike print()

myfile.close()

In [None]:
myfile = open('test.txt','w')  #what happens to file changing modes between 'a' and 'w'
myfile.write('hola\n')
myfile.close()

The next `write` statement writes the string where ever we left off. When we are done, the file needs to be closed. This tells the file object that we are done and it should clean things up.

In [None]:
myfile = open('grades.txt','w')  #what happens to file changing modes between 'a' and 'w'

students = 'Kendrick:A+\nDre:C-\nSnoop:B\n'

myfile.write(students)
myfile.close()

Now we can go to the folder where the jupyter notebook is and observe that there is a file there called `text.txt` containing the lines that we wrote out.

<a id='section3'></a>
## Reading Files

Now that the file exists on our disk, we can open it, this time for reading, and read all the lines in the file, one at a time. This time, the mode argument is "r" for reading:

In [None]:
myfile = open("test.txt", "r")
print(myfile.read())

There are four common ways to read a file. 

In [None]:
#Execute this cell to create a flanders.txt file in your working directory

flanders_file = open('flanders.txt','w')
flanders_file.write('''
In Flanders Fields

In Flanders fields the poppies blow 
Between the crosses, row on row,
That mark our place; and in the sky
The larks, still bravely singing, fly
Scarce heard amid the guns below.
We are the Dead. Short days ago
We lived, felt dawn, saw sunset glow, 
Loved and were loved, and now we lie
In Flanders fields.
Take up our quarrel with the foe:
To you from failing hands we throw
The torch; be yours to hold it high.
If ye break faith with us who die
We shall not sleep, though poppies grow 
In Flanders fields.''')
flanders_file.close()

### The read approach

Read the whole file into a string. **Beware: If the file is huge, this can create problems!**

In [None]:
flanders_file = open("flanders.txt", 'r')
flanders_poem = flanders_file.read()
flanders_file.close()

print(type(flanders_file))
print(type(flanders_poem))
print(flanders_poem)

Q: If `flanders_poem` is a string, why does it print out across multiple lines?

### The readline approach

Read the file line-by-line into a string. This is a safer thing to do as the whole file never gets put in memory at once. Note that the file must be kept open if you still want to read the next line - unlike above where you can close the file immediately after `read()`.

In [None]:
# Approach: readline
# When to use it: When you want to process the file line-by-line
# Example code
myfile = open("flanders.txt", 'r')

line = myfile.readline()
#print(line)
contents = ''

while line != '': #while line is not an empty string
#while line: #while line is not an empty string (empty strings evaluate to False)
    contents += line  #concatenate the current line with the previous lines
    line = myfile.readline() # each time through the loop, line contains one line of the file
myfile.close()

print(contents)


# by the end of this loop, contents contains the entire contents of the file

In [None]:
print(myfile.readline())

### The for line in file approach

For the next two ways to read file, we need to have a preview of material coming next week. There is another form of loop in Python called a **for-loop** that looks like the following:

```
for item in iterable:
    body
```
Similar to `if` and `while` statements, there are two things to note here:
- There must be a colon (:) at the end of the `for` statement.
- The body must be indented.

An "iterable" can be anything that can be 'iterated' over. 'Iterate' means to do something repeatedly. In this case, an iterable is a collection of items and we can loop over them.



Like the `readline` approach, this approach also reads in the file line-by-line. It uses the `in` operator in a for-loop. 

In [None]:
# Approach: for line in file
# When to use it: When you want to process the file line-by-line
# Example code
myfile = open("flanders.txt", 'r')
contents = ''
for line in myfile: # each time through the loop line contains one line of the file
    contents += line
    #print(line)  #why is there a gap between rows when printing like this?
myfile.close()

print(contents)
# by the end of this contents contains the entire contets of the file

In the example above, the variable `line` is assigned to the next line (a string ending in a '\n') each time through the loop.

### The readlines approach

The `readlines` approach reads the whole file in (like `read`) but rather than putting the file in one big string, it creates a list where each line of the file is an entry in the list.

**We haven't actually got to lists yet in this course. For now just remember that there is a way to read lines of a file into a list and that list is an iterable.**

In [None]:
# Approach: readlines
# When to use it: When you want to process the file line-by-line with an index
# Example code
myfile = open("flanders.txt", 'r')
lines = myfile.readlines() # lines is a list of strings. Each entry in lines is a line of the file

print(type(lines))
print(len(lines))
print(type(lines[0]))

print(lines)

myfile.close()

## The with Statement

Notice that whenever we open a file, we need to be careful to close it again. Python provides a nice way to open and then automatically close a file using a `with` block.

```
with open(«filename», «mode») as «variable»:
      «body»
```

The file is opened at the beginning and **automatically closed** at the end of the body. 


In [None]:
with open('test.txt', 'r') as file:
    print(file.read())


    
print("The next line")

In [None]:
with open("flanders.txt", 'r') as flanders_file:
    for line in flanders_file:
        print(line, end="")

The use of `with` is a nice pattern in Python - all it really does it make sure the file is correctly closed when the with statement ends.