# File Reading and Writing

## Motivation

So far, the data in our programs has either been hardcoded into the program itself or it came from the user who typed it in using the keyboard. This is pretty limiting and we will want programs that can read data from files. 

Files can be formatted in a number of ways, some of which are more easy to read than others. Common file types you might encounter may include text files (`.txt`), comma-separated value files (`.csv`), tab-separated value files (`.tsv`), binary files (`.bin`), and Excel spreadsheets (`.xlsx`). There are also many software libraries and packages that help programmers work with these different file types in their code.

In this lesson, we'll focus on text files, and we will not be using any special software libraries or packages so that we can focus on the basic principles.

## Opening a File

To open a file, we can simply use the built-in function `open()`, which requires you to specify the name of teh file as a string:

In [None]:
filename = 'story.txt'
file = open(filename, 'r')
print(file)
file.close()

This opens the file named `story.txt` from the current directory in your file system. If you are using Google Colab, the working directory can be seen by clicking on the `Files` tab on the left side of the screen. You can add files by simply dragging and dropping them into the window. You can specify longer filepaths to access files elsewhere (e.g., your desktop, a subdirectory), but we will keep things simple for now.

The `open()` function does not automatically show you the contents of the file, but instead, it creates an `io.TextIOWrapper` object. The important conceptual idea here is that this object not only knows the contents of the file, but it knows our program's current position in the file. Once our program starts reading, it advances this pointer so that it knows what to give us next when we need it.

Also notice that we actually pass two arguments to the `open()` function. The second argument, which is usually one or two letters long, specifies what you want to do with the file. Here are the primary modes you will encounter:
* `r`: reading (this is the default if you do not specify anything)
* `w`: writing
* `a`: appending

Lastly, it is a good practice to close your files after you are done using them by using the `close()` method. One reason why this is important is because some changes you make to a file might not be reflected until you close the file (think about it like saving and then closing a Word document).

## Reading from a File

There are several other ways to read from a file.  In the following examples, the contents of `story.txt` are:
    
    Mary had a little lamb

    His fleece was white as snow
    And everywhere that Mary went
    The lamb was sure to go

You can read a line by using the `.readline()` method as follows:

In [None]:
myfile = open('story.txt', 'r')
s = myfile.readline()
print('Current line:', s)

Notice that the `print()` function hides the newline character `\n`, but you can see it when you inspect the variable itself.

In [None]:
s

The next time you call this method, the `TextIOWrapper` advances its internal pointer to the next part of the file:

In [None]:
 s = myfile.readline()
 print('Current line:', s)    
 s = myfile.readline()
 print('Current line:', s)
 s = myfile.readline()
 print('Current line:', s)
 myfile.close()

Notice how the output is double-spaced since there is a `\n` at the end of each line and `print()` automatically appends a `\n` to the string.

To fix this, we can remove the `\n` from the string itself by using the `.strip()` method:

In [None]:
myfile = open('story.txt')
s = myfile.readline().strip('\n')
print('Current line:', s)
s = myfile.readline().strip('\n')
print('Current line:', s)
s = myfile.readline().strip('\n')
print('Current line:', s)
s = myfile.readline().strip('\n')
print('Current line:', s)
myfile.close()

Rather than reading line-by-line, you can read a specific number of characters to read:

In [None]:
myfile = open('story.txt')
s = myfile.read(10)
print(s)
s = myfile.read(10)
print(s)
myfile.close()

If we know we want to read line-by-line through the entire file, the most popular way of reading a file is by using a `for` loop:

In [None]:
f = open('story.txt')
for line in f:
    print('Current line:', line.strip('\n'))
myfile.close()

You can also read the entire file at once by either using the `.read()` or `.readlines()` methods. `.read()` puts everything into a single string, while `.readlines()` puts everything into a list of strings (one per line).

In [None]:
# As a single string
filename = "story.txt"
myfile = open(filename)
s = myfile.read()
print(type(s))
print(s)
myfile.close()

In [None]:
# As a list of strings
myfile = open('story.txt')
contents = myfile.readlines() 
print(type(contents))
print(contents)
myfile.close()

Since your programs will normally process text files piece-by-piece, it usually makes sense to process them as such to avoid clogging your system's memory. Therefore, you should only use these methods in special circumstances (e.g., very small files).

## Reaching the End of a File

One of the reasons why the `for` loop approach is most preferred is because it automatically stops when the end of the file is encountered.

If you are at the end of the file when you call `.read()` or `.readline()`, you simply get an empty string. You can use this to your advantage by creating `while` loop that stops once it sees an empty string so that your program knows that there is no need to read further:

In [None]:
myfile = open('story.txt')
next_line = myfile.readline().strip('\n')
while next_line != "":
    print(next_line).strip('\n')
    next_line = myfile.readline()

## Practice Exercise: Reading a File

The file `january06.txt` contains data from the UTM weather station for January 2006. Download it from the C4M website put it in your working directory in Google Colab (or your Jupyter environment).

1. Open it up to see what it looks like.
2. Write a Python program to open the file and read only the first line (this is the first part of the header)
3. Read the second line (this is the second part of the header)
4. Read the third line into a variable `line`.
5. What is the type of the value that `line` refers to? 
6. Call the method `.split()` on variable `line` and save the return value. What is the type that is returned by this method call? 
7. Look up the method `.split()` in the Python 3 documentation.

In [None]:
# Write your code here

## Practice Exercise: Getting Data from a File 

Write a program that only prints out the day and the temperature data from the file `january06.txt`. Here are some steps you might want to follow:
  1. Open the file `january06.txt`
  2. Read and ignore the first two lines since they are part of the header
  3. Use a loop to read the rest of the lines one-by-one
  4. Print out only the day and the temperature from each line

In [None]:
# Write your code here

Now extend that program to print the day and time of the coldest reading in the file.

**Hint:** You must convert the values to integers before you compare them. When you compare values as strings, `'11' < '2'`, but when you compare them as numbers, `11 > 2`. 

In [None]:
# Write your code here (can copy code from previous part)

## Writing to a File

To write to a file, we open it using the writing mode `w`.

In [None]:
new_file = open('example.txt', 'w')

If the file does not exist, Python automatically creates a blank one in your working directory. If you are using Google Colab, you may need to refresh your directory in the `Files` tab by clicking on the `Refresh` icon along the top of the window.

Next, we use the `.write()` method to add new contents to the file:

In [None]:
new_file.write('This is the first line.\n')
new_file.write('And the second\nand third.')
new_file.close()

We can then read and print the file contents using the same reading methods we used earlier:

In [None]:
new_file = open('example.txt', 'r')
print(new_file.read())
new_file.close()

Now, let's modify the file using the appending mode `a`:

In [None]:
# Append new text
new_file = open('example.txt', 'a')
new_file.write('\nAdding another line!')
new_file.close()

# Read and print the file contents again
new_file = open('example.txt', 'r')
print(new_file.read())
new_file.close()

So when should you use `w` versus `a`? If you open a file using the writing mode `w` and it already exists, its contents will be deleted. This is different from the appending mode `a`, which keeps the existing content and writes any new lines to the end of the file.

Let's open `'example.txt'` again using the writing mode `w` to see how the file changes:

In [None]:
# The file is opened and its contents are cleared
new_file = open('example.txt', 'w')       

# This will be the one and only line in the file
new_file.write('Adding some new content') 
new_file.close()

# Read and print the file contents again
new_file = open('example.txt', 'r')
print(new_file.read())
new_file.close()

## Practice Exercise: Writing to a File

Write your name and address to a file named `contact.txt`. Once you have executed your program, open `contact.txt` to verify that its contents are what you expect.

In [None]:
# Write your code here

Now, write a program to add your phone number to that file. Open the file and check its contents.

In [None]:
# Write your code here