# Working with Files

## Motivation

So far, the data in our programs has either been hardcoded into the program itself or it came from the user who typed it in using the keyboard. This is pretty limiting and we will want programs that can read data from files.

Files can be formatted in a number of ways, some of which are more easy to read than others. Common file types you might encounter may include text files (`.txt`), comma-separated value files (`.csv`), tab-separated value files (`.tsv`), binary files (`.bin`), and Excel spreadsheets (`.xlsx`). There are also many software libraries and packages that help programmers work with these different file types in their code.

In this lesson, we'll focus on text files, and we will not be using any special software libraries or packages so that we can focus on the basic principles.

## Opening a File

This notebook will involve two files: `story.txt` and `january06.txt`. To work with these files, we will need to put them in same file system directory from which we are running our programs.

If you are using Google Colab, the working directory can be seen by clicking on the `Files` tab on the left side of the screen. You can add files by simply dragging and dropping them into the window. You can specify longer filepaths to access files elsewhere (e.g., your desktop, a subdirectory), but we will keep things simple for now.

To open a file, we can simply use the built-in function `open()`, which requires you to specify the name of the file as a string:

In [None]:
myfile = open('story.txt', 'r')
print(myfile)
myfile.close()

The `open()` function does not automatically show you the contents of the file, but instead, it creates an `io.TextIOWrapper` object. This special Python object not only knows the contents of the file, but it knows our program's current position in the file. Once our program starts reading, it advances this pointer so that it knows what to give us next when we need it.

Also notice that we actually pass two arguments to the `open()` function. The second argument, which is usually one or two letters long, specifies what you want to do with the file. Here are the primary modes you will encounter:
* `r`: reading (this is the default if you do not specify anything)
* `w`: writing
* `a`: appending

Lastly, it is a good practice to close your files after you are done using the `close()` method. One reason why this is important is because some changes you make to a file might not be reflected until you close the file (think about it like saving and then closing a Word document).

## Reading from a File

Once we have opened a file, there are several ways that we can inspect its contents.

You can read a line by using the `.readline()` method as follows:

In [None]:
myfile = open('story.txt', 'r')
s = myfile.readline()
print('Current line:', s)

The next time you call this method, the `TextIOWrapper` advances its internal pointer to the next part of the file:

In [None]:
s = myfile.readline()
print('Current line:', s)
s = myfile.readline()
print('Current line:', s)
s = myfile.readline()
print('Current line:', s)
myfile.close()

Notice that the `print()` function hides the newline character `\n`, but you can see it when you inspect the variable itself.

In [None]:
s

This is why the output appears to be double-spaced. To fix this, we can remove the `\n` from the string itself by using the `.strip()` method:

In [None]:
myfile = open('story.txt')
s = myfile.readline().strip('\n')
print('Current line:', s)
s = myfile.readline().strip('\n')
print('Current line:', s)
s = myfile.readline().strip('\n')
print('Current line:', s)
s = myfile.readline().strip('\n')
print('Current line:', s)
myfile.close()

If we know we want to read line-by-line through the entire file, we can actually use a `for` loop on the `TextIOWrapper` object:

In [None]:
myfile = open('story.txt')
for line in myfile:
    print('Current line:', line.strip('\n'))
myfile.close()

Reading line-by-line is the most standard way of working with files since it does not require loading the entire file at once and lines are a logical unit. However, here are some alternative methods if you need them:

In [None]:
# By characters rather than by lines
myfile = open('story.txt')
s = myfile.read(10)
print(s)
s = myfile.read(10)
print(s)
myfile.close()

In [None]:
# As a single string
myfile = open('story.txt')
s = myfile.read()
print(type(s))
print(s)
myfile.close()

In [None]:
# As a list of strings
myfile = open('story.txt')
contents = myfile.readlines()
print(type(contents))
print(contents)
myfile.close()

## Practice Exercise: Reading a File

The file `january06.txt` contains data from the UTM weather station for January 2006. Download it from the C4M website put it in your working directory in Google Colab (or your Jupyter environment).

1. Open it up to see what it looks like.
2. Write a Python program to open the file and read only the first line (this is the first part of the header)
3. Read the second line (this is the second part of the header)
4. Read the third line into a variable `line`.
5. What is the type of the value that `line` refers to?
6. Look up the method `.split()` in the Python 3 documentation.
7. Call the method `.split()` on `line`. What is returned and what is its data type?

In [None]:
# Write your code here

## Practice Exercise: Getting Data from a File

Write a program that only prints out the day and the temperature data from the file `january06.txt`. Here are some steps you might want to follow:
  1. Open the file `january06.txt`
  2. Read and ignore the first two lines since they are part of the header
  3. Use a loop to read the rest of the lines one-by-one
  4. Print out only the day and the temperature from each line

In [None]:
# Write your code here

Now extend that program to print the day and time of the coldest reading in the file.

**Hint:** You must convert the values to integers before you compare them. When you compare values as strings, `'11' < '2'`, but when you compare them as numbers, `11 > 2`.

In [None]:
# Write your code here (can copy code from previous part)

## Writing to a File

To write to a file, we open it using the writing mode `w`.

In [None]:
new_file = open('example.txt', 'w')

If the file does not exist, Python automatically creates a blank one in your working directory. If you are using Google Colab, you may need to refresh your directory in the `Files` tab by clicking on the `Refresh` icon along the top of the window.

Next, we use the `.write()` method to add new contents to the file:

In [None]:
new_file.write('This is the first line.\n')
new_file.write('And the second\nand third.')
new_file.close()

We can then read and print the file contents using the same reading methods we used earlier:

In [None]:
new_file = open('example.txt', 'r')
print(new_file.read())
new_file.close()

Now, let's modify the file using the appending mode `a`:

In [None]:
# Append new text
new_file = open('example.txt', 'a')
new_file.write('\nAdding another line!')
new_file.close()

# Read and print the file contents again
new_file = open('example.txt', 'r')
print(new_file.read())
new_file.close()

So when should you use `w` versus `a`? If you open a file using the writing mode `w` and it already exists, its contents will be deleted. This is different from the appending mode `a`, which keeps the existing content and writes any new lines to the end of the file.

Let's open `'example.txt'` again using the writing mode `w` to see how the file changes:

In [None]:
# The file is opened and its contents are cleared
new_file = open('example.txt', 'w')

# This will be the one and only line in the file
new_file.write('Adding some new content')
new_file.close()

# Read and print the file contents again
new_file = open('example.txt', 'r')
print(new_file.read())
new_file.close()

## Practice Exercise: Writing to a File

Write your name and address to a file named `contact.txt`. Once you have executed your program, open `contact.txt` to verify that its contents are what you expect.

In [None]:
# Write your code here

Now, write a program to add your phone number to that file. Open the file and check its contents.

In [None]:
# Write your code here

## Filepaths

A **filepath** is a string that specifies the location of a file or directory on a computer's filesystem. Certain characters may have special meaning in filepaths:
* Path separator: `/` or `\` depending on operating system (Windows, MacOS, etc.)
* Current directory: `.`
* Parent directory: `..`

There are two different kinds of filepaths:

1. **Relative filepath:** This specifies the location of a file or directory relative to the current working directory where your program is being executed. Here are some examples:

    Windows: `file.txt`, `..\Documents\file.txt`

    Unix/Linux/MacOS: `file.txt`, `../Documents/file.txt`

2. **Absolute filepath:** This specifies the complete location of a file or directory starting from the root (i.e., the starting for navigating the entire file system hierarchy). Here are some examples:

    Windows: `C:\Users\Username\Documents\file.txt`

    Unix/Linux/MacOS: `/home/username/Documents/file.txt`

When working in Jupyter notebooks, it is best to stick with relative filepaths because you will usually be running your code from a predetermined location and you will usually not need to dive deep into the filesystem provided by Google Colab. However, using absolute filepaths when working on your local machine can be helpful for ensuring that your filepaths always point to the same location regardless of where you run your code from.

## Navigating Filepaths in Python

The `os` library is built into Python to support basic operating system navigation techniques.

In [None]:
import os

In [None]:
# Get the current working directory
# (i.e., the default folder from which Google Colab runs your code)
os.getcwd()

In [None]:
# Get a list of the files in a folder
# (note: this is a default folder that Google Colab always provides)
os.listdir('sample_data')

In [None]:
# Create a filepath that uses the proper path separator for your operating system
os.path.join('sample_data', 'california_housing_train.csv')