# Files

** See also Examples 15, 16, and 17 from Learn Python the Hard Way**

You'll often be reading data from a file, or writing the output of your Python scripts back into a file. Python makes this very easy. You need to open a file in the appropriate mode, using the `open` function, then you can read or write to accomplish your task. 

The `open` function takes two arguments, the name of the file, and the mode. The mode is a single letter string that specifies if you're going to be _reading_ from a file, _writing_ to a file, or _appending_ to the end of an existing file. The function returns a file object that performs the various tasks you'll be performing: `a_file = open(filename, mode)`. The modes are:

Mode | Description
---|---
`'r'`| open a file for reading.
`'w'`| open a file for writing. <br> _Caution: this will overwrite any previously existing file._
`'a'`| append / write to the end of a file. 

When reading, you typically want to iterate through the lines in a file using a for loop. Some other common methods for dealing with files are: 

+ `file.read()`: read the entire contents of a file into a string
+ `file.write(some_string)`: writes to the file.
    - _Note that this doesn't automatically include any new lines._
    - _Note that sometimes writes are buffered - Python will wait until you have several writes pending, and perform them all at once._
    
+ `file.flush()`: write out any buffered writes
+ `file.close()`: close the open file. This will free up some computer resources occupied by keeping a file open.

Here is an example using files:

### Writing and reading files

#### Writing a file to disk

In [1]:
# Create the file temp.txt, and write some lines

f = open("temp.txt", "w")

f.write("This is my first file! The end!\n")
f.write("Oh wait, I wanted to say something else.")

f.close()

In [2]:
# Create a file numbers.txt and write the numbers from 0 to 24 there

f = open("numbers.txt", "w")

for num in range(25):
    f.write(str(num)+'\n')
    
f.close()

#### Reading a file from disk

Now, let's check that we did everything as expected.

In [None]:
# Check the "temp" file
f = open("temp.txt", "r") # We now open the file for reading
temp_content = f.read()   # And we read the full content of the file in memory, as a big string
f.close()                 # Close the file

print(temp_content)

In [3]:
# Check the "numbers" file
f = open("numbers.txt", "r")
numbers_content = f.read()
f.close()

print(numbers_content)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24



#### Working with files line-by-line

Let's inspect what we've done a little more closely. Consider `numbers.txt`. Note that if we just look at the `read` version of the raw content, it looks like this:

In [4]:
# The whole file has been glued together as one big string.
numbers_content

'0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n19\n20\n21\n22\n23\n24\n'

Let's process that big string a little bit:

In [None]:
# Split the content of the file using the newline character \n
lines = numbers_content.split("\n")
lines

In [None]:
# here we convert the strings into integers, using a list comprehension
# we have the conditional to avoid trying to parse the string '' that 
# is at the end of the list
numbers = [int(line) for line in lines if len(line)>0]
print(numbers)

#### With statements

A convenient shorthand that you'll often see is a `with` statement. This is used to open a file, perform some operations on it, and then close the file again. For example:

In [None]:
# Let's add two lines to our file.
# Instead of f = open('temp.txt', 'w'), we write:

with open('temp.txt', 'w') as f:          
    f.write("here's a fun line to add\n")
    f.write("this file is getting long\n")

Notice that we don't need to `close` the file now. If we try to use it, it's already closed.

In [None]:
# This won't work because the file is closed. Uncomment to run.
f.write("this won't work")

Let's see what we got:

In [None]:
with open('temp.txt', 'r') as f:
    print(f.read())

Oops! We overwrote our old file. Remember that we need to use the `'a'` mode to append:

In [None]:
# Let's append, rather than overwriting 
with open('temp.txt', 'a') as f:
    f.write("lets add a line that won't overwrite what we already have")

# And, let's check that it worked
with open('temp.txt', 'r') as f:
    print(f.read())

####  Exercise 1

* Write a function that reads a file and returns its first `n` lines as a list of strings (one string per line). 
* Test your function by returning the first five lines of the file below (`fname`). 

In [6]:
#def read_file(fname, n):
    # Your function here
def read_file(fname, n):
    with open(fname, 'r') as f:
        contents = f.read()
        contents = contents.split("\n")
        return contents[:n]
# Example:
read_file("restaurant-names.txt", 5)
        

FileNotFoundError: [Errno 2] No such file or directory: 'restaurant-names.txt'

In [None]:
fname = '../../data/dealing_with_data/restaurant-names.txt'
read_file(fname, n)

**Answer:** <span style="color:white">
def read_file(fname, n):
    with open(fname, 'r') as f:
        contents = f.read()
        contents = contents.split("\n")
        return contents[:n]
\# Example:
read_file("restaurant-names.txt", 5)

####  Exercise 2

* Write a function that reads the `k`-th column of a CSV file and returns its contents. (You can reuse parts of the function that you wrote above.) 
* Read the file `data/baseball.csv` and return the content of the 3rd column (`year`).

In [7]:
#def read_kth_col(fname, k):
    # Your function here
def read_kth_col(fname, k):
    with open(fname, 'r') as f:
        contents = f.read()
        contents = contents.split("\n")
        contents_k = [l.split(",")[k-1] for l in contents if len(l)>0]
        return contents_k
read_kth_col("baseball.csv", 3)

['year',
 '2006',
 '2006',
 '2006',
 '2006',
 '2006',
 '2006',
 '2006',
 '2006',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007',
 '2007']

In [None]:
fname = 'baseball.csv'
read_kth_col(fname)

**Answer:** <span style="color:white">
def read_kth_col(fname, k):
    with open(fname, 'r') as f:
        contents = f.read()
        contents = contents.split("\n")
        contents_k = [l.split(",")[k-1] for l in contents if len(l)>0]
        return contents_k
read_kth_col("baseball.csv", 3)

#### IPython magic

An interactive and simple way to write files is through the use of an [IPython magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html). 
These commands are denoted by a prefix of `%` or `%%`; we'll see these again later in the course when we move to plotting data. 

The command below will create a file called `phonetest.txt`.

In [None]:
%%file phonetest.txt
679-397-5255
2126660921
212-998-0902
888-888-2222
800-555-1211
800 555 1212
800.555.1213
(800) 555-1214
1-800-555-1215
1(800)555-1216
800-555-1212-1234
800-555-1212x1234
800-555-1212 ext. 1234
work 1-(800) 555.1212

#### Exercise 3 
* Write a function that:

    1. Takes a string as input 
    2. Removes any non-digit characters
    3. Returns a "clean" string, without any non-digit characters

In [None]:
#def remove_non_digits(num):
    # Your function here

In [None]:
num = '800-555-1212 ext. 1234'
remove_non_digits(num)

* Now, read the file `phonetest.txt`. 
    1. Apply your function to each line in the file.
    2. Print the clean file.

In [None]:
# Clean phonetest.txt here

**Answer**: <span style="color:white">
\# Using loops
def remove_non_digits(num):
    digit_list = [str(x) for x in range(10)]    
    clean_num = ''
    for digit in num:
        if digit in digit_list:
            clean_num += digit
    return clean_num            
\# Using string comprehension
def remove_non_digits(num):
    digit_list = [str(x) for x in range(10)]    
    clean_num = ''.join([d if d in digit_list else '' for d in num])
    return clean_num            
\# Read in the file and clean it 
with open('phonetest.txt', 'r') as f:
    phone_numbers = f.read().split("\n")
print('\n'.join([remove_non_digits(n) for n in phone_numbers]))