# File Input and Output

In this notebook we look at examples of using Python to write and read text files.

## Writing Text Files

Files are special types of variables in Python. We can open a file for reading, writing or appending using the built-in *open()* function. The first parameter is the location of the file and the second parameter is the action we want to take: 
- "w" = write
- "r" = read
- "a" = append.

So to open a new file for writing, we call the *open()* function, supply a path for the file and specify the "w" action to write.  Note: if the file already exists, it will be completely overwritten!

In [2]:
fout = open("data.txt", "w")

Since we just specified "data.txt" rather than a complete path, the file will be written to the same directory as our notebook.

After opening a file to write, we actually write data using the *write()* function with string formatting. Each call will append more text to the file. Note: new line characters are not automatically added.

In [4]:
for i in range(0, 5):
    fout.write("Current value of i is %d\n" % i)  # note, we add a newline with \n at the end of each line

When we are finished, we need to close the file.

In [6]:
fout.close()

Once a file is closed, we cannot write any more data to it. Trying to do so will give an error message.

In [8]:
fout.write("More data!")

ValueError: I/O operation on closed file.

## Reading Text Files

To open a new file for reading, we use the *open()* function again. Note: if the file does not exist, we will get an error message.

In [10]:
fin = open("data.txt", "r")  # action "r" means open file to read

After opening a file to read, you can use several functions to access the data. The function *read()* gets the full contents of the file, *readline()* gets a full line of text, and *readlines()* loads all of the text from the file into a list with one value per line.

In [12]:
lines = fin.readlines()
for l in lines:
    print(l.strip())  # note that we usually need to remove the newline characters from the end of strings

Current value of i is 0
Current value of i is 1
Current value of i is 2
Current value of i is 3
Current value of i is 4


Again we close the file when we are finished - this means no more read functions can be called on the file.

In [14]:
fin.close()

An alternative way to read in all the lines from a text file and remove line endings is to read the data into a single string and use the **splitlines()** function:

In [16]:
fin = open("data.txt","r")
lines = fin.read().splitlines()
fin.close()
print("Read %d lines of text" % len(lines))

Read 5 lines of text


When reading from a file in Python, we sometimes use a **with** statement. This kind of statement is used to simplify the management of resources like files. If we use a **with** then we also do not need to close the file, as this will be automatically performed for us.

In [18]:
with open("data.txt", "r") as fin:
    lines = fin.read().splitlines()
    print("Read %d lines of text" % len(lines))

Read 5 lines of text


## Comma-Separated Files

Frequently, simple datasets are stored as *comma-separated value* (CSV) files. In a CSV file, tabular data is stored as plain text. Each line of the file is a record, and each record consists of one or more fields, separated by commas.

We can manually create a CSV file using the open() and write() functions. 

In [20]:
fout = open("simple.csv", "w")
# create the records
for row in range(5):
    # start the record with an identifier
    fout.write("record_%d" % (row+1))
    # create the fields for each record
    for col in range(4):
        value = (row+1)*(col+1)     # just create some dummy values
        fout.write(",%d" % value)   # notice the comma separator
    # move on to a new line in the file
    fout.write("\n")
# finished, so close the file
fout.close()    

We could just read back the entire file:

In [22]:
with open("simple.csv", "r") as fin:
    print(fin.read())

record_1,1,2,3,4
record_2,2,4,6,8
record_3,3,6,9,12
record_4,4,8,12,16
record_5,5,10,15,20



But more often, we will want to parse the data into numeric values, line by line:

In [24]:
with open("simple.csv", "r") as fin:
    # process the file line by line
    for line in fin.readlines():
        # remove the newline character from the end
        line = line.strip()
        # split the line based on the comma separator
        parts = line.split(",")
        # extract the identifier as the first value in the list
        record_id = parts[0]
        # convert the rest to integers from strings
        values = []
        for s in parts[1:]:
            values.append(int(s))
        # display the record
        print(record_id, values)

record_1 [1, 2, 3, 4]
record_2 [2, 4, 6, 8]
record_3 [3, 6, 9, 12]
record_4 [4, 8, 12, 16]
record_5 [5, 10, 15, 20]


Later in the module we will look at more convenient ways for working with CSV data.

## Handling Exceptions

If Python finds an error in your code, it raises an exception. By default, an exception will terminate a script or notebook. 
We can handle errors in a structured way by "catching" exceptions. In other words, we can plan in advance for errors that might occur in our code.

Below is an example where we try to open a file that may not exist, without dealing with that case. If the file does not exist, we will get an exception of type *FileNotFoundError* when we call the function *open()*. As a result, the subsequent lines of code will never get executed.

In [26]:
f = open("missing.txt","r")
content = f.read()
f.close()
print("Finished")

FileNotFoundError: [Errno 2] No such file or directory: 'missing.txt'

To deal with this case, we wrap the same code in a block surrounded by *try...except** statements. In this case, if an error occurs, we print a warning message and continue on with the rest of the program.

In [28]:
try:
    f = open("missing.txt","r")
    content = f.read()
    f.close()
except FileNotFoundError:
    print("Warning: Input file not found")
print("Finished")

Finished
