<table class="table table-bordered">
    <tr>
        <th style="text-align:center; width:35%"><img src='https://drive.google.com/uc?export=view&id=1zIB3Nw_z8N2SJSSdd2yWQIsDS0MGPYKm' style="width: 300px; height: 90px; "></th>
        <th style="text-align:center;"><h3>IS111 - Notebook 10</h3><h2>File Handling</h2></th>
    </tr>
</table>

## Learning Outcomes

After going through this notebook, you should be able to 

- Use `with open(...) as ...` to <b>open a text file</b> either for <b>reading</b> or for <b>writing</b>. 
- <b>Read</b> the content from a text file line by line.
- Use `rstrip('\n')` to <b>get rid of the newline character</b> at the end of a line from a text file.
- Use `split()` appropriately to <b>process a text file with multiple columns</b>.
- <b>Write</b> to a text file line by line.

## I. Introduction

While a program is running, its <b>data</b> is <b>stored</b> in <b>RAM</b> (Random Access Memory). RAM is fast and not expensive but it is <b>volatile</b>, which means that when the program ends or the computer shuts down, data stored in RAM <b>disappears</b>!

To avoid that, you have to <b>store data in non-volatile storage</b> such as hard disks, USB drives, etc. One way to store data in non-volatile storage is to store data in <b>files</b>.

In this notebook, we explain the basic operations you can perform with files using Python. The most basic tasks involved in file manipulation are <b>reading data</b> from files and <b>writing data</b> to files. 


## II. Opening a File for Reading

Before we can read from a file, we need to first <b>open</b> the file. And when we are done with reading from this file, we need to <b>close</b> the file. Assume that we have a file called ```presidents.txt``` in the same directory as this notebok. One way to open this file for reading and later close it is as follows:

In [None]:
my_file = open('presidents.txt', 'r')

# Some code here to read the content of this file

my_file.close()

If we open a file as shown in Line 1 above, we will need to remember to close it as shown in Line 5 above.

Another __preferred__ way of opening a file for reading is as follows:

In [None]:
with open('presidents.txt', 'r') as my_file:
    # Some code here to read the content of this file
    pass

This second way of opening a file using the ```with open(...) as...``` syntax saves us the trouble of closing the file after we finish processing it.

Now let us take a closer look at the ```open()``` function. This function is used to open a file. Typically this function takes in two parameters: the name of the file to be opened, and the mode to open the file. Two commonly used modes are ```'r'``` for reading and ```'w'``` for writing. For example, the code above opens the file ```presidents.txt``` for reading. Subsequently we can only read from this file but we cannot write to this file.

We can also see in the code above that a variable called ```my_file``` is introduced in Line 1. This variable ```my_file``` will be used later to access the file.

### Opening a file that does not exist for reading

If we try to open a file that does not exist for reading, we will get a ```FileNotFoundError```. Run the following code and observe the outcome:

In [None]:
with open('a-non-existing-file.txt', 'r') as my_file:
    pass

## II. Reading from an Opened File

Once a file is open for reading, and assume that this is a text file, we can use the following syntax to read the content of the file line by line:

In [None]:
with open('presidents.txt', 'r') as my_file:
    for line in my_file:
        print(line)

We can see that the code in Line 2 above is a `for`-loop that goes through all the lines of the file ```my_file```, one at a time. Here we use the variable ```line``` to store a single line of content from ```my_file```.

To verify that the file ```presidents.txt``` indeed contains the names of four U.S. presidents and their terms of office, we can open the text file ```presidents.txt``` directly to inspect its content. 

### Removing the newline character

Let us now take a closer look at the code that reads the content of a text file line by line. Run the following code:

In [None]:
with open('presidents.txt', 'r') as my_file:
    for line in my_file:
        print("Content: [" + line + "]")

We can see that in the output of the code above, all the closing square brackets (```]```) are at the beginning of a new line. Why does this happen?

This because the string ```line``` contains a <b>newline character</b> at the end. The newline character is a special character that indicates line breaks. It is represented by ```'\n'```. Run the following code to understand a newline character:

In [None]:
print("This is one line.\nThis is another line.\nThis is a third line.\n")

When we read from a file line by line, each line contains a newline character at the very end. Usually we want to get rid of this newline character. To do so, we can use a method called ```rstrip()``` from the string class and pass in ```'\n'``` as an argument to this method, as shown below:

In [None]:
with open('presidents.txt', 'r') as my_file:
    for line in my_file:
        line = line.rstrip('\n')
        print("Content: [" + line + "]")

Take a look at Line 3 of the code above. The ```rstrip()``` method removes occurrences of the specified character at the end of a string. If that character occurs multiple times at the end of the string, all occurrences will be removed.

For example, run the following code to observe what ```rstrip()``` does when ```'\n'``` is passed in as its argument:

In [None]:
my_str = 'This is line 1.\nThis is line 2.\n\n'
my_new_str = my_str.rstrip('\n')

print('Before rstrip(): [' + my_str + ']')
print('After rstrip(): [' + my_new_str + ']')

You can see that the two newline characters at the end of ```my_str``` are both removed after ```rstrip()``` is called, but the newline character in the middle of ```my_str``` is untouched.

It is important to note that strings are immutable. Calling the ```rstrip()``` method __returns__ a new string, but the original string is not modified.

<img align="left" src='https://drive.google.com/uc?export=view&id=0B08uY8vosNfobDBuOXVXQWVxMFE' style="width: 60px; height: 60px;"><br />Let's do an exercise !

You are given a file called ```movies.txt```. Open the file, read it line by line, and display only those movies that were produced in 2000 (i.e., only those lines that contain ```'(2000)'``` as a substring).

Your code should produce the following output:

```
Requiem for a Dream (2000)
Memento (2000)
Amores Perros (2000)
Snatch (2000)
```

In [14]:
# Write your code here:
with open("movies.txt", "r") as f:
    for line in f:
        line = line.rstrip("\n")
        if line[-5:-1] == "2000":
            print(line)



Requiem for a Dream (2000)
Memento (2000)
Amores Perros (2000)
Snatch (2000)


## III. Reading a File with Several Columns

Oftentimes when we use a file to store data, each line of the file represents a single record (e.g., a student, a book, a movie). A single record may contain multiple pieces of information, and we may want to store them in different columns.

For example, the file ```presidents.txt``` stores the information of several U.S. presidents. For each president, we store the president's name and his term of office, separated by a tab character (```'\t'```), which can be seen as two columns.

When we read the information from a file containing multiple columns, we usually need to separate these columns so that we can easily retrieve the different pieces of information. The easiest way is to use the ```split()``` method to split a line into multiple columns.

For example, the code below retrieves the name and term of office of each president inside the file ```presidents.txt```.

In [7]:
with open('presidents.txt', 'r') as my_file:
    for line in my_file:
        line = line.rstrip('\n')
        columns = line.split('\t')
        name = columns[0]
        term = columns[1]
        print("Name: " + name + ", Term of Office: " + term)

Name: Joe Biden, Term of Office: 2021 - 
Name: Donald Trump, Term of Office: 2017 - 2021
Name: Barack Obama, Term of Office: 2009 - 2017
Name: George W. Bush, Term of Office: 2001 - 2009
Name: Bill Clinton, Term of Office: 1993 - 2001


In the next example, we have a CSV (comma-separated values) file called ```patients.csv```. Each line of the file contains the name, weight and height of a patient.

The code below opens the file, retrieves the information and prints out only the name of each patient:

In [15]:
with open('patients.csv', 'r') as my_file:
    for line in my_file:
        line = line.rstrip('\n')
        columns = line.split(',')
        name = columns[0]
        print("Patient Name: " + name)

Patient Name: Michael Lim
Patient Name: Nicholas Wong
Patient Name: Jerry Liu


<img align="left" src='https://drive.google.com/uc?export=view&id=0B08uY8vosNfobDBuOXVXQWVxMFE' style="width: 60px; height: 60px;"><br />Let's do an exercise !

Can you modify the code above to open ```patients.csv``` and calculate each patient's BMI value? Your code should generate the following output:

```
Patient Name: Michael Lim, BMI: 23.251459068069035
Patient Name: Nicholas Wong, BMI: 24.72555658151226
Patient Name: Jerry Liu, BMI: 23.570415879017013
```

Hint: (1) You can open the file ```patients.csv``` using Notepad++ to observe how the data inside looks like. (2) After extracting the height and weight from the file, you need to convert these strings into numbers before you can calculate the BMI. 

In [None]:
# Write your code here:
with open('patients.csv', 'r') as my_file:
    for line in my_file:
        line = line.rstrip('\n')
        columns = line.split(',')
        name = columns[0]
        weight = columns[1]
        height = columns[2]
        
        print("Patient Name: " + name + ", BMI: " + str())



## IV. Writing to a File

Besides reading from a file, we often also want to write data to a file. We can use the mode ```'w'``` to open a file for writing. 

Run the code below and check your current folder to see if a file ```hello_world.txt``` has been created:

In [None]:
with open('hello_world.txt', 'w') as output_file:
    output_file.write("Hello World!")

We can see that to write to a file that has been open, we can use the ```write()``` method. This method takes in a string as its argument, and that string will be written to the corresponding file.

Note that if the file opened already has some content inside, then opening the file in ```'w'``` mode and writing to it will <b>overwrite</b> the old content.

Run the code below and check the content of ```hello_world.txt``` again:

In [None]:
with open('hello_world.txt', 'w') as output_file:
    output_file.write("This line will replace the old text.")

### Writing multiple lines to a file

We can also write multiple lines to a file. However, we need to remember to end each line with a newline character ```'\n'```.

Compare the two files generated by the code below:

In [None]:
with open('output-1.txt', 'w') as output_file:
    for num in range(1, 11):
        output_file.write("Line " + str(num))

with open('output-2.txt', 'w') as output_file:
    for num in range(1, 11):
        output_file.write("Line " + str(num) + '\n')

<img align="left" src='https://drive.google.com/uc?export=view&id=0B08uY8vosNfobDBuOXVXQWVxMFE' style="width: 60px; height: 60px;"><br />Let's do an exercise !

Can you modify your previous code that calculates the BMI values of the patients in ```patients.csv``` such that the computed results are written to a text file called ```patients_bmi.txt```?

The generated file ```patients_bmi.txt``` should contain the following data:

```
Michael Lim, 23.251459068069035
Nicholas Wong, 24.72555658151226
Jerry Liu, 23.570415879017013
```

Hint: You can have nested ```with open(...)``` to open an input file and an output file at the same time. Alternatively, you can first store everyone's BMI value into a list while processing the input file. Then after you close the input file, you can open the output file, retrieve the information of each person's BMI from the list one by one and write the information into the output file line by line.

In [None]:
# Write your code here:


