## File Handling

Reading and writing files are very common tasks that come in handy in programming. You can write a program to **read** data from files and also **write** computed results to files. This provides us with the basic infrastructure to process large amount of data in a few seconds.

In this session we will deal *text files*. They are files that only consists of lines of text without formatting information. For example, word documents or pages files are not *text files*, the formatting information contained in them can make handling them in a program more complicated.


### Reading Files

To work with files in Python, you need to open the file first. The built-in `open()` function does what the name implies - it opens a file and returns it as a file object. So to use the `open()` function, you would need to save the returned value to a variable, so that you can access the opened file later.

After the file is opened, you can use the `.read()` method to read content from it.

Remember to always close the file with `.close()` method after you are done. Closing files on time helps to protect you from potential data loss. Also, an opened file prevents other processes from accessing the same file until your current program terminates.



In [13]:
my_file = open ("example.txt") # step 1: open
content = my_file.read() # step 2: read
print(content)
my_file.close() # step 3: close

Hello there!
This example file contains three lines of text.
This is the last line.



Because we tend to be forgetful, it is recommended to open a file in combination with the `with` statement.

In this way, the header line opens the file, and you can access the file only in the following block. After the block the file is automatically closed and you get to write less code. The following code does the same with the code above, but using `with` statement:



In [1]:
with open("example.txt") as new_file:
    content = new_file.read()
    print(content)

Hello there!
This example file contains three lines of text.
This is the last line.


The method `read()` returns the contents of the file as a single string. This can be useful if we want to have access to the entire content of a file. But what is used more often is to go through the file line by line.

The `readline()` method returns one line from the file. It automatically goes to the next line when you call it for the second time.





In [22]:
f = open("example.txt")
print(f.readline())
print(f.readline())
f.close()

Hello there!

This example file contains three lines of text.



In this way, we could iterate over the file using a for loop and the `readline()` method

In [24]:
f = open("example.txt")
for i in range(3):
    print(f.readline())
f.close()

Hello there!

This example file contains three lines of text.

This is the last line.


Another way to do the same thing is to use a `for` loop directly on the returned file object.

In [15]:
with open("example.txt") as new_file:
    for line in new_file:
        line = line.replace("\n", "")
        print(line)

Hello there!
This example file contains three lines of text.
This is the last line.


If the file is located in a different location, you will have to specify the file path, like this:


In [26]:
f = open("../test.txt") # either relative or absolute path
print(f.read())

this file exists outside of the week7 folder



#### Task1:
Create a text file in command line somewhere in your system and access it by modifying the code above

---
### Writing Files

The `open()` function has different modes. By default it is in read mode. That's why we can read a file without specifying the mode.

In [None]:
f1 = open("example.txt")
f2 = open("example.txt", "rt") # this is basically the same as the code above, r stands for read while t stands for text mode.  Another mode is b, binary mode (e.g. images)
f1.close()
f2.close()

To write to an existing file, we must add a parameter to the `open()` function:

The `a` parameter will append the content to the end of the file.

In [31]:
# If you write this multiple times, the content gets longer
with open("example.txt", "a") as f:
  f.write("\nNow the file has more content!")

#open and read the file after the appending:
with open("example.txt") as f:
  print(f.read())

Hello there!
This example file contains three lines of text.
This is the last line.
Now the file has more content!
Now the file has more content!
Now the file has more content!


Or to overwrite the existing content to the file, use the `w` parameter:

In [41]:
with open("example.txt", "w") as f:
  f.write("This text will overwrite the current text in the file. ")
  f.write("It doesn't add line breaks automatically like print().")
  f.write("\nSo you need to add line breaks manually by using \\n")

with open("example.txt") as f:
  print(f.read())

This text will overwrite the current text in the file. It doesn't add line breaks automatically like print().
So you need to add line breaks manually by using \n


To create a new file in Python, you can use the `open()` method with one of the following parameters:
- `x`: Create - will create a file, returns an error if the file exists
- `a`: Append - will create a file if the specified file does not exist
- `w`: Write - will create a file if the specified file does not exist

In [None]:
with open("new_file.txt", "w") as f:
  f.write("This will create a new file if the file doesn't exist yet. But also won't raise an error if it already exists.")

with open("new_file.txt") as f:
    print(f.read())

This will create a new file if the file doesn't exist yet. But also won't raise an error if it already exist.


### Clearing and Delete Files

Sometimes it is necessary to clear the contents of an existing file. You can achieve it by opening the file in write mode and closing the file immediately:

In [43]:
open("example.txt","w").close()

# Or another way to achieve the same thing
with open("example.txt","w") as f:
    pass

To delete a file, you must import the OS module, and run its `os.remove()` function:

In [37]:
import os
os.remove("new_file.txt")

To avoid getting an error, you might want to check if the file exists before you try to delete it:

In [38]:
if os.path.exists("new_file.txt"):
  os.remove("new_file.txt")
else:
  print("The file does not exist")

The file does not exist


#### Task 2:
Write a program that asks two inputs:
 1. The name of the user (string)
 2. How many weeks have you learned python (int)

It shall create a `message.txt` file using the name of the user and generates the GitHub repo structure based on the number of weeks.
The contents of your text file should be something like this:
```
Hi user_name, we hope you enjoy learning Python here!
Your github repo should look like this:
    assignments/
        week1/
        week2/
        ...
        week7/
 ```


---
### Dealing with CSVs


A CSV file, short for comma-separated Values, is a text file which contains data separated by a predetermined character. The most common characters used for this purpose are the comma `,` and the semicolon `;`, but any character is, in principle, possible.

CSV files are commonly used to store records of different kinds. Many database and spreadsheet programs, such as Excel, can import and export data in CSV format, which makes data exchange between different systems easy.

We already learnt we can go through the lines in a file with a for loop, but how can we separate the different fields on a single line?

In [114]:
with open("fruits.csv") as f:
    for line in f:
        line = line.replace("\n", "")
        print(line)
        # how to access each field?

Fruit,Price per kg,Origin
Apple,2.99,Germany
Banana,1.39,Ecuador
Orange,2.99,Spain
Grapes,1.99,Italy
Strawberry,3.99,Germany
Kiwi,2.50,New Zealand
Pineapple,2.80,Costa Rica
Blueberry,5.00,Spain
Pear,2.20,Netherlands
Watermelon,1.39,Spain


#### Task 3: From CSV to Dictionary

The file `fruits.csv` contains names of fruits, their prices and origins.

Please write a function named `read_fruits` which reads the file and returns a dictionary based on the contents. In the dictionary, the name of the fruit should be the key, and the value should be its price. Prices should be of type float.

In [113]:
def read_fruits():
    dict = {}
    return dict

read_fruits()

{}


There is also a module dedicated to CSV so that we don't need to manually split each row.

In [77]:
import csv

# initializing the rows list
rows = []

# reading csv file
with open("fruits.csv", 'r') as csvfile:
    # creating a csv reader object
    csvreader = csv.reader(csvfile)

    # extracting field names through first row, notice that it removes the first row at the same time from csvreader
    fields = next(csvreader)

    # printing the field names
    print('Field names are:', fields)

    # extracting each data row one by one, notice that the items are already splitted for you
    for row in csvreader:
        rows.append(row)
        print(row)

    # get total number of rows
    print("Total no. of rows:", csvreader.line_num)

Field names are: ['Fruit', 'Price per kg', 'Origin']
['Apple', '2.99', 'Germany']
['Banana', '1.39', 'Ecuador']
['Orange', '2.99', 'Spain']
['Grapes', '1.99', 'Italy']
['Strawberry', '3.99', 'Germany']
['Kiwi', '2.50', 'New Zealand']
['Pineapple', '2.80', 'Costa Rica']
['Blueberry', '5.00', 'Spain']
['Pear', '2.20', 'Netherlands']
['Watermelon', '1.39', 'Spain']
Total no. of rows: 11


We can print the rows in some nicely formatted ways. There are many ways to do it. Here we are showing one way of formatting using f string.

In this example, note that each replacement field `{}` contains a string that starts with a colon `:`. That’s a format specifier. The `.10s` part tells Python that you want to format the value as a string (s) with the length of 10.

In [80]:
# printing first 3 rows
print('\nFirst 3 rows are:\n')
for row in rows[:3]:
    # parsing each column of a row
    for col in row:
        print(f"{col:10s}", end=" "), # start specify formatting with a colon :
    print('\n')


First 3 rows are:

Apple      2.99       Germany    

Banana     1.39       Ecuador    

Orange     2.99       Spain      



`DictReader` create an object that operates like a regular reader but maps the information in each row to a dict whose keys are given by the optional `fieldnames` parameter.

In [83]:
rows = []

with open('fruits.csv') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row)
        rows.append(row)

{'Fruit': 'Apple', 'Price per kg': '2.99', 'Origin': 'Germany'}
{'Fruit': 'Banana', 'Price per kg': '1.39', 'Origin': 'Ecuador'}
{'Fruit': 'Orange', 'Price per kg': '2.99', 'Origin': 'Spain'}
{'Fruit': 'Grapes', 'Price per kg': '1.99', 'Origin': 'Italy'}
{'Fruit': 'Strawberry', 'Price per kg': '3.99', 'Origin': 'Germany'}
{'Fruit': 'Kiwi', 'Price per kg': '2.50', 'Origin': 'New Zealand'}
{'Fruit': 'Pineapple', 'Price per kg': '2.80', 'Origin': 'Costa Rica'}
{'Fruit': 'Blueberry', 'Price per kg': '5.00', 'Origin': 'Spain'}
{'Fruit': 'Pear', 'Price per kg': '2.20', 'Origin': 'Netherlands'}
{'Fruit': 'Watermelon', 'Price per kg': '1.39', 'Origin': 'Spain'}


In a dictionary format, we can access each single cell with more precision, and it can be very helpful if you want to do some fine processing of the data.

In [112]:
for row in rows:
    name = row["Fruit"]
    price = row["Price per kg"]
    origin = row["Origin"]
    print(f"{name:11s}", end=" ")
    # 5.2f specifies a floating-point number with two decimal places padded to a total width of 5 characters.
    print(f"{float(price):5.2f} ", end=" ")
    print(f"{origin}")
    print()

Apple        2.99  Germany

Banana       1.39  Ecuador

Orange       2.99  Spain

Grapes       1.99  Italy

Strawberry   3.99  Germany

Kiwi         2.50  New Zealand

Pineapple    2.80  Costa Rica

Blueberry    5.00  Spain

Pear         2.20  Netherlands

Watermelon   1.39  Spain



But data are not always perfect clean and tidy. What if your data contains extra spaces or line breaks? Python's string method `strip()` will be very useful in this case. It removes all spaces, line breaks, tabs from the beginning and end of a string.

In [120]:
print(" tryout ".strip())
print("\ntryout   ".strip())

tryout
tryout


There are multiple ways to write to a CSV file. One is to write to them as a normal text file.

In [146]:
with open('fruits.csv','a') as f:
    f.write("Cherry,5.99,Spain")

Or use `.writerow()` method from the CSV writer. Pay attention that it takes all the cells in a row as a list of strings.

In [154]:
import csv

with open("fruits.csv", "a") as file:
    writer = csv.writer(file)
    writer.writerow(["Mango","1.39","Spain"])

The third way is to use `DictWriter`. For both the normal writer and the `DictWriter` you can use `writerows()` to write multiple rows at one time.

In [157]:
with open('names.csv', 'w', newline='') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames) # specify fieldnames as a parameter
    writer.writeheader() # First writeheader() then writerow()
    writer.writerow({'first_name': 'Wonderful', 'last_name': 'Beans'})
    writer.writerows([{'first_name': 'Annoying', 'last_name': 'Spams'}, {'first_name': 'Lovely', 'last_name': 'Spams'}])
    #

#### Task 4:
Add a new column "Available" to "fruits.csv" and populate the value with random `True/False` values
- Hint: You need to first add the column header "Available" to first row, then appends the corresponding values to each row below.
- You can use all three ways introduced above to do this task, but which way is better for what kind of scenario?

---
### Exceptions Handling

Some common errors when working with files are **FileNotFoundError** (when trying to access a file which doesn't exist), **io.UnsupportedOperation** (when trying to perform an operation on a file which is not supported by the mode in which the file is opened) or **PermissionError** (the program lacks necessary permissions to access the file).

In [148]:
try:
    with open("example.txt") as my_file:
        for line in my_file:
            print(line)
except FileNotFoundError:
    print("The file example.txt was not found")
except PermissionError:
    print("No permission to access the file example.txt")


Hello there!

This example file contains three lines of text.

This is the last line.


You can also raise errors according to your own needs.

In [145]:
def age(x):
    if type(x) is not int:
        raise TypeError("Only integers are allowed")
    if x < 0:
        raise ValueError("Sorry, age shall be larger than zero")
    return x

age(-1)

ValueError: Sorry, age shall be larger than zero