# Chapter 6: Files and Errors

For now we have only worked with data which has been provided for us by the exercise or by a user with the `input()` function.

Using Python you can read data from files as well as save data into them. Python can process files of many types such as text, csv, xml, JSON etc. On this course we will be focusing on processing text and csv files.

Files make it easier to store information. Data created in a program can be permanently stored inside files. It is much easier to read data such as contact information from a file instead of requesting the user to input the information every time the program is executed.

Before reading a file the programmer needs to define what file is being processed and open the file using the `open()` function. This function returns a variable with which the file can be processed.

```Python
file = open(filename, mode)
```

The function takes as arguments the name of the file (`filename`) and the mode of processing (`mode`). The mode of processing defines whether the program is reading data from the file or writing data into the file. If we wish to read data from the file the mode of processing should be `'r'` (read). If instead we wish to write data into the file the mode of processing should be `'w'` (write). If the mode of processing argument is omitted (not given to the function) the default mode will be `'r'`.

```Python
file = open('testfile.txt', 'r')

file = open('testfile.txt', 'w')
```

**Note!** Both the file name and the mode of processing must be given as strings (so written inside quotation marks) so that the function will work properly. They can also be provided as variables and one often sees the file name given using a variable name. Here are two ways of opening a file as an example:

```Python
filename = 'testfile.txt'
file = open(filename, 'r')
######
file = open('testfile.txt', 'r')
```

## Reading a file line by line

As stated earlier, a file can be read in its entirety or line by line. Reading a file line by line can be achieved for example by using the method `readline()` of the file variable as follows:

```Python
file = open('testfile.txt', 'r')
row = file.readline()
```

Here the method is called with `file.readline()`. When calling the method for the first time it returns the first row of the file. Python will keep track with the file variable at which point of the file we are reading so the next time we call the method it will return the second row of the file and so on. When the file has been read to its end the method will return an empty string `""`. If the file contains empty rows the method will return a line break symbol `\n` and not an empty string.

## Closing a file

After reading a file it must be closed to keep the program functioning correctly. This can easily be forgotten and will lead to an error prompt so in that case remember to check if closing of the file is missing from the code.

A file is closed using the method `close()`. The method is called in the same manner as the previous `readline()`, for example like `file.close()`. Take note that the name of the file variable can be anything, not just `file`.

Let's take a look at an example file with four rows, one of which is empty:

```Python
first row
second row

fourth row
```

In [None]:
file = open('testfile.txt', 'r')
row = file.readline() # read the first line
while row != "": # readline ends with ""
    print(row)
    row = file.readline() # read the next row 
file.close()

As the example above shows, `readline()` will print an additional line break after every row as it returns the string including a line break symbol for every row. The `print()` function adds a line break automatically which can be seen as an extra row after each row in the file.

This can be fixed by using the `rstrip()` method for strings. It will remove the line break character from the ends of the rows as well as all other white space characters such as a space etc.

In [None]:
name = "testfile.txt"
file = open(name, 'r')
row = file.readline()
while row != "": # readline ends with ""
    row = row.rstrip()
    print(row)
    row = file.readline()
file.close()

Now the print leaves out the extra line breaks.

#### Exercise 1

Read the file `alphabet.txt` line by line and save each row in a string variable named `alphabets`. Then print this string. Remember to remove all additional spaces of line breaks in the ends of the rows.

In [None]:
# your code here

## Reading a file using a `for` loop

The previous example can also be done using a `for` loop. In this case the loop will go through each line at a time and we won't need to use the `readline()` method.

**Note!** If you use a `for` to read through a file you should not use the `readline()` method in the same program as the `for` loop already goes through the file line by line and the file variable tracks the progress.

In [None]:
name = "testfile.txt"
file = open(name,'r')
for row in file: # every row in file
    row = row.rstrip()
    print(row)
file.close()

**Example 1:** Name list

The program below uses the file 'namelist.txt' from which we will be searching some specific names. The `rstrip()` method is especially important in this program as without it we could not compare the names as every line in the file could contain additional white space characters.

In [None]:
filename = 'namelist.txt'
file = open(filename, 'r') # Open the file 
name = input('Tell me the name you are looking for: \n') # The name we are looking for
for row in file: # Go through the rows in the file
    row = row.rstrip() # Remove excess spaces and linebreaks from the end of the line
    if row == name: # If the name matches the row in the file
        result = True 
        break # We do not have to go through the file anymore, so we break from the loop
    else:
        result = False 
file.close()
if result: # Shorter form for if result == True
    print('Name', name, 'is on the list.')
else: # So result == False
    print('Name', name, 'was not found.')

The program goes through the file line by line and if a row matches the name in question it changes the value of the `result` variable to `True`. Otherwise the value of the `result` variable will be set as `False`. After the loop the program checks the value of the `result` variable and prints a text accordingly.

The lines in a file are always read as strings which means that numbers must be converted to integer (`int`) or floating point (`float`) format.

**Example 2:** The average of grades

The program below reads the grades of courses from a file and calculates their average. Notice the conversion of each row string to int.

In [None]:
sum_of_grades = 0
count = 0
file = open('grades.txt', 'r')
for row in file:
    row = row.rstrip()
    grade = int(row) # Saving grades as integers, because information is read as str
    sum_of_grades += grade
    count += 1
file.close()
if count == 0:
    print('There is no grades to calculate the mean from.')
else:
    mean = sum_of_grades / count 
    print('The mean of the grades:', mean)

## Reading the lines of a file into a list

A file can also be read through with the method `readlines()` which will save **all rows** in the file to a list (making each row an element in the list). After whis the rows can be processed with a `for` loop.

In [None]:
file = open('namelist.txt', 'r')
row_list = file.readlines()
file.close()
for row in row_list:
    row = row.strip()
    print(row)

As you may already have noticed, both methods `readline()` and `readlines()` as well as the `for` loop structure read data in the `str` format. As a result we can use all string methods such as `split()` and `strip()` (Chapter 6) on the lines read this way. When reading numbers you need to convert them to `int` or `float` format first if you want perform numeric operations on them using Python.

## Reading a file using `with`

A file can also be read using the keyword `with`:

```Python
with open(filename) as file:
```

`with` begins a new program block. During the execution of the block the file remains open. `with` keyword is quite user friendly for the programmer as it automatically closes the file when leaving the program block. Using `with` you can either go through the file all at once or line by line with a `for` loop.

In [None]:
with open('namelist.txt') as file1:
    content = file1.readlines()

for row in content:
    row = row.rstrip()
    print(row)

Reading a file line by line with `for` loop can be done as follows:

In [None]:
with open('namelist.txt') as file2:
    for line in file2:
        print(line.rstrip())

You can use whichever of the methods taught here for processing files. If you don't use the `with` keyword, remember always to close the file when the program no longer handles it.

#### Exercise 2

Read the file 'namelist.txt' using the `readlines()` method and print the name list in alphabetical order. Remove all unnecessary spaces and line breaks from the lines read using the `rstrip()` method. *Hint: If necessary, refresh your memory on how to access individual elements in a list using their index from Chapter 5.*

In [None]:
# your code here

## `split()`

Files may have multiple entries of data on a single line. In this case the line must be split up into parts if we wish to access individual entries. This can be achieved with the `split()` method. The delimiter (character splitting the entries) is given as an argument inside the parentheses in string format (typed in quotation marks). Example 3 goes over how to split the date and the average price of a company stock during that day.

```Python
row.split(',')
row.split('/')
row.split(' ')
```

## CSV files

You can also open and process CSV files (Comma Separated Values) using Python. CSV files are used for example for importing and exporting information in table format. In a CSV file the data is saved in text format in such a way that all columns are seperated with a comma. Many programs such as Excel allow the user to choose the delimiter when importing a CSV file.


**Example of a CSV file** 

The file below (`stock_prices.csv`) looks like this when printed out:

In [None]:
stock_file = open('stock_prices.csv', 'r')
for row in stock_file:
    print(row)
stock_file.close()

As we can see, when a CSV file is printed out as strings, the values get printed separated with commas.

Below there is an example of how one opens and processes CSV files using Python. The parts are separated using the `split()` method where the delimiter provided is a comma. This way we can access the elements separated with commas.

**Example 3:** The average price of a stock

In [None]:
stock_file = open('stock_prices.csv', 'r')
for row in stock_file:
    row = row.strip()
    parts = row.split(',')
    print(parts)
stock_file.close()

**Example 4:** Processing a CSV file

A CSV file contains a date and the average price of a company stock on that day. The program separates the values and calculates the average price for all days. The date here does not matter. We only want to save the prices of the stock each day for calculating the average.

In [None]:
sum_of_stock_prices = 0
count = 0
stock_file = open('stock_prices.csv', 'r')
for row in stock_file:
    row = row.strip()
    parts = row.split(',')  # Splits the string from the sign ','
    if len(parts) != 2:  # Checking if there is only two parts in the new list
        print('Error row') # This means that the list does not have exactly 2 elements
    else:
        stock = float(parts[1]) # We have to change the type of the number from string to float
        sum_of_stock_prices += stock
        count += 1
stock_file.close()
if count == 0:
    print('There are no stock prices')
else:
    mean = sum_of_stock_prices / count
    print('The overall mean of the stock prices is', mean)

#### Exercise 3

You have been provided with a file named `prices.csv`, which contains prices for different products. The first column contains the name of the product and the second column contains the price. The information is separated using a comma. Create a program which separates the parts of the rows and saves them into two lists named `products` and `prices` respectively and then prints out these lists. Again, remember to remove the line breaks from ends of the lines.

In [None]:
# your code here

## Writing data into a file

In addition to reading, data can also be written into files. When writing data into a file the file is opened in the same manner as when reading except that the mode of processing is either `'w'` (write) or `'a'` (append) instead of `'r'` (read). The mode `'w'` overwrites the existing file and removes all existing contents while `'a'` keeps all the original contents and writes to the end of the opened file. If a filename specified in the `open()` function does not exist, Python will create one with that name.

```Python 
file = open('testfile.txt', 'w')

file = open('testfile.txt', 'a')
```

To add text into a file you can use the method `write()`. **This method will not add a line break character to the end of the line. You need to add that yourself if you want it to be included.**  The method can only be used to input strings, which means that you need to convert numbers into strings before writing them into a file.

```Python
file.write('Random text\n')
```

**Example 5:** Writing into a file

In [None]:
file = open('incomplete_file.txt', 'w')
print('Give me rows and Ill add them to the file.\nStop with an empty row.')
row = input() #it is not necessary to have text as a parameter for input()
while row != "":
    file.write(row + "\n")
    row = input()
file.close()

#following part prints the file so that you see the result

file1 = open('incomplete_file.txt', 'r')
print('This is what the file looks like:\n')
for row in file1:
    row = row.rstrip()
    print(row)
file1.close()

#### Exercise 4: 

Create a file into which you will save the multiplication table (1-10) of a number given with user input. For example, if a user inputs the number 3, you will write into the file the multiplication table of the number 3 from one times three to ten times three (meaning 3, 6, 9, 12, etc.). Remember to add a line break character after every number.

After this, open the file in Python and print out the contents of the file to see if your program works correctly.

In [None]:
# your code here


## Understanding Filepaths

When working with files, it's important to understand filepaths. A filepath is a string that represents the location of a file or directory on your computer. There are two main types of filepaths:

1. **Absolute Filepaths**: An absolute filepath gives the complete path to a file or directory, starting from the root directory. It is independent of the current working directory.
2. **Relative Filepaths**: A relative filepath is relative to the current working directory. It does not start from the root directory.

### Current Working Directory

The current working directory (CWD) is the directory in which Python is running. You can find out the current working directory in Python using the `os` module, which we will cover a little further down. When using notebooks, the CWD is always the directory where notebook file is located.

### Absolute Filepaths

An absolute filepath always starts from the root directory. For example, on a Unix-based system, an absolute path might look like `/home/user/me/documents/file.txt`, whereas on a Windows system, it might look like `C:\Users\User\Documents\file.txt`.

### Relative Filepaths

A relative filepath is based on the current working directory. If your current working directory is `/home/user/me`, a relative path to the same file mentioned above (`/home/user/me/documents/file.txt`) would be `documents/file.txt`. So far, our examples have used relative filepaths, where the file is in the current working directory, i.e. `open("testfile.txt")`. 

With relative filepaths, it is also possible to traverse up the file system. This is done using the `..` notation that represents "going up a directory". If our CWD is `/home/user/me/`, then `open("../../library/code.py")` would mean that we are trying open a file at the path `/home/library/code.py`. The first `..` means that we are jumping one level above (to directory `user`), and repeating it makes us jump again one level further up (to `home`).

#### Nice-To-Know

You might also see the following notation when defining relative filepaths: `open("./documents/file.txt")`, where there is a `./` before the filepath. This is called an explicit relative filepath, but results in the same outcome as `open("documents/file.txt")`.

---

As we can see from the example below, the current working directory, when using a notebook, is the directory the notebook is located at:

In [None]:
import os

# Get the current working directory
cwd = os.getcwd()
print(f"Current Working Directory: {cwd}")

## Using the `os` Library

The `os` module in Python provides a way of using operating system dependent functionality like reading or writing to the file system. It is especially useful for manipulating files and directories.

### Common Operations with `os`

```
os.getcwd() - Returns the current working directory
os.listdir(path) - Lists the files and directories in the specified path
os.mkdir(path) - Creates a new directory at the specified path
os.remove(path) - Deletes the specified file
os.rename(path1, path2) - Modifies the location of a file in path1 to path2
os.rmdir(path) - Removes the specified empty directory
os.path.join(path1, path2) - Joins multiple path components into a single path
os.path.split(path) - split a path into parts
os.path.exists(path) - Checks if the specified path exists
os.path.isfile(path) - Checks if the specified path is a file
os.path.isdir(path) - Checks if the specified path is a directory
```

Let's explore some of these operations. More about the os library can be read from [here](https://www.geeksforgeeks.org/os-module-python-examples/) and [here](https://www.geeksforgeeks.org/os-module-python-examples/).


In [None]:
import os

cwd = os.getcwd()
path2 = "exercise_test"
joined_path = os.path.join(cwd, path2) # Join the two paths together
print("Joined path is:", joined_path)

os.mkdir(joined_path)
print("Files and directories after creating a new directory:\n", os.listdir(cwd))

print("Does the new directory exist:", os.path.exists(joined_path))
print("Is this path a file:", os.path.isfile(joined_path))
print("Is this path a directory?", os.path.isdir(joined_path))


os.rmdir(joined_path) # Remove the directory 
print("Files and directories after deleting the new directory:\n", os.listdir(cwd))
print("Does the directory exist anymore:", os.path.exists(joined_path))

#### Exercise 5

In this exercise we will get more familiar with filepath operations. 

- Create two directories, called `upper` and `lower`. The directory `upper` should be created to the parent folder of the current working directory, and the directory `lower` should be created to the current working directory. Create the folders using `os.mkdir()`
- To both folders, create a text file called `text.txt` with the string `Hello World`
- Finally, use the `os.exists()` function to check that that both the text files exist. Print the results of the function calls.


In [None]:
# Your code here

#### Exercise 6

In this exercise, you will work with the `os` library to perform various file and directory operations.

You should: 
- List all files in the current directory
- Create a new directory named `test_directory`
- Move `sample_file.txt` into `test_directory`
- Rename the file inside `test_directory` to `renamed_file.txt`
- Delete `renamed_file.txt` and then delete the `test_directory`

Hints: 
- Use `os.listdir()` to list files
- Use `os.mkdir()` to create directories
- Use `os.path.join` to combine filepaths
- Use `os.rename()` to rename files and directories
- Use `os.remove()` to delete files
- Use `os.rmdir()` to delete directories


In [None]:
# Your code here


## Errors in Python

Errors are an integral part of the programming process. As a developer, you'll encounter situations where your program behaves unexpectedly or fails to execute. These occurrences, while potentially frustrating, are normal and provide valuable learning opportunities. In Python, these errors are generally referred to as exceptions.

In Python, we primarily deal with two types of errors:

**Syntax Errors** - these errors occur when the Python interpreter encounters code that violates the language's syntax rules. Essentially, it's when you've written something that Python cannot understand or interpret. For example:

`print "Hello World!"`

would cause an `SyntaxError` exception. Syntax errors are detected during the parsing phase, before the code is executed.

**Runtime Errors** - these errors occur during the execution of the program. The code may be syntactically correct, but it attempts to do something that is not allowed or possible. For example: 

`x = 10 / 0`
would cause an `ZeroDivisionError` exception. Runtime errors only become apparent when the specific line of code is executed.

--- 

In Python, the term "exception" is used to describe an error that occurs during the execution of a program. When Python encounters a situation that it cannot handle, it raises an exception. This is the language's way of signaling that something unexpected or problematic has occurred.

Python has a wide variety of built-in exceptions for different situations, for example:

- SyntaxError: For syntax issues
- KeyError: When trying to access a non-existent dictionary key
- IndexError: When trying to access a non-existent list index
- NameError: When using a variable that hasn't been defined
- FileNotFoundError: When trying to open a file that doesn't exist


A comprehensive list of Python's built-in exceptions can be found in the official [Python documentation](https://docs.python.org/3/library/exceptions.html).

The purpose of these specific exceptions is to help programmers identify and diagnose issues in their code more effectively. Each type of exception provides information about what went wrong, making solving the issue easier.


## Error Handling

By default, when an unhandled exception occurs, Python will stop the execution of the program and display an error message. This behavior is often desirable during development as it immediately brings attention to issues. However, in some scenarios, especially in production environments, we might want our program to continue running despite encountering an error.

Exception handling is a programming concept that allows us to manage errors gracefully without causing the entire program to crash. In Python, this is primarily done using the `try-except` block.

```python
try:
    # code that might raise an error
except:
    # what the program should do if an error is raised

```
The `try` block contains the code that might throw an error. The `except` block lets you handle the error. This means that if inside the `try` block an exception is raised, the program automatically moves to inside the `except` block.

In [None]:
try:
    a = 10 / 0
except:
    print("I went here because an exception was raised in the try block!")

If we use the `except` as is, it well catch all possible exceptions that happen in the `try` block. However, we can also specify which exception we want to catch. This allows the program to act accordingly depending on the situation.

In [None]:
try:
    a = 10 / 0
except ZeroDivisionError:
    print("I went here because an exception was raised in the try block!")

You can have multiple `except` blocks to handle multiple specific of exceptions.

In [None]:
try: 
    num = int(input("Give a number: ")) # This will result in a ValueError, if the user gives something else than a number as input!
    l = [num]
    print(l[2]) # This will always result in an IndexError, which makes the program move to the second 'except' block
except ValueError:
    print("User did not give a number!")
except IndexError:
    print("Tried to access a list element that does not exist!")

The try-except statement also supports two additional blocks: `else` and `finally`.

- The `else` block will be completed if **no exceptions** were raised
- `finally` block will be **always** executed, regardless if there were no exceptions raised or not

In [None]:
try: 
    num = int(input("Give a number: ")) # This will result in an error if the user gives something else than a number as input!
except ValueError:
    print("User did not give a number!")
else:
    print("User gave the number:", num)
finally:
    print("This line is always printed")

Error handling can improve your program in several ways: 

- Your program can continue running even when errors occur, this makes them more robust
- You can provide meaningful error messages to users that improve user experience
- By catching and logging exceptions, you can gather valuable information about issues in your program. This often helps when you are trying to debug your code!

Some good practices to remember when using exceptions:

- Be specific with exceptions - Always catch specific exceptions rather than using a bare `except`
- Provide meaningful error messages - Give users clear messages that explain what went wrong
- Use the `finally` block to handle any necessary clean ups, e.g. closing files or connections



#### Exercise 7

Lets practice handling exceptions in Python. You are tasked to catch and manage errors that can occur during division operations.

Write a Python program that divides two numbers provided by the user. Division is a straightforward operation, but several potential issues can arise:

- The user might accidentally provide non-numeric input, causing a `ValueError`
- The user could enter zero as the divisor, which would cause a `ZeroDivisionError`
- Unexpected input or errors might occur that aren’t initially anticipated

Your goal is to ensure that your program can gracefully handle these errors and provide helpful feedback to the user.

You should:

- Prompt the user to enter two numbers and perform the division operation with the two numbers
    - Use an except block to catch any `ValueError` that occurs if the user inputs something that isn’t a number
    - Use an except block to catch a `ZeroDivisionError` if the user attempts to divide by zero
- Include a general except block to catch any other unforeseen errors
- Ensure that your program prints a relevant message to the user for each type of error
- Use a finally block to print a message indicating that the program has finished, regardless of whether an error occurred

In [None]:
# Your code here

#### Exercise 8

In this exercise, you are going to open two separate files. The filenames are given by the user as an input. Both files contain one number as informaton. Your task is to open the files, and calculate the sum between the two numbers. The files are generated by a code snippet that is given to you below.

Your program should take into account the following situations:

- User gives the wrong filename as input, which raises the `FileNotFoundError`
- The content of the file is not a number, which would raise a `ValueError`
- An unexpected error happens

In [None]:
# Code to generate the files
import random
import os 

for i in range(2):
    filename = os.urandom(2).hex() # Generate a filename on random
    with open(filename+".txt", "w") as f:
        if random.random() < 0.2: # 20% chance that instead of a number, we write a string in to the file
            f.write("dummy text")
        else:
            f.write(str(random.randint(1, 101)))
        print("Created file", filename+".txt")

In [None]:
# Your code here

## Recap

Data can be read from files using Python. Files help storing data permanently and the data does not need to be created separately again.

A file is opened with the `open()` function which takes as arguments the name of the file and the mode of processing `'r'` (reading a file) or `'w'` (writing into a file).

```Python
file = open('testfile.txt','r')
file = open('testfile.txt','w')
```

Files can be read line by line with the method `readline()`. If one wants to read the entire file in a single read, one can use the `readlines()` method, which saves the contents of the file into a list.

A file is closed in the end using the method `close()`.

In [None]:
file = open('testfile.txt', 'r')
row = file.readline()
while row != "": #readline ends with ""
    print(row)
    row = file.readline()
file.close()

The `rstrip()` method removes the spaces and line break characters (all whitespace characters to be precise) **from the end of a string**.

Files can also be read using a `for` loop or the keyword `with`.

In [None]:
file = open('namelist.txt', 'r')
row_list = file.readlines()
file.close()
for row in row_list:
    row = row.strip()
    print(row)

The `split()` method can be used to split the lines in a file to smaller parts. This is used especially in the processing of CSV files. CSV files are tables where columns are separated with a comma.

```Python
row.split(',')
```

Data can be written into a file using the method `write()`. If a file with the given name does not exist, the `open()` function will create one with that name when the mode of processing is `'w'` or `'a'`.

```Python
testfile = open('new_file.txt', 'w')
testfile.write('Random text')
```

Filepaths can be divided in to two categories: relative and absolute. Relative filepaths are with respect to your current working directory, and absolute filepaths start from the root of the filesystem.

```Python
file = open("relative/filepath/would/be/something/like/this.txt")

file = open("/absolute/filepath/starts/with/a/slash.txt")
```

The `os` library provides a lot of tools for creating, removing, renaming files as well as manipulating filepaths.

Errors are a natural part of programming, and in Python they are called *exceptions*.

Exceptions are raised whenever Python encounters something that is unexpected or problematic. They can be divided in to two categories:

- Syntax Errors: when something is broken before the code is even run
- Runtime Errors: when something problematic is encountered when the code is being executed

Exception handling is a concept where a program is not instantly terminated if it encounters an error. In Python it can be achieved with the `try-except` structure:

```python
try:
    # code that might cause an error
except:
    # what we do if an exception is raised
```

We can also specify which exception we want to handle:

```python
try:
    a = 10 / 0
except ZeroDivisionError:
    print("Cannot divide by zero!")
```

The try-except structure also supports additional blocks: `else` and `finally`:

```python
try:
    num = int(input("Give a number: "))
    div = 10 / num
except ValueError:
    print("User did not give a number!")
except ZeroDivisionError:
    print("Cannot divide by zero!")
else:
    print("Everything was OK!")
finally:
    print("Finishing the program")
```