# Files and Exceptions

We often need to read data from files. With Python we can read many different formats, for example Word documents, PDF documents, or tabular data in Excel or CSV format.
Some of these file formats require third-party libraries.
Here, we will look at the reading from and writing to plain text files.

## Opening Files
We can open a file containing the introduction to an ECHR case.

In [None]:
filename = 'LO-NTF-v-Norway.txt'

Python has the function `open()` for opening files.
We must specify the text encoding, which is often UTF-8.

We could use this function directly and assign the result to a variable.

In [None]:
file = open(filename, encoding='UTF-8')

However, open files consume system resources. Therefore, we must always remember to close files when we are finished with them.
A large program or web application that keeps opening files without closing them will eventually run out of memory and crash.

Python can automatically close files for us if we use the `with` statement.

In [None]:
with open(filename, encoding='UTF-8') as file:
    print(file)

```{caution}
*Always* use `with` when opening files.
```

```{note}
Notice that the `print()` statement above prints a description of the file object, not the file content.
We must use methods of this object to get the content.
```

## Reading the File Content

We can use the method `readline()` to read single lines from the file.

In [None]:
with open(filename, encoding='UTF-8') as file:
    print(file.readline())

We can also use a `for` loop to process the file line by line.

In [None]:
with open(filename, encoding='UTF-8') as file:
    for line in file:
        print(line)

## Removing Whitespace

When we print the file content above, we get a blank line between each line.
This is because the lines we read contain a newline, `\n`, and the print statement also insert a newline.
To avoid this, we should remove *leading* and *trailing* whitespace with the string method `strip()`.

In [None]:
with open(filename, encoding='UTF-8') as file:
    for line in file:
        line = line.strip()
        print(line)

## Splitting Strings

Sometimes we need to process text word by word.
To do this, we can use the string method `split()`, which splits a string on whitespace by default.
We can also specify some other character to split on.

In [None]:
with open(filename, encoding='UTF-8') as file:
    line = file.readline()
    line = line.strip()
    words = line.split()
    print(words)

## Joining Strings
When we have processed the information in the list, we can `join()` the items into a new string.
We could use a new separator for joining the items.
For example, in filenames we might use underscores instead of spaces.

In [None]:
line = '_'.join(words)
print(line)

## Extracting Information

We want to extract the list of judges from the case into a Python list.
The list of judges starts with the President, and ends with the Registrar.
We can use these cues to extract the list.

In [None]:
found_start = False
judges = []

with open(filename, encoding='UTF-8') as file:
    for line in file:
        line = line.strip()
        if not found_start:
            if 'president' in line.lower():
                found_start = True
                judges.append(line)
        else:
            judges.append(line)
            if 'registrar' in line.lower():
                break

print(judges)

Here, we use the statement `break` to stop the loop as soon as we find the registrar.
This code still has room for improvement. For example, the extracted names contain commas.
This is left as an exercise.

```{note}
We could easily extract this information by hand from a single document.
But with Python code, we can extract the information from *thousands* of documents in a short time.
```

## Exceptions

When something goes wrong in a program, an *exception* is *raised*.
An exception is a "signal" that an error has occurred and must be handled.
For example, exceptions can occur when user input doesn't match the expectations.
We should handle exceptions that might occur.

For example, trying to open a file that doesn't exist raises an exception:

In [None]:
filename = 'non-existing-file.txt' # often from user input
with open(filename) as file:
    print(file)

In this case, we get a `FileNotFoundError` exception.
Unhandled exceptions make the program stop or crash.

## Handling Exceptions
We can handle exceptions with `try` and `except` statements.

In [None]:
try:
    with open(filename) as file:
        print(file)
except FileNotFoundError:
    print('no such file:', filename)

Now, instead of crashing the program will keep running.

## Handling Multiple Exceptions

We can also handle multiple exceptions. We can handle different exceptions differently.

In [None]:
try:
    with open(filename) as file:
        print(file)
except FileNotFoundError:
    print('no such file:', filename)
except IOError as e:
    print('Error opening file:', e)

We handle more specific exceptions first, then more general exceptions. The most general exception is just called `Exception`.

In [None]:
try:
    with open(filename) as file:
        print(file)
except FileNotFoundError:
    print('no such file:', filename)
except IOError as e:
    print('Error reading from file:', e)
except Exception as e:
    print('Exception:', e)

## Writing Files

We can also write data to files. Let's store the list of judges in a text file.

In [None]:
output_file_name = 'judges.txt'

When we want to open a file for writing, we need to specify *writing mode*, with the mode parameter `'w'`.

The mode has the default value `'r'` for reading, but for consistency we can specify this parameter even when reading.

In [None]:
with open(output_file_name, 'w', encoding='UTF-8') as outfile:
    pass

```{note}
We use the `pass` statements to do nothing in the code block.
The `with` statement and all other statements expecting an indented code block must contain at least one statement to be valid.
```

Once the file has been opened, we can write to it with a `print()` statement.
We must give `print()` a `file` parameter to send the text to a file instead of the console.

In [None]:
with open(output_file_name, 'w', encoding='UTF-8') as outfile:
    print(judges, file=outfile)