# Handling files and exceptions

## Handling files with Pathlib

Reading and writing files is one of the most common tasks in Python and there are several built in tools to help you do this efficiently. The classic way to handle files in Python is using the `os` module, often in combination with the `glob` module for matching filepath patterns. The problem with these modules however, is that they rely on filepaths being represented as strings, which presents its own set of problems when we are working on different operating systems that use different conventions for representing filepaths and directories. For example, Windows uses the backslash to separate directories `\` whereas Linux uses the forward slash `/`. 

The more recent addition to standard Python is the **Pathlib** module, which makes handling filepaths easier and provides better interoperabilty between operating systems. Pathlib should be the prefered choice for handling paths, and in this lesson we will demonstrate why.

First, let's consider how we would use the traditional `os` built-in library in Python:

In [1]:
import datetime as dt
import os

# `os` module used to be the way to do it
current_dir = os.getcwd()
print(f"Current directory: {current_dir}")

new_dir = os.path.join(os.getcwd(), 'my_new_dir')
print(f"Joining a path: {new_dir}")

# note that paths are strings
print(type(new_dir))

Current directory: c:\Users\decval\python-improvers-2\lessons
Joining a path: c:\Users\decval\python-improvers-2\lessons\my_new_dir
<class 'str'>


As we can see, the `new_dir` variable is a *string* type (where it says `<class 'str'>`)

When we use Pathlib however, note that we get a different output:

In [2]:
from pathlib import Path

## Pathlib uses Path objects
current_dir = Path.cwd()

print(current_dir)
print(type(current_dir))
print(f"is_file(): {current_dir.is_file()}")
print(f"exists(): {current_dir.exists()}")
print(f"parent: {current_dir.parent}")

c:\Users\decval\python-improvers-2\lessons
<class 'pathlib._local.WindowsPath'>
is_file(): False
exists(): True
parent: c:\Users\decval\python-improvers-2


Our `current_dir` variable is a pathlib object, rather than just a string. Also, note that because we are working in Windows in this example, pathlib has correctly identified this and further categorised our `current_dir` variable as WindowsPath specifically. Pathlib will automatically set the correct directory separators, and if a colleague were to run your script on a Linux server for example, Pathlib would know to convert these automatically to the Linux-style path format. The code is therefore more easily reproducible on different platforms without having to make any changes.

In a script, you can use the __file__ variable to get current directory,
but this doesn't work in notebooks.

```python
SCRIPT_DIR = Path(__file__).parent
print(f"script_dir: {SCRIPT_DIR}")
```

Pathlib has further builtin functions to help you create directories and files. Note that the syntax for directory separators uses the `/` operator to chain directories together, but this is automatically interpreted to be the correct separator depending on the operating system the script is run on.

Pathlib also introduces method to help write text to files, with the `write_text` method. Reading from Pathlib file objects is done using `read_text`. Pathlib's `iterdir` method also gives you a list of the directory contents by iterating over the directory structure. `iterdir` is an iterator object, so you must convert it to a list first using `list()` if you want to create a printable/readable version of the object.

In [4]:
# Pathlib can create directories and files.
working_dir = current_dir.parent / "work"
working_dir.mkdir(exist_ok=True)

hello_file = working_dir / "hello.txt"
print(f"file exists: {hello_file.exists()}")

# write_text writes text.  Compare this to other file writing methods.
hello_file.write_text("Hello World!")

print(f"file contents: {hello_file.read_text()}")
print(f"file exists: {hello_file.exists()}")
print(f"directory contents: {list(working_dir.iterdir())}")

file exists: False
file contents: Hello World!
file exists: True
directory contents: [WindowsPath('c:/Users/decval/python-improvers-2/work/hello.txt')]


### More Pathlib helper functions

Pathlib contains helpful additional methods that give us metadata about the filepath or directory path. For example, we often might want to get the name of the file without the suffix, the file suffix itself, the full absolute path, or the parent directory of a file. Pathlib provides functionality for readily accessing this metadata:

In [None]:
# Pathlib can retrieve information about files
print(f"name: {hello_file.name}")
print(f"suffix: {hello_file.suffix}")
print(f"stem: {hello_file.stem}")
print(f"absolute path: {hello_file.absolute()}")
print(f"parent: {hello_file.parent}")

# We can also get the size and modifcation metadata
print(type(hello_file.stat()))
print(f"modified: {dt.datetime.fromtimestamp(hello_file.stat().st_mtime)}")
print(f"size: {hello_file.stat().st_size} bytes")

name: hello.txt
suffix: .txt
stem: hello
absolute path: c:\Users\decval\python-improvers-2\work\hello.txt
parent: c:\Users\decval\python-improvers-2\work
<class 'os.stat_result'>
modified: 2026-01-26 12:32:55.780765
size: 12 bytes


### Iterating over files and directories

We can use the `glob` function with a Pathlib object to iterate over directory contents and match wildcard expressions, similarly to how we might use the `glob` module. For example if we wanted to find all the ipython notebooks in the current directory we could do:

In [6]:
# Use glob and rglob to list files
for file_ in current_dir.glob("*.ipynb"):
    print(file_.name)

00_environments_and_editors.ipynb
01_linters_and_code_style.ipynb
02_functions_modules_tests.ipynb
03_handling_files_and_exceptions.ipynb
04_stdlib_tools.ipynb
05_classes_represent_objects.ipynb
pathlib.ipynb
text_parse.ipynb
time_series_data.ipynb


### Exercise

+ Write a script that will print the size of the largest `.txt` file in the data directory.

### Renaming files

We can rename files that are defined by a Path object easily:

In [8]:
# Use rename to rename. Note that original Path location is unchanged.
new_hello_file = hello_file.parent / 'hello_again.txt'
hello_file.rename(new_hello_file)
print(f"name: {hello_file.absolute()}")
print(f"exists: {hello_file.exists()}")
print(f"new name: {new_hello_file.absolute()}")
print(f"exists: {new_hello_file.exists()}")

name: c:\Users\decval\python-improvers-2\work\hello.txt
exists: False
new name: c:\Users\decval\python-improvers-2\work\hello_again.txt
exists: True


 ### Deleting Files

 Deleting files is done using the `unlink` method. *Unlink* comes from Unix terminology where files are deleted simply by removing links and references to them, but the effect is the same as "deleting" them.

In [9]:
# Use unlink to delete a file or directory
new_hello_file.unlink(missing_ok=True)

### Temporary files

Temporary files are useful when we want to generate intermediate output in a Python program, or temporarily store data for processing before discarding it later on. Python provides the `tempfile` module to help with this as well as the `shutil` module which provides shell-like utilities such as file copying. 

In [10]:
import tempfile
import shutil

# tempfile can create temporary directories (it is normally used as context manager)
temp_dir = tempfile.TemporaryDirectory(delete=False)
temp_dir = Path(temp_dir.name)
print(f"temp dir name: {temp_dir}")
print(f"exists: {temp_dir.exists()}")

# Use shutil for copying files (copy2 includes metadata), it returns the new name.
hello_file.write_text("Here is a new file.")
hello_file_copy = Path(shutil.copy2(hello_file, temp_dir))
print(f"new name: {hello_file_copy}")

# Directories can also be copied
current_dir_copy = Path(shutil.copytree(current_dir, temp_dir, dirs_exist_ok=True))
print(f"new files: {list(current_dir_copy.iterdir())}")


temp dir name: C:\Users\decval\AppData\Local\Temp\tmpl724whni
exists: True
new name: C:\Users\decval\AppData\Local\Temp\tmpl724whni\hello.txt
new files: [WindowsPath('C:/Users/decval/AppData/Local/Temp/tmpl724whni/00_environments_and_editors.ipynb'), WindowsPath('C:/Users/decval/AppData/Local/Temp/tmpl724whni/01_linters_and_code_style.ipynb'), WindowsPath('C:/Users/decval/AppData/Local/Temp/tmpl724whni/02_functions_modules_tests.ipynb'), WindowsPath('C:/Users/decval/AppData/Local/Temp/tmpl724whni/03_handling_files_and_exceptions.ipynb'), WindowsPath('C:/Users/decval/AppData/Local/Temp/tmpl724whni/04_stdlib_tools.ipynb'), WindowsPath('C:/Users/decval/AppData/Local/Temp/tmpl724whni/05_classes_represent_objects.ipynb'), WindowsPath('C:/Users/decval/AppData/Local/Temp/tmpl724whni/hello.txt'), WindowsPath('C:/Users/decval/AppData/Local/Temp/tmpl724whni/pathlib.ipynb'), WindowsPath('C:/Users/decval/AppData/Local/Temp/tmpl724whni/text_parse.ipynb'), WindowsPath('C:/Users/decval/AppData/Local/

Bulk-deletion can be done by using `rmtree` in `shutil`

In [11]:
# Use shutil rmtree for bulk delete
shutil.rmtree(temp_dir)

### Exercise

Write a script that will:

+ create a folder called "data_backup"
+ copy all the .txt files from the data directory across, renaming them to `.txt.backup`

## Error handling with Exceptions

Most programs we write will eventually run into a situation that generates an error in Python. It's a normal part of programming to expect errors to occur, which could come from unexpected input data, a file not found, a lost connection to a database, or a variety of reasons. Rather than allowing the error to happen and crash our program, it's good practice to instead handle these errors when they occur. We call this ***error handling*** and it is part defensive programming, i.e. writing programs that can deal with the unexpected.

The errors that we are going to catch are called **Exceptions** in Python. Python has many built-in exceptions that you might already have seen, for example:

 - `TypeError`: trying to perform an operation on the wrong type of data, such as trying to divide two strings.
 - `NameError`: trying to use a variable name that is not defined (or sometimes just mispelled)
 - `MathError`: performing mathematically invalid operations, such as division by zero
 - `FileNotFoundError`: when a filepath is incorrect or the file does not exist.

We can write our program to chose what to do when we encounter an exception. It could be to take another course of action, attempt to fix an input variable, print a more helpful error message to the user, write the error to a logfile, or simply ignore the error.

Let's consider a simple example. Suppose we had read in some input data but we didn't know ahead of time that the input data contained zeros sometimes. The next part of our program tries to divide a value by the list of input data:

In [4]:
input_data = [5, 4, 3, 2, 1, 0, 2, 7, 0, 6]
output_data= []

for num in input_data:
    ans = 42 / num   # Error if num is zero!
    output_data.append(ans)

ZeroDivisionError: division by zero

There are two things we could do: add if statements to handle possible conditions of the input data being zero, but we could also be more Pythonic by trying to catch the exception instead:

In [None]:
input_data = [5, 4, 3, 2, 1, 0, 2, 7, 0, 6]
output_data = []

for num in input_data:
    try:
        ans = 42 / num   # Error if num is zero!
        output_data.append(ans)
    except ZeroDivisionError:
        print("Input data contains zero value....skipping")
        output_data.append("NaN")

print(output_data)

Input data contains zero value....skipping
Input data contains zero value....skipping
[8.4, 10.5, 14.0, 21.0, 42.0, 'NaN', 21.0, 6.0, 'NaN', 7.0]


Here, we have added a try statement to our potential zero division operation. We catch the `ZeroDivisionError` in our `except` block, printing out the error to the user (this could also be a logger message rather than a print statement). If the exception is caught, we have decided to take the action of appending "NaN" (not a number) to our output data, but the course of action would depend on your own program logic and purpose.

### Catch specific exceptions

In our example, we have specified that we want to catch `ZeroDivisionError`. It is possible in Python to catch *any* kind of error by simply leaving the exception statement without a specific error, e.g.

```python
    try:
        ans = 42 / num   # Error if num is zero!
        output_data.append(ans)
    except:
        print("Some sort of error occurred!!!")
```

However, we would want to avoid writing exception hadling like this as it doesn't help us to handle the error. For example it *could* be a `ZeroDivisionError` or it could be something else (like a string in the data, which would cause another type of error.) It is better practice to make your exception handling a specific as possible.

### Exercise

Can you extend the example above so that it handles input data that might contain numbers entered as strings in the input data? e.g.:

```python
[1, 2, 3, 0, 5, 0, "6", "9", 52]
```

Hint: You can have multiple `except` blocks in your code