# 9. Files

Here is the table of contents for this notebook:

- 9.1 Opening files
- 9.2 Text files and lines
- 9.3 Reading files
- 9.4 Writing files
- 9.5 Paths
- 9.6 The `pathlib` module
- 9.7 Exercises


## 9.1 Opening files

When we want to read or write a file, we first must _open_ the file. Opening the file communicates with your operating system, which knows where the data for each file is stored. When you open a file, you are asking the operating system to find the file by name and make sure the file exists. In this example, we open the file `data.csv`, which should be stored in the same folder as this Jupyter Notebook.

In [None]:
fhand = open('data.csv')

In [None]:
print(fhand)

If the `open` is successful, the operating system returns us a _file handle_. The file handle is not the actual data contained in the file, but instead it is a “handle” that we can use to read the data. You are given a handle if the requested file exists and you have the proper permissions to read the file.

If the file does not exist, `open` will fail with a traceback and you will not get a handle to access the contents of the file:

In [None]:
fhand = open('stuff.txt')

**Exercise 9.1**

Open a text file (.txt or .csv) from your computer.

In [None]:
# YOUR CODE HERE

## 9.2 Text files and lines

A text file can be thought of as a sequence of lines, much like a Python string can be thought of as a sequence of characters.

To break the file into lines, there is a special character that represents the “end of the line” called the _newline_ character.

In Python, we represent the _newline_ character as a backslash-n in string constants. Even though this looks like two characters, it is actually a single character. When we look at the variable by entering “stuff” in the interpreter, it shows us the `\n` in the string, but when we use `print` to show the string, we see the string broken into two lines by the newline character.

In [None]:
some_string = 'X\nY'
print(some_string)

You can also see that the length of the string `X\nY` is three characters because the newline character is a single character.

In [None]:
len(some_string)

So when we look at the lines in a file, we need to _imagine_ that there is a special invisible character called the newline at the end of each line that marks the end of the line.

So the newline character separates the characters in the file into lines.

## 9.3 Reading files

While the _file handle_ does not contain the data for the file, it is quite easy to construct a `for` loop to read through and `print` each line:

In [None]:
fhand = open('data.csv')
for line in fhand:
    print(line)

The reason that the `open` function does not read the entire file is that the file might be quite large with many gigabytes of data. The `open` statement takes the same amount of time regardless of the size of the file. The `for` loop actually causes the data to be read from the file.

When the file is read using a `for` loop in this manner, Python takes care of splitting the data in the file into separate lines using the newline character.

**Exercise 9.2**

Count the number of lines in the file. `data.csv` has 91 lines.

In [None]:
# YOUR CODE HERE

**Exercise 9.3**

Considering this code:

```
fhand = open('data.csv')
for line in fhand:
    print(line)
```

Does `line` contain `\n` at the end?

Test it for the first line and the last line.

In [None]:
# YOUR CODE HERE

Since the `for` loop reads the data one line at a time, it can efficiently read and count the lines in very large files without running out of main memory to store the data. The above program can count the lines in any size file using very little memory since each line is read, counted, and then discarded.

If you know the file is relatively small compared to the size of your main memory, you can read the whole file into one string using the `read` method on the file handle.

In [None]:
fhand = open('data.csv')
file = fhand.read()

In [None]:
len(file)

In this example, the entire content of the file is read directly into the variable `file`. Let's use string slicing to show the first 60 characters of the string data stored in `file`. Note that this time, you can see the `\n` character.

In [None]:
file[0:60]

When the file is read in this manner, all the characters including all of the lines and newline characters are one big string in the variable `file`. It is a good idea to store the output of read as a variable because each call to read exhausts the resource:

In [None]:
fhand = open('data.csv')
print(len(fhand.read()))
print(len(fhand.read()))

Remember that this form of the `open` function should only be used if the file data will fit comfortably in the main memory of your computer. If the file is too large to fit in main memory, you should write your program to read the file in chunks using a `for` or `while` loop.

**Exercise 9.4**

What is `\ufeff` at the begginning of `file[0:60]`?

## 9.4 Writing files

To write a file, you have to open it with mode “w” as a second parameter:

In [None]:
f_out = open('output.txt', 'w')
print(f_out)

If the file already exists, opening it in write mode clears out the old data and starts fresh, so be careful! If the file doesn’t exist, a new one is created.

The `write` method of the file handle object puts data into the file, returning the number of characters written. The default write mode is text for writing (and reading) strings.

In [None]:
line1 = "This here's the wattle,\n"
f_out.write(line1)

Again, the file object keeps track of where it is, so if you call `write` again, it adds the new data to the end.

We must make sure to manage the ends of lines as we write to the file by explicitly inserting the newline character when we want to end a line. The `write` method does not add the newline automatically.

In [None]:
line2 = 'the emblem of our land.\n'
f_out.write(line2)

When you are done writing, you have to close the file to make sure that the last bit of data is physically written to the disk so it will not be lost if the power goes off.

In [None]:
f_out.close()

It is good practice to close the files which we open to only read, but Python makes sure that all open files are closed when the program ends. When we are writing files, we want to explicitly close the files so as to leave nothing to chance.

## 9.5 Paths

So far we have used the file name to read/write files (e.g. `data.csv`). This was possible because this notebook and the file are located in the same folder. You will encounter many situations where you would like to access files in different folders. This will require working with a _path_.

Path from Wikipedia:

>A path is a string of characters used to uniquely identify a location in a directory structure. It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory. The delimiting character is most commonly the slash ("/"), the backslash character ("\"), or colon (":"), though some operating systems may use a different delimiter. Paths are used extensively in computer science to represent the directory/file relationships common in modern operating systems and are essential in the construction of Uniform Resource Locators (URLs). Resources can be represented by either absolute or relative paths.

Absolute and relative paths from Wikipedia:

>An absolute or full path points to the same location in a file system, regardless of the current working directory. To do that, it must include the root directory.

>By contrast, a relative path starts from some given working directory, avoiding the need to provide the full absolute path. A filename can be considered as a relative path based at the current working directory. If the working directory is not the file's parent directory, a file not found error will result if the file is addressed by its name.

For example, I used a relative path to access the `data.csv` file. If I had used an absolute path, it would still work on my computer but not on yours. This is why using absolute paths is generally bad practice, they make your code less portable.

In [None]:
# Absolute path to data.csv on my computer
# This will work for me because my name is Alican
# Unless your name is also Alican and you have the exact same folder structure, it won't work for you
# This also means: do not submit your deliverables with an absolute path
'/Users/alican/Documents/BUas/courses/2025-2026/Y1A1/github_repo/2025_26_y1a1_python/W4/data.csv'

Let's assume that we have a folder called `example_folder` in the same directory as your Jupyter notebook, and it contains two csv files `data1.csv` and `data2.csv`. We can access these simply by:

In [None]:
fhand = open('example_folder/data1.csv')

Finally, in file paths, `..` is a special directory reference that means “go up one level” from the current location. It is part of relative path navigation, allowing you to move to a parent directory without needing the full absolute path. For example, W5 folder also has the same `data.csv` to access it you can use:

In [None]:
fhand = open('../W5/data.csv')

However, directly using plain strings to describe file paths is not the best practice, because they can be error-prone and less portable across operating systems.



## 9.6 The `pathlib` module

The `pathlib` module in Python provides convenient and intuitive way to handle file and directory paths. It can also handle differences between operating systems when it comes to how paths are written (e.g. \ vs /)

The main class in the `pathlib` module is `Path`. You can create a `Path` object by instantiating it with a string representing a file or directory path.

Here's an example:

In [None]:
from pathlib import Path

path = Path('data.csv')

The `Path` object provides various methods and attributes to manipulate and inspect paths. Some of the commonly used ones include:

In [None]:
path.exists() # Returns `True` if the path exists on the file system.

In [None]:
path.is_file() # Returns `True` if the path points to a regular file.

In [None]:
path.is_dir() # Returns `True` if the path points to a directory.

In [None]:
path.suffix # Returns the file extension.

Going back to our `example_folder`

If you run the following code, it will return, `True`.

In [None]:
path = Path('example_folder')
path.is_dir()

We can iterate the directory simply with the `iterdir` method:

In [None]:
for file_path in path.iterdir():
    print(file_path)

**Exercise 9.5**

Select any folder on your computer with some files in it and list its contents.

In [None]:
# YOUR CODE HERE

The `pathlib` module also supports various operations such as copying, moving, deleting, and creating files or directories. These operations can be performed using the `Path` object's methods.

Overall, the `pathlib` module provides an elegant and cross-platform way to handle file paths in Python, making it easier to write and maintain file system operations.

Read the documentation if you would like to learn more

https://docs.python.org/3/library/pathlib.html

### 🐍 Advanced 🐍

As mentioned previously, the Python Standard Library contains many different modules. `os`, `shutil`, and `glob` are used widely for file and directory access. Take a look at their documentation if you want to learn more.

https://docs.python.org/3/library/filesys.html

## 9.7 Exercises

**Exercise 9.6**

Find a small .txt or .csv file, print all lines. Use the `pathlib` module.

In [None]:
# YOUR CODE HERE