# Chapter 13: Files

While a program is running, its data gets stored in **random access memory (RAM)**, which is fast, inexpensive, and **volatile**, meaning that when the program ends or the computer is shut down, it dissappears. In order to make your data available for the next session, you need to store it in a **non-volatile** medium.

Non-volatile media store items in objects called **files**, which you can **read** (access data already contained in a pre-existing file) and **write** (create a new file). These files are organized using a set of rules, known as a **file system**, which are collections of files and **directories** (containers for files and other directories).

## Paths and directories

A **path** is the name of the directory where the file is located. There are two tyes of paths:

* **Relative path:** Path to files within the directory in which you're currently working.
* **Absolute path:** Path from the system's **root** (default) directory.

In Unix-based operating systems, paths are written by delimiting directory and filenames with a forward slash(`/`):

```
root/MyFiles/SubDir_1/SubDir_2/script.py
```

In Windows, paths are usually written using double backslashes (`\\`):

```
root/MyFiles/SubDir_1/SubDir_2/script.py
```

A neat trick with Python is that the `os` library handles writing the correct delimiters for you:


```
os.path.join(path, path_1, path_2, ...)
```

In [1]:
import os

os.path.join('root', 'MyFiles', 'SubDir_1', 'SubDir_2', 'script.py')

'root/MyFiles/SubDir_1/SubDir_2/script.py'

##  Handles for modifying files

The first step for writing a file is to create a **handle** for the file. This is the object that will be called to make changes to the actual file.

In Python, handles are created with the `open` function, which receives two parameters:

1. file: String with the file to be accessed, including the path (either absolute or relative) and extention.
2. mode: String with what you want to do to the file. FOr example, `"r"` allows you to read a file, whereas `"w"` allows you to write it, and `"x"` is used to edit it. If the file is binary (zip, executables, images, etc.) you need to add a `b` at the end of the mode.

The basic syntax is:

```
<name> = open("<file_path + file_name + . + file_extension>", "<mode>")
```

however, the following is prefered:

```
with open("<file_path + file_name + . + file_extension>", "<mode>") as <name>:
    <do something>
```

because it handles errors for you if something goes wrong, and doesn't require you to close the files explicitly.

### Handle methods

Here are some of the most usual handle methods you'll use in Python:

* `write(string)`: Writes the string into the file, as long as it was accessed in write mode.
* `readline()`: Read the next line in the file, as long as it was accessed in read mode (it remembers for you which line it's currently at).
* `readlines()`: Read all lines into a list.
* `read()`: Read the entire file at once, provided it was accessed in read mode.
* `close()`: Save progress and stop editing the file.

Examples:

In [2]:
# Writing a file
print(f'Does the file "file_tests/test.txt" exist? {os.path.isfile("file_tests/test.txt")}')

with open('file_tests/test.txt', 'w') as file:  # Create handle
    # Write some stuff.
    file.write('Hello world!\n')
    file.write('This is a file.\n')
    # These lines will be relevant later
    file.write('\n')
    file.write('#This line will be ignored in filter.\n')
    file.write('This line contains the word "snake".\n')
    file.write('And the final line.')
    
print(f'Does the file "file_tests/test.txt" exist? {os.path.isfile("file_tests/test.txt")}')

Does the file "file_tests/test.txt" exist? False
Does the file "file_tests/test.txt" exist? True


In [3]:
# Reading a file

with open("file_tests/test.txt", "r") as file:
    while True:                            # Keep reading forever
        theline = file.readline()   # Try to read next line
        if len(theline) == 0:              # If there are no more lines
            break                          #     leave the loop

        # Now process the line we've just read
        print(theline, end="")

Hello world!
This is a file.

#This line will be ignored in filter.
This line contains the word "snake".
And the final line.

## The `continue` statement

Another important statement in flow control is `continue`. When Python reaches this statement inside a loop, it automatically jumps to the next iteration. For example, the following function will write all lines that don't start with '#' in a new file.

In [4]:
def filter_file(oldfile, newfile):
    with open(oldfile, "r") as infile, open(newfile, "w") as outfile:
        while True:
            text = infile.readline()
            if len(text) == 0:
                break
            if text[0] == "#":
                print(f'Skipping: {text}')
                continue
            # Put any more processing logic here
            print(f'Printing: {text}')
            outfile.write(text)
    return None

filter_file('file_tests/test.txt', 'file_tests/filtered_test.txt')

Printing: Hello world!

Printing: This is a file.

Printing: 

Skipping: #This line will be ignored in filter.

Printing: This line contains the word "snake".

Printing: And the final line.


## Excercises

### 1
Write a program that reads a file and writes out a new file with the lines in reversed order (i.e. the first line in the old file becomes the last one in the new file.)

In [5]:
def reverse_file(oldfile, newfile):
    """
    Reverse the order of lines in oldfile and save it in newfile.
    """
    # Set handles up
    with open(oldfile, "r") as infile, open(newfile, "w") as outfile:
        # Read lines into list.
        lines = infile.readlines()
        # Reverse the order of the lines.
        lines.reverse()
        # Write lines to new file
        for line in lines:
            outfile.write(line)
    return None

# Run the function.
reverse_file('file_tests/test.txt', 'file_tests/reversed_test.txt')

In [6]:
# Test
with open('file_tests/reversed_test.txt', 'r') as test:
    while True:
        theline = test.readline()   # Try to read next line
        if len(theline) == 0:              # If there are no more lines
            break                          #     leave the loop

        # Show the line
        print(theline, end="")

And the final line.This line contains the word "snake".
#This line will be ignored in filter.

This is a file.
Hello world!


### 2

Write a program that reads a file and prints only those lines that contain the substring `snake`.

In [7]:
with open('file_tests/test.txt', 'r') as test:
    while True:
        theline = test.readline()   # Try to read next line
        if len(theline) == 0:              # If there are no more lines
            break                          #     leave the loop
        if 'snake' in theline:
            print(theline, end="")

This line contains the word "snake".


### 3

Write a program that reads a text file and produces an output file which is a copy of the file, except the first five columns of each line contain a four digit line number, followed by a space. Start numbering the first line in the output file at 1. Ensure that every line number is formatted to the same width in the output file. Use one of your Python programs as test data for this exercise: your output should be a printed and numbered listing of the Python program.

In [8]:
def add_line_numbers(oldfile, newfile, n=4):
    """
    Add line numbers of length n to oldfile and save it in newfile.
    """
    # Set handles up
    with open(oldfile, "r") as infile, open(newfile, "w") as outfile:
        # Read lines into list.
        lines = infile.readlines()
        # Write lines to new file
        for index in range(len(lines)):
            outfile.write(str(index+1).zfill(n) + ' ' + lines[index])
    return None

# Run the function.
add_line_numbers('file_tests/test.txt', 'file_tests/lines_test.txt')

In [9]:
# Test
with open('file_tests/lines_test.txt', 'r') as test:
    while True:
        theline = test.readline()   # Try to read next line
        if len(theline) == 0:              # If there are no more lines
            break                          #     leave the loop

        # Show the line
        print(theline)

0001 Hello world!

0002 This is a file.

0003 

0004 #This line will be ignored in filter.

0005 This line contains the word "snake".

0006 And the final line.


### 4

Write a program that undoes the numbering of the previous exercise: it should read a file with numbered lines and produce another file without line numbers.

In [10]:
def add_line_numbers(oldfile, newfile, n=4):
    """
    Remove line numbers of length n to oldfile and save it in newfile.
    """
    # Set handles up
    with open(oldfile, "r") as infile, open(newfile, "w") as outfile:
        # Read lines into list.
        lines = infile.readlines()
        # Write lines to new file
        for line in lines:
            outfile.write(line[n+1:])
    return None

# Run the function.
add_line_numbers('file_tests/lines_test.txt', 'file_tests/un_lined_test.txt')

In [11]:
# Test
with open('file_tests/un_lined_test.txt', 'r') as test:
    while True:
        theline = test.readline()   # Try to read next line
        if len(theline) == 0:              # If there are no more lines
            break                          #     leave the loop

        # Show the line
        print(theline)

Hello world!

This is a file.



#This line will be ignored in filter.

This line contains the word "snake".

And the final line.
