# Files

*Material for the VU Amsterdam course “Introduction to Python Programming” for BSc Artificial Intelligence students. These notebooks are created using the following sources:*
1. [Learning Python by Doing][learning python]: This book, developed by teachers of TU/e Eindhoven and VU Amsterdam, is the main source for the course materials. Code snippets or text explanations from the book may be used in the notebooks, sometimes with slight adjustments.
2. [Think Python][think python]
3. [GeekForGeeks][geekforgeeks]
4. [Python for Text Analysis][textanalysis]: For this particular notebook on working with files, we’ve drawn inspiration from the VU Master’s course Python for Text Analysis offered by the Humanities department. 

[learning python]: https://programming-pybook.github.io/introProgramming/intro.html
[think python]: https://greenteapress.com/thinkpython2/html/
[geekforgeeks]: https://www.geeksforgeeks.org
[textanalysis]: https://github.com/cltl/python-for-text-analysis/blob/master/Chapters/Chapter%2014%20-%20Reading%20and%20writing%20text%20files.ipynb

**In this notebook, we cover the following subjects:**
- Opening Files;
- Reading Files;
- A Context Manager;
- Writing Files.
___________________________________________________________________________________________________________________________

In [None]:
# To enable type hints for lists, dicts, tuples, and sets we need to import the following:
from typing import List, Dict, Tuple, Set

<h2 style="color:#4169E1">Opening Files</h2>

So far, we have worked with data that was stored and created directly in our notebooks. For example, we created a dictionary of word frequencies from a string of text. However, most of the data we use in real-world applications is stored in files, often due to its sheer size or because it needs to be kept for future use. Therefore, it’s essential to learn how to retrieve this data so we can perform operations on it.

First things first. How do we find a file?

<h4 style="color:#B22222">File Names and Paths</h4>

Files are organized into directories (also called “folders”). Every running program has a current directory, which is the default directory for most operations. For example, when you open a file for reading, Python looks for it in the current directory.

A string like `"/Users/mvdbrand/Documents/GitLab/course-material-jbi010/202122/Lectures/week3"` that identifies a file or directory is called a **path**. Often we assign this path to a variable, for example:

In [None]:
file_path: str = "assets/halloween.txt"

Here, `assets` is a folder located in the same directory as this notebook. We want to access the file `halloween.txt` from this folder.

<h4 style="color:#B22222">The <code>open()</code> Function</h4>

Now that we know how to locate our file, the next step is to open it. As the title of this section suggests, we’ll use the `open()` function for this purpose. Your first instinct might be to write something like this:

In [None]:
file = open(file_path)

print(f'Type of the open file object is: {type(file)}')
print('\nContent of our file is:\n')
print(file)

# Does it work like expected?

This isn’t quite what we expected, is it? The function seems to return something called a `_io.TextIOWrapper` object. This happens because when we try to open a file, we’re really asking the **operating system (OS)** to locate the file by its name and ensure it exists. We use Python’s built-in `open()` function to do this. If the `open()` function is successful, the operating system returns a **file handle**, which in this case is the `_io.TextIOWrapper` object. The file handle isn’t the actual data, but rather an intermediary that allows us to read from or write to the file.

So, while we might have expected the content of the data to be displayed when we printed the file, the first step was successful; the file was located, and a file handle was returned. Now, let’s move on to the next step: how do we actually read a file?

<div class="alert" style="background-color: #ffecb3; color: #856404;">
    <b>Note</b><br>
Programs that store their data are <b>persistent</b>.</div>

<h2 style="color:#4169E1">Reading Files</h2>

<h4 style="color:#B22222">The <code>.read()</code> Method</h4>

So, in the previous section, we figured out how to locate a file successfully. Now, we want to see what’s inside. If you know the file is small compared to your main memory, you can use the `read()` method on the file handle. This method pulls the entire content of the file into a single string, including all line breaks.

It’s smart to store the result of the `read()` method in a variable, given that it **exhausts resources**.

In [None]:
file = open("assets/halloween.txt")  # You can also pass the string immediately in the function call
content = file.read()

print('\nContent of our file is:\n')
print(content)

file.close()

You might be wondering, what’s the deal with this `.close()` at the end? Even though we’re only reading the file in this section and not writing to it, it’s still important to close it. When writing to a file, closing is crucial because, until you do, the data might not actually be saved or stored properly. By getting into the habit of closing files, even when just reading, you ensure consistent behavior across different Python environments, making your code more reliable.

<h4 style="color:#B22222">The <code>.readlines()</code> Method</h4>

Sometimes, you don’t need to read the entire content of a file at once; instead, you may want to process it line by line. The `.readlines()` method is ideal for this because it returns the content as a list of strings, with each line split at the newline character (`\n`).

In [None]:
file = open("assets/halloween.txt")  # You can also pass the string immediately in the function call
content = file.readlines()

print('\nContent of our file is:\n')
print(content)

file.close()

To access the content line by line, you can simply use a for loop:

In [None]:
for line in content:
    print(line)

<h4 style="color:#B22222">The <code>.readline()</code> Method</h4>


The .readline() method might feel a bit unintuitive. It reads the content one line at a time, returning the next line each time you call it. It’s easier to understand when you see it in action.

In [None]:
file = open("assets/halloween.txt")  # You can also pass the string immediately in the function call
line = file.readline()

print(line)

And what happens when we call the method again?

In [None]:
line = file.readline()
print(line)

And again?

In [None]:
line = file.readline()
print(line)

file.close()

<h2 style="color:#4169E1">A Context Manager</h2>

Manually opening and closing files is quite prone to errors, especially when writing files, which can lead to problems. To prevent this, before we start writing files, we’ll introduce a **context manager**; a tool that automatically handles opening/closing for you. We do this by using a `with` statement, as the file is automatically closed once you exit the scope of this statement.

In [None]:
file_path: str = "assets/halloween.txt"

with open(file_path, "r") as file:  # The "r" parameters indicates we read a file 
    # The file is only open in this scope
    content = file.read()
    
# Once you're out, the file is closed again
print(content)

<div class="alert" style="background-color: #ffecb3; color: #856404;">
    <b>Note</b><br>
Using a <b>context manager</b> is best practice because it reduces errors, so it’s expected that you use it in future programs. </div>

<h2 style="color:#4169E1">Writing Files</h2>

<h4 style="color:#B22222">The <code>.write()</code> Method</h4>
To write a file you should open it with mode <code>w</code> (from write) as second parameter. If the file already exists, opening it in write mode removes the current content from the file, so be <b>careful</b>! If the file does not exist, a new one is created.

In [None]:
file_path: str = "assets/output.txt"

a_string: str = "This is my first line written to a file!"
    
with open(file_path, "w") as outfile:    
    outfile.write(a_string)

Open the file in the text editor, did we succeed?

<h4 style="color:#B22222">The <code>.append()</code> Method</h4>
If we want to add to an already existing file, we can use the <code>.append()</code> method, opening with mode <code>a</code>. However, with append, we can only add to the end of the file.

In [None]:
file_path: str = "assets/output.txt"

# Note how we add a new character line to make sure it it on a new line in the file
a_second_string: str = "\nThis is my second line written to a file!" 
    
with open(file_path, "a") as outfile:    
    outfile.write(a_second_string)

<h2 style="color:#3CB371">Exercises</h2>

Let's practice! Mind that each exercise is designed with multiple levels to help you progressively build your skills. <span style="color:darkorange;"><strong>Level 1</strong></span> is the foundational level, designed to be straightforward so that everyone can successfully complete it. In <span style="color:darkorange;"><strong>Level 2</strong></span>, we step it up a notch, expecting you to use more complex concepts or combine them in new ways. Finally, in <span style="color:darkorange;"><strong>Level 3</strong></span>, we get closest to exam level questions, but we may use some concepts that are not covered in this notebook. However, in programming, you often encounter situations where you’re unsure how to proceed. Fortunately, you can often solve these problems by starting to work on them and figuring things out as you go. Practicing this skill is extremely helpful, so we highly recommend completing these exercises.

For each of the exercises, make sure to add a `docstring` and `type hints`, and **do not** import any libraries unless specified otherwise.
<br>

### Exercise 1

<span style="color:darkorange;"><strong>Level 1</strong>:</span> Let's practice what we've just learned. For this exercise, you are tasked with writing a function called `add_to_shopping_list()` that takes a list of strings as input, where each string represents a shopping item. You need to add these items, one by one, onto a new line in a text file called `shopping_list.txt`, which should be saved in the `assets` folder. Each time a new list of items is added to the file, the function should also **print** the entire content of the file to the terminal.

**Example input**: you pass this argument to the parameter in the function call.

```python
items_to_add List[str] = ["Apples", "Bananas", "Bread"]

```
**Example output**: as well as in termin as the text file
```
Apples
Bananas
Bread
```

In [None]:
# TODO.