# Files

*Material for the VU Amsterdam course “Introduction to Python Programming” for BSc Artificial Intelligence students. These notebooks are created using the following sources:*
1. [Learning Python by Doing][learning python]: This book, developed by teachers of TU/e Eindhoven and VU Amsterdam, is the main source for the course materials. Code snippets or text explanations from the book may be used in the notebooks, sometimes with slight adjustments.
2. [Think Python][think python]
3. [GeekForGeeks][geekforgeeks]
4. [Python for Text Analysis][textanalysis]: For this particular notebook on working with files, we’ve drawn inspiration from the VU Master’s course Python for Text Analysis offered by the Humanities department. 

[learning python]: https://programming-pybook.github.io/introProgramming/intro.html
[think python]: https://greenteapress.com/thinkpython2/html/
[geekforgeeks]: https://www.geeksforgeeks.org
[textanalysis]: https://github.com/cltl/python-for-text-analysis/blob/master/Chapters/Chapter%2014%20-%20Reading%20and%20writing%20text%20files.ipynb

**In this notebook, we cover the following subjects:**
- Todo.
___________________________________________________________________________________________________________________________

In [50]:
# To enable type hints for lists, dicts, tuples, and sets we need to import the following:
from typing import List, Dict, Tuple, Set

<h2 style="color:#4169E1">Opening Files</h2>

So far, we have worked with data that was stored and created directly in our notebooks. For example, we created a dictionary of word frequencies from a string of text. However, most of the data we use in real-world applications is stored in files, often due to its sheer size or because it needs to be kept for future use. Therefore, it’s essential to learn how to retrieve this data so we can perform operations on it.

First things first. How do we open a file? Your intuition might lead you to something like this:

<h4 style="color:#B22222">File Names and Paths</h4>


<h4 style="color:#B22222">The <code>open()</code> Function</h4>


In [23]:
file = open("assets/halloween.txt")

print(f'Type of the open file object is: {type(file)}')
print('\nContent of our file is:\n')
print(file)

# Does it work like expected?

Type of the open file object is: <class '_io.TextIOWrapper'>

Content of our file is:

<_io.TextIOWrapper name='assets/halloween.txt' mode='r' encoding='UTF-8'>


This isn’t quite what we expected, is it? The function seems to return something called a `_io.TextIOWrapper` object. This happens because when we try to open a file, we’re really asking the **operating system (OS)** to locate the file by its name and ensure it exists. We use Python’s built-in `open()` function to do this. If the `open()` function is successful, the operating system returns a **file handle**, which in this case is the `_io.TextIOWrapper` object. The file handle isn’t the actual data, but rather an intermediary that allows us to read from or write to the file.

So, while we might have expected the content of the data to be displayed when we printed the file, the first step was successful; the file was located, and a file handle was returned. Now, let’s move on to the next step: how do we actually read a file?

<div class="alert" style="background-color: #ffecb3; color: #856404;">
    <b>Note</b><br>
Programs that store their data are <b>persistent</b>.</div>

<h2 style="color:#4169E1">Reading Files</h2>

<h4 style="color:#B22222">The <code>.read()</code> Method</h4>


<h4 style="color:#B22222">The <code>.readlines()</code> Method</h4>


<h4 style="color:#B22222">The <code>.readline()</code> Method</h4>


It’s important to note that the `open()` function doesn’t read the entire content of the file all at once. This is because the file might be too large to fit into the main memory. As a result, the `open()` function takes the same amount of time to execute, no matter the size of the file.

As aforementioned, when we call the `open()` function, it returns a file handle that can be used within a `for` loop to read each line of the file. Python handles splitting the content into **separate lines** for us. Using a `for` loop allows us to efficiently read a file of any size, as each line is read, processed, and then discarded.

The following code creates a file handle and counts the number of lines in the file.

In [41]:
file = open("assets/halloween.txt")
count : int = 0

for line in file:
    count += 1
    
print(count)

10


<h2 style="color:#4169E1">Closing Files</h2>

<h4 style="color:#B22222">The <code>close()</code> Method</h4>


<h4 style="color:#B22222">A Context Manager</h4>


<h2 style="color:#4169E1">Searching in a File</h2>

<h2 style="color:#4169E1">Writing Files</h2>

<h4 style="color:#B22222">The <code>.add()</code> Method</h4>

This method is used to add elements to a set.

<h2 style="color:#3CB371">Exercises</h2>

Let's practice! Mind that each exercise is designed with multiple levels to help you progressively build your skills. <span style="color:darkorange;"><strong>Level 1</strong></span> is the foundational level, designed to be straightforward so that everyone can successfully complete it. In <span style="color:darkorange;"><strong>Level 2</strong></span>, we step it up a notch, expecting you to use more complex concepts or combine them in new ways. Finally, in <span style="color:darkorange;"><strong>Level 3</strong></span>, we get closest to exam level questions, but we may use some concepts that are not covered in this notebook. However, in programming, you often encounter situations where you’re unsure how to proceed. Fortunately, you can often solve these problems by starting to work on them and figuring things out as you go. Practicing this skill is extremely helpful, so we highly recommend completing these exercises.

For each of the exercises, make sure to add a `docstring` and `type hints`, and **do not** import any libraries unless specified otherwise.
<br>

### Exercise 1

<span style="color:darkorange;"><strong>Level 1</strong>:</span> In this exercise, you’ll be working with a data structure that holds the names of students who passed different high school subjects like history, math, physics, economics, etc. Your goal is to create a function called `students_who_passed_all_subjects()` that identifies the students who passed every subject. To do this, you’ll use **set methods** to pinpoint the names that appear in all the subjects. The function should work as follows:

- It should take a **dictionary** as input, with each key representing a subject and the value being a set of student names who passed that subject.
- It should **return** a set containing the names of students who passed every subject.

**Print** the set **outside** the function in a clear and readable format.

**Example input**: you pass this argument to the parameter in the function call.
```Python
subjects: Dict[str, Set[str]] = {
    'history': {'Einstein', 'Curie', 'Tesla', 'Hopper', 'Goodall'},
    'math': {'Einstein', 'Curie', 'Turing', 'Hopper'},
    'physics': {'Einstein', 'Curie', 'Hopper', 'Goodall'},
    'economics': {'Einstein', 'Curie', 'Tesla', 'Hopper'},
    'biology': {'Curie', 'Goodall', 'Turing', 'Einstein'},
    'literature': {'Hopper', 'Turing', 'Einstein', 'Curie'}
}
```

**Example output**:
```Python
"Students who passed all subjects: Einstein, Curie"
```

In [None]:
# TODO.