# Week 6. File I/O
List · open · with · sorted · CSV· dict · csv· PIL
 
With files you can hang on to information long term. And File I/O within the context of programming is all about writing code that can read from, that is load information from, or write to, that is save information to, files themselves.

### 6.1. List

In [13]:
# names.py with a familiar data structure, a familiar type of variable that we've seen before
name = input("What's your name? ")
print(f"hello, {name}")

hello, Miau


Suppose, though, that we wanted to add support not just for one name, but multiple names, maybe three names for the sake of discussion so that we can begin to accumulate some amount of information in the program, such that it's really going to be a downside if we keep throwing it away once the program exits. Well, let me go back into names.py up here at top. 

Let me proactively give myself a variable, this time called **names**, plural. And set it equal to an **empty list**. 
- **square bracket notation**, especially if nothing's inside of it, just means, give me an empty list that we can add things to over time. 
- **_**, I could use a variable here, like **i**, which is conventional. But if I'm not actually using **i** explicitly on any subsequent lines, I might as well just use **underscore**, which is a Pythonic convention. And actually, if I want to clean this up a little bit right now, notice that my name variable doesn't really need to exist because I'm assigning it a value and then immediately appending it.

In [14]:
# names.py 
names = []

for _ in range(3):
    name = input("What's your name? ")
    names.append(name)

for name in sorted(names):
    print(f"hello, {name}")

hello, Harry
hello, Hermione
hello, Ron


In [None]:
# names.py other option not that readable but valid as well
names = []

for _ in range(3):
    names.append(input("What's your name? "))
print(f"hello, {name}")

for name in sorted(names):
    print(f"hello, {name}")

hello, Harry Ron Herione
hello, Harry
hello, Hermione
hello, Ron


And if this is a bigger program than this, that might actually be pretty painful to have to re-input the same information again, and again, and again. Wouldn't it be nice, like most any program today on a phone, or a laptop, or desktop, or cloud to be able to save this information somehow instead? And that's where File I/O comes in. And that's where files come in. They are a way of storing information persistently on your own phone, or Mac, or PC, or some cloud server's disk so that they're there when you come back and run the program again.

### 6.2. Open
In Python, there's this function called open whose purpose in life is to do just that, to open a file, but to open it up programmatically so that you, the programmer, can actually read information from it or write information to it. 

[docs.python3-open](docs.python.org/3/library/functions.html#open)

- **"w"**, is for Write, and that's going to tell open to open the file in a way that's going to allow me to change the content. And better yet, if it doesn't even exist yet, it's going to create the file for me. 
- **"a"**, is for Append, which means to add to the bottom, to the bottom, to the bottom, again and again.

In [None]:
# names.py 
name = input("What's your name? ")

file = open("names.txt", "w")
file.write(name)
file.close()

- **file.write(f"{name}/n")**, just print the name and the new line all at once.

In [None]:
# names.py 
name = input("What's your name? ")

file = open("names.txt", "a")
file.write(f"{name}/n")
file.close()

### 6.3. with
More Pythonic when manipulating files is to do this, to introduce this other keyword called **with** that allows you to specify that, in this context, I want you to open and automatically close some file. 

**With** allows you to forgive to close the file.

In [19]:
# names.py 
name = input("What's your name? ")

with open("names.txt", "a") as file:
    file.write(f"{name}/n")

- **"r"**, allows you to read the file.
- **lines = file.readlines()**, open files come with a special method whose purpose in life is to read all the lines from the file and return them to me as a list.

In [21]:
# names.py 
with open("names.txt", "r") as file:
    lines = file.readlines()

for line in lines:
    print("hello,", line.rstrip())

hello, HarryHermione/nharry/n


To make de code more clear we can use de loop sintax:
- **for** x **in** ***file***:

In [22]:
# names.py 
with open("names.txt", "r") as file:
    for line in file:
        print("hello,", line.rstrip())

hello, HarryHermione/nharry/n


### 6.4. sorted
Is a function which will return a sorted version of that list.

I should note that if we just want to sort the file, we can actually do this even more simply in Python, particularly by not bothering with this names list, nor the second for loop. And let me go ahead and, instead, just do more simply this. Let me go ahead and tell Python that we want the file itself to be sorted using that same sorted function, but this time on the file itself. 

And then inside of that for loop, let's just go ahead and print right away our hello, comma, followed by the line itself, but still stripping off of the end of it any white space therein. If we go ahead and run this same program now with python of names.py and hit Enter, we get the same result. But of course, it's a lot more compact. 

In [23]:
# names.py 
names = []

with open("names.txt") as file:
    for line in file:
        names.append(line.rstrip())

for name in sorted(names):
    print(f"hello, {name}")

hello, HarryHermione/nharry/n


In [None]:
# names.py more compact
with open("names.txt") as file:
    for line in sorted(file):
        print(f"hello, line.rstrip{name}")
        names.append(line.rstrip())

[docs.python3-sorted](docs.python.org/3/library/functions.html#sorted)
The sorted function takes the first argument, generally known as an iterable. And something that's iterable means that you can iterate over it. 

That is you can loop over it one thing at a time. What the rest of this line here means is that you can specify a key, like, how you want to sort it, but more on that later. But this last named parameter here is reverse. 

And by default, per the documentation, it's false. It will not be reversed by default. But if we change that to true, I bet we can do that.
- sorted(iterable,/,key=None, reverse False)
- sorted(iterable,/,key=None, reverse True)

In [9]:
# names.py 
names = []

with open("names.txt") as file:
    for line in file:
        names.append(line.rstrip())

for name in sorted(names, reverse=True):
    print(f"hello, {name}")

hello, Ron
hello, Hermione
hello, Harry
hello, Draco


### 6.5. Comma-Separated Values CSV
Now lets change this file, from names.txt to names.csv

CSV stands for Comma-Separated Values. 

- **.rsplit(",")**, split function is going to return to us a ***list*** of all of the individual parts to the left and to the right of those commas.
- **[0]**, means that is going to go to the first element of the list, which should hopefully be the student's name.
- **[1]**, second element of the list, which should hopefully be the student's house.

In [7]:
# students.py
with open("students.csv") as file:
    for line in file:
        row = line.rstrip().split(",")
        print(f"{row[0]} is in {row[1]}")

Harry is in Gryffindor
Ron is in Gryffindor
Hermione is in Gryffindor
Draco is in Slytherin


- **rstrip**, removes the end of each line in our CSV file. 
- **split** tells the compiler where to find the end of each of our values in our CSV file. 
- **row[0]** is the first element in each line of our CSV file. 
- **row[1]** is the second element in each line in our CSV file.



The above code is effective at dividing each line or “record” of our CSV file. However, it’s a bit cryptic to look at if you are unfamiliar with this type of syntax. Python has built-in ability that could further simplify this code. Modify your code as follows:

In [10]:
# students.py
with open("students.csv")as file:
    for line in file:
        name, house = line.rstrip().split(",")
        print(f"{name} is in {house}")

Harry is in Gryffindor
Ron is in Gryffindor
Hermione is in Gryffindor
Draco is in Slytherin


- Notice that the **split** function actually returns two values: The one before the comma and the one after the comma. Accordingly, we can rely upon that functionality to assign two variables at once instead of one!

Now, to provide this list as sorted output

In [12]:
# students.py
students = [] # empty list [] called students

with open("students.csv") as file:
    for line in file:
        name, house = line.rstrip().split(",")
        students.append(f"{name} is in {house}") #appends the data to the list

for student in sorted(students): #shortes the list 
    print(student)

Draco is in Slytherin
Harry is in Gryffindor
Hermione is in Gryffindor
Ron is in Gryffindor


- Notice that we create a **list** called students. We **append** each string to this list. Then, we output a **sorted** version of our list.

Recall that Python allows for **dictionaries** where a key can be associated with a value. This code could be further improved

In [23]:
# students.py
students = [] # empty list [] called students

with open("students.csv") as file:
    for line in file:
        name, house = line.rstrip().split(",")
        student = {} # empty dictionary {} wich represents the students
        student ["name"] = name # key 1 of the dictionary, list the name of the initial list
        student ["house"] = house # key 1 of the dictionary, list the house of the initial list
        students.append(student) #appends to the initial students list, the data to the dictionary

for student in students:
    print(f"{student['name']} is in {student['house']}") # prints the current {sudent,dictionary [with his name, list inisde dictionary]}
# we are using single quotes '' inside of this f-string because we are usind double quotes outside of the f-string and in this way python don't gets confused

Harry is in Gryffindor
Ron is in Gryffindor
Hermione is in Gryffindor
Draco is in Slytherin


- Notice that this produces the desired outcome, minus the sorting of students.

Unfortunately, we cannot sort the students as we had prior because **each student is now a dictionary inside of a list**. It would be helpful if Python could sort the students list of student dictionaries that sorts this list of dictionaries by the student’s name.

To implement this in our code, make the following changes:

### 6.6. Sort keys

In [27]:
# students.py
students = []

with open("students.csv") as file:
    for line in file:
        name, house = line.rstrip().split(",")
        students.append({"name": name, "house": house}) # this achieves the exact same effect in one line instead of the previous three


def get_name(student): # this function returns the students name
    return student["name"]


for student in sorted(students, key=get_name): # the parameter key inside the shorted function allows us to tell the shorted function how tho short this list of dictionaries
    print(f"{student['name']} is in {student['house']}")

# python allows function to pass as arguments in other functions

Draco is in Slytherin
Harry is in Gryffindor
Hermione is in Gryffindor
Ron is in Gryffindor


- Notice that sorted needs to know how to get the **key** of each student. **Python allows for a parameter called key where we can define on what “key” the list of students will be sorted**. Therefore, the **get_name** function simply returns the key of student["name"]. Running this program, you will now see that the list is now sorted by name.

### 6.7. Lambda Functions

Still, our code can be further improved upon. It just so happens that if you are only going to use a function like get_name once, you can simplify your code in the manner presented below. Modify your code as follows:

In [29]:
# students.py
students = []

with open("students.csv") as file:
    for line in file:
        name, house = line.rstrip().split(",")
        students.append({"name": name, "house": house})

for student in sorted(students, key=lambda student: student["name"]): # landa function is a function with no name and the line makes the same effect that the def function above
    print(f"{student['name']} is in {student['house']}")

Draco is in Slytherin
Harry is in Gryffindor
Hermione is in Gryffindor
Ron is in Gryffindor


- Notice how we use a **lambda** function, an anonymous function, that says “Hey Python, here is a function that has no name: Given a *student*, access their *name* and return that to the **key**.

### 6.8. csv Library

Unfortunately, our code is a bit fragile. Suppose that we changed our CSV file such that we indicated where each student grew up. What would be the impact of this upon our program? First, modify your students.csv file as follows:

```csv
Harry,"Number Four, Privet Drive"
Ron,The Burrow
Draco,Malfoy Manor 
```
- Notice how we are explicitly saying in our CSV file that anything reading it should expect there to be a name value and a home value in each line.
We can modify our code to use a part of the **csv library** called a **DictReader** to treat our CSV file with even more flexibilty: 
- [csv library documentation](docs.python.org/3/library/csv.html)

### 6.9. csv.reader
- Notice that running our program still does not work properly. Can you guess why?

> *The ValueError: too many values to unpack* error produced by the compiler is a result of the fact that we previously created this program expecting the CSV file is split using a , (comma). We could spend more time addressing this, but indeed someone else has already developed a way to “parse” (that is, to read) CSV files!

Python’s built-in **csv library** comes with an object called a **reader**. 
- As the name suggests, we can use a **reader** to read our CSV file despite the extra comma in “Number Four, Privet Drive”. A reader works in a for loop, where each iteration the reader gives us another **row** from our CSV file. 
- This **row** itself is a **list**, where each value in the list corresponds to an element in that row. 
- **row[0]**, for example, is the first element of the given row, while **row[1]** is the second element.


In [34]:
# students.py
import csv

students = []

with open("students.csv") as file:
    reader = csv.reader(file)
    for row in reader:
        students.append({"name": row[0], "home": row[1]})

for student in sorted(students, key=lambda student: student["name"]):
    print(f"{student['name']} is from {student['home']}")



Draco is from Slytherin
Harry is from Gryffindor
Hermione is from Gryffindor
Ron is from Gryffindor


- Notice that our program now works as expected.

Up until this point, we have been relying upon our program to specifically decide what parts of our CSV file are the names and what parts are the homes. It’s better design, though, to bake this directly into our CSV file by editing it as follows:

```csv
name,home
Harry,"Number Four, Privet Drive"
Ron,The Burrow
Draco,Malfoy Manor
```

- Notice how we are explicitly saying in our CSV file that anything reading it should expect there to be a name value and a home value in each line.

### 6.10. csv.DictReader
We can modify our code to use a part of the csv library called a **DictReader** to treat our CSV file with even more flexibilty: 

In [39]:
# students.py
import csv

students = []

with open("students.csv") as file:
    reader = csv.DictReader(file)
    for row in reader:
        students.append({"name": row["name"], "home": row["home"]})

for student in sorted(students, key=lambda student: student["name"]):
    print(f"{student['name']} is from {student['home']}")

Draco is from Malfoy Manor
Harry is from Number Four, Privet Drive
Ron is from The Burrow


- Notice that we have replaced **reader** with **DictReader**, which returns one *dictionary* at a time.
- Also, notice that **the compiler will directly access the row dictionary**, getting the name and home of each student. This is an example of coding defensively. As long as the person designing the CSV file has inputted the correct header information on the first line, we can access that information using our program.


### 6.11. csv.DictWriter
Up until this point, we have been reading CSV files. **What if we want to write to a CSV file?**
To begin, let’s clean up our files a bit. First, delete the students.csv file by typing rm students.csv in the terminal window. This command will only work if you’re in the same folder as your students.csv file.

Then, in students.py, modify your code as follows:

In [40]:
import csv

name = input("What's your name? ")
home = input("Where's your home? ")

with open("students.csv", "a") as file:
    writer = csv.DictWriter(file, fieldnames=["name", "home"])
    writer.writerow({"name": name, "home": home})

- Notice how we are leveraging the built-in functionality of **DictWriter**, which takes two parameters: the file being written to and the fieldnames to write. 
- Further, notice how the writerow function takes a dictionary as its parameter. Quite literally, we are telling the compiler to write a row with two fields called name and home.

Note that there are many types of files that you can read from and write to.
You can learn more in Python’s documentation of CSV.

### 6.12. Binary Files and PIL

- One more type of file that we will discuss today is a **binary file**. A binary file is simply a collection of ones and zeros. This type of file can store anything including, music and image data.
- There is a popular **Python library called PIL** that works well with image files.
- Animated GIFs are a popular type of image file that has many image files within it that are played in sequence over and over again, creating a simplistic animation or video effect.
- Imagine that we have a series of costumes, as illustrated below.
- Here is costume1.gif.
- Here is another one called costume2.gif. Notice how the leg positions are slightly different.
- Before proceeding, please make sure that you have downloaded the source code files from the course website. It will not be possible for you to code the following without having the two images above in your possession and stored in your IDE.

In [44]:
# costumes.py
import sys

from PIL import Image

images = []

for arg in sys.argv[1:]:
    image = Image.open(arg)
    images.append(image)

images[0].save(
    "costumes.gif", save_all=True, append_images=[images[1]], duration=200, loop=0
)

FileNotFoundError: [Errno 2] No such file or directory: '--f=/home/marta/.local/share/jupyter/runtime/kernel-v3c7e790a1b39bd7c807eeabb6462befdceec6da65.json'

- Notice that we import the Image functionality from PIL. 
- Notice that the first for loop simply loops through the images provided as command-line arguments and stores theme into the list called images. The 1: starts slicing argv at its second element. The last lines of code saves the first image and also appends a second image to it as well, creating an animated gif. Typing python costumes.py costume1.gif costume2.gif into the terminal. Now, type code costumes.gif into the terminal window, and you can now see an animated GIF.
You can learn more in Pillow’s documentation of PIL.