<a id='Top'></a>
## 10. Files
<div class="alert alert-block alert-danger" style="margin-top: 10px">
<font color=black>
    
- 10.1. [Introduction: Working with Data Files](#10.1) 
  - 10.1.1. [Learning Goals](#10.1.1)
  - 10.1.2. [Objectives](#10.1.2)
- 10.2. [Reading a File](#10.2)
- 10.3. [Alternative File Reading Methods](#10.3)
- 10.4. [Iterating over lines in a file](#10.4)
- 10.5. [Finding a File in your Filesystem](#10.5)
- 10.6. [Using with for Files](#10.6)
- 10.7. [Recipe for Reading and Processing a File](#10.7)
- 10.8. [Writing Text Files](#10.8)
- 10.9. [CSV Format](#10.9)
- 10.10. [Reading in data from a CSV File](#10.10)
- 10.11. [Writing data to a CSV File](#10.11)
- 10.12. 👩‍💻 [Tips on Handling Files](#10.12)
- 10.13. [Glossary](#10.13)
- 10.14. [Exercises](#10.14)
- 10.15. [Chapter Assessment](#10.15)</div>  

<a id='10.1'></a>
## 10.1. Introduction: Working with Data Files

In [1]:
# Run this cell to see the video

from IPython.display import Video
Video("_videos/AC101 Files.mp4")  

So far, the data we have used in this book have all been either coded right into the program, or have been entered by the user. In real life data reside in files. For example the images we worked with in the image processing unit ultimately live in files on your hard drive. Web pages, and word processing documents, and music are other examples of data that live in files. In this short chapter we will introduce the Python concepts necessary to use data from files in our programs.

For our purposes, we will assume that our data files are text files–that is, files filled with characters. The Python programs that you write are stored as text files. We can create these files in any of a number of ways. For example, we could use a text editor to type in and save the data. We could also download the data from a website and then save it in a file. Regardless of how the file is created, Python will allow us to manipulate the contents.

In Python, we must open files before we can use them and close them when we are done with them. As you might expect, once a file is opened it becomes a Python object just like all other data. Table 1 shows the functions and methods that can be used to open and close files.

|Method Name|Use|Explanation|
|:--------:|:----------:|:-------------------------------------------------------:|
|<font color=red>open</font>|<font color=red>open(filename,'r'))|Open a file called filename and use it for reading. This will return a reference to a file object.|
|<font color=red>open</font>|<font color=red>open(filename,'w')</font>|Open a file called filename and use it for writing. This will also return a reference to a file object.|
|<font color=red>close</font>|<font color=red>filevariable.close()</font>|File use is complete.|

<a id='10.1.1'></a>
### 10.1.1. Learning Goals
[Back to top](#Top)
    
- To understand the structure of file systems  
- To understand opening files with different modes  
- To introduce files as another kind of sequence that one can iterate over  
- To introduce the read/transform/write pattern  
- To introduce parallel assignment to two or three variables  

<a id='10.1.2'></a>
### 10.1.2. Objectives
[Back to top](#Top)
    
- Demonstrate that you can read a single value from each line in a file  
- Convert the line to the appropriate value  
- Read a line and convert it into multiple values using split and assignment to multiple variables
    
<a id='10.2'></a>
## 10.2. Reading a File
[Back to top](#Top)
    
As an example, suppose we have a text file called <font color=red>olympics.txt</font> that contains the data representing about olympians across different years. The contents of the file are shown at the bottom of the page.

To open this file, we would call the <font color=red>open</font> function. The variable, <font color=red>fileref</font>, now holds a reference to the file object returned by <font color=red>open</font>. When we are finished with the file, we can close it by using the <font color=red>close</font> method. After the file is closed any further attempts to use <font color=red>fileref</font> will result in an error.

In [None]:
fileref = open("olympics.txt", "r")
## other code here that refers to variable fileref
fileref.close()

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <font color=black><b>Note</b><br>
A common mistake is to get confused about whether you are providing a variable name or a string literal as an input to the open function. In the code above, “olympics.txt” is a string literal that should correspond to the name of a file on your computer. If you put something without quotes, like <font color=red>open(x, "r")</font>, it will be treated as a variable name. In this example, x should be a variable that’s already been bound to a string value like “olympics.txt”.

Data file: <font color=red>_olympics.txt_</font>

In [None]:
Name,Sex,Age,Team,Event,Medal
A Dijiang,M,24,China,Basketball,NA
A Lamusi,M,23,China,Judo,NA
Gunnar Nielsen Aaby,M,24,Denmark,Football,NA
Edgar Lindenau Aabye,M,34,Denmark/Sweden,Tug-Of-War,Gold
Christine Jacoba Aaftink,F,21,Netherlands,Speed Skating,NA
Christine Jacoba Aaftink,F,21,Netherlands,Speed Skating,NA
Christine Jacoba Aaftink,F,25,Netherlands,Speed Skating,NA
Christine Jacoba Aaftink,F,25,Netherlands,Speed Skating,NA
Christine Jacoba Aaftink,F,27,Netherlands,Speed Skating,NA
Christine Jacoba Aaftink,F,27,Netherlands,Speed Skating,NA
Per Knut Aaland,M,31,United States,Cross Country Skiing,NA
Per Knut Aaland,M,31,United States,Cross Country Skiing,NA
Per Knut Aaland,M,31,United States,Cross Country Skiing,NA
Per Knut Aaland,M,31,United States,Cross Country Skiing,NA
Per Knut Aaland,M,33,United States,Cross Country Skiing,NA
Per Knut Aaland,M,33,United States,Cross Country Skiing,NA
Per Knut Aaland,M,33,United States,Cross Country Skiing,NA
Per Knut Aaland,M,33,United States,Cross Country Skiing,NA
John Aalberg,M,31,United States,Cross Country Skiing,NA
John Aalberg,M,31,United States,Cross Country Skiing,NA
John Aalberg,M,31,United States,Cross Country Skiing,NA
John Aalberg,M,31,United States,Cross Country Skiing,NA
John Aalberg,M,33,United States,Cross Country Skiing,NA
John Aalberg,M,33,United States,Cross Country Skiing,NA
John Aalberg,M,33,United States,Cross Country Skiing,NA
John Aalberg,M,33,United States,Cross Country Skiing,NA
"Cornelia ""Cor"" Aalten (-Strannood)",F,18,Netherlands,Athletics,NA
"Cornelia ""Cor"" Aalten (-Strannood)",F,18,Netherlands,Athletics,NA
Antti Sami Aalto,M,26,Finland,Ice Hockey,NA
"Einar Ferdinand ""Einari"" Aalto",M,26,Finland,Swimming,NA
Jorma Ilmari Aalto,M,22,Finland,Cross Country Skiing,NA
Jyri Tapani Aalto,M,31,Finland,Badminton,NA
Minna Maarit Aalto,F,30,Finland,Sailing,NA
Minna Maarit Aalto,F,34,Finland,Sailing,NA
Pirjo Hannele Aalto (Mattila-),F,32,Finland,Biathlon,NA
Arvo Ossian Aaltonen,M,22,Finland,Swimming,NA
Arvo Ossian Aaltonen,M,22,Finland,Swimming,NA
Arvo Ossian Aaltonen,M,30,Finland,Swimming,Bronze
Arvo Ossian Aaltonen,M,30,Finland,Swimming,Bronze
Arvo Ossian Aaltonen,M,34,Finland,Swimming,NA
Juhamatti Tapio Aaltonen,M,28,Finland,Ice Hockey,Bronze
Paavo Johannes Aaltonen,M,28,Finland,Gymnastics,Bronze
Paavo Johannes Aaltonen,M,28,Finland,Gymnastics,Gold
Paavo Johannes Aaltonen,M,28,Finland,Gymnastics,NA
Paavo Johannes Aaltonen,M,28,Finland,Gymnastics,Gold
Paavo Johannes Aaltonen,M,28,Finland,Gymnastics,NA
Paavo Johannes Aaltonen,M,28,Finland,Gymnastics,NA
Paavo Johannes Aaltonen,M,28,Finland,Gymnastics,NA
Paavo Johannes Aaltonen,M,28,Finland,Gymnastics,Gold
Paavo Johannes Aaltonen,M,32,Finland,Gymnastics,NA
Paavo Johannes Aaltonen,M,32,Finland,Gymnastics,Bronze
Paavo Johannes Aaltonen,M,32,Finland,Gymnastics,NA
Paavo Johannes Aaltonen,M,32,Finland,Gymnastics,NA
Paavo Johannes Aaltonen,M,32,Finland,Gymnastics,NA
Paavo Johannes Aaltonen,M,32,Finland,Gymnastics,NA
Paavo Johannes Aaltonen,M,32,Finland,Gymnastics,NA
Paavo Johannes Aaltonen,M,32,Finland,Gymnastics,NA
Timo Antero Aaltonen,M,31,Finland,Athletics,NA
Win Valdemar Aaltonen,M,54,Finland,Art Competitions,NA

<a id='10.3'></a>
## 10.3. Alternative File Reading Methods
[Back to top](#Top)

Once you have a file “object”, the thing returned by the open function, Python provides three methods to read data from that object. The <font color=red>read()</font> method returns the entire contents of the file as a single string (or just some characters if you provide a number as an input parameter. The <font color=red>readlines</font> method returns the entire contents of the entire file as a list of strings, where each item in the list is one line of the file. The <font color=red>readline</font> method reads one line from the file and returns it as a string. The strings returned by <font color=red>readlines</font> or <font color=red>readline</font> will contain the newline character at the end. Table 2 summarizes these methods and the following session shows them in action.

|Method Name|Use|Explanation|
|:-:|:-:|:-:|
|<font color=red>write</font>|<font color=red>filevar.write(astring)</font>|Add a string to the end of the file. <font color=red>filevar</font> must refer to a file that has been opened for writing.|
|<font color=red>read(n)</font>|<font color=red>filevar.read()</font>|Read and return a string of <font color=red>n</font> characters, or the entire file as a single string if <font color=red>n</font> is not provided.|
|<font color=red>readline(n)</font>|<font color=red>filevar.readline()</font>|Read and return the next line of the file with all text up to and including the newline character. If <font color=red>n</font> is provided as a parameter, then only <font color=red>n</font> characters will be returned if the line is longer than <font color=red>n</font>. __Note:__ The parameter <font color=red>n</font> is not supported in the browser version of Python, and in fact is rarely used in practice, you can safely ignore it.|
|<font color=red>readlines(n)</font>|<font color=red>filevar.readlines()</font>|Returns a list of strings, each representing a single line of the file. If <font color=red>n</font> is not provided then all lines of the file are returned. If <font color=red>n</font> is provided then <font color=red>n</font> characters are read but <font color=red>n</font> is rounded up so that an entire line is returned. __Note:__ Like <font color=red>readline</font>, <font color=red>readlines</font> ignores the parameter <font color=red>n</font> in the browser.|

In this course, we will generally either iterate through the lines returned by <font color=red>readlines()</font> with a for loop, or use <font color=red>read()</font> to get all of the contents as a single string.

In other programming languages, where they don’t have the convenient for loop method of going through the lines of the file one by one, they use a different pattern which requires a different kind of loop, the <font color=red>while</font> loop. Fortunately, you don’t need to learn this other pattern, and we will put off consideration of <font color=red>while</font> loops until later in this course. We don’t need them for handling data from files.

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <font color=black><b>Note</b><br>
A common error that novice programmers make is not realizing that all these ways of reading the file contents, use up the file. After you call readlines(), if you call it again you’ll get an empty list.</div>
        
#### Check your understanding
<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
1. Using the file <font color=red>school_prompt2.txt</font>, find the number of characters in the file and assign that value to the variable <font color=red>num_char</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
file_objt = open("school_prompt2.txt", "r")
num_char = len(file_objt.read())
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
2. Find the number of lines in the file, <font color=red>travel_plans2.txt</font>, and assign it to the variable <font color=red>num_lines</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
file = open('travel_plans2.txt')
num_lines = len(file.readlines())
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
3. Create a string called <font color=red>first_forty</font> that is comprised of the first 40 characters of <font color=red>emotion_words2.txt</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
with open('emotion_words2.txt') as textFile:
    first_forty = textFile.read(40)
```

</details>

<a id='10.4'></a>
## 10.4. Iterating over lines in a file
[Back to top](#Top)

We will now use this file as input in a program that will do some data processing. In the program, we will examine each line of the file and print it with some additional text. Because <font color=red>readlines()</font> returns a list of lines of text, we can use the for loop to iterate through each line of the file.

A __line__ of a file is defined to be a sequence of characters up to and including a special character called the __newline__ character. If you evaluate a string that contains a newline character you will see the character represented as <font color=red>\n</font>. If you print a string that contains a newline you will not see the <font color=red>\n</font>, you will just see its effects (a carriage return).

As the for loop iterates through each line of the file the loop variable will contain the current line of the file as a string of characters. The general pattern for processing each line of a text file is as follows:

In [None]:
for line in myFile.readlines():
    statement1
    statement2
    ...

To process all of our olypmics data, we will use a for loop to iterate over the lines of the file. Using the <font color=red>split</font> method, we can break each line into a list containing all the fields of interest about the athlete. We can then take the values corresponding to name, team and event to construct a simple sentence.

In [None]:
olypmicsfile = open("olypmics.txt", "r")

for aline in olypmicsfile.readlines():
    values = aline.split(",")
    print(values[0], "is from", values[3], "and is on the roster for", values[4])

olypmicsfile.close()

To make the code a little simpler, and to allow for more efficient processing, Python provides a built-in way to iterate through the contents of a file one line at a time, without first reading them all into a list. Some students find this confusing initially, so we don’t recommend doing it this way, until you get a little more comfortable with Python. But this idiom is preferred by Python programmers, so you should be prepared to read it. And when you start dealing with big files, you may notice the efficiency gains of using it.

In [None]:
olypmicsfile = open("olypmics.txt", "r")

for aline in olypmicsfile:
    values = aline.split(",")
    print(values[0], "is from", values[3], "and is on the roster for", values[4])

olypmicsfile.close()

#### Check your understanding
<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
1. Write code to find out how many lines are in the file t<font color=red>emotion_words.tx</font> as shown above. Save this value to the variable <font color=red>num_lines</font>. Do not use the len method.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
with open('emotion_words.txt') as textFile:
    num_lines = 0
    for _ in textFile.readlines():
        num_lines += 1
```

</details>

<a id='10.5'></a>
## 10.5. Finding a File in your Filesystem
[Back to top](#Top)

In the examples we have provided, and in the simulated file system that we’ve built for this online textbook, all files sit in a single directory, and it’s the same directory where the Python program is stored. Thus, we can just write <font color=red>open('myfile.txt', 'r')</font>.

If you have installed Python on your local computer and you are trying to get file reading and writing operations to work, there’s a little more that you may need to understand. Computer operating systems (like Windows and the Mac OS) organize files into a hierarchy of folders, with some folders containing other folders.

![ExampleFileHierarchy.png](attachment:ExampleFileHierarchy.png)

If your file and your Python program are in the same directory you can simply use the filename. For example, with the file hierarchy in the diagram, the file myPythonProgram.py could contain the code <font color=red>open('data1.txt', 'r')</font>.

If your file and your Python program are in different directories, however, then you need to specify a path. You can think of the filename as the short name for a file, and the path as the full name. Typically, you will specify a relative file path, which says where to find the file to open, relative to the directory where the code is running from. For example, the file myPythonProgram.py could contain the code <font color=red>open('../myData/data2.txt', 'r')</font>. The <font color=red>../</font> means to go up one level in the directory structure, to the containing folder (allProjects); <font color=red>myData/</font> says to descend into the myData subfolder.

There is also an option to use an absolute file path. For example, suppose the file structure in the figure is stored on a computer in the user’s home directory, /Users/joebob01/myFiles. Then code in any Python program running from any file folder could open data2.txt via <font color=red>open('/Users/joebob01/myFiles/allProjects/myData/data2.txt', 'r')</font>. You can tell an absolute file path because it begins with a /. If you will ever move your programs and data to another computer (e.g., to share them with someone else), it will be much more convenient if your use relative file paths rather than absolute. That way, if you preserve the folder structure when moving everything, you won’t need to change your code. If you use absolute paths, then the person you are sharing with probably not have the same home directory name, /Users/joebob01/. Note that Python pathnames follow the UNIX conventions (Mac OS is a UNIX variant), rather than the Windows file pathnames that use : and \. The Python interpreter will translate to Windows pathnames when running on a Windows machine; you should be able to share your Python program between a Windows machine and a MAC without having to rewrite the file open commands.

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <font color=black><b>Note</b><br>
For security reasons, our code running in your browser doesn’t read or write files to your computer’s file system. Later, when you run Python natively on your own computer, you will be able to truly read files, using path names as suggested above. To get you started, we have faked it by providing a few files that you can read as if they were on your hard disk. In this chapter, we simulate the existence of one textfile; you can’t open any other files from your local computer from textbook code running in your browser.</div>

#### Check your understanding
<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
1. Say you are in a directory called Project. In it, you have a file with your Python code. You would like to read in data from a file called “YearlyProjections.csv” which is in a folder called CompanyData, which is inside of Project. What is the best way to open the file in your Python program?
    
  A. open("YearlyProjections.csv", "r")  
  B. open("../CompanyData/YearlyProjections.csv", "r")  
  C. open("CompanyData/YearlyProjections.csv", "r")  
  D. open("Project/CompanyData/YearlyProjections.csv", "r")  
  E. open("../YearlyProjections.csv", "r")

<details><summary>Click here for the solution</summary>

<font color=red>► </font>C. open("CompanyData/YearlyProjections.csv", "r")  

<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>✔️ Yes, this is how you can access the file!

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
2. Which of the following paths are relative file paths?

  A. "Stacy/Applications/README.txt"  
  B. "/Users/Raquel/Documents/graduation_plans.doc"  
  C. "/private/tmp/swtag.txt"  
  D. "ScienceData/ProjectFive/experiment_data.csv"

<details><summary>Click here for the solution</summary>

<font color=red>► </font>A. "Stacy/Applications/README.txt"    
<font color=red>► </font>D. "ScienceData/ProjectFive/experiment_data.csv"  
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>✔️ Correct.<br>
A. Yes, this is a relative file path. You can tell by the lack of "/" at the beginning of the path.<br>
D. Yes, this is a relative file path. You can tell by the lack of "/" at the beginning of the path.

</details>

<a id='10.6'></a>
## 10.6. Using <font color=red>with</font> for Files
[Back to top](#Top)

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <font color=black><b>Note</b><br>
This section is a bit of an advanced topic and can be easily skipped. But with statements are becoming very common and it doesn’t hurt to know about them in case you run into one in the wild.</div>

Now that you have seen and practiced a bit with opening and closing files, there is another mechanism that Python provides for us that cleans up the often forgotten close. Forgetting to close a file does not necessarily cause a runtime error in the kinds of programs you typically write in an introductory programing course. However if you are writing a program that may run for days or weeks at a time that does a lot of file reading and writing you may run into trouble.

Python has the notion of a context manager that automates the process of doing common operations at the start of some task, as well as automating certain operations at the end of some task. For reading and writing a file, the normal operation is to open the file and assign it to a variable. At the end of working with a file the common operation is to make sure that file is closed.

The Python with statement makes using context managers easy. The general form of a with statement is:

In [None]:
with <create some object that understands context> as <some name>:
    do some stuff with the object
    ...

When the program exits the with block, the context manager handles the common stuff that normally happens at the end, in our case closing a file. A simple example will clear up all of this abstract discussion of contexts. Here are the contents of a file called “mydata.txt”.

Data file: <font color=red>_mydata.txt_</font>

1 2 3  
4 5 6

In [None]:
with open('mydata.txt', 'r') as md:
    for line in md:
        print(line)
# continue on with other code

The first line of the with statement opens the file and assigns it to the variable <font color=red>md</font>. Then we can iterate over the file in any of the usual ways. When we are done we simply stop indenting and let Python take care of closing the file and cleaning up. The final line <font color=red>print(md)</font>

This is equivalent to code that specifically closes the file at the end, but neatly marks the set of code that can make use of the open file as an indented block, and ensures that the programmer won’t forget to include the .close() invocation.

In [None]:
md = open('mydata.txt', 'r')
for line in md:
    print(line)
md.close()
# continue with other code

<a id='10.7'></a>
## 10.7. Recipe for Reading and Processing a File
[Back to top](#Top)

Here’s a foolproof recipe for processing the contents of a text file. If you’ve fully digested the previous sections, you’ll understand that there are other options as well. Some of those options are preferable for some situations, and some are preferred by python programmers for efficiency reasons. In this course, though, you can always succeed by following this recipe.

#1. Open the file using <font color=red>with</font> and <font color=red>open</font>.

#2. Use <font color=red>.readlines()</font> to get a list of the lines of text in the file.

#3. Use a for loop to iterate through the strings in the list, each being one line from the file. On each iteration, process that line of text

#4. When you are done extracting data from the file, continue writing your code outside of the indentation. Using <font color=red>with</font> will automatically close the file once the program exits the with block.

In [None]:
fname = "yourfile.txt"
with open(fname, 'r') as fileref:         # step 1
    lines = fileref.readlines()           # step 2
    for lin in lines:                     # step 3
        #some code that references the variable lin
#some other code not relying on fileref   # step 4

However, this will not be good to use when you are working with large data. Imagine working with a datafile that has 1000 rows of data. It would take a long time to read in all the data and then if you had to iterate over it, even more time would be necessary. This would be a case where programmers prefer another option for efficiency reasons.

This option involves iterating over the file itself while still iterating over each line in the file:

In [None]:
fname = "yourfile.txt"
with open(fname, 'r') as fileref:         # step 1
    for lin in fileref:                   # step 2
        ## some code that reference the variable lin
#some other code not relying on fileref   # step 3

<a id='10.8'></a>
## 10.8. Writing Text Files
[Back to top](#Top)

One of the most commonly performed data processing tasks is to read data from a file, manipulate it in some way, and then write the resulting data out to a new data file to be used for other purposes later. To accomplish this, the <font color=red>open</font> function discussed above can also be used to create a new file prepared for writing. Note in <font color=blue>Table 1</font> that the only difference between opening a file for writing and opening a file for reading is the use of the <font color=red>'w'</font> flag instead of the <font color=red>'r'</font> flag as the second parameter. When we open a file for writing, a new, empty file with that name is created and made ready to accept our data. If an existing file has the same name, its contents are overwritten. As before, the function returns a reference to the new file object.

<font color=blue>Table 2</font> shows one additional method on file objects that we have not used thus far. The <font color=red>write</font> method allows us to add data to a text file. Recall that text files contain sequences of characters. We usually think of these character sequences as being the lines of the file where each line ends with the newline <font color=red>\n</font> character. Be very careful to notice that the <font color=red>write</font> method takes one parameter, a string. When invoked, the characters of the string will be added to the end of the file. This means that it is the programmer’s job to include the newline characters as part of the string if desired.

Assume that we have been asked to provide a file consisting of all the squared numbers from 1 to 12.

First, we will need to open the file. Afterwards, we will iterate through the numbers 1 through 12, and square each one of them. This new number will need to be converted to a string, and then it can be written into the file.

The program below solves part of the problem. We first want to make sure that we’ve written the correct code to calculate the square of each number.

In [None]:
for number in range(1, 13):
    square = number * number
    print(square)

When we run this program, we see the lines of output on the screen. Once we are satisfied that it is creating the appropriate output, the next step is to add the necessary pieces to produce an output file and write the data lines to it. To start, we need to open a new output file by calling the <font color=red>open</font> function, <font color=red>outfile = open("squared_numbers.txt",'w')</font>, using the <font color=red>'w'</font> flag. We can choose any file name we like. If the file does not exist, it will be created. However, if the file does exist, it will be reinitialized as empty and you will lose any previous contents.

Once the file has been created, we just need to call the <font color=red>write</font> method passing the string that we wish to add to the file. In this case, the string is already being printed so we will just change the <font color=red>print</font> into a call to the <font color=red>write</font> method. However, there is an additional step to take, since the write method can only accept a string as input. We’ll need to convert the number to a string. Then, we just need to add one extra character to the string. The newline character needs to be concatenated to the end of the line. The entire line now becomes <font color=red>outfile.write(str(square)+ '\n')</font>. The print statement automatically outputs a newline character after whatever text it outputs, but the write method does not do that automatically. We also need to close the file when we are done.

The complete program is shown below.

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <font color=black><b>Note</b><br>
As with file reading, for security reasons the runestone interactive textbook environment does not write files to the file system on your local computer. In an activecode window, we simulate writing to a file. The contents of the written file are shown and you can do a subsequent read of the contents of that filename. If you try to overwrite a file that’s built in to the page, it may not let you; don’t try to get too fancy with our file system simulator!
        
Below, we have printed the first 10 characters to the output window.

In [None]:
filename = "squared_numbers.txt"
outfile = open(filename, "w")

for number in range(1, 13):
    square = number * number
    outfile.write(str(square) + "\n")

outfile.close()

infile = open(filename, "r")
print(infile.read()[:10])
infile.close()

<a id='10.9'></a>
## 10.9. CSV Format
[Back to top](#Top)

CSV stands for Comma Separated Values. If you print out tabular data in CSV format, it can be easily imported into other programs like Excel, Google spreadsheets, or a statistics package (R, stata, SPSS, etc.).

For example, we can make a file with the following contents. If you save it as a file name grades.csv, then you could import it into one of those programs. The first line gives the column names and the later lines each give the data for one row.

In [None]:
Name,score,grade
Jamal,98,A+
Eloise,87,B+
Madeline,99,A+
Wei,94,A

<a id='10.10'></a>
## 10.10. Reading in data from a CSV File
[Back to top](#Top)

We are able to read in CSV files the same way we have with other text files. Because of the standardized structure of the data, there is a common pattern for processing it. To practice this, we will be using data about olympic events.

Typically, CSV files will have a header as the first line, which contains column names. Then, each following row in the file will contain data that corresponds to the appropriate columns.

All file methods that we have mentioned - <font color=red>read</font>, <font color=red>readline</font>, and <font color=red>readlines</font>, and simply iterating over the file object itself - will work on CSV files. In our examples, we will iterate over the lines. Because the values on each line are separated with commas, we can use the <font color=red>.split()</font> method to parse each line into a collection of separate value.

In [None]:
fileconnection = open("olympics.txt", "r")
lines = fileconnection.readlines()
header = lines[0]
field_names = header.strip().split(',')
print(field_names)
for row in lines[1:]:
    vals = row.strip().split(',')
    if vals[5] != "NA":
        print("{}: {}; {}".format(
                vals[0],
                vals[4],
                vals[5]))

In the above code, we open the file, olympics.txt, which contains data on some olympians. The contents are similar to our previous olympics file, but include an extra column with information about medals they won.

We split the first row to get the field names. We split other rows to get values. Note that we specify to split on commas by passing that as a parameter. Also note that we first pass the row through the .strip() method to get rid of the trailing n.

Once we have parsed the lines into their separate values, we can use those values in the program. For example, in the code above, we select only those rows where the olympian won a medal, and we print out only three of the fields, in a different format.

Note that the trick of splitting the text for each row based on the presence of commas only works because commas are not used in any of the field values. Suppose that some of our events were more specific, and used commas. For example, “Swimming, 100M Freestyle”. How will a program processing a .csv file know when a comma is separating columns, and when it is just part of the text string giving a value within a column?

The CSV format is actually a little more general than we have described and has a couple of solutions for that problem. One alternative format uses a different column separator, such as | or a tab (t). Sometimes, when a tab is used, the format is called tsv, for tab-separated values). If you get a file using a different separator, you can just call the <font color=red>.split('|')</font> or <font color=red>.split('\\t')</font>.

The other advanced CSV format uses commas to separate but encloses all values in double quotes.

For example, the data file might look like:

"Name","Sex","Age","Team","Event","Medal"  
"A Dijiang","M","24","China","Basketball","NA"  
"Edgar Lindenau Aabye","M","34","Denmark/Sweden","Tug-Of-War","Gold"  
"Christine Jacoba Aaftink","F","21","Netherlands","Speed Skating, 1500M","NA"

If you are reading a .csv file that has enclosed all values in double quotes, it’s actually a pretty tricky programming problem to split the text for one row into a list of values. You won’t want to try to do it directly. Instead, you should use python’s built-in csv module. However, there’s a bit of a learning curve for that, and we find that students gain a better understanding of reading CSV format by first learning to read the simple, unquoted format and split lines on commas.

<a id='10.11'></a>
## 10.11. Writing data to a CSV File
[Back to top](#Top)

The typical pattern for writing data to a CSV file will be to write a header row and loop through the items in a list, outputting one row for each. Here we a have a list of tuples, each representing one Olympian, a subset of the rows and columns from the file we have been reading from.

In [None]:
olympians = [("John Aalberg", 31, "Cross Country Skiing"),
             ("Minna Maarit Aalto", 30, "Sailing"),
             ("Win Valdemar Aaltonen", 54, "Art Competitions"),
             ("Wakako Abe", 18, "Cycling")]

outfile = open("reduced_olympics.csv", "w")
# output the header row
outfile.write('Name,Age,Sport')
outfile.write('\n')
# output each of the rows:
for olympian in olympians:
    row_string = '{},{},{}'.format(olympian[0], olympian[1], olympian[2])
    outfile.write(row_string)
    outfile.write('\n')
outfile.close()

There are a few things worth noting in the code above.

First, using .format() makes it really clear what we’re doing when we create the variable row_string. We are making a comma separated set of values; the {} curly braces indicated where to substitute in the actual values. The equivalent string concatenation would be very hard to read. An alternative, also clear way to do it would be with the .join method: <font color=red>row_string = ','.join(olympian[0], olympian[1], olympian[2])</font>.

Second, unlike the print statement, remember that the .write() method on a file object does not automatically insert a newline. Instead, we have to explicitly add the character \n at the end of each line.

Third, we have to explicitly refer to each of the elements of olympian when building the string to write. Note that just putting <font color=red>.format(olympian)</font> wouldn’t work because the interpreter would see only one value (a tuple) when it was expecting three values to try to substitute into the string template. Later in the book we will see that python provides an advanced technique for automatically unpacking the three values from the tuple, with <font color=red>.format(*olympian)</font>.

As described previously, if one or more columns contain text, and that text could contain commas, we need to do something to distinguish a comma in the text from a comma that is separating different values (cells in the table). If we want to enclose each value in double quotes, it can start to get a little tricky, because we will need to have the double quote character inside the string output. But it is doable. Indeed, one reason Python allows strings to be delimited with either single quotes or double quotes is so that one can be used to delimit the string and the other can be a character in the string. If you get to the point where you need to quote all of the values, we recommend learning to use python’s csv module.

In [None]:
olympians = [("John Aalberg", 31, "Cross Country Skiing, 15KM"),
             ("Minna Maarit Aalto", 30, "Sailing"),
             ("Win Valdemar Aaltonen", 54, "Art Competitions"),
             ("Wakako Abe", 18, "Cycling")]

outfile = open("reduced_olympics2.csv", "w")
# output the header row
outfile.write('"Name","Age","Sport"')
outfile.write('\n')
# output each of the rows:
for olympian in olympians:
    row_string = '"{}", "{}", "{}"'.format(olympian[0], olympian[1], olympian[2])
    outfile.write(row_string)
    outfile.write('\n')
outfile.close()

<a id='10.12'></a>
## 10.12. 👩‍💻 Tips on Handling Files
[Back to top](#Top)

When working with files, there are a few things to keep in mind. When naming files, it’s best to not include spaces. While most operating systems can handle files with spaces in their names, not all can.

Additionally, suffixes in files names, for example the .txt in <font color=red>FileNameExample.txt</font>, are not magic. Instead, these suffixes are a convention. For some operating systems the suffixes have no special significance, and only have meaning when used in a program. Other operating systems infer information from the suffixes - for example, <font color=red>.EXE</font> is a suffix that means a file is executable.

It’s a good idea to follow the conventions. If a file contains CSV formatted data, name it with the extension <font color=red>.csv</font>, not <font color=red>.txt</font>. A Python program will be able to read it either way, but if you follow the convention you will help other people guess what’s in the file. And you will also help the computer’s operating system to guess what application program it should open when you double-click on the file.

<a id='10.13'></a>
## 10.13. Glossary
[Back to top](#Top)

__open__  
You must open a file before you can read its contents.

__close__  
When you are done with a file, you should close it.

__read__  
Will read the entire contents of a file as a string. This is often used in an assignment statement so that a variable can reference the contents of the file.

__readline__  
Will read a single line from the file, up to and including the first instance of the newline character.

__readlines__  
Will read the entire contents of a file into a list where each line of the file is a string and is an element in the list.

__write__  
Will add characters to the end of a file that has been opened for writing.

<a id='10.14'></a>
## 10.14. Exercises
[Back to top](#Top)

Below are the datafiles that you have been using so far, and will continue to use for the rest of the chapter.

The file below is <font color=red>_travel_plans.txt_</font>

In [None]:
This summer I will be travelling.
I will go to...
Italy: Rome
Greece: Athens
England: London, Manchester
France: Paris, Nice, Lyon
Spain: Madrid, Barcelona, Granada
Austria: Vienna
I will probably not even want to come back!
However, I wonder how I will get by with all the different languages.
I only know English!

The file below is <font color=red>_school_prompt.txt_</font>

In [None]:
Writing essays for school can be difficult but
many students find that by researching their topic that they
have more to say and are better informed. Here are the university
we require many undergraduate students to take a first year writing requirement
so that they can
have a solid foundation for their writing skills. This comes
in handy for many students.
Different schools have different requirements, but everyone uses
writing at some point in their academic career, be it essays, research papers,
technical write ups, or scripts.

The file below is <font color=red>_emotion_words.txt_</font>

In [None]:
Sad upset blue down melancholy somber bitter troubled
Angry mad enraged irate irritable wrathful outraged infuriated
Happy cheerful content elated joyous delighted lively glad
Confused disoriented puzzled perplexed dazed befuddled
Excited eager thrilled delighted
Scared afraid fearful panicked terrified petrified startled
Nervous anxious jittery jumpy tense uneasy apprehensive

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
1. The following sample file called <font color=red>studentdata.txt</font> contains one line for each student in an imaginary class. The students name is the first thing on each line, followed by some exam scores. The number of scores might be different for each student.

In [None]:
joe 10 15 20 30 40
bill 23 16 19 22
sue 8 22 17 14 32 17 24 21 2 9 11 17
grace 12 28 21 45 26 10
john 14 32 25 16 89

Using the text file from above, <font color=red>studentdata.txt</font>, write a program that prints out the names of students that have more than six quiz scores.

In [None]:
# Hint: first see if you can write a program that just prints out the number of scores on each line
# Then, make it print the number only if the number is at least six
# Then, switch it to printing the name instead of the number

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
with (open("studentdata.txt", "r")) as file_obj:
    for aline in file_obj:
        student = aline.split()
        if len(student) > 7:
            print(student[0])
```

</details>

<details><summary>Click here for the book's answer</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
f = open("studentdata.txt", "r")
for aline in f:
    items = aline.split()
    if len(items[1:]) > 6:
        print(items[0])
f.close()
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
2. Create a list called <font color=red>destination</font> using the data stored in <font color=red>travel_plans.txt</font>. Each element of the list should contain a line from the file that lists a country and cities inside that country. Hint: each line that has this information also has a colon <font color=red>:</font> in it.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
destination = []
with (open("travel_plans.txt", "r")) as file_objt:
    for aline in file_objt:
        if ":" in aline:
            destination.append(aline)
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
3. Create a list called <font color=red>j_emotions</font> that contains every word in <font color=red>emotion_words.txt</font> that begins with the letter “j”.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
j_emotions = []
with open("emotion_words.txt", "r") as file_obj:
    for aline in file_obj:
        words = aline.split()
        for word in words:
            if word[0].lower() == "j":
                j_emotions.append(word)
```

</details>

<a id='10.15'></a>
## 10.15. Chapter Assessment
[Back to top](#Top)

Data file: <font color=red>_travel_plans.txt_</font>  
Data file: <font color=red>_school_prompt.txt_</font>  
Data file: <font color=red>_emotion_words.txt_</font>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
1. The textfile, <font color=red>travel_plans.txt</font>, contains the summer travel plans for someone with some commentary. Find the total number of characters in the file and save to the variable num.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
file = open("travel_plans.txt", "r").read()
num = len(file)
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
2. We have provided a file called <font color=red>emotion_words.txt</font> that contains lines of words that describe emotions. Find the total number of words in the file and assign this value to the variable <font color=red>num_words</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
emotion_file = open('emotion_words.txt')
num_words = 0
for line in emotion_file:
    emotion_words = line.split()
    num_words += len(emotion_words)
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
3. Assign to the variable <font color=red>num_lines</font> the number of lines in the <font color=red>file school_prompt.txt</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
file = open('school_prompt.txt')
num_lines = len(file.readlines())
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
4. Assign the first 30 characters of <font color=red>school_prompt.txt</font> as a string to the variable <font color=red>beginning_chars</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
file = open('school_prompt.txt')
beginning_chars = file.read()[:30]
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
5. __Challenge:__ Using the file <font color=red>school_prompt.txt</font>, assign the third word of every line to a list called <font color=red>three</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
with open('school_prompt.txt') as textFile:
    three = []
    for line in textFile.readlines():
        words = line.split()
        three.append(words[2])
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
6. __Challenge:__ Create a list called <font color=red>emotions</font> that contains the first word of every line in <font color=red>emotion_words.txt</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
emotions = []
with open("emotion_words.txt", "r") as file:
    for line in file:
        words = line.split()
        emotions.append(words[0])
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
7. Assign the first 33 characters from the textfile, <font color=red>travel_plans.txt</font> to the variable <font color=red>first_chars</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
with open("travel_plans.txt", "r") as file:
    first_chars = file.read(33)
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
8. __Challenge:__ Using the file <font color=red>school_prompt.txt</font>, if the character ‘p’ is in a word, then add the word to a list called <font color=red>p_words</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
p_words = []
with open("school_prompt.txt", "r") as words_file:
    for line in words_file:
        words = line.split()
        for word in words:
            if "p" in word:
                p_words.append(word)
```

</details>

<div class="alert alert-block alert-warning" style="margin-top: 20px">
<font color=black>
    
9. Read in the contents of the file <font color=red>SP500.txt</font> which has monthly data for 2016 and 2017 about the S&P 500 closing prices as well as some other financial indicators, including the “Long Term Interest Rate”, which is interest rate paid on 10-year U.S. government bonds.

  Write a program that computes the average closing price (the second column, labeled SP500) and the highest long-term interest rate. Both should be computed only for the period from June 2016 through May 2017. Save the results in the variables <font color=red>mean_SP</font> and <font color=red>max_interest</font>.

<details><summary>Click here for a solution</summary>
<div class="alert alert-block alert-success" style="margin-top: 20px">
<font color=black>
    
```python
monthly_data = open("SP500.txt", 'r')
lines = monthly_data.readlines()
entries = 0
max_interest = 0
closing_prices_sum = 0
range_selection = False
for row in lines[1:]:
    vals = row.strip().split(',')
    if vals[0] == "6/1/2016":
        range_selection = True
    if range_selection == True:
        entries += 1
        closing_prices_sum += float(vals[1])      
        if float(vals[5]) > max_interest:
            max_interest = float(vals[5])          
    if vals[0] == "5/1/2017":
        range_selection = False        
mean_SP = closing_prices_sum / entries
```

</details>

Data file: <font color=red>_SP500.txt_</font> 

In [None]:
Date,SP500,Dividend,Earnings,Consumer Price Index,Long Interest Rate,Real Price,Real Dividend,Real Earnings,PE10
1/1/2016,1918.6,43.55,86.5,236.92,2.09,2023.23,45.93,91.22,24.21
2/1/2016,1904.42,43.72,86.47,237.11,1.78,2006.62,46.06,91.11,24
3/1/2016,2021.95,43.88,86.44,238.13,1.89,2121.32,46.04,90.69,25.37
4/1/2016,2075.54,44.07,86.6,239.26,1.81,2167.27,46.02,90.43,25.92
5/1/2016,2065.55,44.27,86.76,240.23,1.81,2148.15,46.04,90.23,25.69
6/1/2016,2083.89,44.46,86.92,241.02,1.64,2160.13,46.09,90.1,25.84
7/1/2016,2148.9,44.65,87.64,240.63,1.5,2231.13,46.36,91,26.69
8/1/2016,2170.95,44.84,88.37,240.85,1.56,2251.95,46.51,91.66,26.95
9/1/2016,2157.69,45.03,89.09,241.43,1.63,2232.83,46.6,92.19,26.73
10/1/2016,2143.02,45.25,90.91,241.73,1.76,2214.89,46.77,93.96,26.53
11/1/2016,2164.99,45.48,92.73,241.35,2.14,2241.08,47.07,95.99,26.85
12/1/2016,2246.63,45.7,94.55,241.43,2.49,2324.83,47.29,97.84,27.87
1/1/2017,2275.12,45.93,96.46,242.84,2.43,2340.67,47.25,99.24,28.06
2/1/2017,2329.91,46.15,98.38,243.6,2.42,2389.52,47.33,100.89,28.66
3/1/2017,2366.82,46.38,100.29,243.8,2.48,2425.4,47.53,102.77,29.09
4/1/2017,2359.31,46.66,101.53,244.52,2.3,2410.56,47.67,103.74,28.9
5/1/2017,2395.35,46.94,102.78,244.73,2.3,2445.29,47.92,104.92,29.31
6/1/2017,2433.99,47.22,104.02,244.96,2.19,2482.48,48.16,106.09,29.75
7/1/2017,2454.1,47.54,105.04,244.79,2.32,2504.72,48.52,107.21,30
8/1/2017,2456.22,47.85,106.06,245.52,2.21,2499.4,48.69,107.92,29.91
9/1/2017,2492.84,48.17,107.08,246.82,2.2,2523.31,48.76,108.39,30.17
10/1/2017,2557,48.42,108.01,246.66,2.36,2589.89,49.05,109.4,30.92
11/1/2017,2593.61,48.68,108.95,246.67,2.35,2626.9,49.3,110.35,31.3
12/1/2017,2664.34,48.93,109.88,246.52,2.4,2700.13,49.59,111.36,32.09