# Lesson 14: Reading and Writing CSV Files

- **CSV Files**
- **Format**
- **Expectation**
- **Reading a CSV File**
- **Writing a CSV File**

<h1 style="font-size:1.5em; font-family: verdana, Geneva, sans-serif; color:#00A0B2">
CSV Files</h1>

A comma-sepated values (CSV) file is a plain text file that is used to store tabular data. Each line of the file is one row of the data. A row consists of fields that are typically separated by commas.

For example, consider the following table of numbers:

<table border="border">
    <tr><td>9</td><td>68</td><td>25</td></tr>
    <tr><td>18</td><td>44</td><td>2</td></tr>
    <tr><td>25</td><td>3</td><td>7</td></tr>
</table>

This could be stored in a CSV file with the following contents:

```python
9,68,25
18,44,2
25,3,7
```

Different programs can then read and write this data while storing it internally in some more convenient format. CSV files are often used to share data among database and spreadsheet programs that do not share a common internal format.

<h1 style="font-size:1.5em; font-family: verdana, Geneva, sans-serif; color:#00A0B2">
Format</h1>

- Unfortunately, **the format of a CSV file is not well specified.** This is because the CSV format had been around for a while before any attempt was made at standardization.
- Problems arise because you might want to have commas in the data. 
- One way to solve some of the problems is to use quotes around strings that should be in a single field. Then the delimiter is not treated as a separator when it is within a quoted string. But, then you have the same problem when you need to have the quote character within a field.
- Even though they are called comma-separated vaules, delimiters other than commas can be used to separate fields (such as a space, semicolon, tab, etc.). There is no universally adhered to standard.
- Generally, CSV files, therefore, can use just about any character as the field separator (or delimiter) and quote character. The file itself does not specify which characters are being used. So, the user must either know ahead of time or look at the first few lines of the file and try to figure it out. 

As CSV files are used to store tabular data, you should expect that each line will have the same number of fields. Sometimes, though, some fields may be empty. You would usually see this as follows:

```python
9,68,25
18,,
25,,7
```

<h1 style="font-size:1.5em; font-family: verdana, Geneva, sans-serif; color:#00A0B2">
Expectations</h1>

Most well-formed CSV files adhere to the following *rules*:
- Fields are separated by a single delimiter character, which is often a comma (`,`).
- Rows are separated by a newline character.
- Fields are interpreted as plain text.
- Fields can be quoted by a quote character, which is often a double quote (`"`).
- Quoted fields can contain the delimiter character and/or newlines within them.
- Each row contains the same number of fields in the same order.

If a CSV file follows these *rules*, it is easier to use and interpret (as long as you know which characters are being used as the delimiter and quote characters). Note that CSV files are not required to follow these conventions. While most files will, there are some programs that do not adhere to these rules and follow their own conventions.

If a CSV file does not follow these rules, you often need to fix the files manually by editing them yourself to conform to some set of rules that the program you are using can handle.

Sometimes the first row of a CSV file will be a header row that contains the names of the fields instead of data:
```python
"First name", "Last name", "Phone"
"Peter","Parker","02-727-3051"
"Tony","Stark","02-727-3081"
```

<h1 style="font-size:1.5em; font-family: verdana, Geneva, sans-serif; color:#00A0B2">
Reading a CSV File</h1>

**We are going to take a look at how we can parse a CSV file using our knowledge of Python.** In our previous lesson, we learned how to read data from files and to work with strings. So, we really have everything that we need in order to be able to convert a CSV file into a Python tabular data structure. So let's take a look at some Python code that we could use to parse two CSV files: `phones.csv` and `books_read.csv`. 

###### Example 1: Reading a CSV file (phones.csv)

In [None]:
%load my_files/phones.csv

In [None]:
table = []
with open('my_files/phones.csv', 'r') as f:
    for line in f:
        line = line.rstrip()
        row = line.split(',')
        table.append(row)

print(line)   # print last line
print('-' * 92)
print(row)    # print last row
print('-' * 92)
print(table)
print('-' * 92)

for row in table:
    print("{:<14} {:<10}".format(row[0], row[1]))

###### Example 2: Reading a CSV file (phones2.csv)

In [None]:
%load my_files/phones2.csv

In [None]:
table = []
with open('my_files/phones2.csv', 'r') as f:
    for line in f:
        line = line.rstrip()
        row = line.split(',')
        row[0] = row[0].strip('"')
        table.append(row)

print(line)   # print last line
print('-' * 92)
print(row)    # print last row
print('-' * 92)
print(table)
print('-' * 92)

for row in table:
    print("{:<14} {:<10}".format(row[0], row[1]))

###### Example 3: Reading a CSV file (books_read.csv) 

In [None]:
%load my_files/books_read.csv

In [None]:
table = []
with open('my_files/books_read.csv', 'r') as f:
    for line in f:
        line = line.rstrip()
        row = line.split(',')
        table.append(row)

print(line)
print('-' * 92)
print(row)
print('-' * 92)
print(table)
print('-' * 92)

for row in table:
    print("{:<32} {:<46} {:<10}".format(row[0], row[1], row[2]))
        

### `csv.reader` object
As you have already seen from the above examples, it is actually not too difficult to parse a very nicely formatted CSV file. However, once the file gets a little bit more complicated especially when I am using quotes for their intended purpose to make sure that I can have separators inside of field values, things then start to get a little bit more tricky. 

**Luckily, Python has a `csv module` designed to help us read and write CSV files.**

To read a CSV file using csv module, we need to:
1. open the file using the built-in `open()` function with newline set to an empty string (`newline = ''`).
2. pass the file object to the `csv.reader()` function. 
3. read the file row by row. 
    - **Each row is a list of strings.**

>Note that:
> - The online Python documentation (https://docs.python.org/3/library/csv.html) recommends that CSV files should be opened with `newline=''`.   
> - In Python 3, opening CSV files with `newline = ''` allows the csv module to determine line breaks for itself. If you do not specify `newline = ''` then newlines within quoted fields will be interpreted incorrectly.

In [None]:
%load my_files/books_read.csv

In [None]:
import csv

table = []
with open('my_files/books_read.csv', 'r', newline='') as f:
    reader = csv.reader(f)

    for row in reader:
        table.append(row)

print(row)
print('-' * 92)
print(table)       
print('-' * 92)

for row in table:
    print("{:<32} {:<46} {:<10}".format(row[0], row[1], row[2]))

### `csv.DictReader` Object

When retrieving rows using the `reader` object, it is possible to manipulate them item by item, but you have to know the positions of the different fields. For a CSV with a lot of columns, that can be pretty difficult. You will likely find it easier to use a `DictReader` object, which gives you access to the fields by key.

The DictReader object has a `fieldnames` attribute that by default contains a **list holding the keys taken from the first row of data**. 

### `csv.reader` vs. `csv.DictReader` 

In [None]:
%load my_files/hightemp.csv

#### `csv.reader` version

In [None]:
import csv

with open('my_files/hightemp.csv', 'r', newline='') as f:
    reader = csv.reader(f)
    header = next(reader)
    print(header)
    
    # Find hot city in July. (The temperature is higher than 90)
    for row in reader:
        if int(row[7]) > 90:
            print('{} is every hot in July. The temperature is {}.'.format(
                   row[0], row[7]))
            

#### `csv.DictReader` version

In [None]:
import csv

table = []
with open('my_files/hightemp.csv', 'r', newline='') as f:
    reader = csv.DictReader(f)
    header = reader.fieldnames
    print(header)

    # Find hot city in July. (The temperature is higher than 90)
    for row in reader:
        if int(row['Jul']) > 90:
            print('{} is every hot in July. The temperature is {}.'.format(
                   row['City'], row['Jul']))          

#### If the CSV doesn't have a header row, you can pass fieldnames in when creating the `csv.DictReader` object.

In [None]:
%load my_files/phones.csv

In [None]:
import csv

table = []
with open('my_files/phones.csv', 'r', newline='') as f:
    fieldnames = ['name', 'phone']
    reader = csv.DictReader(f, fieldnames=fieldnames)
    print(reader.fieldnames)   # ['name', 'phone']
    
    for row in reader:
        print('{:<10} {:<10}'.format(row['name'], row['phone']))

<h1 style="font-size:1.5em; font-family: verdana, Geneva, sans-serif; color:#00A0B2">
Writing a CSV File</h1>

To write data to a CSV file, we need to:
1. open the file using the built-in `open()` function in writing mode with newline set to an empty string (`newline = ''`).
2. pass the file object to the `csv.writer()` function  or to `csv.DictWriter()`.
3. write the file row by row with the `writerow(sequence)` method or all at once with the `writerows(sequence_of_sequences)` method.

#### `csv.writer`  version

In [None]:
import csv
   
table = [['2016/1/18', 'Martin Luther King Day', 'Federal Holiday'],
         ['2016/2/2','Groundhog Day', 'Observance'], 
         ['2016/2/8','Chinese New Year', 'Observance'], 
         ['2016/2/14','Valentine\'s Day', 'Obervance'], 
         ['2016/5/8','Mother\'s Day', 'Observance'], 
         ['2016/8/19','Statehood Day', 'Hawaii Holiday'], 
         ['2016/10/28','Nevada Day', 'Nevada Holiday']]
    
with open('my_files/csv_write.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Date', 'Name', 'Notes'])  # write one row - header
    writer.writerows(table)   # write all row at once - tabular data

In [None]:
%load my_files/csv_write.csv

#### `csv.DictWriter` version

In [None]:
import csv
   
table = [{'Date': '2016/1/18', 'Name': 'Martin Luther King Day', 'Notes': 'Federal Holiday'},
         {'Date': '2016/2/2', 'Name': 'Groundhog Day', 'Notes': 'Observance'}, 
         {'Date': '2016/2/8', 'Name': 'Chinese New Year', 'Notes': 'Observance'}, 
         {'Date': '2016/2/14', 'Name': 'Valentine\'s Day', 'Notes': 'Obervance'}, 
         {'Date': '2016/5/8', 'Name': 'Mother\'s Day', 'Notes': 'Observance'}, 
         {'Date': '2016/8/19', 'Name': 'Statehood Day', 'Notes': 'Hawaii Holiday'}, 
         {'Date': '2016/10/28', 'Name': 'Nevada Day', 'Notes': 'Nevada Holiday'}]
    
with open('my_files/csv_write2.csv', 'w', newline='') as f:
    fieldnames = table[0].keys() # fieldnames = ['Date', 'Name', 'Notes']
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()         # write header
    writer.writerows(table)      # write all row at once - tabular data

In [None]:
%load my_files/csv_write2.csv

<h1 style="font-size:1.5em; font-family: verdana, Geneva, sans-serif; color:#B24C00">
Exercise</h1>

1) Write a program to read the `books_read.csv` and print each row with the list bracket.

&nbsp; &nbsp; *Expected Output:*
```python
['Beckert, Sven', 'Empire of Cotton', 'history']
['Buckley, Carla', 'The Deepest Secret', 'mystery']
['Carcaterra, Lorenzo', 'Chasers', 'mystery']
...
['Woods, Stewart', 'Mounting Fears', 'novel']
```

In [None]:
# your code


2) Rewrite the above code so that you will print each row without the list bracket. You will print each item in the row separately instead of printing the whole row.

&nbsp; &nbsp; *Expected Output:*
```python
Beckert, Sven Empire of Cotton history
Buckley, Carla The Deepest Secret mystery
Carcaterra, Lorenzo Chasers mystery
...
Woods, Stewart Mounting Fears novel
```

In [None]:
# your code


3) Write a program that reads your friend's name and his/her phone number as keyboard input, and then save these as records in a csv file. (Each record will have a field for your friend's name and a field for his/her phone number.)

- To exit, you enter a blank name (i.e., just press return or enter).
   
- Do not print anything until you have the phone number and do not get the
   phone number until you know the user is not quiting.

&nbsp; &nbsp; *Sample Run:*

&nbsp; &nbsp; `Enter a friend's name, press return to end: Peter Parker`  
&nbsp; &nbsp; `Enter your friend's phone: 123-456-789`  
&nbsp; &nbsp; `Peter Parker`  
&nbsp; &nbsp; `123-456-789`  
&nbsp; &nbsp; `Enter a friend's name, press return to end: Mary Jane Watson`  
&nbsp; &nbsp; `Enter your friend's phone: 987-654-321`  
&nbsp; &nbsp; `Mary Jane Watson`  
&nbsp; &nbsp; `987-654-321`  
&nbsp; &nbsp; `Enter a friend's name, press return to end:`

In [None]:
# your code


4) Read `weights.csv` file, which contains a person's daily weights, and then compute the 
average weight and write that into another csv file.
In other words, we copy the old file to a new one and add an additional line that 
contains the average of all the weight values. 

In [None]:
# your code
