# CSV reader/writer

## CSV flies

Run the appropriate code cell for your installation to create a CSV file from a multi-line raw text string:

In [None]:
# Code for local installation of Jupyter notebooks
import os
print(os.getcwd())

some_text = '''given_name,family_name,username,student_id
Jimmy,Zhang,rastaman27,37258
Ji,Kim,kimji8,44947
Veronica,Fuentes,shakira<3,19846'''
with open('students.csv', 'wt', encoding='utf-8') as file_object:
    file_object.write(some_text)

In [None]:
# Code for Colab installation
# You will need to mount your Google drive before running the code.
# The file will be saved in the root of your Google Drive.

google_drive_root = '/content/drive/My Drive/'

some_text = '''given_name,family_name,username,student_id
Jimmy,Zhang,rastaman27,37258
Ji,Kim,kimji8,44947
Veronica,Fuentes,shakira<3,19846'''
with open(google_drive_root + 'students.csv', 'wt', encoding='utf-8') as file_object:
    file_object.write(some_text)

For a local installation, the script prints the path where the file is saved. Open the file from within your text editor and examine the form of the text. Notice that each row of the CSV is on a separate line (ended with a newline character). 

Now open the file using a spreadsheet program. Libre Office is the best one to use, but if you don't have it, you can open it with Excel. Notice how the file is rendered as an editable table rather than as raw text. 

## Reading CSV files as a list of lists using a csv.reader() object

The `csv` module defines the `csv.reader()` object that inputs an iterable file object.  The reader object an iterable object whose items are lists.

In [None]:
import csv

with open('students.csv', 'r', newline='', encoding='utf-8') as file_object:
    # The reader() object is instantiated and assigned to a variable.
    reader_object = csv.reader(file_object)
    print(type(reader_object))
    print()
    
    # The reader object is iterable
    for row in reader_object:
        # each iterated item is a Python list
        print(type(row))
        print(row)

In the following example, the code to read from the CSV file to a list of lists is placed in a function. The file path is passed in as the only argument of the function. The function returns a single object, a list of lists containing the data from the CSV file.

In [None]:
import csv

path = '' # uncomment this line for a local installation
#path = '/content/drive/My Drive/' # uncomment this line if using Colab

# The function takes the file path as an argument and returns a list of lists
def read_csv_to_list_of_lists(file_path):
    with open(file_path, 'r', newline='', encoding='utf-8') as file_object:
        reader_object = csv.reader(file_object)
        list_of_lists = []
        for row_list in reader_object:
            list_of_lists.append(row_list)
    return list_of_lists

# Main script
student_info = read_csv_to_list_of_lists(path + 'students.csv')
print(student_info)
print()

print(student_info[1][2])
print()

In [None]:
# Build a string that contains tabs between each table item and has newlines at the end of each line.
output_string = ''
for row in range(0,len(student_info)):
    for column in  range(0,len(student_info[row])):
        output_string += student_info[row][column] + '\t'
    output_string += '\n'
print(output_string)

## Writing to CSV files

Example writing cartoons.csv

In [None]:
import csv

path = '' # uncomment this line for a local installation
#path = '/content/drive/My Drive/' # uncomment this line if using Colab

data = [ ['name', 'company', 'nemesis'], ['Mickey Mouse', 'Disney', 'Donald Duck'], ['Road Runner', 'Warner Brothers', 'Wile Ethelbert Coyote'] ]
with open(path + 'cartoons.csv', 'w', newline='', encoding='utf-8') as file_object:
    writer_object = csv.writer(file_object)
    for row in data:
        print(row)
        writer_object.writerow(row)

## Reading CSV files as a list of dictionaries using a csv.DictReader() object

The `csv` module contains the `DictReader()` object that turns an iterable file object into an iterable object whose items are dictionaries.

In [None]:
import csv

path = '' # uncomment this line for a local installation
#path = '/content/drive/My Drive/' # uncomment this line if using Colab

with open('cartoons.csv', 'r', newline='', encoding='utf-8') as file_object:
    # The DictReader() object is instantiated and assigned to a variable.
    reader_object = csv.DictReader(file_object)
    # The iterable items in a DictReader object are a special kind of dictionary (OrderedDict).
    # But we can use them like regular dictionary if we ignore that they are ordered.
    print(type(reader_object))
    print()
    
    # If we want to reuse the row dictionaries, we can add them to a list.
    cartoon_table = []
    for row_list in reader_object:
        print(type(row_list))
        print(row_list)
        cartoon_table.append(row_list)

print()
# We refer to items in the row lists by their keys, just as we do for normal dictionaries.
# Because each row has its own dictionary, we must specify the row in the first square brackets.
print(cartoon_table[1]['name'] + ' works for ' + cartoon_table[1]['company'] + '. Its enemy is ' + cartoon_table[1]['nemesis'])

In [None]:
for character in cartoon_table:
    print('Character name:', character['name'], ' company:', character['company'], ' nemesis:', character['nemesis'])

## Template code for CSV-reading function (list of dictionaries)
In the following example, the code to read from the CSV file to a list of dictionaries is placed in a function. The file path is passed in as the only argument of the function. The function returns a single object, a list of dictionaries containing the data from the CSV file. The keys for the dictionaries are taken from the header row of the CSV.

The main script is a modification of the earlier script that looks up cartoon characters. By using a file rather than hard-coding the characters data, it's easier to include a lot more information and to change it by updating the CSV file as a spreadsheet.

You can download a CSV file with around 4000 cartoon characters from [here](https://github.com/HeardLibrary/digital-scholarship/blob/master/code/pylesson/challenge4/cartoons.csv). Right click on the `Raw` button and select `Save file as...`. Save the file in the same directory as your Jupyter notebook if you are using a local installation, or in the root of your Google Drive if using Colab. **Note:** if your browser changes the file extension to `.txt`, you may need to change the format from `text` to `All Files`, then manually change the extension in the dialog from `.txt` to `.csv`.

Many of the characters in the file do not have nemeses. You can add them if you know who they are.

In [None]:
import csv

path = '' # uncomment this line for a local installation
#path = '/content/drive/My Drive/' # uncomment this line if using Colab

# The function takes the file path as an argument and returns a list of lists
def read_csv_to_list_of_dicts(filename):
    with open(filename, 'r', newline='', encoding='utf-8') as file_object:
        dict_object = csv.DictReader(file_object)
        list_of_dicts = []
        for row_dict in dict_object:
            list_of_dicts.append(row_dict)
    return list_of_dicts

# Main script
cartoons = read_csv_to_list_of_dicts(path + 'cartoons.csv')
name = input("What's the character? ")
found = False
for character in cartoons:
    if name.lower() in character['name'].lower():
        if character['nemesis'] == '':
            print("I don't know the nemesis of " + character['name'])
        else:
            print(character['name'] + " doesn't like " + character['nemesis'])
        found = True
if not found:
    print("Sorry, I don't know that character.")

## Template code for CSV-writing functions (from list of dictionaries)

Note that the functions do not return anything since they output to a file. 

The file path will need to be adjusted if you want to save the file somewhere other than in the directory in which the notebook is running.

The first function requires you to explicitly provide the field names. Use it if every dictionary does not contain every field.

In [None]:
import csv

def write_dicts_to_csv_fieldnames(list_of_dicts, file_path, field_names):
    with open(file_path, 'w', newline='', encoding='utf-8') as csv_file_object:
        writer = csv.DictWriter(csv_file_object, fieldnames=field_names)
        writer.writeheader()
        for row_dict in list_of_dicts:
            writer.writerow(row_dict)

field_names = ['name', 'company', 'nemesis']
data = [ {'name': 'Mickey Mouse', 'company': 'Disney', 'nemesis': 'Donald Duck'}, {'name': 'Road Runner', 'company': 'Warner Brothers', 'nemesis': 'Wile Ethelbert Coyote'} ]
path = 'mini-cartoon-table.csv'
write_dicts_to_csv_fieldnames(data, path, field_names)

The second function gets the field names from the keys in the first dictionary in the list. Use it if all dictionaries have the same keys.

In [None]:
import csv

def write_dicts_to_csv(list_of_dicts, file_path):
    field_names = list_of_dicts[0].keys()
    with open(file_path, 'w', newline='', encoding='utf-8') as csv_file_object:
        writer = csv.DictWriter(csv_file_object, fieldnames=field_names)
        writer.writeheader()
        for row_dict in list_of_dicts:
            writer.writerow(row_dict)

data = [ {'name': 'Mickey Mouse', 'company': 'Disney', 'nemesis': 'Donald Duck'}, {'name': 'Road Runner', 'company': 'Warner Brothers', 'nemesis': 'Wile Ethelbert Coyote'} ]
path = 'another-cartoon-table.csv'
write_dicts_to_csv(data, path)

## Reading CSV files from the Internet

The Nashville schools data in this exercise comes from [here](https://github.com/HeardLibrary/digital-scholarship/blob/master/data/gis/wg/Metro_Nashville_Schools.csv).

Reading a CSV file from a URL into a list of lists

In [None]:
import requests
import csv

def url_csv_to_list_of_lists(url):
    r = requests.get(url)
    file_text = r.text.splitlines()
    file_rows = csv.reader(file_text)
    list_of_lists = []
    for row in file_rows:
        list_of_lists.append(row)
    return list_of_lists
        
# Main script
url = 'https://raw.githubusercontent.com/HeardLibrary/digital-scholarship/master/data/gis/wg/Metro_Nashville_Schools.csv'
schools_data = url_csv_to_list_of_lists(url)

# print the IDs and names of all of the schools
print(schools_data[0][2] + '\t' + schools_data[0][3])
for school in range(1, len(schools_data)):
    print(schools_data[school][2] + '\t' + schools_data[school][3])

Reading a CSV file from a URL into a list of dictionaries

In [None]:
import requests
import csv

def url_csv_to_list_of_dicts(url):
    r = requests.get(url)
    file_text = r.text.splitlines()
    file_rows = csv.DictReader(file_text)
    list_of_dicts = []
    for row in file_rows:
        list_of_dicts.append(row)
    return list_of_dicts

# Main script
url = 'https://raw.githubusercontent.com/HeardLibrary/digital-scholarship/master/data/gis/wg/Metro_Nashville_Schools.csv'
schools_data = url_csv_to_list_of_dicts(url)

# use the dictionary to look up a school ID
school_name = input("What's the name of the school? ")
found = False
for school in schools_data:
    if school_name.lower() in school['School Name'].lower():
        print('The ID number for', school['School Name'], 'is: ' + school['School ID'])
        found = True
if not found:
    print("I couldn't find that school.")
