# Comma Separated Values

Used to contain tabular data (rows and columns)

In [1]:
# Say we have this data

track_times = [
    [13.10, 13.59, 13.44],
    [13.93, 13.85, 13.47],
    [14.12, 14.41, 13.89],
    [14.42, 13.55, 13.43]
]
track_times

[[13.1, 13.59, 13.44],
 [13.93, 13.85, 13.47],
 [14.12, 14.41, 13.89],
 [14.42, 13.55, 13.43]]

Say we want to store this on disk.

We will first need to serialize the data.

(*Serialization is converting the Python object into a structured format for storage and sharing*)

Serializing helps to:
- convert the overall list into a string
- separating the nested lists by "\n"
- separating the elements by commas

In [2]:
# Initialize an empty string
track_times_csv = ""

# Loop over the list
for index, athlete_times in enumerate(track_times):
    # join comma with each time
    athlete_times_string = ','.join([str(time) for time in athlete_times])
    # append the values
    track_times_csv += athlete_times_string
    # append a new line unless on the last row
    if index < (len(track_times) -1):
        track_times_csv += "\n"

print(track_times_csv)

13.1,13.59,13.44
13.93,13.85,13.47
14.12,14.41,13.89
14.42,13.55,13.43


In [4]:
# Write it to a file and save to disk
with open (r"C:\Users\nrmmw\Documents\Flatiron\dsc-csv\track_times_csv", "w") as f:
    f.write(track_times_csv)

In [6]:
# Assume that later on we open the file
# We have to deserialize the file back into a list of lists

with open(r"C:\Users\nrmmw\Documents\Flatiron\dsc-csv\track_times_csv") as f:
    track_times_csv = f.read()
print(track_times_csv)

13.1,13.59,13.44
13.93,13.85,13.47
14.12,14.41,13.89
14.42,13.55,13.43


### CSV Module

Ignore all that shit above because we have a simpler way using the **csv module**

In [23]:
# Importing the module
import csv

*csv.reader* reads the content of the csv file

In [24]:
with open(r"C:\Users\nrmmw\Documents\Flatiron\dsc-csv\track_times_csv") as f:
    reader = csv.reader(f, quoting = csv.QUOTE_NONNUMERIC)
    # csv.QUOTE_NONNUMERIC specifies that values without
    # explicit quotes should be treated as numbers
    new_track_times = list(reader)
    # Converts it into a list
    
new_track_times

[[13.1, 13.59, 13.44],
 [13.93, 13.85, 13.47],
 [14.12, 14.41, 13.89],
 [14.42, 13.55, 13.43]]

Converying the reader into a list is important to ensure that we can use the file later even after the file is closed

*csv.reader* is useful for CSVs without column headings

*csv.writer* can take a list of lists and write it to a csv file.

*csv.DictReader* is more used for csv files with column headings

In [38]:
# For a file with headers, csv.reader will print the headings as a row too
with open(r"C:\Users\nrmmw\Documents\Flatiron\dsc-csv\olympic_medals.csv", encoding='utf-8') as f:
    reader = csv.reader(f)
    for _ in range(6):
        print(next(reader))

['Gender', 'Event', 'Location', 'Year', 'Medal', 'Name', 'Nationality', 'Result']
['M', '10000M Men', 'Rio', '2016', 'G', 'Mohamed FARAH', 'GBR', '25:05.17']
['M', '10000M Men', 'Rio', '2016', 'S', 'Paul Kipngetich TANUI', 'KEN', '27:05.64']
['M', '10000M Men', 'Rio', '2016', 'B', 'Tamirat TOLA', 'ETH', '27:06.26']
['M', '10000M Men', 'Beijing', '2008', 'G', 'Kenenisa BEKELE', 'ETH', '27:01.17']
['M', '10000M Men', 'Beijing', '2008', 'S', 'Sileshi SIHINE', 'ETH', '27:02.77']


**NOTE:**

csv.reader outputs the csv file as a list of lists while csv.DictReader outputs the csv file as a list of dictionaries

In [40]:
with open(r"C:\Users\nrmmw\Documents\Flatiron\dsc-csv\olympic_medals.csv", encoding='utf-8') as f:
    reader = csv.DictReader(f)
    olympics_data = list(reader)

print(len(olympics_data))
olympics_data[:5]

2394


[{'Gender': 'M',
  'Event': '10000M Men',
  'Location': 'Rio',
  'Year': '2016',
  'Medal': 'G',
  'Name': 'Mohamed FARAH',
  'Nationality': 'GBR',
  'Result': '25:05.17'},
 {'Gender': 'M',
  'Event': '10000M Men',
  'Location': 'Rio',
  'Year': '2016',
  'Medal': 'S',
  'Name': 'Paul Kipngetich TANUI',
  'Nationality': 'KEN',
  'Result': '27:05.64'},
 {'Gender': 'M',
  'Event': '10000M Men',
  'Location': 'Rio',
  'Year': '2016',
  'Medal': 'B',
  'Name': 'Tamirat TOLA',
  'Nationality': 'ETH',
  'Result': '27:06.26'},
 {'Gender': 'M',
  'Event': '10000M Men',
  'Location': 'Beijing',
  'Year': '2008',
  'Medal': 'G',
  'Name': 'Kenenisa BEKELE',
  'Nationality': 'ETH',
  'Result': '27:01.17'},
 {'Gender': 'M',
  'Event': '10000M Men',
  'Location': 'Beijing',
  'Year': '2008',
  'Medal': 'S',
  'Name': 'Sileshi SIHINE',
  'Nationality': 'ETH',
  'Result': '27:02.77'}]

Now we can analyse the shit out of this.

To get only gold medals:

In [43]:
gold_medals = []
for record in olympics_data:
    if record["Medal"] == 'G':
        gold_medals.append(record)

print(f"Out of a total of {len(olympics_data)} medals, {len(gold_medals)} were gold medals")

Out of a total of 2394 medals, 799 were gold medals


How many USA gold medallist were in 2016?
Print the event, and the name of the medallist

In [96]:
usa_2016_gold_medals = []

for record in olympics_data:
    if record["Year"] == "2016" and record["Medal"] == "G" and record["Nationality"] == "USA":
        usa_2016_gold_medals.append({"Event": record["Event"], "Name": record["Name"]})

print(type(usa_2016_gold_medals[0]))
usa_2016_gold_medals

<class 'dict'>


[{'Event': '1500M Men', 'Name': 'Matthew CENTROWITZ'},
 {'Event': '400M Hurdles Men', 'Name': 'Kerron CLEMENT'},
 {'Event': '4X400M Relay Men', 'Name': 'null'},
 {'Event': 'Decathlon Men', 'Name': 'Ashton EATON'},
 {'Event': 'Long Jump Men', 'Name': 'Jeff HENDERSON'},
 {'Event': 'Shot Put Men', 'Name': 'Ryan CROUSER'},
 {'Event': 'Triple Jump Men', 'Name': 'Christian TAYLOR'},
 {'Event': '100M Hurdles Women', 'Name': 'Brianna ROLLINS'},
 {'Event': '400M Hurdles Women', 'Name': 'Dalilah MUHAMMAD'},
 {'Event': '4X100M Relay Women', 'Name': 'null'},
 {'Event': '4X400M Relay Women', 'Name': 'null'},
 {'Event': 'Long Jump Women', 'Name': 'Tianna BARTOLETTA'},
 {'Event': 'Shot Put Women', 'Name': 'Michelle CARTER'}]

We can then write the result above into a csv file using csv.DictWriter. 
For Windows we will have to write "dialect = 'unix'" when using csv.DictWriter to avoid unnecessary blank lines being saved to the file

In [95]:
with open(r"C:\Users\nrmmw\Documents\Flatiron\dsc-csv\usa_2016_gold_medals.csv", encoding='utf-8', mode = 'w') as f:
    writer = csv.DictWriter(f, fieldnames = ["Event", "Name"], dialect = "unix")
    writer.writeheader()
    for record in usa_2016_gold_medals:
        writer.writerow(record)