### Reading a .csv file by hand

In [2]:
fileconnection = open("data/olympics.csv", 'r')
lines = fileconnection.readlines()

headers = lines[0]
field_names = headers.strip().split(',')
print(field_names)

for row in lines[1:]:
    row_vals = row.strip().split(',')
    if row_vals[5] != "NA":
        print(f"{row_vals[0]}: {row_vals[4]}; {row_vals[5]}")

['Name', 'Sex', 'Age', 'Team', 'Event', 'Medal']
Edgar Lindenau Aabye: Tug-Of-War; Gold
Arvo Ossian Aaltonen: Swimming; Bronze
Arvo Ossian Aaltonen: Swimming; Bronze
Juhamatti Tapio Aaltonen: Ice Hockey; Bronze
Paavo Johannes Aaltonen: Gymnastics; Bronze
Paavo Johannes Aaltonen: Gymnastics; Gold
Paavo Johannes Aaltonen: Gymnastics; Gold
Paavo Johannes Aaltonen: Gymnastics; Gold
Paavo Johannes Aaltonen: Gymnastics; Bronze


### Reading a .csv file with the csv module

In [3]:
import csv

In [5]:
fileconnection = open("data/olympics.csv", 'r')
reader = csv.reader(fileconnection)
rows = list(reader)
headers = rows[0]
# headers is already a list; csv.reader handled parsing of comma separated values
print(headers)

for row_vals in rows[1:]:
    # each row is already a list, not a string
    if row_vals[5] != "NA":
        print(f"{row_vals[0]}: {row_vals[4]}; {row_vals[5]}")
        

['Name', 'Sex', 'Age', 'Team', 'Event', 'Medal']
Edgar Lindenau Aabye: Tug-Of-War; Gold
Arvo Ossian Aaltonen: Swimming; Bronze
Arvo Ossian Aaltonen: Swimming; Bronze
Juhamatti Tapio Aaltonen: Ice Hockey; Bronze
Paavo Johannes Aaltonen: Gymnastics; Bronze
Paavo Johannes Aaltonen: Gymnastics; Gold
Paavo Johannes Aaltonen: Gymnastics; Gold
Paavo Johannes Aaltonen: Gymnastics; Gold
Paavo Johannes Aaltonen: Gymnastics; Bronze


### More elegant version

In [None]:
with open("olympics.csv", 'r') as f:
    reader = csv.reader(f)
#     headers = next(reader)  # use next to get first row
#     print(headers)
    for row in reader:  # iterate through remaining rows
        if row[5] != "NA":
            print(f"{row[0]}: {row[4]}; {row[5]}")    
    # close is handled automatically when we use the with block


# Writing .csv Files

### By Hand

In [None]:
olympians = [("John Aalberg", 31, "Cross Country Skiing, 15KM"),
            ("Minna Maarit Aalto", 30, "Sailing"),
            ("Win Valdemar Aaltonen", 54, "Art Competitions"),
            ("Wakako Abe", 18, "Cycling")]

outfile = open("reduced_olympics.csv","w")

outfile.write('Name,Age,Sport')
outfile.write('\n')

for olympian in olympians:
    row_string = f'{olympian[0]}, {olympian[1]}, {olympian[2]}'
    outfile.write(row_string)
    outfile.write('\n')
outfile.close()

### With csv module

In [6]:
olympians = [("John Aalberg", 31, "Cross Country Skiing, 15KM"),
            ("Minna Maarit Aalto", 30, "Sailing"),
            ("Win Valdemar Aaltonen", 54, "Art Competitions"),
            ("Wakako Abe", 18, "Cycling")]

outfile = open("reduced_olympics_2.csv","w")

writer = csv.writer(outfile)
# .writerow expects a list, not a comma-separated string; and no need to write '\n'
writer.writerow(['Name', 'Age', 'Sport'])

for olympian in olympians:
    # especially easy to write a row if we already have it as a list of values
    writer.writerow(olympian)
outfile.close()

### Complication: commas inside field values
Notice that Cross Country Skiiing, 15KM has a comma in the text.

The file reduced_olympics.csv get the following.

```
Name,Age,Sport
John Aalberg, 31, Cross Country Skiing, 15KM
Minna Maarit Aalto, 30, Sailing
Win Valdemar Aaltonen, 54, Art Competitions
Wakako Abe, 18, Cycling
```

This can't be parsed well, either by hand or by a csv reader object, because the meaning of a comma is ambiguous, sometimes separating fields and sometimes part of a field.

The csv module handles it better, writing to reduced_olumpics_2.csv:
```
Name,Age,Sport
John Aalberg,31,"Cross Country Skiing, 15KM"
Minna Maarit Aalto,30,Sailing
Win Valdemar Aaltonen,54,Art Competitions
Wakako Abe,18,Cycling
```

The latter can be read and parsed by a csv.reader() object.

In [None]:
# notice that it thinks there are four fields for the second row
with open("reduced_olympics.csv", 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

In [None]:
# correctly parses the second row
with open("reduced_olympics_2.csv", 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)