## Accessing csv files in Python

There are many different ways how you can access a csv file in Python. The simplest way to practice is the `csv.DictReader` from the `csv` package.



In [44]:
import csv

reader = csv.DictReader(open("drinks.csv"))

In [45]:
reader

<csv.DictReader at 0x108b6e940>

To access the data lets first read it into a list. 

In [46]:
data=[]
for row in reader:
    data.append(row)

To access the data we loop over the data file. Now each column of the table is stored as a `dictionary`, whose values (columns in the table) we can access via the appropriate key. For instance, suppose we want to store all countries in a list. How would you modify the code below to do that?

Recall: a `dictionary` is a python object that stores key-value pairs. It is like a telephone book. Here is an example on how you can create a dictionary:



In [54]:
mydict = {} # create an emtpy dictionary

mydict["Barbara"] = "12345" # add first key (Barbara) and value (12345)

print(mydict)

{'Barbara': '12345'}


In [57]:
# add more
mydict["Rob"] = "12333"
mydict["Malvina"] = "12444"

print(mydict)

{'Barbara': '12345', 'Rob': '12333', 'Malvina': '12444'}


In [58]:
# retrieve Malvina's number
print(mydict["Malvina"])

12444


Going back to our data file, which columns does my data file have? (hint: keys() and values())

### Exercise:

Answer the following questions with python:

- How many glasses of wine per person per year do Italians drink?

- Which countries do not consume spirits?

- Show the top 10 countries with the highest beer consumption.

### Optional: Advanced

An alternative to read `csv` files is the `pandas` library. It has a steep learning curve but offers advanced data analysis options (aggregation, selection, plotting).

In [62]:
import pandas as pd
data = pd.read_csv("drinks.csv")

In [63]:
data.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
0,Afghanistan,0,0,0,0.0
1,Albania,89,132,54,4.9
2,Algeria,25,0,14,0.7
3,Andorra,245,138,312,12.4
4,Angola,217,57,45,5.9


In [64]:
data['wine_servings'].describe()

count    193.000000
mean      49.450777
std       79.697598
min        0.000000
25%        1.000000
50%        8.000000
75%       59.000000
max      370.000000
Name: wine_servings, dtype: float64