# Dictionaries & Pandas
Learn about the dictionary, an alternative to the Python list, and the pandas DataFrame, the de facto standard to work with tabular data in Python. You will get hands-on practice with creating and manipulating datasets, and you’ll learn how to access the information you need from these data structures.

# Dictionaries, Part 1

#### Motivation for dictionaries
To see why dictionaries are useful, have a look at the two lists defined in the script. `countries` contains the names of some European countries. `capitals` lists the corresponding names of their capital.

In [None]:
# Definition of countries and capital
countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']

# Get index of 'germany': ind_ger
ind_ger = countries.index('germany')
print(ind_ger)

# Use ind_ger to print out capital of germany
print(capitals[ind_ger])

#### Create dictionary
The `countries` and `capitals` lists are again available in the script. It's your job to convert this data to a dictionary where the country names are the keys and the capitals are the corresponding values. As a refresher, here is a recipe for creating a dictionary: `my_dict = {"key1":"value1", "key2":"value2",}`

In [None]:
# Definition of countries and capital
countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']

# From string in countries and capitals, create dictionary europe
europe = { 'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }

# Print europe
europe

#### Access dictionary
If the keys of a dictionary are chosen wisely, accessing the values in a dictionary is easy and intuitive. For example, to get the capital for France from `europe` you can use:

In [None]:
europe['france']

In [None]:
# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo' }

# Print out the keys in europe
print(europe.keys())

# Print out a value that belongs to one of the keys
europe['norway']

# Dictionaries, Part 2

#### Dictionary Manipulation 1
If you know how to access a dictionary, you can also assign a new value to it. To add a new key-value pair to `europe` you can use something like this:

In [None]:
europe['iceland'] = 'reykjavik'

In [None]:
# Add italy to europe
europe['italy'] = 'rome'

# Print out italy in europe
print('italy' in europe)

# Add poland to europe
europe['poland'] = 'warsaw'

# Print europe
europe

#### Dictionary Manipulation 2
Somebody thought it would be funny to mess with your accurately generated dictionary. An adapted version of the `europe` dictionary is available in the script.

Can you clean up? Do not do this by adapting the definition of `europe`, but by adding Python commands to the script to update and remove `key:value` pairs.

In [None]:
# Definition of dictionary
europe = {'spain': 'madrid', 'france': 'paris', 'germany': 'berlin', 'norway': 'oslo', 'italy': 'rome',
          'poland': 'warsaw', 'australia': 'vienna'}

# Remove australia
del(europe['australia'])

# Print europe
europe

#### Dictionariception
Remember lists? They could contain anything, even other lists. Well, for dictionaries the same holds. Dictionaries can contain key:value pairs where the values are again dictionaries.

As an example, have a look at the script where another version of `europe` - the dictionary you've been working with all along - is coded. The keys are still the country names, but the values are dictionaries that contain more information than just the capital.

It's perfectly possible to chain square brackets to select elements. To fetch the population for Spain from `europe`, for example, you need: `europe['spain']['population']`

In [None]:
# Dictionary of dictionaries
europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
           'france': { 'capital':'paris', 'population':66.03 },
           'germany': { 'capital':'berlin', 'population':80.62 },
           'norway': { 'capital':'oslo', 'population':5.084 } }


# Print out the capital of France
print(europe['france']['capital'])

# Create sub-dictionary data
data = { 'capital':'rome',
        'population':59.83 }

# Add data to europe under key 'italy'
europe['italy'] = data

# Print europe
europe

# Pandas, Part 1

In [None]:
import pandas as pd

#### Dictionary to DataFrame (1)
Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python. Sounds promising!

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary.

In the exercises that follow you will be working with vehicle data from different countries. Each observation corresponds to a country and the columns give information about the number of vehicles per capita, whether people drive left or right, and so on.

Three lists are defined in the script:

- `names`, containing the country names for which data is available.
- `dr`, a list with booleans that tells whether people drive left or right in the corresponding country.
- `cpc`, the number of motor vehicles per 1000 people in the corresponding country.

Each dictionary key is a column label and each value is a list which contains the column elements.

In [None]:
# Pre-defined lists
names = ['United States', 'Australia', 'Japan', 'India', 'Russia', 'Morocco', 'Egypt']
dr =  [True, False, False, False, True, True, True]
cpc = [809, 731, 588, 18, 200, 70, 45]

# Create dictionary my_dict with three key:value pairs: my_dict
my_dict = {'country': names,
           'drives_right': dr,
           'cars_per_cap': cpc}

# Build a DataFrame cars from my_dict: cars
cars = pd.DataFrame(my_dict)

# Print cars
cars

#### Dictionary to DataFrame 2
Have you noticed that the row labels (i.e. the labels for the different observations) were automatically set to integers from 0 up to 6?

To solve this, create a list of `row_labels`. You can use it to specify the `row labels` of the `cars` DataFrame. You do this by setting the `index` attribute of `cars`, that you can access as `cars.index`.

In [None]:
# Definition of row_labels
row_labels = ['US', 'AUS', 'JPN', 'IN', 'RU', 'MOR', 'EG']

# Specify row labels of cars
cars.index = row_labels

# Print cars again
cars

#### CSV to DataFrame 1
Putting data in a dictionary and then building a DataFrame works, but it's not very efficient. What if you're dealing with millions of observations? In those cases, the data is typically available as files with a regular structure. One of those file types is the CSV file, which is short for "comma-separated values".

To import CSV data into Python as a Pandas DataFrame you can use `read_csv()`.

Let's explore this function with the same `cars` data from the previous exercises. This time, however, the data is available in a CSV file, named `cars.csv`. It is available in the data directory.

In [None]:
# Import the cars.csv data: cars
cars = pd.read_csv('../../data/cars.csv')

# Print out cars_data
cars

#### CSV to DataFrame (2)
Your `read_csv()` call to import the CSV data didn't generate an error, but the output is not entirely what we wanted. The row labels were imported as another column without a name.

Remember `index_col`, an argument of `read_csv()`, that you can use to specify which column in the CSV file should be used as a row label? Well, that's exactly what you need here!

Can you make the appropriate changes to fix the data import?

In [None]:
# Fix import by including index_col
cars = pd.read_csv('../../data/cars.csv', index_col=0)

# Print out cars_data
cars

# Pandas, Part 2

#### Square Brackets 1
You can index and select Pandas DataFrames in many ways. The simplest, but not the most powerful way, is to use square brackets.


In [None]:
# Print out country column as Pandas Series
cars['country']

In [None]:
# Print out country column as Pandas DataFrame
cars[['country']]

In [None]:
# Print out DataFrame with country and drives_right columns
cars[['country', 'drives_right']]

#### Square Brackets 2
Square brackets can do more than just selecting columns. You can also use them to get rows, or observations, from a DataFrame.

In [None]:
# Print out first 3 observations
cars[:3]

In [None]:
# Print out fourth, fifth and sixth observation
cars[3:6]

#### `loc` and `iloc` 1
With `loc` and `iloc` you can do practically any data selection operation on DataFrames you can think of. `loc` is label-based, which means that you have to specify rows and columns based on their row and column labels. `iloc` is integer index based, so you have to specify rows and columns by their integer index like you did in the previous exercise.

In [None]:
# Print out observation for Japan
print(cars.iloc[2])
print(cars.loc['JAP'])

In [None]:
# Print out observations for Australia and Egypt
print(cars.loc[['AUS', 'EG']])
print(cars.iloc[[1, 6]])

#### `loc` and `iloc` 2
`loc` and `iloc` also allow you to select both rows and columns from a DataFrame. To experiment, try out the commands.

In [None]:
print(cars.loc['IN', 'cars_per_cap'])
print(cars.iloc[3, 0])

In [None]:
print(cars.loc[['IN', 'RU'], 'cars_per_cap'])
print(cars.iloc[[3, 4], 0])

In [None]:
print(cars.loc[['IN', 'RU'], ['cars_per_cap', 'country']])
print(cars.iloc[[3, 4], [0, 1]])

In [None]:
# Print out drives_right value of Morocco
print(cars.loc['MOR', 'drives_right'])
print(cars.iloc[5, 2])

In [None]:
# Print sub-DataFrame
print(cars.loc[['RU', 'MOR'], ['country', 'drives_right']])
print(cars.iloc[[4, 5], [1, 2]])

#### `loc` and `iloc` 3
It's also possible to select only columns with `loc` and `iloc`. In both cases, you simply put a slice going from beginning to end in front of the comma:

In [None]:
print(cars.loc[:, 'country'])
print(cars.iloc[:, 1])

In [None]:
print(cars.loc[:, ['country','drives_right']])
print(cars.iloc[:, [1, 2]])

In [None]:
# Print out drives_right column as Series
print(cars.loc[:, 'drives_right'])
print((cars.iloc[:, 2]))

In [None]:
# Print out drives_right column as DataFrame
print(cars.loc[:, ['drives_right']])
print(cars.iloc[:, [2]])

In [None]:
# Print out cars_per_cap and drives_right as DataFrame
print(cars.loc[:, ['cars_per_cap', 'drives_right']])
print(cars.iloc[:, [0, 2]])