## Slicing and Indexing DataFrames

#### Setting multi-level indexes

Indexes can be also made out of multiple columns, forming a *multi-level index* (sometimes called a *hierarchical index*). There is a trade-off to using these. The benefit is that multi-level indexes make it more natural to reason about nested categorical variables. For example, in a clinical trial, you might have control and treatment groups. Then each test subject belongs to one or another group, and we can say that a test subject is nested inside the treatment group. Similarly, in the temperature dataset, the city is located in the country, so we can say a city is nested inside the country. The main downside is that the code for manipulating indexes is different from the code for manipulating columns, so you have to learn two syntaxes and keep track of how your data is represented. `pandas` is loaded as `pd`. `temperatures` is available.

In [2]:
# importing pandas
import pandas as pd

# importing sales dataset
temperatures = pd.read_csv("../datasets/temperatures.csv")
temperatures.head()

Unnamed: 0.1,Unnamed: 0,date,city,country,avg_temp_c
0,0,2000-01-01,Abidjan,Côte D'Ivoire,27.293
1,1,2000-02-01,Abidjan,Côte D'Ivoire,27.685
2,2,2000-03-01,Abidjan,Côte D'Ivoire,29.061
3,3,2000-04-01,Abidjan,Côte D'Ivoire,28.162
4,4,2000-05-01,Abidjan,Côte D'Ivoire,27.547


### Instructions

* Set the index of `temperatures` to the `"country"` and `"city"` columns, and assign this to `temperatures_ind`.
* Specify two country/city pairs to keep: `"Brazil"`/`"Rio De Janeiro"` and `"Pakistan"`/`"Lahore"`, assigning to `rows_to_keep`.
* Print and subset `temperatures_ind` for `rows_to_keep` using `.loc[]`.

In [3]:
# Index temperatures by country & city
temperatures_ind = temperatures.set_index(["country", "city"])
temperatures_ind

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,date,avg_temp_c
country,city,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Côte D'Ivoire,Abidjan,0,2000-01-01,27.293
Côte D'Ivoire,Abidjan,1,2000-02-01,27.685
Côte D'Ivoire,Abidjan,2,2000-03-01,29.061
Côte D'Ivoire,Abidjan,3,2000-04-01,28.162
Côte D'Ivoire,Abidjan,4,2000-05-01,27.547
...,...,...,...,...
China,Xian,16495,2013-05-01,18.979
China,Xian,16496,2013-06-01,23.522
China,Xian,16497,2013-07-01,25.251
China,Xian,16498,2013-08-01,24.528


In [7]:
# List of tuples: Brazil, Rio De Janeiro & Pakistan, Lahore
rows_to_keep = [("Brazil", "Rio De Janeiro"), ("Pakistan", "Lahore")]
rows_to_keep

[('Brazil', 'Rio De Janeiro'), ('Pakistan', 'Lahore')]

In [8]:
# Subset for rows to keep
temperatures_ind.loc[rows_to_keep]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,date,avg_temp_c
country,city,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Brazil,Rio De Janeiro,12540,2000-01-01,25.974
Brazil,Rio De Janeiro,12541,2000-02-01,26.699
Brazil,Rio De Janeiro,12542,2000-03-01,26.270
Brazil,Rio De Janeiro,12543,2000-04-01,25.750
Brazil,Rio De Janeiro,12544,2000-05-01,24.356
...,...,...,...,...
Pakistan,Lahore,8575,2013-05-01,33.457
Pakistan,Lahore,8576,2013-06-01,34.456
Pakistan,Lahore,8577,2013-07-01,33.279
Pakistan,Lahore,8578,2013-08-01,31.511
