## Activity 05: Indexing, Slicing, and Iterating

In order to get some good and understandable insights into our dataset, we need to be able to explicitly index, slice and iterate our data to e.g. compare several countries in terms of population density growth.   

After looking at the distinct operations we want to display the countries Germany, Singapore, United States, and India with their population density of years 1970, 1990, 2010.

#### Loading the dataset

In [2]:
# importing the necessary dependencies
import pandas as pd

In [4]:
# loading the Dataset
dataset = pd.read_csv('./data/world_population.csv', index_col=0)

dataset.head()

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aruba,ABW,Population density (people per sq. km of land ...,EN.POP.DNST,,307.972222,312.366667,314.983333,316.827778,318.666667,320.622222,...,562.322222,563.011111,563.422222,564.427778,566.311111,568.85,571.783333,574.672222,577.161111,
Andorra,AND,Population density (people per sq. km of land ...,EN.POP.DNST,,30.587234,32.714894,34.914894,37.170213,39.470213,41.8,...,180.591489,182.161702,181.859574,179.614894,175.161702,168.757447,161.493617,154.86383,149.942553,
Afghanistan,AFG,Population density (people per sq. km of land ...,EN.POP.DNST,,14.038148,14.312061,14.599692,14.901579,15.218206,15.545203,...,39.637202,40.634655,41.674005,42.830327,44.127634,45.533197,46.997059,48.444546,49.821649,
Angola,AGO,Population density (people per sq. km of land ...,EN.POP.DNST,,4.305195,4.384299,4.464433,4.544558,4.624228,4.703271,...,15.387749,15.915819,16.459536,17.020898,17.600302,18.196544,18.808215,19.433323,20.070565,
Albania,ALB,Population density (people per sq. km of land ...,EN.POP.DNST,,60.576642,62.456898,64.329234,66.209307,68.058066,69.874927,...,108.394781,107.566204,106.843759,106.314635,106.013869,105.848431,105.717226,105.60781,105.444051,


In [1]:
# looking at the first 2 elements of the dataset


---

#### Indexing

Since we need several rows and columns of our dataset to complete the given task, we have to use indexing to get the right rows and columns.   
We need: 
- the row of the USA
- the second to last row
- the column of year 2000 as Series
- the population density for India in 2000

In [5]:
# indexing the USA row
dataset.loc[["United States"]]

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
United States,USA,Population density (people per sq. km of land ...,EN.POP.DNST,,20.05588,20.366723,20.661953,20.950959,21.214527,21.460952,...,32.878611,33.243687,33.536399,33.817936,34.077243,34.337838,34.591983,34.863098,35.137648,


In [7]:
# indexing the last second to last row by index
dataset.iloc[[-2]]

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Zambia,ZMB,Population density (people per sq. km of land ...,EN.POP.DNST,,4.227724,4.359305,4.496824,4.639914,4.788452,4.942343,...,17.135926,17.641587,18.170609,18.721585,19.294752,19.890745,20.508866,21.148177,21.80789,


In [8]:
# indexing the column of 2000 as a Series
dataset.loc[:,"2000"]

Country Name
Aruba               504.766667
Andorra             139.146809
Afghanistan          30.177894
Angola               12.078798
Albania             112.738212
                       ...    
Yemen, Rep.          33.704981
South Africa         36.271010
Congo, Dem. Rep.     21.194356
Zambia               14.239121
Zimbabwe             32.312217
Name: 2000, Length: 264, dtype: float64

In [9]:
# indexing the population density of India in 2000 (Dataframe)

dataset.loc[(dataset.index == "India"), "2000"]

Country Name
India    354.326858
Name: 2000, dtype: float64

**Note:**   
Using single brackets to index columns (like with NumPy) we will get a pandas Series object.   
When using double brackets to do indexing, a DataFrame will be returned. This way we can also index several elements with one query. 

When comparing the output of the Dataframe query to the Series query, we can see the difference between Series and DataFrames

In [12]:
# indexing the population density of India in 2000 (Series)
dataset["2000"].loc["India"] # series


np.float64(354.326858357522)

In [13]:
dataset[["2000"]].loc[["India"]] #df

Unnamed: 0_level_0,2000
Country Name,Unnamed: 1_level_1
India,354.326858


---

#### Slicing

Other than the single rows and columns and we also need to get some Subsets of the dataset.   
Here we want slices:
- the countries in row 2 to 5
- countries Germany, Singapore, United States, and India
- Germany, Singapore, United States, and India with their population density of years 1970, 1990, 2010

In [7]:
# slicing countries of rows 2 to 5


In [8]:
# slicing rows Germany, Singapore, United States, and India 


In [9]:
# slicing a subset of Germany, Singapore, United States, and India 
# for years 1970, 1990, 2010


---

#### Iterating

As the last task of this activity, we want to iterate over the first three countries of our dataset and print:   
- name
- country code 
- years 1970, 1990, 2010 

In [10]:
# iterating over the first three countries (row by row)


**Note:**   
Iterrows returns a Series for each row. This means that it does not preserve data types across the row.   
If you should need to preserve the dtypes of the columns, use the `itertuples()` method.