## Activity 05: Indexing, Slicing, and Iterating

In order to get some good and understandable insights into our dataset, we need to be able to explicitly index, slice and iterate our data to e.g. compare several countries in terms of population density growth.   

After looking at the distinct operations we want to display the countries Germany, Singapore, United States, and India with their population density of years 1970, 1990, 2010.

#### Loading the dataset

In [1]:
# importing the necessary dependencies
import pandas as pd
import numpy as np


In [2]:
# loading the Dataset
dataset = pd.read_csv('world_population.csv', index_col=0)
dataset.info()

<class 'pandas.core.frame.DataFrame'>
Index: 264 entries, Aruba to Zimbabwe
Data columns (total 60 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Code    264 non-null    object 
 1   Indicator Name  264 non-null    object 
 2   Indicator Code  264 non-null    object 
 3   1960            0 non-null      float64
 4   1961            253 non-null    float64
 5   1962            253 non-null    float64
 6   1963            253 non-null    float64
 7   1964            253 non-null    float64
 8   1965            253 non-null    float64
 9   1966            253 non-null    float64
 10  1967            253 non-null    float64
 11  1968            253 non-null    float64
 12  1969            253 non-null    float64
 13  1970            253 non-null    float64
 14  1971            253 non-null    float64
 15  1972            253 non-null    float64
 16  1973            253 non-null    float64
 17  1974            253 non-null   

In [3]:
# looking at the first 2 elements of the dataset
dataset.head(2)

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aruba,ABW,Population density (people per sq. km of land ...,EN.POP.DNST,,307.972222,312.366667,314.983333,316.827778,318.666667,320.622222,...,562.322222,563.011111,563.422222,564.427778,566.311111,568.85,571.783333,574.672222,577.161111,
Andorra,AND,Population density (people per sq. km of land ...,EN.POP.DNST,,30.587234,32.714894,34.914894,37.170213,39.470213,41.8,...,180.591489,182.161702,181.859574,179.614894,175.161702,168.757447,161.493617,154.86383,149.942553,


---

#### Indexing

Since we need several rows and columns of our dataset to complete the given task, we have to use indexing to get the right rows and columns.   
We need: 
- the row of the USA
- the second to last row
- the column of year 2000 as Series
- the population density for India in 2000

In [4]:
# indexing the USA row
dataset.loc[dataset['Country Code'] == 'USA']

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
United States,USA,Population density (people per sq. km of land ...,EN.POP.DNST,,20.05588,20.366723,20.661953,20.950959,21.214527,21.460952,...,32.878611,33.243687,33.536399,33.817936,34.077243,34.337838,34.591983,34.863098,35.137648,


In [3]:
# indexing the last second to last row by index


In [4]:
# indexing the column of 2000 as a Series


In [5]:
# indexing the population density of India in 2000 (Dataframe)


**Note:**   
Using single brackets to index columns (like with NumPy) we will get a pandas Series object.   
When using double brackets to do indexing, a DataFrame will be returned. This way we can also index several elements with one query. 

When comparing the output of the Dataframe query to the Series query, we can see the difference between Series and DataFrames

In [6]:
# indexing the population density of India in 2000 (Series)


---

#### Slicing

Other than the single rows and columns and we also need to get some Subsets of the dataset.   
Here we want slices:
- the countries in row 2 to 5
- countries Germany, Singapore, United States, and India
- Germany, Singapore, United States, and India with their population density of years 1970, 1990, 2010

In [17]:
# slicing countries of rows 2 to 5
dataset.iloc[1:5]

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Andorra,AND,Population density (people per sq. km of land ...,EN.POP.DNST,,30.587234,32.714894,34.914894,37.170213,39.470213,41.8,...,180.591489,182.161702,181.859574,179.614894,175.161702,168.757447,161.493617,154.86383,149.942553,
Afghanistan,AFG,Population density (people per sq. km of land ...,EN.POP.DNST,,14.038148,14.312061,14.599692,14.901579,15.218206,15.545203,...,39.637202,40.634655,41.674005,42.830327,44.127634,45.533197,46.997059,48.444546,49.821649,
Angola,AGO,Population density (people per sq. km of land ...,EN.POP.DNST,,4.305195,4.384299,4.464433,4.544558,4.624228,4.703271,...,15.387749,15.915819,16.459536,17.020898,17.600302,18.196544,18.808215,19.433323,20.070565,
Albania,ALB,Population density (people per sq. km of land ...,EN.POP.DNST,,60.576642,62.456898,64.329234,66.209307,68.058066,69.874927,...,108.394781,107.566204,106.843759,106.314635,106.013869,105.848431,105.717226,105.60781,105.444051,


In [19]:
# slicing rows Germany, Singapore, United States, and India 
# for years 1970, 1990, 2010
dataset.loc[['Germany', 'Singapore', 'United States', 'India']] [['1970','1990', '2010']]

Unnamed: 0_level_0,1970,1990,2010
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Germany,223.897371,227.517054,234.606908
Singapore,3096.268657,4547.958209,7231.811966
United States,22.388131,27.254514,33.817936
India,186.312757,292.817404,414.0282


In [13]:
# slicing a subset of Germany, Singapore, United States, and India 
dataset.loc[['Germany', 'Singapore', 'United States', 'India']]

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Germany,DEU,Population density (people per sq. km of land ...,EN.POP.DNST,,210.172807,212.029284,214.001527,215.731495,217.57997,219.403406,...,235.943362,235.522178,234.939637,234.606908,234.67315,230.750625,235.647997,232.347794,233.583362,
Singapore,SGP,Population density (people per sq. km of land ...,EN.POP.DNST,,2540.895522,2612.238806,2679.104478,2748.656716,2816.268657,2887.164179,...,6602.300719,6913.422857,7125.104286,7231.811966,7363.193182,7524.6983,7636.721358,7736.526167,7828.857143,
United States,USA,Population density (people per sq. km of land ...,EN.POP.DNST,,20.05588,20.366723,20.661953,20.950959,21.214527,21.460952,...,32.878611,33.243687,33.536399,33.817936,34.077243,34.337838,34.591983,34.863098,35.137648,
India,IND,Population density (people per sq. km of land ...,EN.POP.DNST,,154.275864,157.424902,160.679256,164.029246,167.470047,170.995768,...,396.774384,402.621463,408.376922,414.0282,419.564848,424.994581,430.345479,435.657171,440.957533,


---

#### Iterating

As the last task of this activity, we want to iterate over the first three countries of our dataset and print:   
- name
- country code 
- years 1970, 1990, 2010 

In [28]:
# iterating over the first three countries (row by row)
for index, row in dataset.head(3).iterrows():
    country_name= row['Country Name']
    year_1970 = row['1970']
    year_1990 = row['1990']
    year_2010 = row['2010']
    print(country_name, year_1970, year_1990, year_2010)
  

KeyError: 'Country Name'

**Note:**   
Iterrows returns a Series for each row. This means that it does not preserve data types across the row.   
If you should need to preserve the dtypes of the columns, use the `itertuples()` method.