## Activity 05: Indexing, Slicing, and Iterating

In order to get some good and understandable insights into our dataset, we need to be able to explicitly index, slice and iterate our data to e.g. compare several countries in terms of population density growth.   

After looking at the distinct operations we want to display the countries Germany, Singapore, United States, and India with their population density of years 1970, 1990, 2010.

#### Loading the dataset

In [2]:
# importing the necessary dependencies
import pandas as pd

In [62]:
# loading the Dataset
dataset = pd.read_csv('./data/world_population.csv', index_col=0)
df = dataset

In [63]:
# looking at the first 2 elements of the dataset

df[:2]

# or 
df.head(2)

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Aruba,ABW,Population density (people per sq. km of land ...,EN.POP.DNST,,307.972222,312.366667,314.983333,316.827778,318.666667,320.622222,...,562.322222,563.011111,563.422222,564.427778,566.311111,568.85,571.783333,574.672222,577.161111,
Andorra,AND,Population density (people per sq. km of land ...,EN.POP.DNST,,30.587234,32.714894,34.914894,37.170213,39.470213,41.8,...,180.591489,182.161702,181.859574,179.614894,175.161702,168.757447,161.493617,154.86383,149.942553,


---

#### Indexing

Since we need several rows and columns of our dataset to complete the given task, we have to use indexing to get the right rows and columns.   
We need: 
- the row of the USA
- the second to last row
- the column of year 2000 as Series
- the population density for India in 2000

In [64]:
# indexing the USA row

usa = df.loc['United States'] # using label index
print(usa.tolist())

['USA', 'Population density (people per sq. km of land area)', 'EN.POP.DNST', nan, 20.0558797068663, 20.3667228593639, 20.6619528854804, 20.9509594975849, 21.2145265401312, 21.4609518984688, 21.6959130731, 21.9136233808205, 22.1288224863958, 22.3881314035655, 22.6729890729952, 22.9170124118896, 23.1367971909474, 23.3491575462716, 23.5805156917379, 23.8056504231922, 24.0462890983256, 24.3024317171382, 24.5721129909946, 24.8090394542612, 25.0537178893674, 25.293701468289, 25.526042258073, 25.7480106911702, 25.9771851825972, 26.2183697712404, 26.4537676766794, 26.6950614480247, 26.9483653165862, 27.2545136128993, 27.6211491261017, 28.0068916121481, 28.3786587123429, 28.7288076375484, 29.0729515141457, 29.4131648134723, 29.7694279699879, 30.1184850681737, 30.4663411566379, 30.7973013298523, 31.1036283879362, 31.3935499327652, 31.6645346171981, 31.9589450682826, 32.2548765979183, 32.5673998463204, 32.8786113609374, 33.2436868537795, 33.5363992251367, 33.8179358770014, 34.0772433101355, 34.3

In [65]:
# indexing the last second to last row by index

df.iloc[-2:]

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Zambia,ZMB,Population density (people per sq. km of land ...,EN.POP.DNST,,4.227724,4.359305,4.496824,4.639914,4.788452,4.942343,...,17.135926,17.641587,18.170609,18.721585,19.294752,19.890745,20.508866,21.148177,21.80789,
Zimbabwe,ZWE,Population density (people per sq. km of land ...,EN.POP.DNST,,10.021037,10.356112,10.703901,11.062585,11.431128,11.809022,...,34.374559,34.885516,35.46852,36.122262,36.850438,37.651498,38.511289,39.410249,40.332819,


In [66]:
# indexing the column of 2000 as a Series

series2000 = df['2000']
print(series2000)
print(type(series2000))

Country Name
Aruba               504.766667
Andorra             139.146809
Afghanistan          30.177894
Angola               12.078798
Albania             112.738212
                       ...    
Yemen, Rep.          33.704981
South Africa         36.271010
Congo, Dem. Rep.     21.194356
Zambia               14.239121
Zimbabwe             32.312217
Name: 2000, Length: 264, dtype: float64
<class 'pandas.core.series.Series'>


In [68]:
# indexing the population density of India in 2000 (Dataframe)

pop_den_ind_2000 = df.loc[['India']][['2000']]
print(pop_den_ind_2000)
print(type(pop_den_ind_2000))

                    2000
Country Name            
India         354.326858
<class 'pandas.core.frame.DataFrame'>


**Note:**   
Using single brackets to index columns (like with NumPy) we will get a pandas Series object.   
When using double brackets to do indexing, a DataFrame will be returned. This way we can also index several elements with one query. 

When comparing the output of the Dataframe query to the Series query, we can see the difference between Series and DataFrames

In [69]:
# indexing the population density of India in 2000 (Series)

pop_den_ind_2000 = df.loc[['India']]['2000']
print(type(pop_den_ind_2000))

<class 'pandas.core.series.Series'>


---

#### Slicing

Other than the single rows and columns and we also need to get some Subsets of the dataset.   
Here we want slices:
- the countries in row 2 to 5
- countries Germany, Singapore, United States, and India
- Germany, Singapore, United States, and India with their population density of years 1970, 1990, 2010

In [58]:
# slicing countries of rows 2 to 5

slice2_5 = df[2:4]
slice2_5

Unnamed: 0_level_0,Country Name,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AFG,Afghanistan,Population density (people per sq. km of land ...,EN.POP.DNST,,14.038148,14.312061,14.599692,14.901579,15.218206,15.545203,...,39.637202,40.634655,41.674005,42.830327,44.127634,45.533197,46.997059,48.444546,49.821649,
AGO,Angola,Population density (people per sq. km of land ...,EN.POP.DNST,,4.305195,4.384299,4.464433,4.544558,4.624228,4.703271,...,15.387749,15.915819,16.459536,17.020898,17.600302,18.196544,18.808215,19.433323,20.070565,


In [70]:
# slicing rows Germany, Singapore, United States, and India 

slice_spec = df.loc[['Germany','Singapore','United States','India']]
slice_spec

Unnamed: 0_level_0,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Germany,DEU,Population density (people per sq. km of land ...,EN.POP.DNST,,210.172807,212.029284,214.001527,215.731495,217.57997,219.403406,...,235.943362,235.522178,234.939637,234.606908,234.67315,230.750625,235.647997,232.347794,233.583362,
Singapore,SGP,Population density (people per sq. km of land ...,EN.POP.DNST,,2540.895522,2612.238806,2679.104478,2748.656716,2816.268657,2887.164179,...,6602.300719,6913.422857,7125.104286,7231.811966,7363.193182,7524.6983,7636.721358,7736.526167,7828.857143,
United States,USA,Population density (people per sq. km of land ...,EN.POP.DNST,,20.05588,20.366723,20.661953,20.950959,21.214527,21.460952,...,32.878611,33.243687,33.536399,33.817936,34.077243,34.337838,34.591983,34.863098,35.137648,
India,IND,Population density (people per sq. km of land ...,EN.POP.DNST,,154.275864,157.424902,160.679256,164.029246,167.470047,170.995768,...,396.774384,402.621463,408.376922,414.0282,419.564848,424.994581,430.345479,435.657171,440.957533,


In [71]:
# slicing a subset of Germany, Singapore, United States, and India 
# for years 1970, 1990, 2010

slice_spec = df.loc[['Germany','Singapore','United States','India']]
sliced = slice_spec[['1970','1990','2010']]
sliced

Unnamed: 0_level_0,1970,1990,2010
Country Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Germany,223.897371,227.517054,234.606908
Singapore,3096.268657,4547.958209,7231.811966
United States,22.388131,27.254514,33.817936
India,186.312757,292.817404,414.0282


---

#### Iterating

As the last task of this activity, we want to iterate over the first three countries of our dataset and print:   
- name
- country code 
- years 1970, 1990, 2010 

In [79]:
# iterating over the first three countries (row by row)

first3 = df[:3][['Country Code','1970','1990','2010']]
first3

for index,row in first3.iterrows():
    print('Name:', index,'\tCountry code:',row[0],'\tYears- 1970:',row[1],'1990:',row[2],'2010:',row[3])

Name: Aruba 	Country code: ABW 	Years- 1970: 328.138888888889 1990: 345.266666666667 2010: 564.427777777778
Name: Andorra 	Country code: AND 	Years- 1970: 51.6574468085106 1990: 115.98085106383 2010: 179.614893617021
Name: Afghanistan 	Country code: AFG 	Years- 1970: 17.034428514536 1990: 18.4841619949147 2010: 42.8303265631223


**Note:**   
Iterrows returns a Series for each row. This means that it does not preserve data types across the row.   
If you should need to preserve the dtypes of the columns, use the `itertuples()` method.