## Subsetting with the `.iloc` indexer

The `.iloc` indexer allows you to do positional indexing, whereas `.loc` required that you do named indexing. The fact that we could provide integer positions to the `.loc` indexer previously was a result of the fact that the row index values of the `gapminder` DataFrame were themselves integer values.

Let's take a quick look at the gapminder dataset (notice that the row index consists of integers):

In [1]:
# import pandas
import pandas as pd
# load the gapminder dataset
gapminder = pd.read_csv('data/gapminder.csv')
# take a look at the head of gapminder
gapminder.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.85303
2,Afghanistan,Asia,1962,31.997,10267083,853.10071
3,Afghanistan,Asia,1967,34.02,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106


As a reminder, the following code uses the `.loc` indexer to select the rows with index 1, 4, 5, and the columns with index `'year'`, `'lifeExp'`, and `'pop'`:

In [2]:
# use .loc[,] to subset row with index 1, 4, 5 and columns `year', 'lifeExp', 'pop' from gapminder
gapminder.loc[[1, 4, 5],['year', 'lifeExp', 'pop']]

Unnamed: 0,year,lifeExp,pop
1,1957,30.332,9240934
4,1972,36.088,13079460
5,1977,38.438,14880372


To emphasize that the integer row index positions above are actually index names rather than index positions, let's we define the `gapminder_country` DataFrame that has the `country` column as the row index:

In [3]:
# create gapminder_country dataframe with country as index
gapminder_country = gapminder.set_index('country')
# look at the head of gapminder_country
gapminder_country.head(10)

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,Asia,1952,28.801,8425333,779.445314
Afghanistan,Asia,1957,30.332,9240934,820.85303
Afghanistan,Asia,1962,31.997,10267083,853.10071
Afghanistan,Asia,1967,34.02,11537966,836.197138
Afghanistan,Asia,1972,36.088,13079460,739.981106
Afghanistan,Asia,1977,38.438,14880372,786.11336
Afghanistan,Asia,1982,39.854,12881816,978.011439
Afghanistan,Asia,1987,40.822,13867957,852.395945
Afghanistan,Asia,1992,41.674,16317921,649.341395
Afghanistan,Asia,1997,41.763,22227415,635.341351


### Positional indexing with `.iloc`

Now let's try and extract the rows 1, 4, and 5 and the columns `'year'`, `'lifeExp'`, and `'pop'` from this country-indexed version of gapminder:

In [4]:
# Try to use .loc[,] to subset row with index 1, 4, 5 and columns `year', 'lifeExp', 'pop' from gapminder_country
gapminder_country.loc[[1, 4, 5],['year', 'lifeExp', 'pop']]

KeyError: "None of [Index([1, 4, 5], dtype='object', name='country')] are in the [index]"

We get an error, because there are no longer any rows with row index names 1, 4, 5.


If you want to do positional indexing, this is where the `.iloc` indexer comes in.

The following code will subset the country-indexed gapminder dataset to just the 2nd positional row and the 2nd positional column (remember that Python is 0-indexed, so this is actually the third row and column). Recall also that the country column is now the index rather than a column.

In [6]:
# Use .iloc to extract the 3rd row and 3rd column (i.e., index position of 2) of gapminder_country using positional indexing
gapminder_country.iloc[2, 2]

31.997

You can subset to multiple rows and column by providing a list of row/column positions to the corresponding entry of the `.iloc` indexer. The code below will subset to the 2nd, 5th, and 7th row positions and the first and the third column positions (remember that Python is 0-indexed though!!!)

In [7]:
# Use .iloc to extract rows in position 2, 5, 7, and the columns in position 1 and 3
gapminder_country.iloc[[2, 5, 7], [1, 3]]

Unnamed: 0_level_0,year,pop
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Afghanistan,1962,10267083
Afghanistan,1977,14880372
Afghanistan,1987,13867957


### Using `:` to mean all rows/columns

We can also use the `:` placeholder for "all rows" or "all columns". The code below will select the 0th, 3rd, and 5th columns and *all* rows:

In [8]:
# Use .iloc to extract all rows for columns 0, 3, 5
gapminder.iloc[:, [0, 3, 5]]

Unnamed: 0,country,lifeExp,gdpPercap
0,Afghanistan,28.801,779.445314
1,Afghanistan,30.332,820.853030
2,Afghanistan,31.997,853.100710
3,Afghanistan,34.020,836.197138
4,Afghanistan,36.088,739.981106
...,...,...,...
1699,Zimbabwe,62.351,706.157306
1700,Zimbabwe,60.377,693.420786
1701,Zimbabwe,46.809,792.449960
1702,Zimbabwe,39.989,672.038623


### More general sequences with `start:stop:step`

And finally, you can extract more general sequences of rows/columns using the `start:stop` sequencing syntax. `0:20` will correspond to a list of integers from 0 up to 20 (not inclusive -- so it will actually go up to 19). 

`start:stop:step`, e.g., `0:20:2`, will similarly correspond to a list of integers from 0 up to 20 (non-inclusive) with a step size of 2, so 0, 2, 4, 6, ..., 18. 

So the following code will extract every second row up to the 19th positional index, and every column up to the third positional index (which is actually the fourth column):

In [9]:
# Use .iloc to extract rows 0, 2, 4, 6, ..., 18 (inclusive) for columns 0 to 3 (inclusive)
# Hint: start:stop:step
gapminder.iloc[0:20:2, 0:4]

Unnamed: 0,country,continent,year,lifeExp
0,Afghanistan,Asia,1952,28.801
2,Afghanistan,Asia,1962,31.997
4,Afghanistan,Asia,1972,36.088
6,Afghanistan,Asia,1982,39.854
8,Afghanistan,Asia,1992,41.674
10,Afghanistan,Asia,2002,42.129
12,Albania,Europe,1952,55.23
14,Albania,Europe,1962,64.82
16,Albania,Europe,1972,67.69
18,Albania,Europe,1982,70.42


### Exercise

Use `iloc` to extract every third row starting at index position 2 up to position 100, and the first two columns. 

In [10]:
# start:stop:step
gapminder.iloc[2:100:3, 0:2]

Unnamed: 0,country,continent
2,Afghanistan,Asia
5,Afghanistan,Asia
8,Afghanistan,Asia
11,Afghanistan,Asia
14,Albania,Europe
17,Albania,Europe
20,Albania,Europe
23,Albania,Europe
26,Algeria,Africa
29,Algeria,Africa
