<a href="https://colab.research.google.com/github/cnrgrl/PANDAS/blob/main/03_Selecting_Subsets_of_Data_from_DataFrames_with_iloc_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# uncomment the following line, if you are using google collab
!rm -r Pandas
!git clone https://github.com/Wuebbelt/Pandas.git

rm: cannot remove 'Pandas': No such file or directory
Cloning into 'Pandas'...
remote: Enumerating objects: 77, done.[K
remote: Counting objects: 100% (77/77), done.[K
remote: Compressing objects: 100% (66/66), done.[K
remote: Total 77 (delta 12), reused 75 (delta 10), pack-reused 0[K
Unpacking objects: 100% (77/77), done.


# Selecting Subsets of Data from DataFrames with `iloc`

The `iloc` indexer is very similar to the `loc` indexer but only uses **integer location** to make its subset selections. The word `iloc` itself stands for integer location so that should help remind you what it does.

## Simultaneous row and column subset selection

The `iloc` indexer is capable of making simultaneous row and column selections just like `loc`. Selection with `iloc` takes on the following form with a comma separating the row and column selections.

```python
df.iloc[rows, cols]
```

Let's read in some sample data and then begin making selections with integer location using `iloc`.

In [None]:
import pandas as pd
df = pd.read_csv('Pandas/sample_data.csv', index_col=0)
df

Unnamed: 0_level_0,state,color,food,age,height,score
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Jane,NY,blue,Steak,30,165,4.6
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5
Cornelia,TX,red,Beans,69,150,2.2


### Use a list for both rows and columns

Let's select rows with integer location 2 and 4 along with the first and last columns. It is possible to use negative integers in the same manner as Python lists. The integer location -1 refers to the last column below.

In [None]:
rows = [2, 4]
cols = [0, -1]
df.iloc[rows, cols]

Unnamed: 0_level_0,state,score
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Aaron,FL,9.0
Dean,AK,1.8


### The possible types of selections for `iloc`

In the above example, we used a list of integers for both the row and column selection. You are not limited to just lists. All of the following are valid objects available for both row and column selections with `iloc`.  The `iloc` indexer, unlike `loc`, is unable to do boolean selection. 

* A single integer
* A list of integers
* A slice with integers

### Slice the rows and use a list for the columns

Let's use slice notation to select rows with integer location 2 and 3 and a list to select columns with integer location 4 and 2. Notice that the stop integer location is **excluded** with `iloc`, which is exactly how slicing works with Python lists, tuples, and strings. Slicing with `loc` is **inclusive** of the stop label.

In [None]:
cols = [4, 2]
df.iloc[2:4, cols]

Unnamed: 0_level_0,height,food
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Aaron,120,Mango
Penelope,80,Apple


### Use a list for the rows and a slice for the columns

In this example, we use a list for the row selection and slice notation for the columns.

In [None]:
rows = [5, 2, 4]
df.iloc[rows, 3:]

Unnamed: 0_level_0,age,height,score
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Christina,33,172,9.5
Aaron,12,120,9.0
Dean,32,180,1.8


### Select all of the rows and some of the columns

You can use an empty slice (just the colon) to select all of the rows or columns. In the example below, we select all of the rows and some of the columns with a list.

In [None]:
cols = [2, 4]
df.iloc[:, cols]

Unnamed: 0_level_0,food,height
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Jane,Steak,165
Niko,Lamb,70
Aaron,Mango,120
Penelope,Apple,80
Dean,Cheese,180
Christina,Melon,172
Cornelia,Beans,150


### Cannot do this with *just the brackets*
*Just the brackets* does select columns, but it only understands **labels** and not **integer location**. The following produces an error as pandas is looking for column names that are the integers 2 and 4.

In [None]:
df[cols]

KeyError: ignored

### Select some of the rows and all of the columns

We can again use an empty slice, but do so for the columns to select all of them. Below we use a list to select some of the rows.

In [None]:
rows = [-3, -1, -2]
df.iloc[rows, :]

Unnamed: 0_level_0,state,color,food,age,height,score
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dean,AK,gray,Cheese,32,180,1.8
Cornelia,TX,red,Beans,69,150,2.2
Christina,TX,black,Melon,33,172,9.5


It is possible to rewrite the above without the column selection. pandas defaults to selecting all of the columns if a selection for them is not explicitly present.

In [None]:
df.iloc[rows]

Unnamed: 0_level_0,state,color,food,age,height,score
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dean,AK,gray,Cheese,32,180,1.8
Cornelia,TX,red,Beans,69,150,2.2
Christina,TX,black,Melon,33,172,9.5


### Select a single row and a single column

We can select a single value in our DataFrame using `iloc` by providing a single integer for both the row and column selection. This returns the actual value by itself completely outside of a DataFrame or Series.

In [None]:
df.iloc[3, 2]

'Apple'

### Select a single row and a single column as a DataFrame

It is possible to select the above value as a DataFrame by using one-item lists for the row and column selections. The output looks a little bizarre, but it's just a DataFrame with one row and one column.

In [None]:
rows = [3]
cols = [2]
df.iloc[rows, cols]

Unnamed: 0_level_0,food
name,Unnamed: 1_level_1
Penelope,Apple


### Select some rows and a single column

In this example, a list of integers is used for the rows and a single integer for the columns. pandas returns a Series when a single integer is used to select either a row or column.

In [None]:
rows = [2, 3, 5]
cols = 4
df.iloc[rows, cols]

name
Aaron        120
Penelope      80
Christina    172
Name: height, dtype: int64

### Select a single row or column as a DataFrame and NOT a Series
You can select a single row (or column) and return a DataFrame and not a Series if you use a list to make the selection. Let's replicate the selection from the previous example, but use a one-item list for the column selection.

In [None]:
rows = [2, 3, 5]
cols = [4]
df.iloc[rows, cols]

Unnamed: 0_level_0,height
name,Unnamed: 1_level_1
Aaron,120
Penelope,80
Christina,172


### Select a single row as a Series

We can select a single row by providing a single integer as the row selection for `iloc`. We use an empty slice to select all of the columns. Because we are selecting a single row, a Series is returned. Just as with `loc`, the returned output can be confusing as the original horizontal row is now displayed vertically.

In [None]:
df.iloc[2, :]

state        FL
color       red
food      Mango
age          12
height      120
score         9
Name: Aaron, dtype: object

To maintain the original orientation, we can select the row using a one-item list which returns a DataFrame.

In [None]:
df.iloc[[2], :]

Unnamed: 0_level_0,state,color,food,age,height,score
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aaron,FL,red,Mango,12,120,9.0


## Summary of `iloc`

The `iloc` indexer is analogous to `loc` but only uses **integer location** for selection. The official pandas documentation refers to it as selection by **position**.

* Uses only integer location
* Selects rows and columns simultaneously with `df.iloc[rows, cols]`
* Selection can be a 
    * single integer
    * a list of integers
    * a slice of integers
* A comma separates row and column selections

## Exericses

Read in the movie dataset by executing the cell below and use it for the following exercises.

In [None]:
pd.set_option('display.max_columns', 50)
movie = pd.read_csv('Pandas/movie.csv', index_col='title')
movie.head(3)

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,Wes Studi,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,Jack Davenport,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,Stephanie Sigman,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8


### Exercise 1

<span  style="color:green; font-size:16px">Select the columns with integer location 10, 5, and 1.</span>

### Exercise 2

<span  style="color:green; font-size:16px">Select the rows with integer location 10, 5, and 1.</span>

### Exercise 3

<span  style="color:green; font-size:16px">Select rows with integer location 100 to 104 along with the column integer location 5.</span>

### Exercise 4

<span  style="color:green; font-size:16px">Select the value at row integer location 100 and column integer location 4.</span>

### Exercise 5

<span  style="color:green; font-size:16px">Return the result of exercise 4 as a DataFrame.</span>

### Exercise 6

<span  style="color:green; font-size:16px">Select the last 5 rows of the last 5 columns.</span>

### Exercise 7

<span  style="color:green; font-size:16px">Select every 25th row between rows with integer location 100 and 20 and every fifth column.</span>

### Exercise 8

<span  style="color:green; font-size:16px">Select the column with integer location 7 as a Series.</span>

### Exercise 9

<span  style="color:green; font-size:16px">Select the rows with integer location 999, 99, and 9 and the columns with integer location 9 and 19.</span>