## Focusing only on [], .loc, and .iloc
#### There are many ways to select subsets of data, but in this article we will only cover the usage of the square brackets ([]), .loc and .iloc. Collectively, they are called the indexers. These are by far the most common ways to select data. A different part of this Series will discuss a few methods that can be used to make subset selections.

#### If you have a DataFrame, df, your subset selection will look something like the following:

### df[ ]
### df.loc[ ]
### df.iloc[ ]
#### A real subset selection will have something inside of the square brackets. All selections in this article will take place inside of those square brackets.

#### Notice that the square brackets also follow .loc and .iloc. All indexing in Python happens inside of these square brackets.

In [12]:
import pandas as pd
df = pd.read_csv('movie_sample_data.csv', index_col=0)
df

Unnamed: 0,state,color,food,age,height,score
Jane,NY,blue,Steak,30,165,4.6
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5
Cornelia,TX,red,Beans,69,150,2.2


In [14]:
index = df.index
columns = df.columns
values = df.values

In [15]:
index

Index(['Jane', 'Niko', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'], dtype='object')

In [16]:
columns

Index(['state', 'color', 'food', 'age', 'height', 'score'], dtype='object')

In [17]:
values

array([['NY', 'blue', 'Steak', 30, 165, 4.6],
       ['TX', 'green', 'Lamb', 2, 70, 8.3],
       ['FL', 'red', 'Mango', 12, 120, 9.0],
       ['AL', 'white', 'Apple', 4, 80, 3.3],
       ['AK', 'gray', 'Cheese', 32, 180, 1.8],
       ['TX', 'black', 'Melon', 33, 172, 9.5],
       ['TX', 'red', 'Beans', 69, 150, 2.2]], dtype=object)

In [20]:
# Selecting multiple columns with just the indexing operator
df[['state','age','height','score']]

Unnamed: 0,state,age,height,score
Jane,NY,30,165,4.6
Niko,TX,2,70,8.3
Aaron,FL,12,120,9.0
Penelope,AL,4,80,3.3
Dean,AK,32,180,1.8
Christina,TX,33,172,9.5
Cornelia,TX,69,150,2.2


## Select a single row as a Series with .loc

In [23]:
#The .loc indexer will return a single row as a Series when given a single row label. 
#Let's select the row for Niko.
df.loc['Niko']

state        TX
color     green
food       Lamb
age           2
height       70
score       8.3
Name: Niko, dtype: object

## Select multiple rows as a DataFrame with .loc

To select multiple rows, put all the row labels you want to select in a list and pass that to .loc. 
Let's select Niko and Penelope.



In [24]:
df.loc[['Niko','Penelope']]

Unnamed: 0,state,color,food,age,height,score
Niko,TX,green,Lamb,2,70,8.3
Penelope,AL,white,Apple,4,80,3.3


## Use slice notation to select a range of rows with .loc
It is possible to 'slice' the rows of a DataFrame with .loc by using slice notation. Slice notation uses a colon to separate start, stop and step values. 
For instance we can select all the rows from Niko through Dean like this:

In [30]:
df.loc['Niko':'Dean']
# .loc includes the last value with slice notation
#Notice that the row labeled with Dean was kept. 
# In other data containers such as Python lists, the last value is excluded.

Unnamed: 0,state,color,food,age,height,score
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8


## Other slices
You can use slice notation similarly to how you use it with lists. 
Let's slice from the beginning through Aaron:

In [31]:
df.loc[:'Aaron']

Unnamed: 0,state,color,food,age,height,score
Jane,NY,blue,Steak,30,165,4.6
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0


In [32]:
# Slice from Niko to Christina stepping by 2:
df.loc['Niko':'Christina':2]

Unnamed: 0,state,color,food,age,height,score
Niko,TX,green,Lamb,2,70,8.3
Penelope,AL,white,Apple,4,80,3.3
Christina,TX,black,Melon,33,172,9.5


In [33]:
df.loc['Dean':]

Unnamed: 0,state,color,food,age,height,score
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5
Cornelia,TX,red,Beans,69,150,2.2


## Selecting rows and columns simultaneously with .loc
Unlike just the indexing operator, it is possible to select rows and columns simultaneously with .loc. You do it by separating your row and column selections by a comma. It will look something like this:

df.loc[row_selection, column_selection]

## Select two rows and three columns
For instance, if we wanted to select the rows Dean and Cornelia along with the columns age, state and score we would do this:

In [35]:
df.loc[['Jane','Dean'],['age','height','score']]

Unnamed: 0,age,height,score
Jane,30,165,4.6
Dean,32,180,1.8


## Use any combination of selections for either row or columns for .loc
Row or column selections can be any of the following as we have already seen:

A single label
A list of labels
A slice with labels
We can use any of these three for either row or column selections with .loc. Let's see some examples.

Let's select two rows and a single column:

In [37]:
df.loc[['Jane','Aaron'],'food']

Jane     Steak
Aaron    Mango
Name: food, dtype: object

In [38]:
# Select a slice of rows and a list of columns:
df.loc['Jane':'Dean',['food','age']]

Unnamed: 0,food,age
Jane,Steak,30
Niko,Lamb,2
Aaron,Mango,12
Penelope,Apple,4
Dean,Cheese,32


In [40]:
# Select a single row and a single column. This returns a scalar value.
df.loc['Jane','age']

30

In [41]:
# Select a slice of rows and columns
df.loc[:'Dean','age':]

Unnamed: 0,age,height,score
Jane,30,165,4.6
Niko,2,70,8.3
Aaron,12,120,9.0
Penelope,4,80,3.3
Dean,32,180,1.8


## Selecting all of the rows and some columns
It is possible to select all of the rows by using a single colon. 
You can then select columns as normal:

In [45]:
df.loc[:,['food','color']]
#df[['food','color']]
# we can do it by both the ways

Unnamed: 0,food,color
Jane,Steak,blue
Niko,Lamb,green
Aaron,Mango,red
Penelope,Apple,white
Dean,Cheese,gray
Christina,Melon,black
Cornelia,Beans,red


In [47]:
# You can also use this notation to select all of the columns:
df.loc['Jane':'Penelope',:]

Unnamed: 0,state,color,food,age,height,score
Jane,NY,blue,Steak,30,165,4.6
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3


In [48]:
df.loc[['Jane','Aaron'],:]

Unnamed: 0,state,color,food,age,height,score
Jane,NY,blue,Steak,30,165,4.6
Aaron,FL,red,Mango,12,120,9.0


In [49]:
# But, it isn't necessary as we have seen, so you can leave out that last colon:
df.loc[['Jane','Aaron']]

Unnamed: 0,state,color,food,age,height,score
Jane,NY,blue,Steak,30,165,4.6
Aaron,FL,red,Mango,12,120,9.0


## Assign row and column selections to variables
It might be easier to assign row and column selections to variables before you use .loc. 
This is useful if you are selecting many rows or columns:

In [50]:
rows = ['Jane','Niko','Dean','Cornelia']
cols = ['color','food','age','height']

In [52]:
df.loc[rows,cols]

Unnamed: 0,color,food,age,height
Jane,blue,Steak,30,165
Niko,green,Lamb,2,70
Dean,gray,Cheese,32,180
Cornelia,red,Beans,69,150


## Summary of .loc
Only uses labels
Can select rows and columns simultaneously
Selection can be a single label, a list of labels or a slice of labels
Put a comma between row and column selections

# Getting started with .iloc
The .iloc indexer is very similar to .loc but only uses integer locations to make its selections. 
The word .iloc itself stands for integer location so that should help with remember what it does.

## Selecting a single row with .iloc
By passing a single integer to .iloc, it will select one row as a Series:

In [54]:
df.iloc[0]  # it's giving the first tuple

state        NY
color      blue
food      Steak
age          30
height      165
score       4.6
Name: Jane, dtype: object

## Selecting multiple rows with .iloc
Use a list of integers to select multiple rows:

In [55]:
df.iloc[[5,2,4]]    # df.iloc[5, 2, 4]  Error!

Unnamed: 0,state,color,food,age,height,score
Christina,TX,black,Melon,33,172,9.5
Aaron,FL,red,Mango,12,120,9.0
Dean,AK,gray,Cheese,32,180,1.8


## Use slice notation to select a range of rows with .iloc
Slice notation works just like a list in this instance and is exclusive of the last element

In [56]:
df.iloc[3:5]

Unnamed: 0,state,color,food,age,height,score
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8


## Selecting rows and columns simultaneously with .iloc
Just like with .iloc any combination of a single integer, lists of integers or slices can be used 
to select rows and columns simultaneously. Just remember to separate the selections with a comma.

Select two rows and two columns:

In [57]:
df.iloc[[2,4],[2,4]]

Unnamed: 0,food,height
Aaron,Mango,120
Dean,Cheese,180


In [58]:
# Select a slice of the rows and two columns:
df.iloc[2:5,1:4]

Unnamed: 0,color,food,age
Aaron,red,Mango,12
Penelope,white,Apple,4
Dean,gray,Cheese,32


In [60]:
# Select a single row and column
df.iloc[2,2]

'Mango'

In [61]:
# Select all the rows and a single column
df.iloc[:,2]

Jane          Steak
Niko           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

# Deprecation of .ix
Early in the development of pandas, there existed another indexer, ix. This indexer was capable of selecting both by label and by integer location. While it was versatile, it caused lots of confusion because it's not explicit. Sometimes integers can also be labels for rows or columns. Thus there were instances where it was ambiguous.

You can still call .ix, but it has been deprecated, so please never use it.

# Selecting subsets of Series
Typically, you will create a Series by selecting a single column from a DataFrame. Let's select the food column:

In [67]:
food = df['food']
print(type(food))
food

<class 'pandas.core.series.Series'>


Jane          Steak
Niko           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

# Series selection with .loc
Series selection with .loc is quite simple, since we are only dealing with a single dimension. You can again use a single row label, a list of row labels or a slice of row labels to make your selection. Let's see several examples.

Let's select a single value:

In [74]:
food.loc['Aaron']

'Mango'

In [76]:
# Select three different values. This returns a Series:
food.loc[['Dean','Niko','Christina']]

Dean         Cheese
Niko           Lamb
Christina     Melon
Name: food, dtype: object

In [70]:
# Slice from Niko to Christina - is inclusive of last index
food.loc['Niko':'Christina']

Niko           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Name: food, dtype: object

In [71]:
# Slice from Penelope to the end:
food.loc['Penelope':]

Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

In [72]:
# Select a single value in a list which returns a Series
food.loc[['Aaron']]

Aaron    Mango
Name: food, dtype: object

# Series selection with .iloc
Series subset selection with .iloc happens similarly to .loc except it uses integer location. You can use a single integer, a list of integers or a slice of integers. Let's see some examples.

Select a single value:

In [78]:
food.iloc[0]

'Steak'

In [81]:
# Use a list of integers to select multiple values:
food.iloc[[4,1,3]]

Dean        Cheese
Niko          Lamb
Penelope     Apple
Name: food, dtype: object

In [83]:
# Use a slice - is exclusive of last integer
food.iloc[4:6]

Dean         Cheese
Christina     Melon
Name: food, dtype: object

## Using just the indexing operator to select rows from a DataFrame - Confusing!
Above, I used just the indexing operator to select a column or columns from a DataFrame. But, it can also be used to select rows using a slice. This behavior is very confusing in my opinion. The entire operation changes completely when a slice is passed.

Let's use an integer slice as our first example:



In [84]:
df[3:6]

Unnamed: 0,state,color,food,age,height,score
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5


In [85]:
df['Aaron':'Christina']

Unnamed: 0,state,color,food,age,height,score
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5


# I recommend not doing this!
This feature is not deprecated and completely up to you whether you wish to use it. But, I highly prefer not to select rows in this manner as can be ambiguous, especially if you have integers in your index.

Using .iloc and .loc is explicit and clearly tells the person reading the code what is going to happen. 
Let's rewrite the above using .iloc and .loc.

In [86]:
df.iloc[3:6]

Unnamed: 0,state,color,food,age,height,score
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5


In [87]:
df.loc['Aaron':'Christina']

Unnamed: 0,state,color,food,age,height,score
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5


## Cannot simultaneously select rows and columns with []
An exception will be raised if you try and select rows and columns simultaneously with just the indexing operator. You must use .loc or .iloc to do so.

In [93]:
# df[3:6, 'Aaron':'Christina']  it gives error

# Using just the indexing operator to select rows from a Series !
You can also use just the indexing operator with a Series. Again, this is confusing because it can accept integers or labels. Let's see some examples

### Again, I recommend against doing this and always use .iloc or .loc

In [94]:
food

Jane          Steak
Niko           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

In [95]:
food[2:4]

Aaron       Mango
Penelope    Apple
Name: food, dtype: object

In [96]:
food['Niko':'Dean']

Niko          Lamb
Aaron        Mango
Penelope     Apple
Dean        Cheese
Name: food, dtype: object

In [97]:
food['Dean']

'Cheese'

In [98]:
food[['Dean','Aaron','Niko','Penelope']]

Dean        Cheese
Aaron        Mango
Niko          Lamb
Penelope     Apple
Name: food, dtype: object

## Summary of Part 1
### We covered an incredible amount of ground. Let's summarize all the main points:

Before learning pandas, ensure you have the fundamentals of Python

Always refer to the documentation when learning new pandas operations

The DataFrame and the Series are the containers of data

A DataFrame is two-dimensional, tabular data

A Series is a single dimension of data

The three components of a DataFrame are the index, the columns and the data (or values)

Each row and column of the DataFrame is referenced by both a label and an integer location

There are three primary ways to select subsets from a DataFrame - [], .loc and .iloc

I use the term just the indexing operator to refer to [] immediately following a DataFrame/Series

Just the indexing operator's primary purpose is to select a column or columns from a DataFrame

Using a single column name to just the indexing operator returns a single column of data as a Series

Passing multiple columns in a list to just the indexing operator returns a DataFrame

A Series has two components, the index and the data (values). It has no columns

.loc makes selections only by label

.loc can simultaneously select rows and columns

.loc can make selections with either a single label, a list of labels, or a slice of labels

.loc makes row selections first followed by column selections: df.loc[row_selection, col_selection]

.iloc is analogous to .loc but uses only integer location to refer to rows or columns.

.ix is deprecated and should never be used

.loc and .iloc work the same for Series except they only select based on the index as their are no columns

Pandas combines the power of python lists (selection via integer location) and dictionaries (selection by label)

You can use just the indexing operator to select rows from a DataFrame, but I recommend against this and instead 
sticking with the explicit .loc and .iloc

Normally data is imported without setting an index. Use the set_index method to use a column as an index.

You can select a single column as a Series from a DataFrame with dot notation

In [None]:
# Check exercise in the next part