# Pandas Essentials:  Selecting Subsets of Data, Part II

This Pandas Notebook explores additional options for selecting subsets of data.  Concepts are illustrated with the [New York City pizza restaurant inspection data](https://github.com/ecerami/pydata-essentials/blob/master/pandas/data/NYC_Pizza_2017.csv).   

Topics include:

* Selecting rows and columns with `iloc`, `loc` and `ix`.

Along the way, we explore two detours, necessary for better understanding `iloc`, `loc` and `ix`.  These are:

* Detour 1:  Selecting Slices of Arrays
* Detour 2:  Resetting a Data Frame Index

In [2]:
# To get started, we load the NYC Pizza Restaurant Inspection Data Set
import pandas as pd
pizza_df = pd.read_csv("data/NYC_Pizza_2017.csv")

# Set max display columns and rows (for more compact view)
pd.options.display.max_columns = 4
pd.options.display.max_rows = 6

## Detour 1:  Selecting Slices of Arrays

`iloc`, `loc` and `ix` use the Numpy notation for selecting slices of data.  Here, we illustrate the basics.

In [4]:
# Create a Numpy array, consisting of 10 random integer values.
import numpy as np
np.random.seed(100)
x = np.random.randint(0,100,10)
print (x)

[ 8 24 67 87 79 48 10 94 52 98]


In [5]:
# Get the Zeroeth element
x[0]

8

In [6]:
# Get the Last element
x[-1]

98

In [7]:
# Select a Slice --> selects 0..5
x[0:6]

array([ 8, 24, 67, 87, 79, 48])

In [8]:
# Select a Slice with a Step --> selects 0,2,4
x[0:6:2]

array([ 8, 67, 79])

In [9]:
# Omitting Index Example #1 --> selects 0..5
x[:6]

array([ 8, 24, 67, 87, 79, 48])

In [10]:
# Omitting Index Example #2 --> selects 6..9
x[6:]

array([10, 94, 52, 98])

In [11]:
# Omitting Index Example #3 --> selects the entire array
x[:]

array([ 8, 24, 67, 87, 79, 48, 10, 94, 52, 98])

## Detour 2:  Resetting the Data Frame Index

To better illustrate the differences between `iloc`, `loc` and `ix`, we set the `CAMIS` identifier as the data frame index and change `CAMIS` from an integer value to a string value.

In [12]:
pizza_df["CAMIS"] = pizza_df["CAMIS"].astype(str)
pizza_df.set_index("CAMIS", drop=True, inplace=True)
pizza_df.head()

Unnamed: 0_level_0,DBA,BORO,...,GRADE,GRADE DATE
CAMIS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
40363644,DOMINO'S,MANHATTAN,...,A,2017-03-30
40363945,DOMINO'S,MANHATTAN,...,A,2017-03-02
40364920,RIZZO'S FINE PIZZA,QUEENS,...,A,2016-11-03
40365280,COMO PIZZA,MANHATTAN,...,A,2016-08-29
40365632,J&V FAMOUS PIZZA,BROOKLYN,...,A,2017-04-05


## Selecting Rows and Columns with `iloc`

* `iloc` enables you to select rows or columns, based on integer index values.

In [13]:
# Select a Single Field at the Specified Coordinates
pizza_df.iloc[0,0]

"DOMINO'S"

In [14]:
# Select a Slice of Rows
pizza_df.iloc[0:2]

Unnamed: 0_level_0,DBA,BORO,...,GRADE,GRADE DATE
CAMIS,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
40363644,DOMINO'S,MANHATTAN,...,A,2017-03-30
40363945,DOMINO'S,MANHATTAN,...,A,2017-03-02


In [15]:
# Select a Slice of Rows and Columns
pizza_df.iloc[0:2,7:9]

Unnamed: 0_level_0,GRADE,GRADE DATE
CAMIS,Unnamed: 1_level_1,Unnamed: 2_level_1
40363644,A,2017-03-30
40363945,A,2017-03-02


## Selecting Rows and Columns with `loc`

* `loc` enables you to select rows or columns, based on “labels”, e.g. index or column labels.

In [16]:
# Select a Single Row by Index Value
pizza_df.loc["40363644"]

DBA             DOMINO'S
BORO           MANHATTAN
BUILDING             464
                 ...    
SCORE                  4
GRADE                  A
GRADE DATE    2017-03-30
Name: 40363644, dtype: object

In [17]:
# Select a Single Column by Column Name
pizza_df.loc[:,"GRADE"]

CAMIS
40363644    A
40363945    A
40364920    A
           ..
50060439    A
50060695    Z
50062741    A
Name: GRADE, dtype: object

In [18]:
# Select a Slice of Rows and Columns
pizza_df.loc[["40363644","40365280"],["GRADE", "SCORE"]]

Unnamed: 0_level_0,GRADE,SCORE
CAMIS,Unnamed: 1_level_1,Unnamed: 2_level_1
40363644,A,4.0
40365280,A,10.0


## Selecting Rows and Columns with `ix`

*  `ix` is a hybrid between `loc` and `iloc`, and you can use it to select rows or columns based on integer *or* label values.

In [19]:
# Select rows 0:5, two columns only
pizza_df.ix[0:5,["GRADE", "SCORE"]]

Unnamed: 0_level_0,GRADE,SCORE
CAMIS,Unnamed: 1_level_1,Unnamed: 2_level_1
40363644,A,4.0
40363945,A,12.0
40364920,A,12.0
40365280,A,10.0
40365632,A,2.0
