## Selectors

### Selecting with .loc and .iloc

Aside from using double brackets `[][]` to access values, DataFrame provides `.loc[]` and `'iloc[]` mthods to select values with row labels (index) or position respectively.

Here's some examples of using `.loc()`

In [1]:
import pandas as pd

# read data fom csv
flights = pd.read_csv('../data/flights.csv', header=0)

# select single row by index
flights.loc[0]

# select multiple rows with slices
flights.loc[[0, 5, 7, 10]]
flights.loc[0:3]

# select multiple rows and columns by index
flights.loc[0:3,['airline', 'src', 'dest']]

Unnamed: 0,airline,src,dest
0,9E,CHA,DTW
1,9E,JAX,RDU
2,9E,RDU,LGA
3,9E,DTW,ATW


:::info `.loc[[rows],[columns]]`

Using `.loc` the first bracket selects rows and the second bracket select column. This is the reverse order of using double brackets.

:::

`.iloc[]` works the same way, but instead of labels (index) you can select by row and colunm position numbers. In this case, since our flight records have a RangeIndex the row indexes are the **same** as labels:

In [2]:
# select first row
flights.iloc[0]

# select multiple rows with slices
flights.iloc[[0, 5, 7, 10]]
flights.iloc[0:3]

# select multiple rows and columns by position
flights.iloc[0:3,[0, 2, 4]]

Unnamed: 0,flight_date,tailnumber,src
0,2019-11-28,N8974C,CHA
1,2019-11-28,N901XJ,JAX
2,2019-11-28,N901XJ,RDU


:::info Mixing `.loc` and `iloc`

You can always mix using `.loc` and `iloc` together:

:::


In [3]:
# mixing loc and iloc
# select rows 5-10 and few columns
flights.iloc[5:10].loc[:, ['flight_number', 'src', 'dest']]

Unnamed: 0,flight_number,src,dest
5,3285,LGA,PWM
6,3286,CLE,DTW
7,3288,DTW,LAN
8,3288,LAN,DTW
9,3289,JFK,ROC


### Conditional Selections

You can specify criterias for selecting values within the Dataframe:

In [4]:
# select delta airline flights
flights.loc[flights.airline == 'DL']
# same as above
flights.loc[flights['airline'] == 'DL']

# flights where distance is not null
flights.loc[flights.distance.notna()]
# or where distance is null
flights.loc[flights.distance.isna()]

# select flights out of PDX over 500 miles
flights.loc[(flights.src == 'PDX') & (flights.distance > 500.0)]

# apply multiple conditions::
# select delta or alaska flights
flights.loc[(flights.airline == 'DL') | (flights.airline == 'AS')]
# select delta airlines flights from LAX-JFK
flights.loc[(flights.airline == 'DL') & (flights.src == 'LAX') & (flights.dest == 'JFK')]

# select delta and alaska flights from LAX-JSK
a = flights.loc[(flights.airline.isin(['DL', 'AS'])) & 
            (flights.src == 'LAX') & (flights.dest == 'JFK')]

with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', 500, 'display.width', 500):
    print(a)

     flight_date airline tailnumber  flight_number  src dest  departure_time  arrival_time  flight_time  distance
2006  2019-11-28      AS     N238AK            410  LAX  JFK             700          1530        330.0    2475.0
2024  2019-11-28      AS     N282AK            452  LAX  JFK            1040          1909        329.0    2475.0
2028  2019-11-28      AS     N461AS            460  LAX  JFK            2325           748        323.0    2475.0
2035  2019-11-28      AS     N266AK            470  LAX  JFK            2035           500        325.0    2475.0
3395  2019-11-28      DL     N177DN           1436  LAX  JFK             605          1425        320.0    2475.0
3815  2019-11-28      DL     N179DN           2164  LAX  JFK             915          1742        327.0    2475.0
4174  2019-11-28      DL     N195DN           2815  LAX  JFK            2100           516        316.0    2475.0
4521  2019-11-28      DL     N183DN            816  LAX  JFK            1115          19

:::tip Handy selection methods

Pandas has special selections method for almsot everything. Remember them and use the rigolously. Methods such as `.isin()`, `.isna()`, and `.notna()`. See examples above.

:::

### Using query() method

If you are more familiar with SQL syntax, you can use the pandas `.query()` method:


In [5]:
# select flights from PDX over 500 miles
flights.query("(src == 'PDX') & (distance > 500.0)")

Unnamed: 0,flight_date,airline,tailnumber,flight_number,src,dest,departure_time,arrival_time,flight_time,distance
524,2019-11-28,AA,N832NN,1402,PDX,PHX,820,1153,153.0,1009.0
1012,2019-11-28,AA,N939AN,2298,PDX,ORD,627,1229,242.0,1739.0
1155,2019-11-28,AA,N992AU,2577,PDX,DFW,600,1141,221.0,1616.0
1201,2019-11-28,AA,N971UY,2658,PDX,PHX,500,834,154.0,1009.0
1692,2019-11-28,AS,N628VA,1042,PDX,BUR,720,930,130.0,817.0
...,...,...,...,...,...,...,...,...,...,...
11430,2019-11-28,WN,N252WN,5558,PDX,SJC,1330,1515,105.0,569.0
11585,2019-11-28,WN,N7829B,5731,PDX,OAK,535,720,105.0,543.0
11718,2019-11-28,WN,N915WN,610,PDX,SJC,1110,1255,105.0,569.0
11759,2019-11-28,WN,N7740A,668,PDX,ONT,1050,1300,130.0,838.0


### Subselections

You can always save a selection and further subselect within a set by assigning your selections into a variable:

In [6]:
# select flights from PDX
pdx_flights = flights.loc[flights.src == 'PDX']
# find long distance flights
pdx_long_distance = pdx_flights.query("distance > 500.0")
pdx_long_distance

Unnamed: 0,flight_date,airline,tailnumber,flight_number,src,dest,departure_time,arrival_time,flight_time,distance
524,2019-11-28,AA,N832NN,1402,PDX,PHX,820,1153,153.0,1009.0
1012,2019-11-28,AA,N939AN,2298,PDX,ORD,627,1229,242.0,1739.0
1155,2019-11-28,AA,N992AU,2577,PDX,DFW,600,1141,221.0,1616.0
1201,2019-11-28,AA,N971UY,2658,PDX,PHX,500,834,154.0,1009.0
1692,2019-11-28,AS,N628VA,1042,PDX,BUR,720,930,130.0,817.0
...,...,...,...,...,...,...,...,...,...,...
11430,2019-11-28,WN,N252WN,5558,PDX,SJC,1330,1515,105.0,569.0
11585,2019-11-28,WN,N7829B,5731,PDX,OAK,535,720,105.0,543.0
11718,2019-11-28,WN,N915WN,610,PDX,SJC,1110,1255,105.0,569.0
11759,2019-11-28,WN,N7740A,668,PDX,ONT,1050,1300,130.0,838.0
