# Ch 1, Ep. 6: Pandas
## Selectors

### Selecting with .loc and .iloc

Aside from using double brackets `[][]` to access values, DataFrame provides `.loc()` and `'iloc()` methods to select values with row labels (index) or position respectively.

Here's some examples of using `.loc()`

In [1]:
import pandas as pd

# set data directory and input csv
data_dir = '../../../data/input/ch1/'
airports_file = data_dir + 'airports.csv'

# read data fom csv
airports = pd.read_csv(airports_file, header=0)

# select single row by index
airports.loc[0]

# select multiple rows with slices
airports.loc[[0, 5, 7, 10]]
airports.loc[0:3]

# select multiple rows and columns by index
airports.loc[0:3,['airport', 'city', 'state']]

Unnamed: 0,airport,city,state
0,Thigpen,Bay Springs,MS
1,Livingston Municipal,Livingston,TX
2,Meadow Lake,Colorado Springs,CO
3,Perry-Warsaw,Perry,NY


:::info `.loc[[rows],[columns]]`

Using `.loc` the first bracket selects rows and the second bracket select column. This is the reverse order of using double brackets.

:::

`.iloc[]` works the same way, but instead of labels (index) you can select by row and colunm position numbers. In this case, since our flight records have a RangeIndex the row indexes are the **same** as labels:

In [3]:
# select first row
airports.iloc[0]

# select multiple rows with slices
airports.iloc[[0, 5, 7, 10]]
airports.iloc[0:3]

# select multiple rows and columns by position
airports.iloc[0:3,[0, 2, 1]]

Unnamed: 0,iata,city,airport
0,00M,Bay Springs,Thigpen
1,00R,Livingston,Livingston Municipal
2,00V,Colorado Springs,Meadow Lake


:::info Mixing `.loc` and `iloc`

You can always mix using `.loc` and `iloc` together:

:::


In [4]:
# mixing loc and iloc
# select rows 5-10 and few columns
airports.iloc[5:10].loc[:, ['airport', 'city', 'state']]

Unnamed: 0,airport,city,state
5,Tishomingo County,Belmont,MS
6,Gragg-Wade,Clanton,AL
7,Capitol,Brookfield,WI
8,Columbiana County,East Liverpool,OH
9,Memphis Memorial,Memphis,MO


### Conditional Selections

You can specify criterias for selecting values within the Dataframe:

In [9]:
# select delta airline airports
airports.loc[airports.state == 'NY']
# same as above
airports.loc[airports['state'] == 'NY']

# airports where city is not null
airports.loc[airports.city.notna()]
# or where city is null
airports.loc[airports.city.isna()]

# select airports in New York city, NY
airports.loc[(airports.state == 'NY') & (airports.city == 'New York')]

# apply multiple conditions::
# select New York City or Alaska airports
airports.loc[(airports.city == 'New York') | (airports.state == 'AK')]


Unnamed: 0,iata,airport,city,state,country,lat,lon
37,0AK,Pilot Station,Pilot Station,AK,USA,61.933964,-162.892936
115,15Z,McCarthy 2,McCarthy,AK,USA,61.437061,-142.903737
116,16A,Nunapitchuk,Nunapitchuk,AK,USA,60.905828,-162.439116
125,17Z,Manokotak,Manokotak,AK,USA,58.988966,-159.049974
131,19P,Port Protection SPB,Port Protection,AK,USA,56.328804,-133.610084
...,...,...,...,...,...,...,...
3365,Z40,Goose Bay,Goose Bay,AK,USA,61.394451,-149.845556
3366,Z55,Lake Louise,Lake Louise,AK,USA,62.293689,-146.579422
3367,Z73,Nelson Lagoon,Nelson Lagoon,AK,USA,56.007536,-161.160367
3368,Z84,Clear,Clear A.F.B.,AK,USA,64.301204,-149.120144


:::tip Handy selection methods

Pandas has special selections method for almsot everything. Remember them and use the rigolously. Methods such as `.isin()`, `.isna()`, and `.notna()`. See examples above.

:::

### Using query() method

If you are more familiar with SQL syntax, you can use the pandas `.query()` method:


In [10]:
# select airports from New York city, NY
airports.query("(city == 'New York') & (state == 'NY')")

Unnamed: 0,iata,airport,city,state,country,lat,lon
589,6N5,E 34th St Heliport,New York,NY,USA,40.742602,-73.972083
590,6N7,New York Skyports Inc. SPB,New York,NY,USA,40.733991,-73.972916
1915,JFK,John F Kennedy Intl,New York,NY,USA,40.639751,-73.778926
1929,JRA,Port Authority-W 30th St Midtown Heliport,New York,NY,USA,40.754546,-74.007084
1930,JRB,Downtown Manhattan/Wall St. Heliport,New York,NY,USA,40.701214,-74.009028
2061,LGA,LaGuardia,New York,NY,USA,40.777243,-73.872609


### Subselections

You can always save a selection and further subselect within a set by assigning your selections into a variable:

In [12]:
# select airports from PDX
ny_airports = airports.loc[airports.state == 'NY']
# find long distance airports
nyc_airports = ny_airports.query("city == 'New York'")
nyc_airports

Unnamed: 0,iata,airport,city,state,country,lat,lon
589,6N5,E 34th St Heliport,New York,NY,USA,40.742602,-73.972083
590,6N7,New York Skyports Inc. SPB,New York,NY,USA,40.733991,-73.972916
1915,JFK,John F Kennedy Intl,New York,NY,USA,40.639751,-73.778926
1929,JRA,Port Authority-W 30th St Midtown Heliport,New York,NY,USA,40.754546,-74.007084
1930,JRB,Downtown Manhattan/Wall St. Heliport,New York,NY,USA,40.701214,-74.009028
2061,LGA,LaGuardia,New York,NY,USA,40.777243,-73.872609


### Further Reading
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html