# Clean airports

Goal:

To narrow down airports to international USA airport (including territories)
- I assume that the airports data set contains all such airports

To extract state
- for easy joining

In [1]:
import pandas as pd

In [6]:
airports = pd.read_csv('../../airport-codes_csv.csv')

In [16]:
airports

Unnamed: 0,ident,type,name,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code,coordinates
0,00A,heliport,Total Rf Heliport,11.0,,US,US-PA,Bensalem,00A,,00A,"-74.93360137939453, 40.07080078125"
1,00AA,small_airport,Aero B Ranch Airport,3435.0,,US,US-KS,Leoti,00AA,,00AA,"-101.473911, 38.704022"
2,00AK,small_airport,Lowell Field,450.0,,US,US-AK,Anchor Point,00AK,,00AK,"-151.695999146, 59.94919968"
3,00AL,small_airport,Epps Airpark,820.0,,US,US-AL,Harvest,00AL,,00AL,"-86.77030181884766, 34.86479949951172"
4,00AR,closed,Newport Hospital & Clinic Heliport,237.0,,US,US-AR,Newport,,,,"-91.254898, 35.6087"
...,...,...,...,...,...,...,...,...,...,...,...,...
55070,ZYYK,medium_airport,Yingkou Lanqi Airport,0.0,AS,CN,CN-21,Yingkou,ZYYK,YKH,,"122.3586, 40.542524"
55071,ZYYY,medium_airport,Shenyang Dongta Airport,,AS,CN,CN-21,Shenyang,ZYYY,,,"123.49600219726562, 41.784400939941406"
55072,ZZ-0001,heliport,Sealand Helipad,40.0,EU,GB,GB-ENG,Sealand,,,,"1.4825, 51.894444"
55073,ZZ-0002,small_airport,Glorioso Islands Airstrip,11.0,AF,TF,TF-U-A,Grande Glorieuse,,,,"47.296388888900005, -11.584277777799999"


## Filter US airports (including territories)

US Territories can be filtered by the iso_country column:

In [8]:
airports[airports['iso_country'] == 'GU']

Unnamed: 0,ident,type,name,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code,coordinates
10134,9OG1,heliport,Barrigada Readiness Center Heliport,311.0,OC,GU,GU-U-A,Guam,9OG1,,9OG1,"144.812142, 13.475863"
22965,HI47,heliport,Big Eye Heliport,80.0,OC,GU,GU-U-A,Agana Guam,,,,"144.80499267578125, 13.498100280761719"
38567,PGUA,medium_airport,Andersen Air Force Base,627.0,OC,GU,GU-U-A,"Yigo, Mariana Island",PGUA,UAM,UAM,"144.929998, 13.584"
38568,PGUM,large_airport,Antonio B. Won Pat International Airport,298.0,OC,GU,GU-U-A,"HagÃ¥tÃ±a, Guam International Airport",PGUM,GUM,GUM,"144.796005249, 13.4834003448"


From google:

```
US Territory and Possession Address Abbreviations
AS – American Samoa.
FM – Federated States of Micronesia.
GU – Guam.
MH – Marshall Islands.
MP – Northern Mariana Islands.
PR – Puerto Rico.
PW – Palau.
VI – U.S. Virgin Islands.
```

In [12]:
iso_countries = pd.DataFrame(
    ['US', 'AS', 'FM', 'GU', 'MH', 'MP', 'PR', 'PW', 'VI'],
    columns=['iso_country']
)

In [13]:
iso_countries

Unnamed: 0,iso_country
0,US
1,AS
2,FM
3,GU
4,MH
5,MP
6,PR
7,PW
8,VI


Filter down to relevant iso_countries:

In [14]:
def merge_airport_iso_countries(df):
    return (
        df
        .merge(iso_countries, left_on='iso_country', right_on='iso_country', how='inner')
    )

In [15]:
(
    airports
    .pipe(merge_airport_iso_countries)
)

Unnamed: 0,ident,type,name,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code,coordinates
0,00A,heliport,Total Rf Heliport,11.0,,US,US-PA,Bensalem,00A,,00A,"-74.93360137939453, 40.07080078125"
1,00AA,small_airport,Aero B Ranch Airport,3435.0,,US,US-KS,Leoti,00AA,,00AA,"-101.473911, 38.704022"
2,00AK,small_airport,Lowell Field,450.0,,US,US-AK,Anchor Point,00AK,,00AK,"-151.695999146, 59.94919968"
3,00AL,small_airport,Epps Airpark,820.0,,US,US-AL,Harvest,00AL,,00AL,"-86.77030181884766, 34.86479949951172"
4,00AR,closed,Newport Hospital & Clinic Heliport,237.0,,US,US-AR,Newport,,,,"-91.254898, 35.6087"
...,...,...,...,...,...,...,...,...,...,...,...,...
22887,VI02,heliport,St. Thomas Waterfront Heliport,4.0,,VI,VI-U-A,Charlotte Amalie,VI02,,VI02,"-64.93930053710938, 18.338600158691406"
22888,VI03,heliport,Frenchman's Reef Heliport,20.0,,VI,VI-U-A,St Thomas,VI03,,VI03,"-64.9220962524414, 18.31999969482422"
22889,VI04,heliport,Stouffer Grand Beach Resort Heliport,125.0,,VI,VI-U-A,Charlotte Amalie,VI04,,VI04,"-64.90399932861328, 18.345800399780273"
22890,VI22,seaplane_base,Charlotte Amalie Harbor Seaplane Base,,,VI,VI-U-A,Charlotte Amalie St Thomas,VI22,SPB,VI22,"-64.9406967163086, 18.338600158691406"


## Filter international airports

After some exploration, my conclusion is that international airports are those whose name contains "International":

In [17]:
airports['name'].str.lower().str.contains('international').value_counts()

False    54058
True      1017
Name: name, dtype: int64

In [18]:
def filter_international_airports(df):
    return df[df['name'].str.lower().str.contains('international')]

In [20]:
us_international_airports = (
    airports
    .pipe(merge_airport_iso_countries)
    .pipe(filter_international_airports)
)

In [21]:
us_international_airports

Unnamed: 0,ident,type,name,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code,coordinates
194,03AZ,small_airport,Thompson International Aviation Airport,4275.0,,US,US-AZ,Hereford,03AZ,,03AZ,"-110.08399963378906, 31.433399200439453"
550,09I,seaplane_base,International Falls Seaplane Base,1110.0,,US,US-MN,International Falls,09I,,09I,"-93.37079620361328, 48.60580062866211"
1101,0TS2,small_airport,Ultralight International Ultralightport,820.0,,US,US-TX,Haslet,0TS2,,0TS2,"-97.32890319824219, 32.948699951171875"
1154,0WI5,small_airport,Crash In International Airport,755.0,,US,US-WI,Husher,0WI5,,0WI5,"-87.8906021118164, 42.793399810791016"
2122,1NC9,small_airport,Northbrook International Ultraport Ultralightport,1030.0,,US,US-NC,Cherryville,1NC9,,1NC9,"-81.42639923095703, 35.44969940185547"
...,...,...,...,...,...,...,...,...,...,...,...,...
22876,NSTU,medium_airport,Pago Pago International Airport,32.0,OC,AS,AS-U-A,Pago Pago,NSTU,PPG,PPG,"-170.710006714, -14.3310003281"
22879,PTKK,medium_airport,Chuuk International Airport,11.0,OC,FM,FM-TRK,Weno Island,PTKK,TKK,TKK,"151.84300231933594, 7.461870193481445"
22880,PTPN,medium_airport,Pohnpei International Airport,10.0,OC,FM,FM-PNI,Pohnpei Island,PTPN,PNI,PNI,"158.20899963378906, 6.985099792480469"
22881,PTSA,medium_airport,Kosrae International Airport,11.0,OC,FM,FM-KSA,Okat,PTSA,KSA,TTK,"162.957993, 5.35698"


## Project state

In iso_region, if the prefix is US, then the state is a two-letter suffix; otherwise, it is a two-letter prefix:

In [37]:
def extract_state(iso_region):
    if iso_region[:2] == 'US':
        return iso_region[-2:]
    else:
        return iso_region[:2]

In [39]:
def project_state(df):
    return df.assign(state=df['iso_region'].apply(extract_state))

In [49]:
us_international_airports = (
    airports
    .pipe(merge_airport_iso_countries)
    .pipe(filter_international_airports)
    .pipe(project_state)
)

In [50]:
us_international_airports

Unnamed: 0,ident,type,name,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code,coordinates,state
194,03AZ,small_airport,Thompson International Aviation Airport,4275.0,,US,US-AZ,Hereford,03AZ,,03AZ,"-110.08399963378906, 31.433399200439453",AZ
550,09I,seaplane_base,International Falls Seaplane Base,1110.0,,US,US-MN,International Falls,09I,,09I,"-93.37079620361328, 48.60580062866211",MN
1101,0TS2,small_airport,Ultralight International Ultralightport,820.0,,US,US-TX,Haslet,0TS2,,0TS2,"-97.32890319824219, 32.948699951171875",TX
1154,0WI5,small_airport,Crash In International Airport,755.0,,US,US-WI,Husher,0WI5,,0WI5,"-87.8906021118164, 42.793399810791016",WI
2122,1NC9,small_airport,Northbrook International Ultraport Ultralightport,1030.0,,US,US-NC,Cherryville,1NC9,,1NC9,"-81.42639923095703, 35.44969940185547",NC
...,...,...,...,...,...,...,...,...,...,...,...,...,...
22876,NSTU,medium_airport,Pago Pago International Airport,32.0,OC,AS,AS-U-A,Pago Pago,NSTU,PPG,PPG,"-170.710006714, -14.3310003281",AS
22879,PTKK,medium_airport,Chuuk International Airport,11.0,OC,FM,FM-TRK,Weno Island,PTKK,TKK,TKK,"151.84300231933594, 7.461870193481445",FM
22880,PTPN,medium_airport,Pohnpei International Airport,10.0,OC,FM,FM-PNI,Pohnpei Island,PTPN,PNI,PNI,"158.20899963378906, 6.985099792480469",FM
22881,PTSA,medium_airport,Kosrae International Airport,11.0,OC,FM,FM-KSA,Okat,PTSA,KSA,TTK,"162.957993, 5.35698",FM


As a sanity check, lets see how many international airports there are per state:

In [51]:
(
    us_international_airports
    [['state', 'name']]
    .groupby('state')
    .count()
    .rename(columns={'name': 'count'})
    .sort_values('count', ascending=False)
)

Unnamed: 0_level_0,count
state,Unnamed: 1_level_1
TX,30
FL,24
NY,14
CA,13
MN,12
MT,9
IL,9
MI,9
WA,9
AZ,8


## Materialize

In [52]:
us_international_airports.to_csv('../curated/us_international_airports.csv', index=False)

In [53]:
pd.read_csv('../curated/us_international_airports.csv')

Unnamed: 0,ident,type,name,elevation_ft,continent,iso_country,iso_region,municipality,gps_code,iata_code,local_code,coordinates,state
0,03AZ,small_airport,Thompson International Aviation Airport,4275.0,,US,US-AZ,Hereford,03AZ,,03AZ,"-110.08399963378906, 31.433399200439453",AZ
1,09I,seaplane_base,International Falls Seaplane Base,1110.0,,US,US-MN,International Falls,09I,,09I,"-93.37079620361328, 48.60580062866211",MN
2,0TS2,small_airport,Ultralight International Ultralightport,820.0,,US,US-TX,Haslet,0TS2,,0TS2,"-97.32890319824219, 32.948699951171875",TX
3,0WI5,small_airport,Crash In International Airport,755.0,,US,US-WI,Husher,0WI5,,0WI5,"-87.8906021118164, 42.793399810791016",WI
4,1NC9,small_airport,Northbrook International Ultraport Ultralightport,1030.0,,US,US-NC,Cherryville,1NC9,,1NC9,"-81.42639923095703, 35.44969940185547",NC
...,...,...,...,...,...,...,...,...,...,...,...,...,...
253,NSTU,medium_airport,Pago Pago International Airport,32.0,OC,AS,AS-U-A,Pago Pago,NSTU,PPG,PPG,"-170.710006714, -14.3310003281",AS
254,PTKK,medium_airport,Chuuk International Airport,11.0,OC,FM,FM-TRK,Weno Island,PTKK,TKK,TKK,"151.84300231933594, 7.461870193481445",FM
255,PTPN,medium_airport,Pohnpei International Airport,10.0,OC,FM,FM-PNI,Pohnpei Island,PTPN,PNI,PNI,"158.20899963378906, 6.985099792480469",FM
256,PTSA,medium_airport,Kosrae International Airport,11.0,OC,FM,FM-KSA,Okat,PTSA,KSA,TTK,"162.957993, 5.35698",FM
