# SFO Air Traffic Passenger Statistics

San Francisco International Airport Report on Monthly Passenger Traffic Statistics by Airline. Airport data is seasonal in nature, therefore any comparative analyses should be done on a period-over-period basis (i.e. January 2010 vs. January 2009) as opposed to period-to-period (i.e. January 2010 vs. February 2010). It is also important to note that fact and attribute field relationships are not always 1-to-1. For example, Passenger Counts belonging to United Airlines will appear in multiple attribute fields and are additive, which provides flexibility for the user to derive categorical Passenger Counts as desired.

- __Dataset Identifier:__ rkru-6vcg
- __Total Rows:__ 20745
- __Source Domain:__ data.sfgov.org
- __Created:__ 4/19/2016, 4:51:23 PM
- __Last Updated:__ 8/12/2019, 11:16:26 AM
- __Category:__ Transportation
- __License:__ Open Data Commons Public Domain Dedication and License
- __Owner:__ OpenData
- __Endpoint Version:__ 2.1

In [1]:
from pathlib import Path

import pandas as pd

from sodapy import Socrata

from sklearn.model_selection import train_test_split

## SFO Fields

https://data.sfgov.org/Transportation/Air-Traffic-Passenger-Statistics/rkru-6vcg

https://dev.socrata.com/foundry/data.sfgov.org/rkru-6vcg

Each row is an airline passenger

![SFO Field Info](./Data/sfo-data-info.png)

In [2]:
total_rows = 20_745
client = Socrata("data.sfgov.org", None)
results = client.get("rkru-6vcg", limit=total_rows)
sfo_raw = pd.DataFrame.from_records(results)



In [4]:
sfo_train_raw, sfo_test_raw = train_test_split(sfo_raw[::-1], test_size=0.2, shuffle=False)
sfo_train_raw.shape, sfo_test_raw.shape

((16596, 12), (4149, 12))

In [5]:
sfo_train_raw.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 16596 entries, 20744 to 4149
Data columns (total 12 columns):
activity_period                16596 non-null object
operating_airline              16596 non-null object
operating_airline_iata_code    16538 non-null object
published_airline              16596 non-null object
published_airline_iata_code    16538 non-null object
geo_summary                    16596 non-null object
geo_region                     16596 non-null object
activity_type_code             16596 non-null object
price_category_code            16596 non-null object
terminal                       16596 non-null object
boarding_area                  16596 non-null object
passenger_count                16596 non-null object
dtypes: object(12)
memory usage: 1.6+ MB


In [6]:
sfo_train_raw.describe()

Unnamed: 0,activity_period,operating_airline,operating_airline_iata_code,published_airline,published_airline_iata_code,geo_summary,geo_region,activity_type_code,price_category_code,terminal,boarding_area,passenger_count
count,16596,16596,16538,16596,16538,16596,16596,16596,16596,16596,16596,16596
unique,141,81,76,71,67,2,9,3,2,5,8,12660
top,201608,United Airlines - Pre 07/01/2013,UA,United Airlines - Pre 07/01/2013,UA,International,US,Deplaned,Other,International,A,1
freq,142,2154,3351,2645,4118,10328,6268,7828,14480,10344,5837,14


In [7]:
sfo_train_raw.head()

Unnamed: 0,activity_period,operating_airline,operating_airline_iata_code,published_airline,published_airline_iata_code,geo_summary,geo_region,activity_type_code,price_category_code,terminal,boarding_area,passenger_count
20744,200507,Independence Air,DH,Independence Air,DH,Domestic,US,Enplaned,Low Fare,International,A,10967
20743,200507,Japan Airlines,JL,Japan Airlines,JL,International,Asia,Deplaned,Other,International,A,9195
20742,200507,Japan Airlines,JL,Japan Airlines,JL,International,Asia,Enplaned,Other,International,A,9086
20741,200507,KLM Royal Dutch Airlines,KL,KLM Royal Dutch Airlines,KL,International,Europe,Deplaned,Other,International,A,9978
20740,200507,KLM Royal Dutch Airlines,KL,KLM Royal Dutch Airlines,KL,International,Europe,Enplaned,Other,International,A,9587


In [8]:
sfo_train_raw.tail()

Unnamed: 0,activity_period,operating_airline,operating_airline_iata_code,published_airline,published_airline_iata_code,geo_summary,geo_region,activity_type_code,price_category_code,terminal,boarding_area,passenger_count
4153,201703,United Airlines,UA,United Airlines,UA,Domestic,US,Enplaned,Other,Terminal 3,E,293538
4152,201703,United Airlines,UA,United Airlines,UA,Domestic,US,Thru / Transit,Other,Terminal 3,E,203
4151,201703,United Airlines,UA,United Airlines,UA,Domestic,US,Deplaned,Other,Terminal 3,F,386706
4150,201703,United Airlines,UA,United Airlines,UA,Domestic,US,Enplaned,Other,Terminal 3,F,371608
4149,201703,United Airlines,UA,United Airlines,UA,Domestic,US,Thru / Transit,Other,Terminal 3,F,71


In [9]:
cwd = Path.cwd()
data_dir = cwd / 'Data'
sfo_raw.to_pickle(data_dir / 'sfo_raw.pkl')
sfo_train_raw.to_pickle(data_dir / 'sfo_train_raw.pkl')
sfo_test_raw.to_pickle(data_dir / 'sfo_test_raw.pkl')