# City Pairs: Domestic Traffic

AndrewJ, 2020-04-05

## Description

Visualisation sandbox on some random data sets using Python3. In this case, it's Australian [monthly airport domestic traffic data](https://data.gov.au/dataset/domestic-airlines-top-routes-and-totals) via data.gov.au.

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import datetime as dt
import altair as alt

## Read and process the data

In [2]:
def read_traffic():
    return pd.read_csv("data/audomcitypairs-202003.csv")

In [28]:
def transform_traffic(df):
    df1 = df.assign(
        Journey = df.City1 + "-" + df.City2, 
        YearMonth = dt.datetime(1899, 12, 30) + df['Month'].map(dt.timedelta))
    return df1

## Run

In [30]:
dom = transform_traffic(read_traffic())

In [31]:
dom.head()

Unnamed: 0,City1,City2,Month,Passenger_Trips,Aircraft_Trips,Passenger_Load_Factor,Distance_GC_(km),RPKs,ASKs,Seats,Year,Month_num,Journey,YearMonth
0,ADELAIDE,ALICE SPRINGS,30682,15743,143,81.8,1316,20717788,25327369,19246,1984,1,ADELAIDE-ALICE SPRINGS,1984-01-01
1,ADELAIDE,BRISBANE,30682,3781,32,89.8,1622,6132782,6829379,4210,1984,1,ADELAIDE-BRISBANE,1984-01-01
2,ADELAIDE,CANBERRA,30682,1339,12,94.7,972,1301508,1374348,1414,1984,1,ADELAIDE-CANBERRA,1984-01-01
3,ADELAIDE,DARWIN,30682,3050,33,66.8,2619,7987950,11958009,4566,1984,1,ADELAIDE-DARWIN,1984-01-01
4,ADELAIDE,GOLD COAST,30682,1596,16,88.5,1607,2564772,2898047,1803,1984,1,ADELAIDE-GOLD COAST,1984-01-01


In [32]:
dom.dtypes

City1                            object
City2                            object
Month                             int64
Passenger_Trips                   int64
Aircraft_Trips                    int64
Passenger_Load_Factor           float64
Distance_GC_(km)                  int64
RPKs                              int64
ASKs                              int64
Seats                             int64
Year                              int64
Month_num                         int64
Journey                          object
YearMonth                datetime64[ns]
dtype: object

In [33]:
dom.describe()

Unnamed: 0,Month,Passenger_Trips,Aircraft_Trips,Passenger_Load_Factor,Distance_GC_(km),RPKs,ASKs,Seats,Year,Month_num
count,23808.0,23808.0,23808.0,23808.0,23808.0,23808.0,23808.0,23808.0,23808.0,23808.0
mean,37905.737021,48441.726016,474.670993,69.882172,1215.53398,54588880.0,69909650.0,63125.457325,2003.321783,6.479503
std,3799.628799,85029.217378,605.520693,18.40076,845.654524,90191620.0,111610100.0,106274.582297,10.404476,3.460656
min,30682.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1984.0,1.0
25%,34790.0,9067.5,128.0,65.2,538.0,5329457.0,7603080.0,13312.0,1995.0,3.0
50%,38169.0,17847.0,316.0,73.8,956.0,18431460.0,25850310.0,25402.0,2004.0,6.0
75%,41244.0,52831.0,556.0,80.6,1622.0,56821920.0,73805850.0,70417.0,2012.0,9.0
max,43891.0,834347.0,5397.0,109.7,3615.0,591587000.0,705952100.0,959123.0,2020.0,12.0


## Visualise

Top sectors by total passengers

In [19]:
trips = (dom['Passenger_Trips'] 
    .groupby(dom['Journey']) 
    .mean() 
    .sort_values(ascending = False) 
    .reset_index())

trips.head(5)

Unnamed: 0,Journey,Passenger_Trips
0,MELBOURNE-SYDNEY,469193.717241
1,BRISBANE-SYDNEY,278447.174713
2,BRISBANE-MELBOURNE,167578.427586
3,GOLD COAST-SYDNEY,133147.765517
4,ADELAIDE-MELBOURNE,130774.048276


In [34]:
bars = (alt.Chart(trips.head(15))
    .mark_bar(size = 15)
    .encode(
        x = alt.X(
            'Passenger_Trips:Q',
            scale = alt.Scale(domain = [0, 550000])),
        y = alt.Y(
            'Journey:O', 
            sort = '-x')))

labels = (bars
    .mark_text(dx = 25) 
    .encode(
        text = alt.Text(
            'Passenger_Trips:Q', 
            format = ".0d")))

(bars+labels).properties(
        width = 500, 
        height = 350)

Time series of monthly passenger numbers.

In [36]:
trips_months = dom.groupby(['YearMonth'])[['Passenger_Trips']].sum().reset_index()

In [43]:
(alt.Chart(trips_months)
    .mark_line(color = 'green')
    .encode(
        x = alt.X(
            'YearMonth:T',
            title = "Date"),
        y = alt.Y(
            'Passenger_Trips:Q', 
            title = "Passenger trips"))
    .properties(
        width = 400,
        height = 250))

Plot the moving average centered over a 12-month window.

In [38]:
df = (trips_months
    .set_index('YearMonth').rolling(window = 12, center = True)
    .mean()
    .reset_index())

In [42]:
(alt.Chart(df)
    .mark_line(color = 'green')
    .encode(
        x = alt.X(
            'YearMonth:T',
            title = "Date"),
        y = alt.Y(
            'Passenger_Trips:Q',
            title = "Passenger trips"))
    .properties(
        width = 400,
        height = 250))