# City Pairs: Domestic Traffic

AndrewJ, 2020-04-05

## Description

Visualisation sandbox on some random data sets using Python3. In this case, it's Australian [monthly airport domestic traffic data](https://data.gov.au/dataset/domestic-airlines-top-routes-and-totals) via data.gov.au.

In [74]:
%matplotlib inline

import numpy as np
import pandas as pd
import datetime as dt
import altair as alt

## Read and process the data

In [2]:
def read_traffic():
    return pd.read_csv("data/audomcitypairs-201912.csv")

In [3]:
def transform_traffic(df):
    df1 = df.assign(
        Journey = df.City1 + "-" + df.City2, 
        Month = dt.datetime(1899, 12, 30) + df['Month'].map(dt.timedelta))
    return df1

## Run

In [4]:
dom = transform_traffic(read_traffic())

In [40]:
dom.head()

Unnamed: 0,City1,City2,Month,Passenger_Trips,Aircraft_Trips,Passenger_Load_Factor,Distance_GC_(km),RPKs,ASKs,Seats,Year,Month_num,Journey
0,ADELAIDE,ALICE SPRINGS,1984-01-01,15743,143,81.8,1316,20717788,25327369,19246,1984,1,ADELAIDE-ALICE SPRINGS
1,ADELAIDE,BRISBANE,1984-01-01,3781,32,89.8,1622,6132782,6829379,4210,1984,1,ADELAIDE-BRISBANE
2,ADELAIDE,CANBERRA,1984-01-01,1339,12,94.7,972,1301508,1374348,1414,1984,1,ADELAIDE-CANBERRA
3,ADELAIDE,DARWIN,1984-01-01,3050,33,66.8,2619,7987950,11958009,4566,1984,1,ADELAIDE-DARWIN
4,ADELAIDE,GOLD COAST,1984-01-01,1596,16,88.5,1607,2564772,2898047,1803,1984,1,ADELAIDE-GOLD COAST


In [6]:
dom.dtypes

City1                            object
City2                            object
Month                    datetime64[ns]
Passenger_Trips                   int64
Aircraft_Trips                    int64
Passenger_Load_Factor           float64
Distance_GC_(km)                  int64
RPKs                              int64
ASKs                              int64
Seats                             int64
Year                              int64
Month_num                         int64
Journey                          object
dtype: object

## Visualise

Top sectors by total passengers

In [42]:
trips = dom['Passenger_Trips'] \
    .groupby(dom['Journey']) \
    .mean() \
    .sort_values(ascending = False) \
    .reset_index()

trips.head(5)

Unnamed: 0,Journey,Passenger_Trips
0,MELBOURNE-SYDNEY,468167.1875
1,BRISBANE-SYDNEY,278187.972222
2,BRISBANE-MELBOURNE,167081.564815
3,GOLD COAST-SYDNEY,132675.391204
4,ADELAIDE-MELBOURNE,130473.773148


In [70]:
bars = alt.Chart(trips.head(15)) \
    .mark_bar(size = 15) \
    .encode(
        x = alt.X(
            'Passenger_Trips:Q',
            scale = alt.Scale(domain = [0, 550000])),
        y = alt.Y(
            'Journey:O', 
            sort = '-x'))

labels = bars \
    .mark_text(dx = 25) \
    .encode(
        text = alt.Text(
            'Passenger_Trips:Q', 
            format = ".0d"))

(bars+labels).properties(
        width = 500, 
        height = 350)

Time series of monthly passenger numbers.

In [25]:
trips_months = dom.groupby(['Month'])[['Passenger_Trips']].sum().reset_index()

In [73]:
alt.Chart(trips_months) \
    .mark_line(color = 'green') \
    .encode(
        x = 'Month:T',
        y = 'Passenger_Trips:Q') \
    .properties(
        width = 400,
        height = 250)

Plot the moving average centered over a 12-month window.

In [46]:
df = trips_months \
    .set_index('Month').rolling(window = 12, center = True) \
    .mean() \
    .reset_index()

In [71]:
alt.Chart(df) \
    .mark_line(color = 'green') \
    .encode(
        x = 'Month:T',
        y = 'Passenger_Trips:Q') \
    .properties(
        width = 400,
        height = 250)