# Train Station Data Cleaner

Data source
https://www.data.vic.gov.au/data/dataset/train-station-entries-2008-09-to-2011-12-new
Data Temporal Coverage:	01/07/2008 to 30/06/2012
Comparable data with buses and trams [Weekeday by time 'AM Peak', 'Interpeak', 'PM Peak']

## Data Source
https://www.data.vic.gov.au/data/dataset/train-station-entries-2008-09-to-2011-12-new

Data Temporal Coverage:	01/07/2008 to 30/06/2012

In [21]:
rawtrain = './raw/Train Station Entries 2008-09 to 2011-12 - data.XLS'

## Step 1: Download raw tram boarding data, save a local copy in ./raw directory
Download Tram boardings and alightings xls file manually. The web page has a 'I consent to terms and conditions / I am not a robot' button that prevents automated downloading (or at least makes it harder than I expected). Save file to './raw' directory

In [22]:
import pandas as pd

In [25]:
df = pd.read_excel(rawtrain,sheetname='Data', header = 0, skiprows = 1, skip_footer=5)
df

Unnamed: 0,Station,Notes,Line Group,Network Segment,Line Segment,2008-09,2009-10,2010-11,2011-12,2008-09.1,...,2011-12.1,Normal Weekday,Saturday,Sunday,Weekly,Pre AM Peak,AM Peak,Interpeak,PM Peak,PM Late
0,Aircraft,,Northern,Newport Corridor,Seaholme-Werribee,0.313692,0.305483,0.324742,0.315160,1088.699986,...,1061.929966,1061.929966,525.683292,397.080300,6232.413425,237.897730,456.425690,225.740549,130.721038,11.144959
1,Alamein,,Burnley,Camberwell Corridor,Riversdale-Alamein,0.176067,0.169994,0.175858,0.153094,628.730293,...,492.705744,492.705744,292.156843,201.637095,2957.322660,48.946191,267.239895,114.143413,44.118682,18.257564
2,Albion,,Northern,Sunbury Line,Albion-Sunbury,0.785147,0.808922,0.802370,0.679577,2859.820566,...,2529.307073,2529.307073,928.927817,632.559319,14208.022498,319.162044,1310.765719,485.731707,389.188179,24.459423
3,Alphington,,Clifton Hill,Hurstbridge Line,Westgarth-Hurstbridge,0.316047,0.323038,0.321010,0.287551,1160.002749,...,1011.836954,1011.836954,476.803206,348.284321,5884.272295,42.208523,537.959378,241.489810,153.019594,37.159648
4,Altona,,Northern,Newport Corridor,Seaholme-Werribee,0.409006,0.391029,0.385586,0.281536,1381.436326,...,962.443514,962.443514,486.406093,298.350946,5596.974608,120.828059,400.638040,231.197189,170.906116,38.874109
5,Anstey,,Northern,Upfield Line,Macauly-Upfield,0.353803,0.379469,0.376610,0.361545,1181.831716,...,1143.501139,1143.501139,797.290704,612.842409,7127.638809,95.506592,453.776044,306.652866,216.409306,71.156331
6,Armadale,,Caulfield,Hawksburn-Caulfield,Hawksburn-Caulfield,0.568183,0.581113,0.623642,0.563734,2040.527713,...,1844.564584,1844.564584,1356.511188,870.567424,11449.901533,80.678582,866.786739,381.513971,437.373267,78.212026
7,Ascot Vale,,Northern,Craigieburn Line,Kensington-Craigieburn,0.570532,0.587112,0.548968,0.544282,2026.388357,...,1890.138774,1890.138774,970.636798,678.919272,11100.249942,126.057396,987.481205,443.258259,280.724595,52.617319
8,Ashburton,,Burnley,Camberwell Corridor,Riversdale-Alamein,0.295785,0.303425,0.307057,0.286560,1091.933975,...,1023.164969,1023.164969,370.467082,244.203530,5730.495455,93.941294,487.340828,236.751601,189.405391,15.725855
9,Aspendale,,Caulfield,Frankston Line,Glenhuntly-Frankston,0.388967,0.380146,0.355486,0.305565,1340.146017,...,959.853901,959.853901,606.947772,455.699106,5861.916380,89.744631,496.647928,221.980937,127.979531,23.500874


### Comparison with bus and tram reports.
#### Station entries v Boardings and alightings
The train station entry data does not provide information about how many people got on or off a train.  (There is no boarding or alighting information).  Train Station entries can only measure the level of activity over the course of a day at a particular station.  

### Step 2: Subset out the weekday 7am to 7pm station entry data
The Train station entry report covers the entire operating day. The bus and tram reports cover only the 7am to 7pm period. 

Train station entries are broken into four time periods.  They are not specieifed on the Data Vic Gov Au website, but they are specified on the PTV Research website https://www.ptv.vic.gov.au/about-ptv/ptv-data-and-reports/research-and-statistics/

 - Pre AM Peak - first service to 6:59am
 - AM Peak - 7:00am to 9:29am
 - Interpeak - 9:30am to 2:59pm
 - PM Peak - 3:00pm to 6:59pm
 - PM Late - 7:00pm to last service

To compare with bus and tram boadings, SUM('AM Peak', 'Interpeak', 'PM Peak') columns from the 2011 weekday dataset to create a 'wk7am7pm' value.

In [26]:
trains = df.loc[:, ['Station','AM Peak','Interpeak','PM Peak']]
trains['wk7am7pm'] = trains['AM Peak'] + trains['Interpeak'] + trains['PM Peak']

### Step 3: Create a .csv file with weekday 7am to 7pm station entries for each stop
This script groups all the reported tram boardings and alightings for a given stop
If multiple routes use the same stop the results from multiple routes will be combined into a single "boarding" value and a single "alighting" value.

Results are saved as

'./clean/TrainStationEntries.csv' 



In [24]:
trains.to_csv('./clean/TrainStationEntries.csv')

trains

Unnamed: 0,Station,AM Peak,Interpeak,PM Peak,wk7am7pm
0,Aircraft,456.425690,225.740549,130.721038,812.887277
1,Alamein,267.239895,114.143413,44.118682,425.501990
2,Albion,1310.765719,485.731707,389.188179,2185.685605
3,Alphington,537.959378,241.489810,153.019594,932.468782
4,Altona,400.638040,231.197189,170.906116,802.741346
5,Anstey,453.776044,306.652866,216.409306,976.838216
6,Armadale,866.786739,381.513971,437.373267,1685.673976
7,Ascot Vale,987.481205,443.258259,280.724595,1711.464059
8,Ashburton,487.340828,236.751601,189.405391,913.497819
9,Aspendale,496.647928,221.980937,127.979531,846.608396


## Step 4: Map train specific results
Use QGIS to join TrainStationEntries.csv to 'layer ptv_train_station' using the common column 'Station'

Use Display properties to colourcode tram stops by wk7am7pm to find the busiest stop.

Note: 'layer ptv_train_station' includes a 'Metlink_StopID' column creating, providing a common index for all public transport stops.


