<p style="font-family: helvetica,arial,sans-serif; font-size:2.0em;color:white; background-color: black">&emsp;<b>Event Impacts and Effects on Pedestrian Traffic</b></p>
    
<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:black; background-color: #DDDDDD; text-align:justify">&emsp;<b>Authored by: </b>Alex Voung, Mark Brooksby, Brendan Richards</p>

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black; text-align:right"><b>Duration:</b> 30 to 120 mins&emsp;</p>

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:black; background-color: #DDDDDD; text-align:justify">&emsp;<b>Level: </b>Intermediate to Advanced&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;<b>Pre-requisite Skills:</b>Python</p>

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black">&emsp;<b>Scenario</b>

#### "My business relies on customer walk-ins. I want to understand the events or factors that influence pedestrian numbers near my business."

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black">&emsp;<b>What this Use Case will teach you</b>

At the end of this use case you will understand how to:
- Access and do exploratory analysis on the City of Melbourne's pedestrian sensor data.
- Gain insights through integrating other relevant datasets including:
    - Accessing more open data available through the City of Melbourne.
    - Finding other open data on the internet and importing it for analysis.
    - Connecting to real-time public transport data.
- Create engaging visualisations of your analysis.
- Create intuitive, interactive interfaces.
- Perform predictive modelling.

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black">&emsp;<b>Enriching and Analysing the City of Melbourne's Pedestrian Sensor Network Data</b>

The City of Melbourne produces an attractive interface that shows the location of pedestrian sensors around the CBD overlaid on a 3-Dimensional map. The interface can be animated at the click of a button, showing how the pedestrian traffic ebbs and flows as each hour passes. The interface also provides a comparison to the numbers of pedestrians moving past that sensor 4 weeks or 52 weeks prior.

However, this interface doesn't explore the question of 'why' those numbers change. So this use case will attempt to do just that, by showcasing different approaches to investigating and analysing factors that can influence pedestrian traffic.

Approach 1 will begin by comparing pedestrian traffic data with events in or near the CBD, such as AFL matches or the Melbourne Grand Prix. It will focus on statistical analysis and visualisation to show what impacts these events have.

Approach 2 will show how to access and integrate real-time data from the VicRoads open data portal. It will present a visualisation of train positions, movement and capacity - providing an intuitive way to understand the impact of public transport on pedestrain traffic.

Approach 3 will integrate data from the City of Melbourne's microclimate sensor network, as well as other promising datasets, to build regression models of pedestrian traffic. It will show how to create an interactive visualisation to allow users to experiment with the model.

Each approach introduces new datasets, new ideas and will require the installation of new modules. They are presented in a way where each approach builds on the one prior to it.

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black"><b>&emsp;What data, packages and accesses will I need?</b>

The star of the show is - naturally - the open data provided by the City of Melbourne.
In this notebook we will be using three of their datasets.


<a href='https://data.melbourne.vic.gov.au/Transport/Pedestrian-Counting-System-Monthly-counts-per-hour/b2ak-trbp'>Pedestrian Counting System - Monthly (counts per hour)</a><br>
<a href='https://data.melbourne.vic.gov.au/Transport/Pedestrian-Counting-System-Sensor-Locations/h57g-5234'>Pedestrian Counting System - Sensor Locations</a><br>
<a href='https://data.melbourne.vic.gov.au/Environment/Microclimate-Sensor-Readings/u4vh-84j8'>Microclimate Sensor Readings</a><br>

We also access the open data provided by VicRoads.

<a href='https://data-exchange.vicroads.vic.gov.au/docs/services/vehicle-position-trip-update-opendata/operations/metro-train-service-alerts?'>Metro Train Service Alerts</a><br>
<a href='https://data-exchange.vicroads.vic.gov.au/docs/services/vehicle-position-trip-update-opendata/operations/get-metrotraintripupdates?'>Metro Trains Trip Updates</a><br>
<a href='https://data-exchange.vicroads.vic.gov.au/docs/services/vehicle-position-trip-update-opendata/operations/get-metrotrainvehiclepositionupdates?'>Metro Trains Vehicle Positions</a><br>

To get the best access to these datasets may require the use of an API key. Getting a key is free and has many benefits. The more data you have at your fingertips for analysing, the better, right? Check the Melbourne Open Data and the VicRoads Data Exchange Platform page for details.<br>

Other datasets that we use were found through information freely available on the internet which we put into CSV files for easy consumption.

As we go through each of the three analytical approaches below, we will show what packages need to be installed and why they are necessary. However, if you are using Conda, we have also provided a 'yml' file which you can use to set up an environment with everything you need.

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black">&emsp;<b>Approach 1: Statistical Analysis of Local Major Events</b>

In [None]:
import pandas as pd
from datetime import datetime
!pip -q install sodapy
!pip -q install folium
!pip -q install plotly
!pip -q install seaborn
!pip -q install altair
from sodapy import Socrata
import pandas as pd
import numpy as np
import folium
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
from functools import reduce
from folium.plugins import HeatMapWithTime
from folium.plugins import HeatMap
import folium.plugins as plugins  
from folium import Map
from folium import features
import altair as alt
import json

import warnings
warnings.filterwarnings('ignore')

Most of the packages that we have included here will be familiar to you. If you haven't used Socrata before, that is what we use to connect via API to Melbourne City's Open Data Platform.<br>
If you haven't had the pleasure of playing with Folium before, then you will probably enjoy learning about it. Folium is a really easy way to produce great looking maps with only a few lines of code.

In [None]:
!pip -q install folium --upgrade

In [None]:
#Read and view AFL Data
afl_df = pd.read_csv('afl_data.csv')
afl_df.head()

In [None]:
#Standardise the datatime data
afl_df['startTime'] = pd.to_datetime(afl_df['startTime'], format = '%H:%M')

afl_df.date = pd.to_datetime(afl_df.date, format = '%d-%b-%y')
afl_df['day_of_week'] = afl_df['date'].dt.day_name()

afl_df['StartHour'] = pd.to_datetime(afl_df['startTime'], format='%H:%M:%S').dt.hour
afl_df['StartMinute'] = pd.to_datetime(afl_df['startTime'], format='%H:%M:%S').dt.minute

In [None]:
#Generate start time categories
afl_df['timeslot'] = pd.to_datetime(afl_df['startTime'])
afl_df['timeslot'] = afl_df['timeslot'].dt.strftime("%H:%M:%S")
afl_df['timeslot'] = afl_df['timeslot'].apply(lambda x: 'Afternoon' if x <= '16:00:00' else 'Evening')
afl_df['start'] = afl_df['startTime'].dt.strftime("%H-%M-%S")

## Extract game date data

In [None]:
game_dates_list = pd.unique((afl_df.date).dt.strftime('%Y-%m-%d')).tolist()

afternoons_dates = afl_df[(afl_df.timeslot == 'Afternoon')]
afternoons_dates_list = pd.unique((afternoons_dates.date).dt.strftime('%Y-%m-%d')).tolist()

evening_dates = afl_df[(afl_df.timeslot == 'Evening')]
evening_dates_list = pd.unique((evening_dates.date).dt.strftime('%Y-%m-%d')).tolist()


MCG_dates = afl_df[(afl_df.venue == 'M.C.G.')]
MCG_dates_list = pd.unique((MCG_dates.date).dt.strftime('%Y-%m-%d')).tolist()

Docklands_dates = afl_df[(afl_df.timeslot == 'Docklands')]
Docklands_dates_list = pd.unique((Docklands_dates.date).dt.strftime('%Y-%m-%d')).tolist()

In [None]:
#Pedestrian data
client = Socrata('data.melbourne.vic.gov.au', 'jkWLqYGpmFN5bK6j45TU4peYP', None)
results = client.get("b2ak-trbp", limit=7000000)

pedestrian_df = pd.DataFrame.from_records(results)
#pedestrian_df['sensor_id'] = pedestrian_df['sensor_id'].astype(int)

#View pedestrian data
pedestrian_df.head(5).T

In [None]:
#Location data
client = Socrata('data.melbourne.vic.gov.au', 'nlPM0PQJSjzCsbVqntjPvjB1f', None)
ped_data_location = "h57g-5234"
results = client.get(ped_data_location)
sensor_location = pd.DataFrame.from_records(results)
sensor_location[['latitude', 'longitude']] = sensor_location[['latitude', 'longitude']].astype(float)
#sensor_location['sensor_id'] = sensor_location['sensor_id'].astype(str)

#View sensor data
sensor_location.head(5).T

## Visualise all sensor locations

In [None]:
#Plot location of sensors
map = folium.Map(location=[sensor_location.latitude.mean(), 
                           sensor_location.longitude.mean()], 
                          zoom_start=13.5, min_zoom=13, max_zoom = 16,max_bounds=True)

for i in range(0,len(sensor_location)):
        label = 'Sensor ID: ' + sensor_location.iloc[i]['sensor_id']
        folium.Marker(location = [sensor_location.iloc[i]['latitude'], 
                            sensor_location.iloc[i]['longitude']], 
                            popup = label).add_to(map)
map

In [None]:
merged_df = pd.merge(pedestrian_df, sensor_location, on = 'sensor_id')

merged_df = merged_df[['date_time', 'year', 'month', 'mdate', 'day', 'time', 'sensor_id',
       'sensor_name_x', 'hourly_counts','latitude',
       'longitude']]

merged_df['date_time'] = pd.to_datetime(merged_df['date_time'])
merged_df['date'] = merged_df['date_time'].dt.strftime("%Y-%m-%d")

merged_df = merged_df.sort_values(['date_time'], ascending = True)
merged_df['hourly_counts'] = merged_df['hourly_counts'].astype(int)
merged_df['day'] = merged_df['day'].astype(str)

#merged_df['date_time'] = pd.to_datetime(merged_df['date_time'])
merged_df['year'] = merged_df['year'].astype(int)
merged_df['mdate'] = merged_df['mdate'].astype(int)
merged_df['time'] = merged_df['time'].astype(int)
merged_df['hourly_counts'] = merged_df['hourly_counts'].astype(int)
merged_df['year'] = merged_df['year'].astype(int)
merged_df['sensor_id'] = merged_df['sensor_id'].astype(int)
all_sensors=list(pd.unique(merged_df['sensor_id']))

In [None]:
#adjust dataframe to AFL data
merged_df.reset_index(inplace =True)
afl_merged_df = merged_df[merged_df.date > '2012-03-29']

merged_df2 = merged_df.copy()

In [None]:
#Extract game and and non game data
afl_df = afl_merged_df[merged_df['date'].isin(game_dates_list)]
non_afl_df = afl_merged_df[~merged_df['date'].isin(game_dates_list)]
afl_df.head()

In [None]:
non_afl_df.head()

In [None]:
#Generate average pedestraian movements per hour
afl = pd.DataFrame(afl_df.groupby(['sensor_id', 'year', 'day','time', 'latitude', 'longitude'])['hourly_counts'].mean())
afl.reset_index(inplace = True)

non_afl = pd.DataFrame(non_afl_df.groupby(['sensor_id', 'year', 'day','time', 'latitude', 'longitude'])['hourly_counts'].mean())
non_afl.reset_index(inplace = True)

#Combine AFL and Non-AFL Data
afl_merged_df= non_afl.merge(afl, on=["sensor_id","year", "day", "time"])
afl_merged_df.drop(['latitude_x', 'longitude_x'], axis=1, inplace=True)
afl_merged_df = afl_merged_df.rename(columns = {'latitude_y':'latitude','longitude_y':'longitude' })
afl_merged_df = afl_merged_df.rename(columns = {'hourly_counts_x':'No_AFL','hourly_counts_y':'AFL' })

merged_data_diff = afl_merged_df.copy()
merged_data_diff['diff'] = afl_merged_df.AFL - afl_merged_df.No_AFL

merged_data_diff = afl_merged_df.copy()
merged_data_diff['diff'] = (afl_merged_df.AFL - afl_merged_df.No_AFL)/afl_merged_df.AFL
merged_data_diff

In [None]:
afl_merged_df = afl_merged_df.melt(["sensor_id", "year", "day", "time", "latitude", "longitude"],var_name="AFL",value_name="PedestrianCount")

#Check for duplicate dates
list1 = afl_df.date_time
list2 = non_afl_df.date_time

if any(x in list1 for x in list2):
    print("Duplicates found.")
else:
    print("No duplicates found.")

afl_merged_df.head()

## Generate Plot comparing days with and without an AFL Game 

In [None]:
#Plot afl v non-afl movmeents, enter sensor_id, year and day of the week
def dayPlot(sensor, year, day):
    plt.figure(figsize=(15,8))
    sns.lineplot(x='time', y='PedestrianCount', hue='AFL', 
                 data = afl_merged_df[(afl_merged_df.sensor_id == sensor) & 
                                    (afl_merged_df.year == year) & 
                                    (afl_merged_df.day == day)]).set_title('Pedestrian variation - AFL Game v No AFL Game')

    plt.show()

In [None]:
#dayplot -  sensor id, year and day of the week
day = dayPlot(7, 2019, 'Saturday')
day

## Visualise days with and without an AFL Game - Click on Sensor for Graph

In [None]:
def PedestrianVar(year, day, *sensor_ids):
  m = folium.Map(location=[sensor_location.latitude.mean(), 
                            sensor_location.longitude.mean()], 
                            zoom_start=14, control_scale=True, min_zoom=14, max_zoom = 16)

  for i in sensor_ids:
    data = afl_merged_df[(afl_merged_df.sensor_id == i) & 
                                (afl_merged_df.year == year) & 
                                (afl_merged_df.day == day)]

    pedestrian_chart = alt.Chart(data).mark_line().encode(
      x='time',
      y='PedestrianCount',
      color='AFL')
    
    chart = json.loads(pedestrian_chart.to_json())

    popup = folium.Popup(max_width=350)
    folium.features.VegaLite(chart, height=200, width=350).add_to(popup)
    folium.Marker([data.iloc[i]['latitude'], 
                              data.iloc[i]['longitude']], tooltip = data.iloc[i]['sensor_id'], popup=popup).add_to(m)

  return(m)

In [None]:
pedestrian_variance = PedestrianVar(2019, 'Saturday', 9,10,2,6,18,8,11,1,5,12,3,14,31,7,2, 33)
pedestrian_variance

**Differentiate between start times, before 4pm is Afternoon, after 4pm is considered evening.**

In [None]:
#Extract afternoon and evening data
afl_afternoon_df = afl_df[afl_df['date'].isin(afternoons_dates_list)]
afl_evening_df = afl_df[~afl_df['date'].isin(afternoons_dates_list)]

afl_afternoon_df2 = pd.DataFrame(afl_afternoon_df.groupby(['sensor_id', 'year', 'day','time', 'latitude', 'longitude'])['hourly_counts'].mean())
afl_afternoon_df2.reset_index(inplace = True)

afl_evening_df2 = pd.DataFrame(afl_evening_df.groupby(['sensor_id', 'year', 'day','time','latitude', 'longitude'])['hourly_counts'].mean())
afl_evening_df2.reset_index(inplace = True)
#non_afl.drop(['latitude', 'longitude'], axis=1, inplace=True)
dfs = [afl_evening_df2,afl_afternoon_df2,non_afl]

In [None]:
final_df = reduce(lambda  left,right: pd.merge(left,right,on=["sensor_id", "year", "day", "time"],
                                            how='outer'), dfs)
final_df = final_df.rename(columns = {'hourly_counts_x':'Evening','hourly_counts_y':'Afternoon', 'hourly_counts': 'No AFL'})
final_df.drop(['latitude_x', 'longitude_x', 'latitude_y', 'longitude_y'], axis=1, inplace=True)
time_merged_data = final_df.melt(["sensor_id", "year", "day", "time", "latitude", "longitude"],var_name="AFL",value_name="hourly_counts")
time_merged_data.head()

## Generate Plot comparing days with different start times and No AFL Game

In [None]:
#Plot non-afl v afternoon v evening movements, enter sensor_id, year and day of the week

def TimePlot(year, day, sensor):
    plt.figure(figsize=(15,8))
    sns.lineplot(x='time', y='hourly_counts', hue='AFL', 
                 data = time_merged_data[(time_merged_data.sensor_id == sensor) & 
                                    (time_merged_data.year == year) & 
                                    (time_merged_data.day == day)]).set_title('Pedestrian variation')

plt.show()

In [None]:
time = TimePlot(2016, 'Sunday', 24)

## Visualise variation between games start times - Click on Sensor for Graph

In [None]:
def TimeVar(year, day, *sensor_ids):

  m = folium.Map(location=[sensor_location.latitude.mean(), 
                            sensor_location.longitude.mean()], 
                            zoom_start=15, control_scale=False, min_zoom=14, max_zoom = 16)

  for i in sensor_ids:
    data = time_merged_data[(time_merged_data.sensor_id == i) & 
                                (time_merged_data.year == 2016) & 
                                (time_merged_data.day == 'Sunday')]

    pedestrian_chart = alt.Chart(data).mark_line().encode(
      x='time',
      y='hourly_counts',
      color='AFL')
    
    chart = json.loads(pedestrian_chart.to_json())

    popup = folium.Popup(max_width=350)
    folium.features.VegaLite(chart, height=200, width=350).add_to(popup)
    folium.Marker([data.iloc[i]['latitude'], 
                              data.iloc[i]['longitude']], tooltip = data.iloc[i]['sensor_id'], popup=popup).add_to(m)

  return(m)

In [None]:
TimeVariance = TimeVar(2020, 'Sunday',7,9,10,2,6,18,8,11,1,5,12,3,14,31)
TimeVariance

**Differentiate between grounds, MCG or Docklands.**

In [None]:
#Extract ground data
afl_mcg_df = afl_df[afl_df['date'].isin(MCG_dates_list)]
afl_docklands_df = afl_df[~afl_df['date'].isin(MCG_dates_list)]

afl_mcg_df2 = pd.DataFrame(afl_mcg_df.groupby(['sensor_id', 'year', 'day','time','latitude', 'longitude'])['hourly_counts'].mean())
afl_mcg_df2.reset_index(inplace = True)

afl_docklands_df2 = pd.DataFrame(afl_docklands_df.groupby(['sensor_id', 'year', 'day','time', 'latitude', 'longitude'])['hourly_counts'].mean())
afl_docklands_df2.reset_index(inplace = True)

dfs = [afl_mcg_df2,afl_docklands_df2,non_afl]

In [None]:
final_df = reduce(lambda  left,right: pd.merge(left,right,on=["sensor_id", "year", "day", "time"],
                                            how='outer'), dfs)
final_df = final_df.rename(columns = {'hourly_counts_x':'MCG','hourly_counts_y':'Docklands', 'hourly_counts': 'No AFL'})
final_df.drop(['latitude_x', 'longitude_x', 'latitude_y', 'longitude_y'], axis=1, inplace=True)
time_merged_data = final_df.melt(["sensor_id", "year", "day", "time", "latitude", "longitude"],var_name="AFL",value_name="hourly_counts")


In [None]:
#Plot non-afl v afternoon v evening movements, enter sensor_id, year and day of the week

def GroundPlot(sensor, year, day):
    plt.figure(figsize=(15,8))
    sns.lineplot(x='time', y='hourly_counts', hue='AFL', 
                 data = time_merged_data[(time_merged_data.sensor_id == sensor) & 
                                    (time_merged_data.year == year) & 
                                    (time_merged_data.day == day)]).set_title('Pedestrian variation')
plt.show()

## Generate Plot comparing days with games at different grounds

In [None]:
GroundVar = GroundPlot(23, 2017, 'Sunday')

## Visualisation of games at different ground and no AFL games - Click on Sensor

In [None]:
def GroundVar(year, day, *sensor_ids):
  m = folium.Map(location=[sensor_location.latitude.mean(), 
                            sensor_location.longitude.mean()], 
                            zoom_start=13, control_scale=True, min_zoom=14, max_zoom = 16)

  for i in sensor_ids:
    data = time_merged_data[(time_merged_data.sensor_id == i) & 
                                (time_merged_data.year == year) & 
                                (time_merged_data.day == day)]

    pedestrian_chart = alt.Chart(data).mark_line().encode(
      x='time',
      y='hourly_counts',
      color='AFL')
    
    chart = json.loads(pedestrian_chart.to_json())

    popup = folium.Popup(max_width=350)
    folium.features.VegaLite(chart, height=200, width=350).add_to(popup)
    folium.Marker([data.iloc[i]['latitude'], 
                              data.iloc[i]['longitude']], tooltip = data.iloc[i]['sensor_id'], popup=popup).add_to(m)

  return(m)

In [None]:
GroundVariance = GroundVar(2016, 'Sunday',7,9,10,2,6,18,8,11,1,5,12,3,14,31,13)
GroundVariance

## Visualise HeatMap - Indicating how far above the avearge for that particular time and day 

In [None]:
def VarHeatMap(year, day):
    merged_data_diff2 = merged_data_diff[(merged_data_diff.year == year) & (merged_data_diff.day == day)]
    data = []
    for _, d in merged_data_diff2.groupby('time'):
        data.append([[row['latitude'], row['longitude'], row['diff']] for _, row in d.iterrows()])

    time_index = [k[0] for k in merged_data_diff2.groupby('time')]




    map= folium.Map(location=[sensor_location.latitude.mean(), 
                               sensor_location.longitude.mean()],
                                zoom_start=14)


    hm = HeatMapWithTime(data, index = time_index, name = 'Heat Map',auto_play=True, 
                    min_opacity=.5, 
                    gradient = {0.05: 'blue', 
                                0.1: 'green', 
                                0.3: 'orange', 
                                0.45: 'red'}).add_to(map)
    return(map)

In [None]:
VarHeatMap(2016, 'Sunday')

## Introduce new Data - Melbourne Grand Prix

In [None]:
gp_dates = pd.read_csv('gp_dates.csv' )
gp_dates['date'] = pd.to_datetime(gp_dates['date'])

gp_dates = gp_dates[gp_dates.date> '1999-01-01']
gp_dates_list = pd.unique((gp_dates.date).dt.strftime('%Y-%m-%d')).tolist()

gp_df = merged_df[merged_df['date'].isin(gp_dates_list)]
non_gp_df = merged_df[~merged_df['date'].isin(gp_dates_list)]

In [None]:
#Generate average pedestrian movements per hour
gp = pd.DataFrame(gp_df.groupby(['sensor_id', 'year', 'day','time', 'latitude', 'longitude'])['hourly_counts'].mean())
gp.reset_index(inplace = True)

non_gp = pd.DataFrame(non_gp_df.groupby(['sensor_id', 'year', 'day','time', 'latitude', 'longitude'])['hourly_counts'].mean())
non_gp.reset_index(inplace = True)

In [None]:
#Combine Grand Prix and Non-Grand Prix Data
gp_merged_df= non_gp.merge(gp, on=["sensor_id","year", "day", "time"])
gp_merged_df.drop(['latitude_x', 'longitude_x'], axis=1, inplace=True)
gp_merged_df = gp_merged_df.rename(columns = {'latitude_y':'latitude','longitude_y':'longitude' })
gp_merged_df = gp_merged_df.rename(columns = {'hourly_counts_x':'No_GP','hourly_counts_y':'GP' })

In [None]:
gp_merged_df = gp_merged_df.rename(columns = {'hourly_counts_x':'No_GP','hourly_counts_y':'GP' })
gp_merged_data_diff = gp_merged_df.copy()
gp_merged_data_diff['diff'] = (gp_merged_data_diff.GP - gp_merged_data_diff.No_GP)/gp_merged_data_diff.GP

In [None]:
gp_merged_df = gp_merged_df.melt(["sensor_id", "year", "day", "time", "latitude", "longitude"],var_name="GP",value_name="PedestrianCount")
gp_merged_df.head()

## Generate Plot comparing days with days with Grand Prix and without (must be Friday, Saturday or Sunday)

In [None]:
#Plot grand prix v non-grand movmements, enter sensor_id, year and day of the week
def GP_Plot(sensor, year, day):
    plt.figure(figsize=(15,8))
    sns.lineplot(x='time', y='PedestrianCount', hue='GP', 
                 data = gp_merged_df[(gp_merged_df.sensor_id == sensor) & 
                                    (gp_merged_df.year == year) & 
                                    (gp_merged_df.day == day)]).set_title('Pedestrian variation')
plt.show()

In [None]:
GPVariance = GP_Plot(31, 2019, 'Saturday')

## Visualisation of days with and without Grand Prix (Friday, Saturday or Sunday)- Click on Sensor

In [None]:
def GPPedestrianVar(year, day, *sensor_ids):
  m = folium.Map(location=[sensor_location.latitude.mean(), 
                            sensor_location.longitude.mean()], 
                            zoom_start=14, control_scale=True, min_zoom=14, max_zoom = 16)

  for i in sensor_ids:
    data = gp_merged_df[(gp_merged_df.sensor_id == i) & 
                                (gp_merged_df.year == year) & 
                                (gp_merged_df.day == day)]

    pedestrian_chart = alt.Chart(data).mark_line().encode(
      x='time',
      y='PedestrianCount',
      color='GP')
    
    chart = json.loads(pedestrian_chart.to_json())

    popup = folium.Popup(max_width=350)
    folium.features.VegaLite(chart, height=200, width=350).add_to(popup)
    folium.Marker([data.iloc[i]['latitude'], 
                              data.iloc[i]['longitude']], tooltip = data.iloc[i]['sensor_id'], popup=popup).add_to(m)

  return(m)

In [None]:
pedestrian_variance = GPPedestrianVar(2018, 'Sunday',7,9,10,2,6,18,8,11,1,5,12,3,14,31,44)
pedestrian_variance

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black">&emsp;<b>Approach 2: Visual Analysis of Real-Time Train Data</b>

In [None]:
import http.client, urllib.request, urllib.parse, urllib.error, base64
from google.transit import gtfs_realtime_pb2
import pandas as pd
import folium
from folium.plugins import BeautifyIcon
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import datetime

To connect to the real time data provided by the VicRoads open data platform, you will need to use 'gtfs_realtime_pb2'. This is a package developed to work with the GTFS (General Transit Feed Specification) format.

In [None]:
headers = {'Ocp-Apim-Subscription-Key': '46f44fa970e44e04be413233229d3c09',}

params = urllib.parse.urlencode({
})

feed = gtfs_realtime_pb2.FeedMessage()

## Vicroads planned closure API
### Services Alerts
Cancellation of metro train trips

In [None]:
try:
    conn = http.client.HTTPSConnection('data-exchange-api.vicroads.vic.gov.au')
    conn.request("GET", "/opendata/v1/gtfsr/metrotrain-servicealerts?%s" % params, "{body}", headers)
    response = conn.getresponse()
    data = response.read()
    feed.ParseFromString(data)
    print(feed)
    conn.close()
except Exception as e:
    print("[Errno {0}] {1}".format(e.errno, e.strerror))

### Trip Updates
The trip update feed provides real-time arrival and departure information of a trip where data is available.

In [None]:
try:
    conn = http.client.HTTPSConnection('data-exchange-api.vicroads.vic.gov.au')
    conn.request("GET", "/opendata/v1/gtfsr/metrotrain-tripupdates?%s" % params, "{body}", headers)
    response = conn.getresponse()
    data = response.read()
    feed.ParseFromString(data)
    print(feed)
    conn.close()
except Exception as e:
    print("[Errno {0}] {1}".format(e.errno, e.strerror))

In [None]:
trip_list = []
stop_time_list = []

for entity in feed.entity:
  if entity.HasField('trip_update'):
      trip_update = entity.trip_update.trip
      trip_list.append(
          {'trip_id': trip_update.trip_id,
           'start_time': trip_update.start_time,
           'start_date': trip_update.start_date
           })

      stop_times = entity.trip_update.stop_time_update
      for st in stop_times:
        stop_time_list.append(
          {'trip_id': trip_update.trip_id,
           'stop_sequence': st.stop_sequence,
           'arrival_time': st.arrival.time,
           'depart_time': st.departure.time,
           })


df_trip_update = pd.DataFrame(trip_list, columns = ['trip_id','start_time','start_date'])
df_stop_time_update = pd.DataFrame(stop_time_list, columns = ['trip_id','stop_sequence','arrival_time','depart_time'])

# Clean data
df_trip_update['start_date'] = pd.to_datetime(df_trip_update['start_date'], format='%Y%m%d')
df_stop_time_update['arrival_time'] = pd.to_datetime(df_stop_time_update['arrival_time'], unit='s')
df_stop_time_update['depart_time'] = pd.to_datetime(df_stop_time_update['depart_time'], unit='s')

display(df_trip_update)
display(df_stop_time_update.head(20))

### Vehicle Positions
The vehicle position feed contains live location and occupancy of the service.

In [None]:
try:
    conn = http.client.HTTPSConnection('data-exchange-api.vicroads.vic.gov.au')
    conn.request("GET", "/opendata/v1/gtfsr/metrotrain-vehicleposition-updates?%s" % params, "{body}", headers)
    response = conn.getresponse()
    data = response.read()
    feed.ParseFromString(data)
    print(feed)
    conn.close()
except Exception as e:
    print("[Errno {0}] {1}".format(e.errno, e.strerror))

In [None]:
location_list = []

for entity in feed.entity:
  if entity.HasField('vehicle'):
      trip_update = entity.vehicle.trip
      position = entity.vehicle.position
      location_list.append(
          {'trip_id': trip_update.trip_id,
           'start_time': trip_update.start_time,
           'start_date': trip_update.start_date,
           'lat': position.latitude,
           'lon': position.longitude,
           'bearing': position.bearing,
           'timestamp': entity.vehicle.timestamp,
           'vehicle_id': entity.vehicle.vehicle.id,
           'occupancy_stat': entity.vehicle.occupancy_status
           })

df_vehicle_location = pd.DataFrame(location_list, columns = ['trip_id', 'start_time', 'start_date', 'lat', 'lon', 'bearing', 'timestamp', 'vehicle_id', 'occupancy_stat'])

# Clean data
df_vehicle_location['start_date'] = pd.to_datetime(df_vehicle_location['start_date'], format='%Y%m%d')
df_vehicle_location['timestamp'] = pd.to_datetime(df_vehicle_location['timestamp'], unit='s')
df_vehicle_location['timestamp'] = df_vehicle_location['timestamp'].dt.tz_localize('UTC').dt.tz_convert('Australia/Sydney')

display(df_vehicle_location)

**Note on occupancy_stat**
0 : EMPTY
1 : MANY_SEATS_AVAILABLE
2 : FEW_SEATS_AVAILABLE
3 : STANDING_ROOM_ONLY
...

Ref: https://developers.google.com/transit/gtfs-realtime/reference#enum-occupancystatus

#### Visualise Data

In [None]:
# Prepare colour dictionary for occupancy level
keys = list(df_vehicle_location['occupancy_stat'].unique())
color_range = list(np.linspace(0, 1, len(keys), endpoint=False))
colors = [matplotlib.colors.to_hex(plt.cm.Reds(x)) for x in color_range]
color_dict_industry = dict(zip(keys, colors))

In [None]:
# Vehicle Occupancy Status
train_occp = folium.FeatureGroup(name="Train Occupancy Status",
                                show=True,)


for i in range(0,len(df_vehicle_location)):

  # styles = {
  #   'fill': True,
  #   'color': color_dict_industry[df_vehicle_location.iloc[i]['occupancy_stat']],
  #   'weight': 1.5,
  #   # 'fillOpacity': 1
  # }

  icon = BeautifyIcon(
    icon='arrow-up',
    background_color=color_dict_industry[df_vehicle_location.iloc[i]['occupancy_stat']],
    # icon_shape='marker',
    inner_icon_style=f'transform: rotate({df_vehicle_location.iloc[i]["bearing"]}deg);'
  )

  html=f"""
      <h5>{df_vehicle_location.iloc[i]['trip_id']}</h5>
      <p>Occupancy Status: {df_vehicle_location.iloc[i]['occupancy_stat']}</p>
      <p>Start Time: {df_vehicle_location.iloc[i]['start_time']}</p>
      """
  iframe = folium.IFrame(html=html, width=200, height=200)
  popup = folium.Popup(iframe, max_width=2650)

  folium.Marker(
    location=[df_vehicle_location.iloc[i]['lat'], df_vehicle_location.iloc[i]['lon']],
    popup=popup,
    # radius=float(df_vehicle_location.iloc[i]['occupancy_stat'])*100,
    icon=icon,
    # **styles
  ).add_to(train_occp)

In [None]:
map = folium.Map(location=[-37.813, 144.945], tiles="CartoDB dark_matter", zoom_start=13)

train_occp.add_to(map)
folium.LayerControl(collapsed=False).add_to(map)

# Show the map
map

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black">&emsp;<b>Approach 3: Predicitve Linear Models and Interactive Visualisation</b>

In [None]:
from sodapy import Socrata
import pandas as pd
import time
from datetime import date
import warnings
warnings.filterwarnings('ignore')

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

from IPython.core.display import display, HTML

import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objs as go

# from jupyter_dash import JupyterDash
import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_bootstrap_components as dbc
from dash.dependencies import Input, Output

A lot of these packages will be familiar from reading through the two earlier approaches taken in this notebook.
To do the regression modelling, several imports are used from the popular Scikit-Learn package.<br>
To create the interactive visualisation, Dash and several of the related components of Dash are needed.<br>
*Note - Dash runs in a browser. You can test it on a local browser before deploying it. However, if you want to run the interface within the notebook, you can install JupyterDash instead.

## Approach 3, Part 1: Investigating the Pedestrian Sensor Data.
### Pedestrian sensor data - basic analysis.

This next dataset is pulled from the Melbourne Open Playground. The code block in the next cell needs to be uncommented to extract the data and copy it into a csv file that gets stored on your local drive.

This section of Approach 3 will see whether the data in this dataset can be used to make predictions of the pedestrian counts.

Further sections will introduce new datasets and see if the extra information can help to make better, more accurate predictions.

In [None]:
#Uncomment the below to open data source, download sensor data, and store it as a csv locally.

##Function to get Sensor count history data
# def sensor_count():
#     client = Socrata('data.melbourne.vic.gov.au', 'nlPM0PQJSjzCsbVqntjPvjB1f', None)
#     sensor_data_id = "b2ak-trbp"
#     results = client.get(sensor_data_id, limit=5000000)
#     df = pd.DataFrame.from_records(results)
#     df = df[['year', 'month', 'mdate', 'day', 'time', 'sensor_id', 'sensor_name', 'hourly_counts']]
#     return df

# sensor_history = sensor_count()

# sensor_history.to_csv('sensor_history.csv', index=False)

In [None]:
sensor_history = pd.read_csv('sensor_history.csv')

#This function grabs the location (longitude and latitude) of the pedestrian sensors
def sensor_location():
    client = Socrata('data.melbourne.vic.gov.au', 'nlPM0PQJSjzCsbVqntjPvjB1f', None)
    sensor_location_data_id = "h57g-5234"
    results = client.get(sensor_location_data_id)
    df = pd.DataFrame.from_records(results)
    sensor_location = df[["sensor_id", "sensor_description", "latitude", "longitude"]]
    sensor_location.columns = ["Sensor ID", "Sensor Description", "lat", "lon"]
    sensor_location["lat"] = sensor_location["lat"].apply(lambda x: float(x))
    sensor_location["lon"] = sensor_location["lon"].apply(lambda x: float(x))
    return sensor_location

sensor_location = sensor_location()
sensor_location['sensor_id'] = sensor_location['Sensor ID'].astype(int)
sensor_location = sensor_location.drop(['Sensor ID', 'Sensor Description'], axis=1)
sensor_history = sensor_history.merge(sensor_location, on=('sensor_id'), how='inner')

In [None]:
#Have a bit of a look at the dataset.
print(sensor_history.head())
print("")
print(sensor_history.info())
print("")
print(sensor_history.corr())

In [None]:
#Let's do a quick linear regression to see how well we can model the relationships
#contained within the pedestrian sensor network data.
x = sensor_history.drop(columns='hourly_counts')
y = sensor_history.hourly_counts

x_day = pd.get_dummies(x.day)
x_month = pd.get_dummies(x.month)
x_sensor = pd.get_dummies(x.sensor_name)

x_drop = x.drop(['month', 'day', 'sensor_name', 'sensor_id'], axis=1)
X = pd.concat([x_drop, x_day, x_month, x_sensor],axis=1)
X = X.fillna(0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

LR = LinearRegression()
LR.fit(X_train, y_train)
print("The R-squared score is: ", LR.score(X_test, y_test))

## Approach 3, Part 2: Adding new datasets:
### Add a new dataset: climate microsensors.

We didn't get a great result from the previous model. The score it output is the 'R-squared' score. These scores range from 0 to 1, with 1 being a perfect score and 0 being the worst possible score.

Let's see if we can improve on this score by adding other datasets.

This next dataset is also pulled from the Melbourne Open Playground. The code block in the next cell needs to be uncommented to extract the data and copy it into a csv file that gets stored on your local drive.

The dataset is based on climate microsensors in Melbourne's CBD. For this analysis, we are only trying to get an idea of what the climate is like in Melbourne's city as a whole, not going into the detail of each sensor location. So we are only grabbing the data from one sensor.

In [None]:
##Function to get Sensor count history data
# def micro_count():
#     client = Socrata('data.melbourne.vic.gov.au', 'nlPM0PQJSjzCsbVqntjPvjB1f', None)
#     micro_data_id = "u4vh-84j8"
#     results = client.get(micro_data_id, limit=4000000)
#     if results:
#         df = pd.DataFrame.from_records(results)
#     return df

# micro_history = micro_count()

# micro_history.to_csv('micro_history.csv', index=False)

In [None]:
micro_history = pd.read_csv('micro_history.csv')

micro_history = micro_history[(micro_history.sensor_id == '5a') | (micro_history.sensor_id == '5b') |
                             (micro_history.sensor_id == '5c') | (micro_history.sensor_id == '0a') |
                             (micro_history.sensor_id == '0b') | (micro_history.sensor_id == '6')]

micro_history = micro_history[(micro_history.site_id == 1003) | (micro_history.site_id == 1009)]

micro_history = micro_history.drop(['id', 'gateway_hub_id', 'type', 'units'], axis=1)

micro_history.loc[micro_history.sensor_id == '5a', 'temp'] = micro_history.value
micro_history.loc[micro_history.sensor_id == '5b', 'humidity'] = micro_history.value
micro_history.loc[micro_history.sensor_id == '5c', 'pressure'] = micro_history.value
micro_history.loc[micro_history.sensor_id == '0a', 'part_2p5'] = micro_history.value
micro_history.loc[micro_history.sensor_id == '0b', 'part_10'] = micro_history.value
micro_history.loc[micro_history.sensor_id == '6', 'wind'] = micro_history.value

micro_history.local_time = pd.to_datetime(micro_history.local_time, format='%Y-%m-%d')
micro_history['year'] = micro_history.local_time.dt.year
micro_history['month'] = micro_history.local_time.dt.month_name()
micro_history['mdate'] = micro_history.local_time.dt.day
micro_history['time'] = micro_history.local_time.dt.hour

micro_history = micro_history.drop(['site_id', 'sensor_id', 'value', 'local_time'], axis=1)
micro_history = micro_history.groupby(by=['year', 'month', 'mdate', 'time']).max()

ped_climate = sensor_history.merge(micro_history, on=('year', 'month', 'mdate', 'time'), how='inner')

x = ped_climate.drop(columns='hourly_counts')
y = ped_climate.hourly_counts

x_day = pd.get_dummies(x.day)
x_month = pd.get_dummies(x.month)
x_sensor = pd.get_dummies(x.sensor_name)

x_drop = x.drop(['month', 'day', 'sensor_name', 'sensor_id'], axis=1)
X = pd.concat([x_drop, x_day, x_month, x_sensor],axis=1)
X = X.fillna(0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

LR = LinearRegression()
LR.fit(X_train, y_train)
print("The R-squared score is: ", LR.score(X_test, y_test))

### Add a different dataset: school and public holidays:

Adding in the climate data resulted in an imrpovement of the R-squared score. Let's see if we can find other datasets to add in to the model and get that score even higher. The next dataset was one that was created manually - by looking up the details online, then entering them into a csv.
You will need to have this csv downloaded into your local directory for this to work.

This dataset has details of which dates are public holidays and which are school holidays. This could have an impact on how many people are walking around, right?

In [None]:
vic_holidays = pd.read_csv('vic_holidays.csv')

ped_holidays = sensor_history.merge(vic_holidays, on=('year', 'month', 'mdate'), how='left')
ped_holidays =  ped_holidays.fillna(0)

x = ped_holidays.drop(columns='hourly_counts')
y = ped_holidays.hourly_counts

x_day = pd.get_dummies(x.day)
x_month = pd.get_dummies(x.month)
x_sensor = pd.get_dummies(x.sensor_name)

x_drop = x.drop(['month', 'day', 'sensor_name', 'sensor_id'], axis=1)
X = pd.concat([x_drop, x_day, x_month, x_sensor],axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

LR.fit(X_train, y_train)
print("The R-squared score is: ", LR.score(X_test, y_test))

### What about Covid-19?:

The holiday dataset only had a small, but consistent, positive effect on the scoring. Let's keep looking. The next dataset was found on the internet. You will need to have this csv downloaded into your local directory for this to work.

It contains statistics about covid-19, including historical data. It is expected that these numbers could have a reasonably large impact on the numbers of pedestrians.

**Source:** https://govtstats.covid19nearme.com.au/data/all.csv

In [None]:
covid_data = pd.read_csv('covid_data.csv')

covid_data = covid_data[['DATE', 'VIC_CASES_LOCAL_LAST_24H', 'VIC_CASES_ACTIVE', 
                    'VIC_CASES_LOCAL_LAST_7D', 'VIC_CASES_OVERSEAS_ACQUIRED_LAST_24H', 'VIC_CASES_OVERSEAS_ACQUIRED_LAST_7D',
                    'VIC_CASES_UNDER_INVESTIGATION_LAST_24H', 'VIC_CASES_UNDER_INVESTIGATION_LAST_7D',
                    'VIC_TESTS_LAST_7D', 'VIC_TESTS_PER_100K_LAST_7D']]

covid_data.fillna(0)

covid_data.DATE = pd.to_datetime(covid_data.DATE, format='%Y-%m-%d')

covid_data['year'] = covid_data.DATE.dt.year
covid_data['month'] = covid_data.DATE.dt.month_name()
covid_data['mdate'] = covid_data.DATE.dt.day
covid_data['mon'] = covid_data.DATE.dt.month

sensor_covid = sensor_history.merge(covid_data, on=('year', 'month', 'mdate'), how='inner')

x = sensor_covid.drop(columns='hourly_counts')
y = sensor_covid.hourly_counts

x_day = pd.get_dummies(x.day)
x_month = pd.get_dummies(x.month)
x_sensor = pd.get_dummies(x.sensor_name)

x_drop = x.drop(['month', 'day', 'sensor_name', 'sensor_id', 'mon', 'DATE'], axis=1)
X = pd.concat([x_drop, x_day, x_month, x_sensor],axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

#Let's have a look at the newly created dataset:
print(sensor_covid.head())
print("")
print(sensor_covid.corr())

#And then do the regression.
LR = LinearRegression()
LR.fit(X_train, y_train)
print("The R-squared score is: ", LR.score(X_test, y_test))

## Approach 3, Part 3: Making the final dataset:

The Covid data also seems to have a small, positive impact on the scoring. What if we add all of these datasets together? Will the sum of the parts be greater, or will the different datasets just confuse the model?

In [None]:
merge_1 = ped_climate.merge(sensor_covid, on=('year', 'month', 'mdate', 'day', 'time', 'sensor_id'
                                            , 'sensor_name', 'hourly_counts'), how='inner')

merged = merge_1.merge(ped_holidays, on=('year', 'month', 'mdate', 'day', 'time', 'sensor_id'
                                            , 'sensor_name', 'hourly_counts'), how='inner')

merge_days = pd.get_dummies(merged.day)
merge_months = pd.get_dummies(merged.month)
merge_sensor = pd.get_dummies(merged.sensor_name)
merge_drop = merged.drop(['month', 'day', 'sensor_name', 'sensor_id', 'DATE'], axis=1)

merged_final = pd.concat([merge_drop, merge_days, merge_months, merge_sensor],axis=1)

X = merged_final.drop(columns='hourly_counts')
y = merged_final.hourly_counts

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

#And then do the regression.
LR = LinearRegression()
LR.fit(X_train, y_train)
print("The R-squared score is: ", LR.score(X_test, y_test))

## Approach 3, Part 4: Creating the predictive models:

Ok, so now we have some datasets and we have had a look at their individual impacts - before finding that they are stronger when combined together. We have a better understanding of how different events effect the number of pedestrians going past different sensors.

But are we using the best model? There are other alternatives such as Decision Tree regressors, Random Forest regressors, Support Vector Machine regressors and even Deep Learning regression models.

However, Support Vector Machine regressors can take a long time to run when the data has many dimensions, and Deep Learning models are also resource intensive. Below we will limit ourselves to adding in a Decision Tree regressor and a Random Forest regressor. Even these can take a long time to run, but the results will hopefully be worth it!

In [None]:
#Linear Regression
LR = LinearRegression(fit_intercept=False)
LR.fit(X_train, y_train)
print("The basic Linear Regression R-squared score: ", LR.score(X_test, y_test))

#Decision Tree Regressor
DT = DecisionTreeRegressor(max_depth = 75)
DT.fit(X_train, y_train)
print("The Decision Tree regressor's R-squared score: ", DT.score(X_test, y_test))

#Random Forest Regressor
RFR = RandomForestRegressor(n_estimators=150, max_depth=100, n_jobs= -1, max_features=100)
RFR.fit(X_train, y_train)
print("The Random Forest regressor's R-squared score: ', RFR.score(X_test, y_test))

## Approach 3, Part 5: Interacting with our predictive models:

The R-squared scores we have managed to create now are much better, with the Decision Tree being a huge jump over the basic Linear Regression, and the Random Forest being even better again.

So now we have these cool models, we need a really easy, intuitive way to investigate them. For that, we build an interactive interface using Plotly Dash.

In [None]:
app = dash.Dash(external_stylesheets=[dbc.themes.SOLAR])

fig = px.scatter_mapbox(merged_final, lat=merged_final.lat, lon=merged_final.lon
                        , zoom = 12.5
                       , size = merged_final.hourly_counts)
fig.update_layout(mapbox_style="carto-positron", mapbox_center_lon=144.96
                      , mapbox_center_lat = -37.81)
fig.update_layout(margin={"r":5,"t":5,"l":5,"b":5})

app.layout = html.Div(id='parent', children=[ #main Div
    html.Div(id='header', children=[ #header Div
        
        html.Div([ #calender selector Div
            html.P('Choose a date for analysis:'),
            dcc.DatePickerSingle(
                id = 'selector_date',
                month_format = 'MMMM Y',
                calendar_orientation = 'horizontal',
                placeholder = 'Select a date',
                date = date(2021, 6, 21),
                display_format = 'DD/MM/YYYY')
        ],
        style={'width': '15%', 'display': 'inline-block', 'verticalAlign': 'top', 'padding': '20px 20px 20px 20px'}),

        html.Div([ #hour slider Div
            html.P('Select the hour of the day:'),
            dcc.Slider(
                id='selector_hour',
                min=0,
                max=23,
                step=1,
                value=0,
                marks={0: 'midnight', 3: '3am', 6: '6am', 9: '9am', 12: 'midday',
                       15: '3pm', 18: '6pm', 21: '9pm'}
        )], style={'width': '85%', 'display': 'inline-block', 'textAlign':'left', 'verticalAlign': 'top', 'padding': '20px 20px 20px 20px'}),
    ]), #end of 'header' Div

    html.Div([ #map and various selectors Div
        
            html.Div([ #various selectors Div

                html.Hr(),
                html.P('Temperature:'),
                dcc.Slider(
                    id='selector_temp',
                    min=0,
                    max=50,
                    value=25,
                    marks = {0: '0C', 10: '10C', 20: '20C', 30: '30C', 40: '40C', 50: '50C'}
                    ),
                html.P('Humidity:'),
                 dcc.Slider(
                    id='selector_humid',
                    min=0,
                    max=100,
                    value=0,
                    marks = {0: '0%', 25: '25%', 50: '50%', 75: '75%', 100: '100%'}
                    ),
                html.P('Wind speed:'),
                 dcc.Slider(
                    id='selector_wind',
                    min=0,
                    max=100,
                    value=0,
                    marks = {0: 'Calm', 20: '20km/h', 40: '40km/h', 60: '60km/h', 80: '80km/h', 100: '100km/h'}
                    ),
                html.P('Air pressure:'),
                 dcc.Slider(
                    id='selector_pressure',
                    min=975,
                    max=1050,
                    value=975,
                    marks = {975: '975hPa', 1000: '1000hPa', 1025: '1025hPa', 1050: '1050hPa'}
                    ),
                html.P('Particulate concentration 2.5 microns:'),
                 dcc.Slider(
                    id='selector_part2p5',
                    min=0,
                    max=500,
                    value=0,
                    marks = {0: '0', 100: '100', 200: '200', 300: '300', 400: '400', 500: '500'}
                    ),
                html.P('Particulate concentration 10 microns:'),
                 dcc.Slider(
                    id='selector_part10',
                    min=0,
                    max=1000,
                    value=0,
                    marks = {0: '0', 250: '200', 500: '500', 750: '750', 1000: '1000'}
                    ),
                html.Hr(),
                html.P('Holiday type:'),
                dcc.RadioItems(id='selector_holiday', 
                   options=[
                       {'label': 'School holiday ', 'value': 'SCH'},
                       {'label': 'Public holiday ', 'value': 'PUB'},
                       {'label': 'Both ', 'value': 'BOTH'},
                       {'label': 'Neither ', 'value': 'NONE'}
                   ],
                   value='NONE'
                ),
                html.Hr(),

                html.P('Covid cases under investigation in the previous 7 days:'),
                dcc.Input(id='selector_covid', type='number', min=0, max=10000, step=100, value=0),

            ],
            style={'height': '49%', 'width': '49%', 'display': 'inline-block', 'padding': '20px 20px 20px 20px'}), #various selectors Div

            html.Div(id = 'right_panel', children=[ #format and place the right side panel

            html.Div(id='map', children=[ #map div
                    dcc.Graph(id = 'world_map', figure = fig)
            ],
                    style={'textAlign':'center', 'verticalAlign': 'top', 'display': 'inline-block', 'padding': '0px 20px 20px 50px'}), #map selector Div
            html.Br(),
            html.Hr(),
            html.P('Select which predictor model to use:'),
            html.Div(id='predictor', children=[ #prediction model div
                    dcc.Dropdown(id = 'selector_model', #pick the predictive model to use
                        options = [
                            {'label':'Linear Regression', 'value':'LR' },
                            {'label': 'Decision Tree Regression', 'value':'DT'},
                            {'label': 'Random Forest Regression', 'value':'RFR'},
                        ],
                        value = 'LR'),                        
            ],
                    style={'textAlign':'center', 'width': '35%', 'display': 'inline-block'}), 

        ], style={'textAlign':'center', 'width': '49%', 'display': 'inline-block'})
    ]) #end of 'map and various selectors' Div
]) #end of 'parent' Div

@app.callback(Output(component_id='world_map', component_property='figure'),
            Input(component_id='selector_temp', component_property='value'),
            Input(component_id='selector_humid', component_property='value'),
            Input(component_id='selector_wind', component_property='value'),
            Input(component_id='selector_part2p5', component_property='value'),
            Input(component_id='selector_part10', component_property='value'),
            Input(component_id='selector_pressure', component_property='value'),
            Input(component_id='selector_holiday', component_property='value'),
            Input(component_id='selector_covid', component_property='value'),
            Input(component_id='selector_date', component_property='date'),
            Input(component_id='selector_hour', component_property='value'),
            Input(component_id='selector_model', component_property='value'))

def selectors(temp, humid, wind, part_2p5, part_10, pressure, holiday, covid, indate, time, model):

    date_object = date.fromisoformat(indate)
    year = date_object.year
    mon = date_object.month
    mdate = date_object.day

    scenario = merged_final[(merged_final.year == year) & (merged_final.mdate == mdate)
                    & (merged_final.mon == mon) & (merged_final.time == time)]

    scenario.school_hol = 0
    scenario.pub_hol = 0
    
    if holiday == 'BOTH':
        scenario.school_hol = 1
        scenario.pub_hol = 1
    if holiday == 'SCH':
        scenario.school_hol = 1
    if holiday == 'PUB':
        scenario.pub_hol = 1
    
    scenario.temp = temp
    scenario.humidity = humid
    scenario.wind = wind
    scenario.time = time
    scenario.pressure = pressure
    scenario.part_2p5 = part_2p5
    scenario.part_10 = part_10
    scenario.VIC_CASES_UNDER_INVESTIGATION_LAST_7D = covid
   
    if model == 'LR':
        guess = LR.predict(scenario.drop(columns='hourly_counts'))
    if model == 'DT':
        guess = DT.predict(scenario.drop(columns='hourly_counts'))
    if model == 'RFR':
        guess = RFR.predict(scenario.drop(columns='hourly_counts'))
    
    scenario.guess = guess.astype(int)

    fig = px.scatter_mapbox(scenario, lat=scenario.lat, size = scenario.guess, lon=scenario.lon)
    fig.update_layout(mapbox_style="carto-positron", uirevision='scenario')
    fig.update_layout(margin={"r":5,"t":5,"l":5,"b":5})

    return fig
    
if __name__ == '__main__':
    app.run_server()

## Approach 3, Conclusion:
So there we have it. An awesome, fun way to interact with our predictive models!

Of course, these models and their predictions can always be improved. You can find new datasets to add, or you can look to make the models better through feature selection, feature reduction or feature engineering. You can also find different ways to for users to interact with the data. The possibilities are endless.

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:black; background-color: #EEEEEE">&emsp;<b>Congratulations. You now know everything about the pedestrian data!</b>

Well, maybe not everything... of course, these explorations and approaches are just scratching the surface.

If you would like to extend this analysis further, please visit the __[City of Melbourne Open Data Site](https://data.melbourne.vic.gov.au/)__ and explore some of the other valuable datasets.

Trying to model human behaviour (such as pedestrian activity) is a difficult task. There are many variables at play. While we have presented some ideas here that we think are both interesting and useful, more work would need to be done to dig even deeper into this data, and to make more accurate models.

Hopefully we have provided you with a good head start!