<div class="usecase-title">Entertainment Location Projections</div>

<div class="usecase-authors"><b>Authored by: </b>Barkha Javed, Jack Pham, Te' Claire (editor/ API repointing) 2023</div>

<div class="usecase-duration"><b>Duration:</b> 75 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python</div>
</div>

 <div class="usecase-section-header">Scenario</div>

**As a City of Melbourne council worker, I want to visualise and provide statistics on upcoming activities and planned works in entertainment and leisure, so that I can understand impact for my local area.**

I also want to know which entertainment locations are projected as growth areas.

<div class="usecase-section-header">What this Use Case will teach you</div>

At the end of this use case you will understand what entertainment and leisure venues are in a small area, and if the locations is projected as a growth area.

This means learning how to:

* Load and examine data on seating capacity of cafes, restaurants and pubs
* Load and examine data on cafe, bistro, restaurant seats
* Load and examine data for city activities and planned works
* Load and examine pedestrian traffic to see current volumes for entertainment locations
* Visualise information from the datasets
* Review growth projections about entertainment locations



<div class="usecase-section-header">A brief introduction to the datasets used</div>

#### Dataset 1. Census of Land Use and Employment (CLUE)
The City of Melbourne  (COM) conducts a census of all local businesses every two years. The last published survey was in 2020, the next survey results are expected soon.

#####Reference information: https://data.melbourne.vic.gov.au/pages/clue/
* CLUE Blocks spatial layer
##### Link: https://data.melbourne.vic.gov.au/explore/dataset/blocks-for-census-of-land-use-and-employment-clue/information/?location=12,-37.81306,144.94413&basemap=jawg.light
##### API: /api/explore/v2.1/catalog/datasets/blocks-for-census-of-land-use-and-employment-clue/records?limit=20
##### **Dataset Identifier:** blocks-for-census-of-land-use-and-employment-clue
<br>

* Bar, tavern, pub patron capacity
##### Link: https://data.melbourne.vic.gov.au/explore/dataset/bars-and-pubs-with-patron-capacity/table/?refine.census_year=2022
##### API: /api/explore/v2.1/catalog/datasets/bars-and-pubs-with-patron-capacity/records?limit=20&refine=census_year%3A%222022%22
##### **Dataset Identifier:** bars-and-pubs-with-patron-capacity
<br>

* Cafe, restaurant, bistro seats
##### Link: https://data.melbourne.vic.gov.au/explore/dataset/cafes-and-restaurants-with-seating-capacity/table/?refine.census_year=2022
##### API: /api/explore/v2.1/catalog/datasets/cafes-and-restaurants-with-seating-capacity/records?limit=20&refine=census_year%3A%222022%22
##### **Dataset Identifier:** cafes-and-restaurants-with-seating-capacity
<br>


#### Dataset 2. City Activities and Planned Works
* Geospatial events data, includes types such as traffic management, sport and recreation, reserved parking, public and  private events
##### Link: https://data.melbourne.vic.gov.au/explore/dataset/city-activities-and-planned-works/information/?disjunctive.classification&disjunctive.small_area
##### API: /api/explore/v2.1/catalog/datasets/city-activities-and-planned-works/records?limit=20
##### **Dataset Identifier:** city-activities-and-planned-works
<br>

#### *Optional* Other datasets of interest
* Hourly pedestrian counts from sensors located across the city
##### Link: https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/information/
##### API:
/api/explore/v2.1/catalog/datasets/pedestrian-counting-system-monthly-counts-per-hour/records?limit=20
##### **Dataset Identifier:** pedestrian-counting-system-monthly-counts-per-hour
<br>

* COM population forecasts by small area for 2020-2040
##### Link: https://data.melbourne.vic.gov.au/explore/dataset/city-of-melbourne-population-forecasts-by-small-area-2020-2040/information/
##### API: /api/explore/v2.1/catalog/datasets/city-of-melbourne-population-forecasts-by-small-area-2020-2040/records?limit=20
##### **Dataset Identifier:** city-of-melbourne-population-forecasts-by-small-area-2020-2040
<br>

#####*Edited 10/12/23 - TeClaire*




<div class="usecase-section-header">Setup</div>

In [1]:
#Libraries to be installed
##!pip -q is to give less output
!pip -q install seaborn
!pip -q install pandas
!pip -q install matplotlib
!pip -q install numpy
!pip -q install nbconvert
!pip -q install keyboard
!pip -q install geopandas
!pip -q install requests
!pip -q install folium
!pip -q install statsmodels
!pip -q install tqdm
!pip -q install scikit-learn
!pip -q install pendulum


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.1/58.1 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.2/81.2 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m490.0/490.0 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for pendulum (pyproject.toml) ... [?25l[?25hdone


In [2]:
#load libraries
import os
import io
import time
import keyboard
import warnings
warnings.filterwarnings('ignore')
from datetime import datetime
import requests
import zipfile

import numpy as np
import pandas as pd

from urllib.request import urlopen
import json
from pandas import json_normalize


import statsmodels.api as sm
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor


import folium
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster
from IPython.core.display import display, HTML
import geopandas as gpd


import plotly.graph_objects as go
import plotly.express as px
from shapely.geometry import Polygon, shape, Point, box
from shapely.ops import unary_union
from shapely.wkt import loads

from tqdm import tqdm
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import style
style.use('ggplot')

from pylab import rcParams
rcParams['figure.figsize'] = 8,4


In [4]:
# Added 10/12/23 - TeClaire
import warnings
warnings.filterwarnings("ignore")

In [3]:
#set default values
this_decade = (pd.Timestamp.today().year)-10
this_year = pd.Timestamp.today().year
y3 = (pd.Timestamp.today().year)-3
y2 = (pd.Timestamp.today().year)-2
y1 = (pd.Timestamp.today().year)-1


#Replacement for socrata
domain = 'https://data.melbourne.vic.gov.au/explore/dataset/'
baseurl = '/download/?format=json&timezone=Australia/Sydney&lang=en'
basegeourl='/download/?format=geojson&timezone=Australia/Sydney&lang=en'

In [5]:
# Added 10/12/23 - TeClaire
# Define the company colors format for matplotlib
dark_theme_colors = ['#08af64', '#14a38e', '#0f9295', '#056b8a', '#121212'] #Dark theme
light_theme_colors = ['#2af598', '#22e4ac', '#1bd7bb', '#14c9cb', '#0fbed8', '#08b3e5'] #Light theme

In [31]:
# Added 10/12/23 - TeClaire
def fetch_data(base_url, dataset, api_key, num_records=99, offset=0):
    all_records = []
    max_offset = 9900

    while True:
        if offset > max_offset:
            break

        filters = f'{dataset}/records?limit={num_records}&offset={offset}'
        url = f'{base_url}{filters}&api_key={api_key}'

        try:
            result = requests.get(url, timeout = 10)
            result.raise_for_status()
            records = result.json().get('results')
        except requests.exceptions.RequestException as e:
            raise Exception(f'API request failed: {e}')
        if records is None:
            break
        all_records.extend(records)
        if len(records) < num_records:
            break

        offset += num_records

    df = pd.DataFrame(all_records)
    return df

BASE_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# Remember to remove API keys before Publishing!
# API_KEY = 'Insert API Key here or upload to secret file'

<div class="usecase-section-header">Load and Transform Data</div>

## Load small area CLUE blocks

In [None]:
# Old code not in use as of 10/12/23 - TeClaire
# #spatial layer used to map CLUE datasets to CLUE blocks

# dsurl = 'blocks-for-census-of-land-use-and-employment-clue'
# GeoJSONURL = domain + dsurl + basegeourl
# #print(GeoJSONURL)

# clueblocks = requests.get(GeoJSONURL).json()
# #clueblocks["features"][0]

In [10]:
# Edited 10/12/23 - TeClaire
BASE_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'

dsurl = 'blocks-for-census-of-land-use-and-employment-clue'
clueblocks = fetch_data(BASE_URL, dsurl, API_KEY)
clueblocks.head()

Unnamed: 0,geo_point_2d,geo_shape,block_id,clue_area
0,"{'lon': 144.98671534083016, 'lat': -37.8168041...","{'type': 'Feature', 'geometry': {'coordinates'...",662,East Melbourne
1,"{'lon': 144.97026050566197, 'lat': -37.8177220...","{'type': 'Feature', 'geometry': {'coordinates'...",6,Melbourne (CBD)
2,"{'lon': 144.94124539321137, 'lat': -37.7875065...","{'type': 'Feature', 'geometry': {'coordinates'...",926,Parkville
3,"{'lon': 144.94029862679008, 'lat': -37.7785734...","{'type': 'Feature', 'geometry': {'coordinates'...",928,Parkville
4,"{'lon': 144.9448205623792, 'lat': -37.80785784...","{'type': 'Feature', 'geometry': {'coordinates'...",415,West Melbourne (Residential)


## Load Bar, tavern, pub patron capacity

In [None]:
# Old code not in use as of 10/12/23 - TeClaire
# #Load Bar, tavern, pub patron capacity dataset
# dsurl = 'bars-and-pubs-with-patron-capacity'
# url = domain + dsurl + baseurl
# #print(url)

# data_json = requests.get(url).json()
# data_json_df = pd.DataFrame.from_dict(data_json)

# #this flattens the features
# df_btp_capacity=json_normalize(data_json_df['fields'])

# df_btp_capacity.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4402 entries, 0 to 4401
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   longitude          4382 non-null   float64
 1   census_year        4402 non-null   object 
 2   building_address   4402 non-null   object 
 3   trading_name       4402 non-null   object 
 4   location           4382 non-null   object 
 5   property_id        4402 non-null   int64  
 6   business_address   4402 non-null   object 
 7   latitude           4382 non-null   float64
 8   clue_small_area    4402 non-null   object 
 9   block_id           4402 non-null   int64  
 10  number_of_patrons  4402 non-null   int64  
 11  base_property_id   4402 non-null   int64  
dtypes: float64(2), int64(4), object(6)
memory usage: 412.8+ KB


In [23]:
# # Edited 10/12/23 - TeClaire - API ISSUE
# BASE_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
# API_KEY = 'api/explore/v2.1/catalog/datasets/bars-and-pubs-with-patron-capacity/records?limit=20'
# dsurl = 'bars-and-pubs-with-patron-capacity'
# data_json = fetch_data(BASE_URL, dsurl, API_KEY)
# data_json.head()
# print('―' * 10)

# # Directly create DataFrame from the fetched data
# df_btp_capacity = pd.DataFrame(data_json.get('results'))
# df_btp_capacity.info()

――――――――――
<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Empty DataFrame


### API 2.1 is loading an empty dataframe
####Loading CSV or JSON link as temporary fix 10/12/23 - TeClaire

In [None]:
# Edited 10/12/23 - TeClaire
# Links to current datasets
# CSV = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/bars-and-pubs-with-patron-capacity/exports/csv?lang=en&timezone=Australia%2FSydney&use_labels=true&delimiter=%2C'
# JSON = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/bars-and-pubs-with-patron-capacity/exports/json?lang=en&timezone=Australia%2FSydney'

In [29]:
# Edited 10/12/23 - TeClaire
import pandas as pd
def fetch_data_json(url, data_format='json'):
    try:
        response = requests.get(url)
        response.raise_for_status()

        if data_format == 'json':
            data = response.json()
            if isinstance(data, list):
                df = pd.DataFrame(data)  # Convert to DataFrame
            else:
                raise ValueError("Unsupported JSON structure. Expected a list.")
        elif data_format == 'csv':
            df = pd.read_csv(pd.compat.StringIO(response.text))
        else:
            raise ValueError("Unsupported data format. Use 'json' or 'csv'.")

        return df
    except requests.exceptions.RequestException as e:
        raise Exception(f'Failed to fetch data: {e}')

# JSON link
json_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/bars-and-pubs-with-patron-capacity/exports/json?lang=en&timezone=Australia%2FSydney'
df_btp_capacity = fetch_data_json(json_url, data_format='json')
df_btp_capacity.info()
print('―' * 10)
df_btp_capacity.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4696 entries, 0 to 4695
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   census_year        4696 non-null   object 
 1   block_id           4696 non-null   int64  
 2   property_id        4696 non-null   object 
 3   base_property_id   4696 non-null   object 
 4   building_address   4696 non-null   object 
 5   clue_small_area    4696 non-null   object 
 6   trading_name       4696 non-null   object 
 7   business_address   4696 non-null   object 
 8   number_of_patrons  4696 non-null   int64  
 9   longitude          4676 non-null   float64
 10  latitude           4676 non-null   float64
 11  location           4676 non-null   object 
dtypes: float64(2), int64(2), object(8)
memory usage: 440.4+ KB
――――――――――


Unnamed: 0,census_year,block_id,property_id,base_property_id,building_address,clue_small_area,trading_name,business_address,number_of_patrons,longitude,latitude,location
0,2002,11,108972,108972,10-22 Spencer Street MELBOURNE 3000,Melbourne (CBD),Explorers Inn,10-22 Spencer Street MELBOURNE 3000,50,144.955254,-37.820511,"{'lon': 144.95525416628004, 'lat': -37.8205106..."
1,2002,14,103172,103172,31-39 Elizabeth Street MELBOURNE 3000,Melbourne (CBD),Connells Tavern,35 Elizabeth Street MELBOURNE 3000,350,144.964322,-37.817426,"{'lon': 144.964321660097, 'lat': -37.817426106..."
2,2002,15,103944,103944,277-279 Flinders Lane MELBOURNE 3000,Melbourne (CBD),De Biers,"Unit 1, Basement , 277 Flinders Lane MELBOURNE...",400,144.965307,-37.817242,"{'lon': 144.96530699086, 'lat': -37.8172419402..."
3,2002,16,103938,103938,187 Flinders Lane MELBOURNE 3000,Melbourne (CBD),Adelphi Hotel,187 Flinders Lane MELBOURNE 3000,80,144.968385,-37.81636,"{'lon': 144.9683846004515, 'lat': -37.81635974..."
4,2002,17,103925,103925,121-123 Flinders Lane MELBOURNE 3000,Melbourne (CBD),Velour,"Unit 1, Gnd & Bmt , 121 Flinders Lane MELBOURN...",350,144.970523,-37.815674,"{'lon': 144.97052296371248, 'lat': -37.8156736..."


In [27]:
#transform
integer_columns = ['census_year', 'block_id', 'property_id', 'base_property_id', 'number_of_patrons']
str_columns = ['building_address', 'clue_small_area', 'trading_name']
float_columns = ['longitude', 'latitude']
df_btp_capacity[integer_columns] = df_btp_capacity[integer_columns].astype(int)
df_btp_capacity[float_columns] = df_btp_capacity[float_columns].astype(float)
df_btp_capacity[str_columns] = df_btp_capacity[str_columns].astype(str)

#Add column with description Pubs, Taverns and Bars for grouping
df_btp_capacity['category'] = 'Pubs, Taverns and Bars'

#limit data to past decade
df_btp_capacity=df_btp_capacity.query("census_year >= @this_decade")
df_btp_capacity.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2690 entries, 629 to 4695
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   census_year        2690 non-null   int64  
 1   block_id           2690 non-null   int64  
 2   property_id        2690 non-null   int64  
 3   base_property_id   2690 non-null   int64  
 4   building_address   2690 non-null   object 
 5   clue_small_area    2690 non-null   object 
 6   trading_name       2690 non-null   object 
 7   business_address   2690 non-null   object 
 8   number_of_patrons  2690 non-null   int64  
 9   longitude          2676 non-null   float64
 10  latitude           2676 non-null   float64
 11  location           2676 non-null   object 
 12  category           2690 non-null   object 
dtypes: float64(2), int64(5), object(6)
memory usage: 294.2+ KB


## Load Cafe, restaurant, bistro seats

In [None]:
# # Old code not in use as of 10/12/23 - TeClaire
# #Load Cafe, restaurant, bistro seats dataset
# dsurl = 'cafes-and-restaurants-with-seating-capacity'
# url = domain + dsurl + baseurl
# #print(url)

# data_json = requests.get(url).json()
# data_json_df = pd.DataFrame.from_dict(data_json)

# #this flattens the features
# df_crb=json_normalize(data_json_df['fields'])

# df_crb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56987 entries, 0 to 56986
Data columns (total 15 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   location                      56460 non-null  object 
 1   seating_type                  56987 non-null  object 
 2   census_year                   56987 non-null  object 
 3   property_id                   56987 non-null  int64  
 4   base_property_id              56987 non-null  int64  
 5   trading_name                  56987 non-null  object 
 6   block_id                      56987 non-null  int64  
 7   industry_anzsic4_description  56987 non-null  object 
 8   number_of_seats               56987 non-null  int64  
 9   building_address              56987 non-null  object 
 10  clue_small_area               56987 non-null  object 
 11  business_address              56987 non-null  object 
 12  industry_anzsic4_code         56987 non-null  int64  
 13  l

In [32]:
# # Edited 10/12/23 - TeClaire - API ISSUE
# dsurl = 'cafes-and-restaurants-with-seating-capacity'
# data_json = fetch_data(BASE_URL, dsurl, API_KEY)
# data_json.head()
# print('―' * 10)

# # Directly create DataFrame from the fetched data
# df_crb = pd.DataFrame(data_json.get('results'))
# df_crb.info()

――――――――――
<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Empty DataFrame


### API 2.1 is loading an empty dataframe
####Loading CSV or JSON link as temporary fix 10/12/23 - TeClaire

In [33]:
# Edited 10/12/23 - TeClaire - JSON link
json_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/cafes-and-restaurants-with-seating-capacity/exports/json?lang=en&timezone=Australia%2FSydney'
df_crb = fetch_data_json(json_url, data_format='json')
df_crb.info()
print('―' * 10)
df_crb.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 60055 entries, 0 to 60054
Data columns (total 15 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   census_year                   60055 non-null  object 
 1   block_id                      60055 non-null  int64  
 2   property_id                   60055 non-null  object 
 3   base_property_id              60055 non-null  object 
 4   building_address              60055 non-null  object 
 5   clue_small_area               60055 non-null  object 
 6   trading_name                  60055 non-null  object 
 7   business_address              60055 non-null  object 
 8   industry_anzsic4_code         60055 non-null  object 
 9   industry_anzsic4_description  60055 non-null  object 
 10  seating_type                  60055 non-null  object 
 11  number_of_seats               60055 non-null  int64  
 12  longitude                     59528 non-null  float64
 13  l

Unnamed: 0,census_year,block_id,property_id,base_property_id,building_address,clue_small_area,trading_name,business_address,industry_anzsic4_code,industry_anzsic4_description,seating_type,number_of_seats,longitude,latitude,location
0,2017,6,578324,573333,2 Swanston Street MELBOURNE 3000,Melbourne (CBD),Transport Hotel,"Tenancy 29, Ground , 2 Swanston Street MELBOUR...",4520,"Pubs, Taverns and Bars",Seats - Indoor,230,144.969942,-37.817778,"{'lon': 144.96994164279243, 'lat': -37.8177778..."
1,2017,6,578324,573333,2 Swanston Street MELBOURNE 3000,Melbourne (CBD),Transport Hotel,"Tenancy 29, Ground , 2 Swanston Street MELBOUR...",4520,"Pubs, Taverns and Bars",Seats - Outdoor,120,144.969942,-37.817778,"{'lon': 144.96994164279243, 'lat': -37.8177778..."
2,2017,11,103957,103957,517-537 Flinders Lane MELBOURNE 3000,Melbourne (CBD),Altius Coffee Brewers,"Shop , Ground , 517 Flinders Lane MELBOURNE 3000",4512,Takeaway Food Services,Seats - Outdoor,4,144.956486,-37.819875,"{'lon': 144.95648638781466, 'lat': -37.8198754..."
3,2017,11,103957,103957,517-537 Flinders Lane MELBOURNE 3000,Melbourne (CBD),Five & Dime Bagel,16 Flinders Lane MELBOURNE 3000,1174,Bakery Product Manufacturing (Non-factory based),Seats - Indoor,14,144.956486,-37.819875,"{'lon': 144.95648638781466, 'lat': -37.8198754..."
4,2017,11,103985,103985,562-564 Flinders Street MELBOURNE 3000,Melbourne (CBD),YHA Melbourne Central,562-564 Flinders Street MELBOURNE 3000,4400,Accommodation,Seats - Indoor,43,144.955635,-37.820595,"{'lon': 144.9556348088, 'lat': -37.82059511593..."


In [34]:
#transform
integer_columns = ['census_year', 'block_id', 'property_id', 'base_property_id', 'number_of_seats']
str_columns = ['clue_small_area', 'trading_name','industry_anzsic4_description','seating_type']
df_crb[integer_columns] = df_crb[integer_columns].astype(int)
df_crb[str_columns] = df_crb[str_columns].astype(str)

#Add column with description for grouping
df_crb['category'] = 'Café, Restaurant, Bistro'

#drop NaN values
df_crb.dropna(subset=['business_address'])
df_crb.dropna(subset=['longitude'])
df_crb.dropna(subset=['latitude'])

#latest decade
df_crb = df_crb.query("census_year >= 2012")

print(df_crb.shape)
df_crb.head(5).T

#limit data to past decade
df_crb=df_crb.query("census_year >= @this_decade")
df_crb.info()

(36713, 16)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 33654 entries, 0 to 60054
Data columns (total 16 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   census_year                   33654 non-null  int64  
 1   block_id                      33654 non-null  int64  
 2   property_id                   33654 non-null  int64  
 3   base_property_id              33654 non-null  int64  
 4   building_address              33654 non-null  object 
 5   clue_small_area               33654 non-null  object 
 6   trading_name                  33654 non-null  object 
 7   business_address              33654 non-null  object 
 8   industry_anzsic4_code         33654 non-null  object 
 9   industry_anzsic4_description  33654 non-null  object 
 10  seating_type                  33654 non-null  object 
 11  number_of_seats               33654 non-null  int64  
 12  longitude                     33204 non-null  fl

### Merge CLUE venue seats, capacity and activities datasets

In [35]:
#Merge CLUE block data
clue_venues = df_crb.append(df_btp_capacity)

#combine data from venue datasets to use seats or patrons values as a capacity measure
#similar combination for category created
clue_venues['capacity'] = clue_venues[['number_of_seats', 'number_of_patrons']].bfill(axis=1).iloc[:, 0]
clue_venues['venue_description'] = clue_venues[['category', 'industry_anzsic4_description']].bfill(axis=1).iloc[:, 0]

#rename columns
clue_venues.rename(columns={
      "latitude":"lat"
    , "longitude":"lon"
},inplace = True)


#fill remaining nulls
clue_venues.fillna(0, inplace=True)

#clue_venues.head(3).T

## Load City Activities and Planned Works



In [None]:
# # Old Code as of 10/12/23 - TeClaire
# # spatial layer used to map city activity planned works
# dsurl = 'city-activities-and-planned-works'
# GeoJSONURL = domain + dsurl + basegeourl
# #print(GeoJSONURL)

# capw = requests.get(GeoJSONURL).json()
# capw["features"][0]


{'type': 'Feature',
 'geometry': {'coordinates': [[[144.9685797842, -37.8134248213],
    [144.9688373541, -37.8133508046],
    [144.9687076377, -37.8130671474],
    [144.9684498427, -37.813142364],
    [144.9685797842, -37.8134248213]]],
  'type': 'Polygon'},
 'properties': {'location': '124-134 Russell Street\rMELBOURNE VIC 3000',
  'status': 'CONFIRMED',
  'notes': 'Hoarding',
  'end_date': '2022-08-08',
  'geo_point_2d': [-37.81324619965848, 144.9686437019243],
  'activity_id': 'SS-1094323-0-108589-EHD-Permit Issued-231120210700-080820221900',
  'classification': 'Structures',
  'geometry': 'MULTIPOLYGON (((144.96857978421863 -37.813424821287875,144.9688373540564 -37.81335080458208,144.9687076377078 -37.81306714743958,144.9684498427469 -37.813142363962264,144.96857978421863 -37.813424821287875)))',
  'start_date': '2021-11-23',
  'source_id': 'EHD-2021-93',
  'small_area': 'Melbourne (CBD)'}}

In [38]:
# Edited 10/12/23 - TeClaire
BASE_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'

dsurl = 'city-activities-and-planned-works'
df_capw = fetch_data(BASE_URL, dsurl, API_KEY)
print('―' * 10)
df_capw.info()

――――――――――
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 605 entries, 0 to 604
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   activity_id     605 non-null    object
 1   classification  605 non-null    object
 2   end_date        605 non-null    object
 3   location        603 non-null    object
 4   notes           500 non-null    object
 5   source_id       605 non-null    object
 6   start_date      605 non-null    object
 7   status          605 non-null    object
 8   geometry        605 non-null    object
 9   small_area      605 non-null    object
 10  json_geometry   0 non-null      object
 11  geo_point_2d    605 non-null    object
dtypes: object(12)
memory usage: 56.8+ KB


In [None]:
# # Old Code as of 10/12/23 - TeClaire
# # spatial layer used to map city activity planned works
# dsurl = 'city-activities-and-planned-works'
# url = domain + dsurl + baseurl
# #print(url)

# data_json = requests.get(url).json()
# data_json_df = pd.DataFrame.from_dict(data_json)

# #this flattens the features
# df_capw=json_normalize(data_json_df['fields'])
# df_capw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 605 entries, 0 to 604
Data columns (total 13 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   location                   603 non-null    object
 1   status                     605 non-null    object
 2   notes                      500 non-null    object
 3   end_date                   605 non-null    object
 4   geo_point_2d               605 non-null    object
 5   activity_id                605 non-null    object
 6   classification             605 non-null    object
 7   geometry                   605 non-null    object
 8   start_date                 605 non-null    object
 9   source_id                  605 non-null    object
 10  small_area                 605 non-null    object
 11  json_geometry.coordinates  605 non-null    object
 12  json_geometry.type         605 non-null    object
dtypes: object(13)
memory usage: 61.6+ KB


In [40]:
#look at events that are for entertainment
print(df_capw.classification.unique())
df_capw.dropna(subset=['geometry'])

#Convert to date, add columns
df_capw['start_dt'] = pd.to_datetime(df_capw.start_date).dt.date
df_capw['start_year'] = pd.to_datetime(df_capw.start_dt).dt.year
df_capw['start_month'] = pd.to_datetime(df_capw.start_dt).dt.month
#Add column with description for grouping
df_crb['seating_type'] = 'Not Provided'

#drop columns - no column exists 2023
# df_capw = df_capw.drop(['json_geometry.type'], axis=1)

#filter found there are records with value 2921-11-19 00:00:00, exclude these
df_capw = df_capw.loc[(df_capw['end_date'] < '2065-01-01')]
df_capw['end_dt'] = pd.to_datetime(df_capw.end_date).dt.date
df_capw['end_year'] = pd.to_datetime(df_capw.end_dt).dt.year
df_capw['end_month'] = pd.to_datetime(df_capw.end_dt).dt.month
df_capw_all = df_capw.copy()
df_capw = df_capw[(df_capw.classification.isin(['Event','Public Event','Private Event']))]

['Structures' 'Traffic Management' 'Event' 'Reserved Parking'
 'Public Event' 'Private Event']


In [41]:
#Range of years
print(df_capw.start_year.unique())
print(df_capw.classification.unique())

[2018 2022 2023 2019 2020 2021]
['Event' 'Public Event' 'Private Event']


In [42]:
#Merge CLUE block data
clue_venues_capw=clue_venues.append(df_capw)

#combine values across datasets for year, small area and description
clue_venues_capw['year'] = clue_venues_capw[['census_year', 'start_year']].bfill(axis=1).iloc[:, 0]
clue_venues_capw['year'] = clue_venues_capw['year'].astype(int)
clue_venues_capw['small_area_tag'] = clue_venues_capw[['clue_small_area', 'small_area']].bfill(axis=1).iloc[:, 0]
clue_venues_capw['description_tag'] = clue_venues_capw[['industry_anzsic4_description', 'classification']].bfill(axis=1).iloc[:, 0]
clue_venues_capw['category_tag'] = clue_venues_capw[['category', 'classification']].bfill(axis=1).iloc[:, 0]

#fill remaining nulls
clue_venues_capw.fillna(0, inplace=True)

In [43]:
clue_venues_capw.category_tag.unique()

array(['Café, Restaurant, Bistro', 0, 'Event', 'Public Event',
       'Private Event'], dtype=object)

In [44]:
#create data frames per year for some visuals
#the latest data is for the past year
clue_venues_y3=clue_venues_capw.query("year == @y3")
clue_venues_y2=clue_venues_capw.query("year == @y2")
clue_venues_y1=clue_venues_capw.query("year >= @y1") #latest year, see setup for detail
#clue_venues_capw.tail(3).T

## Other datasets of interest

The population forecast for the city for the next five years indicates demand due to the increase in population. The pedestrian traffic numbers will be used as an indicator of people potentially using the entertainment venues.

The pedestrian traffic will show us what areas people are visiting and during what part of the day. This can be used to evaluate if the entertainment venue capacity is low, sufficient, or high.


### Load small area population forecasts

In [72]:
dsurl = 'city-of-melbourne-population-forecasts-by-small-area-2020-2040'
url = domain + dsurl + baseurl
#print(url)

data_json = requests.get(url).json()
data_json_df = pd.DataFrame.from_dict(data_json)

#this flattens the features
ds=json_normalize(data_json_df['fields'])

ds['year']=ds['year'].astype(int)

#limit to next 5 years
fy = pd.to_numeric((this_year + 5))
ds = ds.query("year <= @fy")

#Look at total value for city
ds_smapop_tot = ds.query("age =='Total population' & geography=='City of Melbourne'")
ds_smapop_tot.drop(columns=['gender','age'], inplace=True)
ds_pop = ds_smapop_tot.reset_index(drop=True).sort_values(by=['year'], ascending=True)
ds.info()

#geography
ds_smapop_all = ds.query("age =='Total population' & geography!='City of Melbourne'")
ds_smapop_all.drop(columns=['gender','age'], inplace=True)
ds_pop_all = ds_smapop_all.reset_index(drop=True).sort_values(by=['year'], ascending=True)



<class 'pandas.core.frame.DataFrame'>
Int64Index: 6496 entries, 0 to 17046
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   geography  6496 non-null   object 
 1   year       6496 non-null   int64  
 2   age        6496 non-null   object 
 3   value      6472 non-null   float64
 4   gender     6496 non-null   object 
dtypes: float64(1), int64(1), object(3)
memory usage: 304.5+ KB


In [46]:
#plot population forecast, next 5 years
fig = px.line(ds_pop_all, x="year", y="value", title='Population Forecast - City of Melbourne', color='geography')
fig.show()


### Load pedestrian sensor locations

In [73]:
# # Old Code as of 10/12/23
# #Pedestrian sensor location data
# dsurl = 'pedestrian-counting-system-sensor-locations'
# url = domain + dsurl + baseurl
# #print(url)

# data_json = requests.get(url).json()
# data_json_df = pd.DataFrame.from_dict(data_json)

# #this flattens the features
# sensor_data=json_normalize(data_json_df['fields'])

In [74]:
sensor_data[['lat', 'lon']] = sensor_data[['latitude', 'longitude']].astype(float)
#sensor_data.head(5).T
sensor_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 138 entries, 0 to 137
Data columns (total 14 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   direction_2         96 non-null     object 
 1   installation_date   135 non-null    object 
 2   direction_1         96 non-null     object 
 3   location            138 non-null    object 
 4   location_id         138 non-null    int64  
 5   sensor_name         138 non-null    object 
 6   longitude           138 non-null    float64
 7   status              138 non-null    object 
 8   sensor_description  138 non-null    object 
 9   latitude            138 non-null    float64
 10  location_type       138 non-null    object 
 11  note                31 non-null     object 
 12  lat                 138 non-null    float64
 13  lon                 138 non-null    float64
dtypes: float64(4), int64(1), object(9)
memory usage: 15.2+ KB


### Load pedestrian traffic hourly counts data

In [51]:
# # Old Code as of 10/12/23 - TeClaire
# #URL / API method will need to be updated
# #Pedestrian foot count data zip file
# ds_url = "https://data.melbourne.vic.gov.au/api/datasets/1.0/pedestrian-counting-system-monthly-counts-per-hour/attachments/pedestrian_counting_system_monthly_counts_per_hour_may_2009_to_14_dec_2022_csv_zip/"
# filename = 'pedestrian_counting_system_monthly_counts_per_hour_may_2009_to_14_dec_2022.csv'

# r = requests.get(ds_url)
# z = zipfile.ZipFile(io.BytesIO(r.content))
# z.extractall()

# sensor_traffic = pd.read_csv(filename, sep=',')

In [84]:
# Edited 10/12/23 - TeClaire
BASE_URL = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'

dsurl = 'pedestrian-counting-system-monthly-counts-per-hour'
sensor_traffic = fetch_data(BASE_URL, dsurl, API_KEY)
print('―' * 10)
sensor_traffic.info()

――――――――――
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   sensor_name          9999 non-null   object
 1   timestamp            9999 non-null   object
 2   locationid           9999 non-null   object
 3   direction_1          9999 non-null   int64 
 4   direction_2          9999 non-null   int64 
 5   total_of_directions  9999 non-null   int64 
 6   location             9999 non-null   object
dtypes: int64(3), object(4)
memory usage: 546.9+ KB


In [85]:
# # Old Code as of 10/12/23 - TeClaire
# #sensor_traffic.info()

# #rename columns
# sensor_traffic.rename(columns={"Date_Time": "date_time","Year":"year"
#                         ,"Month":"month"
#                         ,"Mdate":"mdate"
#                         ,"Day":"day"
#                         ,"Time": "time"
#                         ,"Sensor_ID":"sensor_id"
#                         ,"Sensor_Name":"sensor_name"
#                         ,"Hourly_Counts":"hourly_counts"
#                         }
#                ,inplace = True)
# #sensor_traffic.head(5).T

## Added datetime due to attribute error
#### Reformated timestamp for old attributes
#### sensor_id no longer exists in dataset and has been removed from code
#####Edited 10/12/23 - TeClaire

In [86]:
# Edited 10/12/23 - TeClaire
# Add date column
sensor_traffic['date'] = pd.to_datetime(sensor_traffic['timestamp']).dt.date
sensor_traffic['month_num'] = pd.to_datetime(sensor_traffic['timestamp']).dt.month

# Add day of week column
# sensor_traffic['dow'] = pd.to_datetime(sensor_traffic['timestamp']).dt.day_of_week

# Add date column
sensor_traffic['date'] = pd.to_datetime(sensor_traffic['timestamp']).dt.date
sensor_traffic['month_num'] = pd.to_datetime(sensor_traffic['timestamp']).dt.month

# Add day of week column
# sensor_traffic['dow'] = pd.to_datetime(sensor_traffic['timestamp']).dt.day_of_week

In [87]:
# Edited 10/12/23 - TeClaire
# Column mdate not in dataset
# Check DataFrame
print(sensor_traffic.columns)

Index(['sensor_name', 'timestamp', 'locationid', 'direction_1', 'direction_2',
       'total_of_directions', 'location', 'date', 'month_num'],
      dtype='object')


In [88]:
# Edited 10/12/23 - TeClaire
print(sensor_traffic.head())

  sensor_name                  timestamp locationid  direction_1  direction_2  \
0     SwaCs_T  2023-03-31T08:00:00+00:00         65           20           22   
1     SwaCs_T  2023-03-31T09:00:00+00:00         65           41           95   
2     SwaCs_T  2023-03-31T10:00:00+00:00         65           78          197   
3     SwaCs_T  2023-03-31T12:00:00+00:00         65          245          555   
4     SwaCs_T  2023-03-31T13:00:00+00:00         65          367          590   

   total_of_directions                                   location        date  \
0                   42  {'lon': 144.9668064, 'lat': -37.81569416}  2023-03-31   
1                  136  {'lon': 144.9668064, 'lat': -37.81569416}  2023-03-31   
2                  275  {'lon': 144.9668064, 'lat': -37.81569416}  2023-03-31   
3                  800  {'lon': 144.9668064, 'lat': -37.81569416}  2023-03-31   
4                  957  {'lon': 144.9668064, 'lat': -37.81569416}  2023-03-31   

   month_num  
0          

In [89]:
# Edited 10/12/23 - TeClaire
# Change timestamp for correct format
# sensor_id no longer exists in dataset and has been removed from code

# Extract date and time timestamp
sensor_traffic['date'] = pd.to_datetime(sensor_traffic['timestamp']).dt.date
sensor_traffic['month_num'] = pd.to_datetime(sensor_traffic['timestamp']).dt.month
sensor_traffic['time'] = pd.to_datetime(sensor_traffic['timestamp']).dt.hour

# Add day column
sensor_traffic['dow'] = pd.to_datetime(sensor_traffic['timestamp']).dt.dayofweek

# Convert integer
sensor_traffic['mdate'] = pd.to_datetime(sensor_traffic['date']).dt.strftime('%Y%m%d').astype(int)
sensor_traffic['year'] = pd.to_datetime(sensor_traffic['date']).dt.year
sensor_traffic['hourly_counts'] = sensor_traffic['total_of_directions'].astype(int)  # Assuming 'total_of_directions' is the relevant column
sensor_traffic['sensor_id'] = sensor_traffic['locationid'].astype(int)  # Assuming 'locationid' is the relevant column

##################################################################################################
'''
# Editing 10/12/23 - Te' Claire
To Continue editing:
-sensor_id no longer exists in dataset and has been removed from code, however this is used as ‘common attribute’ for merging code for sensor_traffic and sensor_data.
-more time is needed to consider datasets and how to merge correctly.
'''
##################################################################################################

# # Combine pedestrian sensor location, foot traffic datasets
# sensor_traffic = pd.merge(sensor_traffic, sensor_data, on='sensor_id', how='inner')

##################################################################################################

# Filter
this_decade = 2023  # Replace with the desired decade
sensor_traffic = sensor_traffic.query("year >= @this_decade")

# Add column for day (5 am to 5 pm) or night (6 pm to 4 am) traffic
sensor_traffic['day_counts'] = np.where(((sensor_traffic['time'] > 4) & (sensor_traffic['time'] < 18)),
                                       sensor_traffic['hourly_counts'], 0).astype(int)
sensor_traffic['night_counts'] = np.where(sensor_traffic['day_counts'] == 0, sensor_traffic['hourly_counts'], 0).astype(int)
sensor_traffic['when'] = np.where((sensor_traffic['day_counts'] > 0), 'day', 'night')

# Drop columns
sensor_traffic.drop(['timestamp', 'locationid', 'direction_1', 'direction_2', 'total_of_directions'], axis=1, inplace=True)


In [90]:
# # Old Code as of 10/12/23 - TeClaire
# #Add date column
# sensor_traffic['date'] = pd.to_datetime(sensor_traffic.date_time).dt.date
# sensor_traffic['month_num'] = pd.to_datetime(sensor_traffic.date_time).dt.month

# #Add day of week column
# sensor_traffic['dow'] = pd.to_datetime(sensor_traffic.date_time).dt.day_of_week

# #convert fields to integer
# sensor_traffic['mdate']=sensor_traffic['mdate'].astype(int)
# sensor_traffic['time']=sensor_traffic['time'].astype(int)
# sensor_traffic['year']=sensor_traffic['year'].astype(int)
# sensor_traffic['hourly_counts']=sensor_traffic['hourly_counts'].astype(int)
# sensor_traffic['sensor_id']=sensor_traffic['sensor_id'].astype(int)

# # Mesh pedestrian sensor location and foot traffic datasets
# sensor_traffic = pd.merge(sensor_traffic, sensor_data, on='sensor_id', how='inner')

# #filter to this decade
# sensor_traffic=sensor_traffic.query("year >= @this_decade")

# #Add column for day (5am to 5pm) or night (6pm to 4am) traffic
# sensor_traffic['day_counts']   = np.where(((sensor_traffic['time']>4) & (sensor_traffic['time']<18)),
#                                           sensor_traffic['hourly_counts'] , 0).astype(int)
# sensor_traffic['night_counts'] = np.where(sensor_traffic['day_counts']==0,sensor_traffic['hourly_counts']
#                                           , 0).astype(int)
# sensor_traffic['when'] = np.where((sensor_traffic['day_counts']>0),'day', 'night')

In [91]:
sensor_traffic.rename(columns={"sensor_name_x": "sensor_name"}
               ,inplace = True)
sensor_traffic.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9999 entries, 0 to 9998
Data columns (total 13 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   sensor_name    9999 non-null   object
 1   location       9999 non-null   object
 2   date           9999 non-null   object
 3   month_num      9999 non-null   int64 
 4   time           9999 non-null   int64 
 5   dow            9999 non-null   int64 
 6   mdate          9999 non-null   int64 
 7   year           9999 non-null   int64 
 8   hourly_counts  9999 non-null   int64 
 9   sensor_id      9999 non-null   int64 
 10  day_counts     9999 non-null   int64 
 11  night_counts   9999 non-null   int64 
 12  when           9999 non-null   object
dtypes: int64(9), object(4)
memory usage: 1.1+ MB


In [92]:
sensor_traffic.year.unique()

array([2023])

In [93]:
#group by traffic for past decade 2012 to 2022
#average day_counts, night_counts, hourly counts per month, year,all areas

this_year = (pd.Timestamp.today().year)
sensor_ds=sensor_traffic.query("year >= @this_year")

#will use this to show traffic in entertainment locations this year
sensor_ds_yearll = sensor_traffic.groupby(['year','sensor_name','lat','lon','when'],as_index=False).agg(
    {'hourly_counts': 'mean','day_counts':'mean','night_counts':'mean'})

sensor_ds_year = sensor_traffic.groupby(['year','when'],as_index=False).agg(
    {'hourly_counts': 'mean','day_counts':'mean','night_counts':'mean'})

sensor_ds_ym = sensor_ds.groupby(['year','month_num','when'],as_index=False).agg(
    {'hourly_counts': 'mean','day_counts':'mean','night_counts':'mean'})

sensor_ds_ymd = sensor_ds.groupby(['year','month_num', 'dow','when'],as_index=False).agg(
    {'hourly_counts': 'mean','day_counts':'mean','night_counts':'mean'})

sensor_ds_hod = sensor_traffic.groupby(['time','when'],as_index=False).agg(
    {'hourly_counts': 'mean','day_counts':'mean','night_counts':'mean'})


KeyError: ignored

In [None]:
#Covid years by date
sensor_ds_cy = sensor_traffic.query("year >= 2020")
sensor_ds_dt = sensor_ds_cy.groupby(['date'],as_index=False).agg(
    {'hourly_counts': 'mean','day_counts':'mean','night_counts':'mean'})


In [None]:
#Traffic by date for past six months
sensor_ds_cy['date'] = pd.to_datetime(sensor_ds_cy['date'], format='%Y-%m-%d')

sensor_ds_cy = sensor_ds_cy.query("date >= '2022-06-01'")
sensor_ds_cy['year_month'] = pd.to_datetime(sensor_ds_cy['date']).dt.to_period('M')
sensor_ds_cy.head(5)

In [None]:
sensor_ds_dt_now = sensor_ds_cy.groupby(['date'],as_index=False).agg(
    {'hourly_counts': 'mean','day_counts':'mean','night_counts':'mean'})

sensor_ds_dt_month = sensor_ds_cy.query("date >= '2022-06-01'")
sensor_ds_dt_month.info()

### Combine population, capacity, planned events

In [None]:
#Need population total for City of Melbourne, without the decription
ds_pop = ds_pop.drop(['geography'], axis=1)

In [None]:
#Clue venues and events
ds_cve = clue_venues_capw.groupby(['year'],as_index=False).agg({'capacity': 'sum'})

#keep dataset with covid years for use
ds_cve_wc = ds_cve.copy()

print(ds_cve.info())
ds_cve.tail(10).T

In [None]:
ds_pop.rename(columns={"value": "population"}
               ,inplace = True)
print(ds_pop.year.unique())
ds_cve_pop = pd.merge(ds_pop, ds_cve,  on=['year'], how='outer').sort_values(by='year')

print(ds_cve_pop.tail(10).T)

In [None]:
ds_pt = sensor_traffic.groupby(['year'],as_index=False).agg(
    {'hourly_counts': 'mean','day_counts':'mean','night_counts':'mean'})

#keep dataset with covid years for use
ds_pt_wc = ds_pt.copy()

#merge datasets
ds_cve_pop_pt = pd.merge(ds_cve_pop, ds_pt, on=['year'], how='outer')

#reset 2020-2022 values for capacity and traffic
ds_cve_pop_pt.at[8,'capacity']=np.NaN
ds_cve_pop_pt.at[8,'hourly_counts']=np.NaN
ds_cve_pop_pt.at[8,'day_counts']=np.NaN
ds_cve_pop_pt.at[8,'night_counts']=np.NaN

ds_cve_pop_pt.at[9,'capacity']=np.NaN
ds_cve_pop_pt.at[9,'hourly_counts']=np.NaN
ds_cve_pop_pt.at[9,'day_counts']=np.NaN
ds_cve_pop_pt.at[9,'night_counts']=np.NaN

ds_cve_pop_pt.at[10,'capacity']=np.NaN
ds_cve_pop_pt.at[10,'hourly_counts']=np.NaN
ds_cve_pop_pt.at[10,'day_counts']=np.NaN
ds_cve_pop_pt.at[10,'night_counts']=np.NaN
ds_cve_pop_pt.at[11,'capacity']=np.NaN

#Sort data for use in projections, missing values will be imputed later
rs_cve_pop_pt  = ds_cve_pop_pt.sort_values(by=['year'], ascending=True)

#The combined dataset will be used for projections
print(ds_cve_pop_pt.head(20))


In [None]:
#keep copy with Covid years
rs_cve_pop_pt_wc = rs_cve_pop_pt.copy()

rs_wc = rs_cve_pop_pt_wc.query('year>=2020 and year<=2022')

<div class="usecase-section-header">Analysis and Statistics</div>

## Entertainment location venue seating and patron capacity

Map the number of seats or number of patrons from the CLUE survey responses from venues including bars, pubs and taverns and cafes, bistros and restaurants. The capacity measure is a total of the types of venues.

In [None]:
#group data for latest survey response
clue_venues_ds = clue_venues_y1.groupby(['census_year', 'clue_small_area','block_id'
                                         ,'lon','lat','category_tag','description_tag'],as_index=False).agg(
    {'number_of_patrons': 'sum','number_of_seats':'sum','capacity':'sum'})

clue_venues_ds = clue_venues_ds.sort_values(by=['capacity'], ascending=False)
clue_venues_ds.head(5).T

In [None]:
# Display the choropleth map
fig = px.choropleth_mapbox(

    clue_venues_ds, #dataset
    geojson=clueblocks, #CLUE Block spatial data

    locations='block_id',
    color='capacity',
    color_continuous_scale='sunset', #colour scale ylgn / sunset / geyser
    range_color=(0, df_btp_capacity['number_of_patrons'].max()), #range for the colour scale

    featureidkey="properties.block_id",
    mapbox_style="carto-positron", #map style
    zoom=11.75, #zoom level

    center = {"lat": -37.81216592937499, "lon": 144.961812290625}, # set the map centre coordinates on Melbourne
    opacity=0.7,

    hover_name='clue_small_area', #title of the pop up box
    hover_data={'census_year':True, 'block_id':True, 'number_of_patrons':True,
                'number_of_seats':True, 'capacity':True, 'description_tag': True,
                'lon':False, 'lat':False, 'category_tag':True
               }, #values to display in the popup box
    labels={'number_of_patrons':'Number of Patrons','block_id':'Block Id',
            'number_of_seats':'Number of Seats',
            'capacity':'Capacity','census_year':'Census Year',
            'category_tag':'Category','description_tag':'Description'
           },
    title='Venue Capacity and Planned Activity and Works', #Title for plot
    width=950, height=800 #dimensions of plot in pixels

 )

#show year 3
fig3 = px.scatter_mapbox(

    clue_venues_y3, lat="lat", lon="lon",
    opacity=0.8,
    hover_name='clue_small_area', # the title of the hover pop up box
    hover_data={'census_year':True,'block_id':True,'number_of_patrons':True,
                'number_of_seats':True, 'capacity' : True, 'description_tag': True,
                'lat':False,'lon':False}, #values to display in the popup box
    color_discrete_sequence=['blue'],
    labels={'capacity':'Capacity','block_id':'Block Id','description_tag':'Description',
            'census_year':'Census Year', 'number_of_patrons': 'Number of Patrons',
            'number_of_seats':'Number of Seats'
           }, #labels

)

#show year 2
fig2 = px.scatter_mapbox(

    clue_venues_y2, lat="lat", lon="lon",
    opacity=0.7,
    hover_name='clue_small_area', # the title of the hover pop up box
    hover_data={'census_year':True,'block_id':True,'number_of_patrons':True,
                'number_of_seats':True, 'capacity' : True, 'description_tag': True,
                'lat':False,'lon':False}, #values to display in the popup box
    color_discrete_sequence=['cyan'],
    labels={'capacity':'Capacity','block_id':'Block Id','description_tag':'Description',
            'census_year':'Census Year', 'number_of_patrons': 'Number of Patrons',
            'number_of_seats':'Number of Seats'
           }, #labels

)

#show year 1
fig1 = px.scatter_mapbox(

    clue_venues_y1, lat="lat", lon="lon",
    opacity=0.75,
    hover_name='clue_small_area', #title of the pop up box
    hover_data={'census_year':True,'block_id':True,'number_of_patrons':True,
                'number_of_seats':True, 'capacity' : True, 'description_tag': True,
                'lat':False,'lon':False}, #values to display in the popup box
    color_discrete_sequence=['purple'],
    size_max=20, zoom=10,
    labels={'capacity':'Capacity','block_id':'Block Id','description_tag':'Description',
            'census_year':'Census Year', 'number_of_patrons': 'Number of Patrons',
            'number_of_seats':'Number of Seats'
           }, #labels

)

#capw
fig4 = px.choropleth_mapbox(

    clue_venues_capw, #dataset
    geojson=capw,

    locations='activity_id',

    featureidkey="properties.activity_id", #polygon identifier from the GeoJSON data

    hover_name='small_area', # the title of the hover pop up box
    hover_data={'activity_id':False, 'classification':True,
                'start_year':True,'end_year':True, 'source_id': True}, #values to display in the popup box

    #defines labels
    labels={'source_id':'Source_Id', 'classification':'Classification',
            'start_year':'Start Year',
            'end_year':'End Year'}

 )



#differentiate recent years for interest
fig.add_trace(fig3.data[0])
fig.add_trace(fig2.data[0])
fig.add_trace(fig1.data[0])
fig.add_trace(fig4.data[0])
fig.update_geos(fitbounds="locations", visible=True)

fig.show()


### What areas have the maximum number of capacity?

Assess venue capacity to accommodate both number of seats and number of patrons, based on CLUE survey responses for 2021.

From the map above and the chart below, we can see the entertainment venues with maximum capacity are in Melbourne (Remainder), Kensington, East Melbourne, Southbank, Docklands, and Melbourne (CBD). This is venues that can hold large number of people.

The largest venues classified as pub, tavern or bar, are in Docklands.

In [None]:
#group
df = clue_venues_y1.groupby(['small_area_tag','category_tag'],as_index=False).agg(
    {'capacity':'max'})

#filter values
df = df.query("capacity > 0")

#sort
df= df.sort_values(by=['capacity'], ascending=True)

#plot
fig = px.bar(df, y='small_area_tag', x='capacity', orientation='h',
             color='category_tag', title="Capacity - Large Venue by area")
fig.show()

### What areas have the highest total number of seats or patrons?

From the chart below, we can see the entertainment venues with highest total capacity are in Melbourne (CBD), Southbank, Docklands, and Carlton. This is total capacity across all venues per area. Please note, that the cafe restaurants and bistro datasets contain supermarkets, florists and variety sub-types of venues.

In [None]:
#group
df = clue_venues_y1.groupby(['small_area_tag','category_tag'],as_index=False).agg(
    {'capacity':'sum', 'trading_name':'count'})

#filter values
df = df.query("capacity > 0")

#sort
df= df.sort_values(by=['capacity','trading_name'], ascending=True)

#plot
fig = px.bar(df, y='small_area_tag', x='capacity',
             color='category_tag', title="Total Capacity - Across Venues per Area")
fig.show()

### What types of seating do these venues have?

The most number of total Indoor and Outdoor seats are in Melbourne (CBD), folowed by Docklands and Southbank.

In [None]:
#group
df = clue_venues_y1.groupby(['small_area_tag','seating_type','category_tag'],as_index=False).agg(
    {'capacity':'sum','trading_name':'count'})

#filter values
df = df.query("capacity > 0")

#sort
df= df.sort_values(by=['capacity'], ascending=True)

#plot
fig = px.bar(df, y='small_area_tag', x='capacity',
             color='seating_type', title="Total Capacity by Seating Type - Across Venues per Area")
fig.show()

### What type of Events are planned?

Many planned events were provided for 2022, this will continue in 2023. The most events are for Melbourne (CBD) and Carlton.

In [None]:
#group
df = df_capw.groupby(['start_year','end_year','small_area','classification','status'],as_index=False).agg(
    {'source_id': 'count'})

df.rename(columns={"source_id":"num_events"},inplace = True)
df

#sort
df= df.sort_values(by=['end_year'], ascending=True)

#plot
fig = px.bar(df, x=['small_area','classification'], y='num_events',
             color='end_year', title="Planned Activity - End Year per Area")
fig.show()

#### The charts below shows for each district, how seats and patrons are related to planned activity and works areas

In [None]:
def summarize_within(input_gdf, input_summary_gdf, in_fields, out_fields = None, aggfunc='mean'):
    '''
    Overlays a polygon layer with another layer to calculate attribute field statistics about those features (input_summary_gdf) within the polygons (input_gdf).

    Parameters:
        input_gdf: Geodataframe of the polygons in which features will be summarized by.
        input_summary_gdf: Geodataframe of features that will be summarized
        in_fields: name of the fields (in input_summary_gdf) that will be summarized
        out_fields: name that the fields will have after they're summarized
        aggfunc: function that will be used to summarize

    Returns:
        A geodataframe with 'input_gdf' polygons and the attributes of 'input_summary_gdf' summarized by each polygon.

    '''
    input_gdf = input_gdf.copy()
    input_summary_gdf = input_summary_gdf.copy()
    print(input_summary_gdf.columns)
    if out_fields == None:
        out_fields = in_fields
    #Merges the dwelling points with the input_polygons. A new column "index right" is created. It indicates in what cell the property is within.
    merged = gpd.sjoin(input_summary_gdf, input_gdf, how='left')
    #Now lets count how many properties are within each cell
    dissolve = merged.dissolve(by="index_right", aggfunc=aggfunc) #Dissolve (looks like groupby) by the cell index
    for in_field, out_field in zip(in_fields, out_fields):
        input_gdf.loc[dissolve.index, out_field] = dissolve[in_field].values #Putting number of properties in input_polygons gdf

    return input_gdf.round(2)


In [None]:
gdf_capw = gpd.GeoDataFrame(df_capw_all, geometry = df_capw_all['geometry'].apply(lambda wkt: loads(wkt)))

df_crb_y3 = df_crb[df_crb['census_year']==2020]
df_crb_y3_gdf = gpd.GeoDataFrame(df_crb_y3, geometry = df_crb_y3[['longitude','latitude']].apply(lambda coord : Point(coord[0], coord[1]), axis=1))
gdf_capw['Number of seats'] = summarize_within(gdf_capw, df_crb_y3_gdf, ['number_of_seats'], aggfunc='sum').dropna(subset=['number_of_seats'])['number_of_seats']
gdf_capw['Number of seats'] = gdf_capw['Number of seats'].fillna(0)

df_btp_capacity_y3 = df_btp_capacity[df_btp_capacity['census_year']==2020]
df_btp_capacity_y3_gdf = gpd.GeoDataFrame(df_btp_capacity_y3, geometry = df_btp_capacity_y3[['longitude','latitude']].apply(lambda coord : Point(coord[0], coord[1]), axis=1))

gdf_capw['Number of patrons'] = summarize_within(gdf_capw, df_btp_capacity_y3_gdf, ['number_of_patrons'], aggfunc='sum').dropna(subset=['number_of_patrons'])['number_of_patrons']
gdf_capw['Number of patrons'] = gdf_capw['Number of patrons'].fillna(0)


In [None]:
def plot_map(gdf, col1, col2, col3, title):
    fig = go.Figure(go.Choroplethmapbox(geojson=gdf.__geo_interface__, locations=gdf.index, z=gdf[col1],
                                        colorscale="sunset", zmin=gdf[col1].min(), zmax=gdf[col1].max(),
                                        marker_opacity=1, marker_line_width=0, ))



    x,y = box(*gdf.total_bounds).centroid.xy
    #print(gdf[col1].max())


    fig.update_layout(mapbox_style="stamen-terrain", mapbox_center = {"lat": y[0], "lon": x[0]}, mapbox_zoom=11.5)

    matter_r= [[0.0, '#2f0f3d'], #cmocean colorscale
            [0.1, '#4f1552'],
            [0.2, '#72195f'],
            [0.3, '#931f63'],
            [0.4, '#b32e5e'],
            [0.5, '#cf4456'],
            [0.6, '#e26152'],
            [0.7, '#ee845d'],
            [0.8, '#f5a672'],
            [0.9, '#faca8f'],
            [1.0, '#fdedb0']]



    button1 = dict(method= 'update',
                label=col1,
                args=[
                        {"z": [gdf[col1]],
                        "zmax":[gdf[col1].max()],
                        "zmin":[gdf[col1].min()]

                        }, #dict for fig.data[0] updates
                        {"coloraxis.colorscale":"Viridis" } #dict for  layout attribute update
                    ])

    button2 = dict(method= 'update',
                label=col2,
                args=[
                    {"z": [gdf[col2]],
                    "zmax":[gdf[col2].max()],
                    "zmin":[gdf[col2].min()]


                    },
                    {"coloraxis.colorscale": matter_r} #update layout attribute
            ])

    button3 = dict(method= 'update',
                label=col3,
                args=[
                    {"z": [gdf[col3]],
                    "zmax":[gdf[col3].max()],
                    "zmin":[gdf[col3].min()]


                    },
                    {"coloraxis.colorscale": matter_r} #update layout attribute
            ])


    fig.update_layout(updatemenus=[dict(active=0,
                                        buttons= [button1, button2, button3])]
                                        )
    fig.update_layout(title_text = title, title_x=0.5)
    return fig

#gdf_capw = gpd.GeoDataFrame(df_capw, geometry='geometry')
plot_map(gdf_capw, 'start_year', 'end_year', 'Number of seats', 'Planned activity and works')


In [None]:
results_seats = []
for small_area in tqdm(df_crb_y3_gdf['clue_small_area'].unique()):
    intersection_mask = df_crb_y3_gdf[df_crb_y3_gdf['clue_small_area']==small_area].intersects(unary_union(gdf_capw['geometry']))
    seats_within_apw = df_crb_y3_gdf[df_crb_y3_gdf['clue_small_area']==small_area][intersection_mask]['number_of_seats'].sum()
    total_seats = df_crb_y3_gdf[df_crb_y3_gdf['clue_small_area']==small_area]['number_of_seats'].sum()
    results_seats.append({"clue_small_area":small_area, 'number_of_seats':seats_within_apw, 'percentage_of_seats':(seats_within_apw/total_seats)*100})
results_seats = pd.DataFrame(results_seats)

In [None]:
results_seats = results_seats.sort_values(by=['number_of_seats'], ascending=False)


fig = go.Figure()
fig.add_trace(
    go.Bar(x =results_seats['clue_small_area'], y=results_seats['number_of_seats'])
)

fig.update_layout(title_text = 'Number of seats of busineses located within planned activity and works area', title_x=0.5)
fig.show()

In [None]:
results_seats = results_seats.sort_values(by=['percentage_of_seats'], ascending=False)

fig = go.Figure()
fig.add_trace(
    go.Bar(x =results_seats['clue_small_area'], y=results_seats['percentage_of_seats'])
)

fig.update_layout(title_text = 'Percentage of seats located within planned activity and works area', title_x=0.5)
fig.show()

In [None]:
results_patrons = []
for small_area in tqdm(df_btp_capacity_y3_gdf['clue_small_area'].unique()):
    intersection_mask = df_btp_capacity_y3_gdf[df_btp_capacity_y3_gdf['clue_small_area']==small_area].intersects(unary_union(gdf_capw['geometry']))
    patrons_within_apw = df_btp_capacity_y3_gdf[df_btp_capacity_y3_gdf['clue_small_area']==small_area][intersection_mask]['number_of_patrons'].sum()
    total_patrons = df_btp_capacity_y3_gdf[df_btp_capacity_y3_gdf['clue_small_area']==small_area]['number_of_patrons'].sum()
    results_patrons.append({"clue_small_area":small_area, 'number_of_patrons':patrons_within_apw, 'percentage_of_patrons':(patrons_within_apw/total_patrons)*100})

results_patrons = pd.DataFrame(results_patrons)

In [None]:
results_patrons = results_patrons.sort_values(by=['number_of_patrons'], ascending=False)

fig = go.Figure()
fig.add_trace(
    go.Bar(x =results_patrons['clue_small_area'], y=results_patrons['number_of_patrons'])
)

fig.update_layout(title_text = 'Patrons capacity of busineses located within planned activity and works area', title_x=0.5)
fig.show()

In [None]:
results_patrons = results_patrons.sort_values(by=['percentage_of_patrons'], ascending=False)

fig = go.Figure()
fig.add_trace(
    go.Bar(x =results_patrons['clue_small_area'], y=results_patrons['percentage_of_patrons'])
)

fig.update_layout(title_text = 'Percentage of patrons capacity located within planned activity and works area', title_x=0.5)
fig.show()

<div class="usecase-section-header">Projections</div>

In this section we are trying to project the return of pedestrian traffic to the city. The reasoning is that if pedestrian traffic increases it is an indicator of people returning to the city. The time of the day, and where that pedestrian activity occurs is also an indicator of whether the interaction is associated with work or for leisure activities.

Specifically we can associate evening or night time activity for entertainment, either to venues such as bars, pubs, taverns, and restaurants, or for events. Apart from night time traffic, we will use the small area population forecast, and venue capacity, to project growth areas.

In [None]:
#group
df = sensor_ds_year.groupby(['year','when'],as_index=False).agg(
    {'hourly_counts':'mean','day_counts':'mean','night_counts':'mean'})

#sort
df= df.sort_values(by=['year'], ascending=False)

#plot
fig = px.line(df, x='year', y='hourly_counts', hover_data=["day_counts","night_counts"],
             color='when', title="Average Pedestrian Traffic By Year, By Day and Night")
fig.show()

In the chart above, as expectd we can see a marked difference in the pedestrian traffic in 2020. The day and night traffic have increased in 2022.

In [None]:
#group
df = sensor_ds_hod.groupby(['time','when'],as_index=False).agg(
    {'hourly_counts':'mean','day_counts':'mean','night_counts':'mean'})

#sort
df= df.sort_values(by=['hourly_counts'], ascending=True)

#plot
fig = px.bar(df, x='time', y='hourly_counts', hover_data=["day_counts","night_counts"],
             color='when', title="Average Pedestrian Traffic By Year, By Time")
fig.show()

Will examine the pedestrian traffic before Covid, as we can see in the charts above, there is a sharp drop in traffic in 2020. The years 2020-2022 are removed in the dataset used for the next step.

### Data Preparation

Confirm the data to be used has a linear trend, and remove outliers. Estimate missing values for years based on history.

Pedestrian traffic data trends upward once the 2020-2022 year totals are removed.

In [None]:
df =  rs_cve_pop_pt
fig = px.scatter(df,  x="year", y=["hourly_counts","day_counts","night_counts"]
                 ,  trendline="ols", title = "Pedestrian Traffic By Year"
                )
fig.show()

In [None]:
#Impute missing values for pedestrian traffic
df =  rs_cve_pop_pt.query('year < 2020')

#hourly_counts
y = df['hourly_counts']
x = np.array(df['year']).reshape((-1, 1))
regressor = LinearRegression()
regressor.fit(x, y)

#missing year values
mcp = ds_cve_pop_pt
mv = np.array(mcp['year']).reshape((-1, 1))
y_pred = regressor.predict(mv)

#Add estimated value for years with missing
mcp['est_hourly_counts'] = y_pred.tolist()


#day_counts
y = df['day_counts']
x = np.array(df['year']).reshape((-1, 1))
regressor = LinearRegression()
regressor.fit(x, y)

#missing year values
mcp = ds_cve_pop_pt
mv = np.array(mcp['year']).reshape((-1, 1))
y_pred = regressor.predict(mv)

#Add estimated value for years with missing
mcp['est_day_counts'] = y_pred.tolist()


#night_counts
y = df['night_counts']
x = np.array(df['year']).reshape((-1, 1))
regressor = LinearRegression()
regressor.fit(x, y)

#missing year values
mcp = ds_cve_pop_pt
mv = np.array(mcp['year']).reshape((-1, 1))
y_pred = regressor.predict(mv)

#Add estimated value for years with missing
mcp['est_night_counts'] = y_pred.tolist()



Population data has no outliers, but we need to impute values for past years based on the trend.

In [None]:
df =  ds_pop
fig = px.scatter(df,  x="year", y=["population"]
                 ,  trendline="ols", title = "Population By Year"
                )
fig.show()

In [None]:
#Impute missing values for population

df =  ds_pop
y = df['population']
x = np.array(df['year']).reshape((-1, 1))
regressor = LinearRegression()
regressor.fit(x, y)

#missing population year values
mcp = ds_cve_pop_pt
mv = np.array(mcp['year']).reshape((-1, 1))
y_pred = regressor.predict(mv)

#Add estimated value for years with missing
mcp['est_population'] = y_pred.tolist()


In [None]:
#view imputed values for population and traffic
ds_cve_pop_pt

For capacity data the downwards trend for 2021 - 2023 is removed.

In [None]:
#Removed Covid impacted years from here

#identify outliers 2020 and 2021 - peak Covid impact
out_values = [2020, 2021, 2022, 2023]

df =  ds_cve.copy()

#drop any rows for outlier values in the year column
df = df[df.year.isin(out_values) == False]

fig = px.scatter(df,  x="year", y=["capacity"]
                 ,  trendline="ols", title = "Capacity By Year"
                )
fig.show()

In [None]:
#Impute missing values for capacity
df =  ds_cve.copy()

y = df['capacity']
x = np.array(df['year']).reshape((-1, 1))
regressor = LinearRegression()
regressor.fit(x, y)

#missing capacity year values
mcp = ds_cve_pop_pt
mv = np.array(mcp['year']).reshape((-1, 1))
y_pred = regressor.predict(mv)

#Add estimated value for years with missing
mcp['est_capacity'] = y_pred.tolist()

In [None]:
mcp

In [None]:
#Combine data set across columns, replace missings with estimates
ds_cve_pop_pt['capacity'] = ds_cve_pop_pt[['capacity', 'est_capacity']].bfill(axis=1).iloc[:, 0]
ds_cve_pop_pt['population'] = ds_cve_pop_pt[['population', 'est_population']].bfill(axis=1).iloc[:, 0]
ds_cve_pop_pt['hourly_counts'] = ds_cve_pop_pt[['hourly_counts', 'est_hourly_counts']].bfill(axis=1).iloc[:, 0]
ds_cve_pop_pt['night_counts'] = ds_cve_pop_pt[['night_counts', 'est_night_counts']].bfill(axis=1).iloc[:, 0]
ds_cve_pop_pt['day_counts'] = ds_cve_pop_pt[['day_counts', 'est_day_counts']].bfill(axis=1).iloc[:, 0]

We now have estimations and predictions based on non-Covid values. These can be plotted as below.

In [None]:
df = ds_cve_pop_pt.sort_values(by=['year'], ascending=True)
df.tail(3).T

In [None]:
df =  ds_cve_pop_pt.copy()

#drop any rows for outlier values in the year column
df = df[df.year.isin(out_values) == False]

df =  df.sort_values(by=['year'], ascending=True)

fig = px.line(df,  x="year", y=["capacity","population","hourly_counts"]
                 , title = "Projection By Year (without Covid)"
                )
fig.show()

We will add the Covid component back in now to focus on Covid and recovery, and predict recovery for 2023.

In [None]:
#sort
df= sensor_ds_dt.sort_values(by=['date'], ascending=True)
print(df.head(10))


In [None]:
fig = px.line(df,  x="date", y=["day_counts","night_counts","hourly_counts"]
                 , title = "Projection By Year (with Covid traffic values)"
                )
fig.show()


Will focus on the past six months, as the growth seems to have flattened out looking at the chart above. There seems to be steady fluctuation looking at date alone.

In [None]:
df = sensor_ds_dt_month.groupby(['month_num'],as_index=False).agg(
    {'hourly_counts': 'mean','day_counts':'mean','night_counts':'mean'})

fig = px.scatter(df,  x="month_num", y=["day_counts","night_counts","hourly_counts"]
                 ,  trendline="ols", title = "Pedestrian Traffic By Date - Jun 2022 onwards"
                )
fig.show()

<div class="usecase-section-header">Congratulations!</div>

You have successfully used Melbourne Open Data to analyse the CLUE survey response data about the entertainment venues in and around the City of Melbourne!


For next steps please explore the City of Melbourne Open Data playground, such as stepping through other the use cases on pedestrain traffic and CLUE datasets.


<div class="usecase-section-header">References</div>

City of Melbourne Open Data Team, 2016 - 2022,'Bar, tavern, pub patron capacity 2020', City of Melbourne, date retrieved 26 Nov 2022, <https://data.melbourne.vic.gov.au/explore/dataset/bars-and-pubs-with-patron-capacity/information/>

City of Melbourne Open Data Team, 2015 - 2022,'Cafe, restaurant, bistro seats 2020', City of Melbourne, date retrieved 26 Nov 2022, <https://data.melbourne.vic.gov.au/explore/dataset/cafes-and-restaurants-with-seating-capacity/information/>

City of Melbourne Open Data Team, 2021 - 2022,'City Activities and Planned Works', City of Melbourne, date retrieved 26 Nov 2022, <https://data.melbourne.vic.gov.au/explore/dataset/city-activities-and-planned-works/information/?disjunctive.classification&disjunctive.small_area>

City of Melbourne Open Data Team, 2014 - 2021,'Pedestrian Counting System - Monthly (counts per hour)', City of Melbourne, date retrieved 03 Dec 2022, <https://melbournetestbed.opendatasoft.com/explore/dataset/pedestrian-counting-system-monthly-counts-per-hour/information/>

City of Melbourne Open Data Team, 2018 - 2021,'Pedestrian Counting System - Sensor Locations', City of Melbourne, date retrieved 03 Dec 2022, <https://data.melbourne.vic.gov.au/explore/dataset/pedestrian-counting-system-sensor-locations/information/>

City of Melbourne Open Data Team, 2021 - 2022,'City of Melbourne Population Forecasts by Small Area 2021-2041', City of Melbourne, date retrieved 15 Dec 2022, <https://data.melbourne.vic.gov.au/explore/dataset/city-of-melbourne-population-forecasts-by-small-area-2020-2040/information/>

*Edited Te' Claire 2023*