<p style="font-family: helvetica,arial,sans-serif; font-size:2.0em;color:white; background-color: black;
          padding: 16px">&emsp;<b>Small Area Population Growth & Transportation Needs Analysis</b></p>
    
<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:black; background-color: #DDDDDD; 
          text-align:justify; padding: 10px">&emsp;<b>Authored by: </b> Mick Wiedermann and Angie Hollingworth</p>

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black; 
          text-align:right; padding: 10px"><b>Duration:</b> 90 mins&emsp;</p>

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:black; background-color: #DDDDDD; 
          text-align:justify; padding: 10px">&emsp;<b>Level: </b>Intermediate&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;<b>Pre-requisite Skills:</b> Python</p>

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black;
          padding: 10px">&emsp;<b>Scenario</b>

<p>
    As a city planner, City of Melbourne investor, or business owner, knowing where potential growth hotspots are projected to develop helps to better plan for meeting future needs and identify opportunities.
    <ul>  
        <li>As a city planner, I want to identify which routes I should prioritise for public transport upgrades and which for active transport upgrades.</li>  
        <li>As a business owner, I need to know which areas could have a greater demand for my goods or services when planning to establish a new business location.</li>   
        <li>As an investor, I'd like to know which suburbs are set to grow more rapidly than the average so I can minimise my risk</li>  
    </ul>
</p>


<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black;
          padding: 10px">&emsp;<b>What this Use Case will teach you</b>

At the end of this use case you will:
- {point1}
- {point2}
- {point3}

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black;
          padding: 10px"">&emsp;<b>{Introduction/Background or History relating to problem}</b>

{Keep it concise. We're not after "War and Peace" but enough background information to inform the reader on the rationale for solving this problem or background non-technical information that helps explain the approach.}

### {Sub-Section Title}

{Sub-Section blurb}
{Use Links for references and be sure to acknowledge your sources and any attributions.}

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black;
          padding: 10px""><b>&emsp;Which Melbourne Open Data should I use?</b>

To begin we shall first import the necessary libraries to support our [exploratory data analysis|visualisation|predictive analytics|reporting] using Melbourne Open data.

The following are core packages required for this exercise:

- {List each non-standard package and why briefly why you're using it. No need to list commonly used packages like numpy, maths,os, time, pandas}


In [3]:
# Importing the data in various forms
import os                   
from sodapy import Socrata
import zipfile as zf
import requests
from io import BytesIO

# Working with the data
import time
from datetime import datetime
import numpy as np
import pandas as pd
import geopandas as gpd

# Creating Visualisations
import plotly.graph_objs as go
import plotly.express as px

<h3>Importing the data</h3> 
<p>
    To connect to the <b>Melbourne Open Data Portal</b> we must establish a connection using the sodapy library by specifying a domain, being the website domain where the data is hosted, and an application access token which can be requested from the City of Melbourne Open Data portal by registering 
<a href="https://data.melbourne.vic.gov.au/signup">here</a>.
</p>
<p>
    For this exercise we will access the domain without an application token. Each dataset in the Melbourne Open Data Portal has a unique identifier which can be used to retrieve the dataset using the sodapy library.
</p>
<p>
    The <b>City of Melbourne Population Forecasts by Small Area 2020-2040</b> dataset unique identifier is <b>sp4r-xphj</b>.
    We will pass this identifier into the sodapy command below to retrieve this data placing it into a Pandas dataframe.
</p>

In [4]:
apptoken = os.environ.get("SODAPY_APPTOKEN") # Anonymous App Token
domain = "data.melbourne.vic.gov.au"
client = Socrata(domain, apptoken)           # Open Dataset Connection
pop_data_unique_identifier = 'sp4r-xphj'   

population_data = pd.DataFrame.from_dict(client.get_all(pop_data_unique_identifier))



<p>
    The next Dataset we need to import is the <b>Victorian Suburbs/Locality Boundaries</b> from data.gov.au which is freely available for download via the below URL. As the data includes geometric data, we will import this data into a GeoPandas Dataframe. 
</p>

In [5]:
suburb_geo_data_url = ('https://data.gov.au/geoserver/vic-suburb-locality-boundaries-psma-administrative-'
    + 'boundaries/wfs?request=GetFeature&typeName=ckan_af33dd8c_0534_4e18_9245_fc64440f742e&outputFormat=json')
vic_suburbs = gpd.read_file(suburb_geo_data_url)

<p>Now, we will look at each specific dataset to better understand its structure and how we can use it.</p>
<p>
    Our data requirements from this use case include the following:
    <ul>
        <li>Number of residents per suburb</li>
        <li>Number of residents per year</li>
        <li>Suburb geometry and location</li>
    </ul>
</p>
<p>
    For this exercise, we shall start by examining the first five rows of the <b>City of Melbourne Population Forecasts by Small Area 2020-2040</b> dataset.
</p>

In [6]:
print(f'Number of (rows, columns): {population_data.shape}')
population_data.head()       

Number of (rows, columns): (16989, 5)


Unnamed: 0,geography,year,gender,age,value
0,City of Melbourne,2020,Female,Age 0-4,2683
1,City of Melbourne,2021,Female,Age 0-4,2945
2,City of Melbourne,2022,Female,Age 0-4,3212
3,City of Melbourne,2023,Female,Age 0-4,3515
4,City of Melbourne,2024,Female,Age 0-4,3833


<p>
    Now moving onto the <b>Victorian Suburbs/Locality Boundaries</b> dataset, we only require two columns of information being <b>vic_loca_2</b> the suburb name, and <b>geometry</b>, which holds the geographical location and boundries of our suburb, so lets extract those columns and examin the first few rows of our dataset. 
</p>

In [7]:
vic_suburbs_reduced = vic_suburbs[['vic_loca_2', 'geometry']] # Selecting our required columns
vic_suburbs_reduced.columns = ['suburb', 'geometry']          # Renaming for clarity
print(f'Number of (rows, columns): {population_data.shape}')
vic_suburbs_reduced.head()

Number of (rows, columns): (16989, 5)


Unnamed: 0,suburb,geometry
0,UNDERBOOL,"MULTIPOLYGON (((141.74552 -35.07229, 141.74552..."
1,NURRAN,"MULTIPOLYGON (((148.66877 -37.39571, 148.66876..."
2,WOORNDOO,"MULTIPOLYGON (((142.92288 -37.97886, 142.90449..."
3,DEPTFORD,"MULTIPOLYGON (((147.82336 -37.66001, 147.82313..."
4,YANAC,"MULTIPOLYGON (((141.27978 -35.99859, 141.27989..."


<p>
    Good.
</p>

<h3>Filtering Our Datasets</h3>

<p>
    As we can see in the data preview above, the population data has many suburbs that are outside of our target area. By passing the dataset to the following function while specifying our year of interest, the function will remove any unnecasary suburbs, clean, and return a summarised version of our data containing our sububs of interest for the year specified.  
</p>

In [18]:
def pop_data_by_year(dataset, year):
    """
    Filters and cleans the Population dataset returning a new pandas dataframe focused on the year passed to the function.
    
    Note that the year must be between 2020 and 2040 inclusive. 
    """
    # Extract the colomns of interest into "summary".
    summary = dataset[['geography', 'year', 'value']]
    # Extract the data matching the year passed from the summary.
    data = summary[summary['year'] == year]
    # Convert datatypes
    data['year'] = data['year'].astype('int')
    data['value'] = data['value'].astype('float')
    data['suburb'] = data['geography'].astype('str')
   
    # Grouping the data by suburb while summing the population values. 
    data = pd.DataFrame(data.groupby('suburb')['value'].sum())
    data = data.reset_index()
    # Renaming the column "value" to "population_year" where year represents the year passed.
    data.rename(columns={'value':f'population_{year}'}, inplace=True)
    
    # Cleaning the data and reset indexes
    data['suburb'] = data['suburb'].replace(['Melbourne (CBD)', 'Melbourne (Remainder)'], ['Melbourne', 'Melbourne'])
    data = pd.DataFrame(data.groupby('suburb')[f'population_{year}'].sum())
    # data.drop('West Melbourne (Industrial)', axis=0)
    # data = data.reset_index()
    
    # Removing unrequired data.
    # subs_to_delete = ['West Melbourne (Industrial)', 'City of Melbourne']
    # subs = [data.index[data['suburb']==sub].tolist()[0] for sub in subs_to_delete]

    # data.drop(subs, inplace = True)

    data = data.reset_index(drop=True)
    data['suburb'] = data['suburb'].replace(['West Melbourne (Residential)'], ['West Melbourne'])
    
    # sort data
    data.sort_values('suburb', inplace = True)
    data = data.reset_index(drop=True)
    data['suburb'] = data['suburb'].astype(str)
    
    return data

<p>Utilising the function above</p>

In [19]:
pop_data_2030 = pop_data_by_year(population_data, 2030)
pop_data_2030

KeyError: 'suburb'

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: red">&emsp;<b>Example code from this point on from the New Business Location Use Case...</b>

In [None]:
########################################################
# EXAMPLE CODE BELOW FROM NEW BUSINESS LOCATION USE CASE
########################################################
# Retrieve the dataset
data_rm92_h5tq = pd.DataFrame.from_dict(client.get_all('rm92-h5tq'))

print(f'The shape of dataset is {data_rm92_h5tq.shape}.')
print('Below are the first few rows of this dataset:')

# Transpose the DataFrame for easier visual comparison. 
data_rm92_h5tq.head(3).T

The shape of dataset is (10402, 10).
Below are the first few rows of this dataset:


Unnamed: 0,0,1,2
census_year,2020,2020,2020
block_id,1,1,11
pbs_property_id,611394,611395,103957
bps_base_id,611394,611395,103957
street_name,545-557 Flinders Street MELBOURNE VIC 3000,561-581 Flinders Street MELBOURNE VIC 3000,517-537 Flinders Lane MELBOURNE VIC 3000
clue_small_area,Melbourne (CBD),Melbourne (CBD),Melbourne (CBD)
dwelling_type,Residential Apartments,Residential Apartments,Residential Apartments
dwelling_number,196,189,26
x_coordinate,144.9565145,144.9559094,144.9566569
y_coordinate,-37.82097941,-37.82108687,-37.81987147


We can see that there are 10,402 records and 10 fields describing each record.

Each record show us the number of dwellings for each individual property and the type of dwelling e.g. House/Townhouse, Residential Apartments, etc.

The location of each property is given using:
- Latitude and Longitude
- CLUE Small Area and Block ID
- Property Id

The Census year that the data was collected is also shown.

For our analysis of this dataset and others we will be restricting our analysis to the 2020 CLUE Census and summarising the data to CLUE Block level.

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black;
          padding: 10px"">&emsp;<b>Summarising Residential Dwelling data</b>

We want to plot the density of both residential dwellings and employment at city block level rather than a specific property or address. We can use a __[choropleth map](https://en.wikipedia.org/wiki/Choropleth_map)__ to do this.

Let's start by summarising the data at CLUE small area and Block level.

*Note: We include CLUE Small Area as one of our group by fields so we can display the CLUE Small area name in the popup window when you hover over the area on the map.*

We want to summarise the data by summing the number of dwellings across all rows in the same CLUE Block.

The following cell creates a dataframe containing this summary of residential dwellings.

In [None]:
########################################################
# EXAMPLE CODE BELOW FROM NEW BUSINESS LOCATION USE CASE
########################################################
# Cast datatypes to correct type so we can summarise
data_rm92_h5tq[['census_year', 'dwelling_number']] = data_rm92_h5tq[['census_year', 'dwelling_number']].astype(int)
data_rm92_h5tq[['x_coordinate', 'y_coordinate']] = data_rm92_h5tq[['x_coordinate', 'y_coordinate']].astype(float)
data_rm92_h5tq = data_rm92_h5tq.convert_dtypes() # convert remaining to string
data_rm92_h5tq.dtypes

# create the aggregate dataset
groupbyfields = ['block_id','clue_small_area']
aggregatebyfields = {'dwelling_number': ["sum"]}

dwellingsByBlock = pd.DataFrame(data_rm92_h5tq.groupby(groupbyfields, as_index=False).agg(aggregatebyfields))

# Dataframse Group by creates two levels of headings
# so we flatten the headings to make it easier to extract data for plotting
dwellingsByBlock.columns = dwellingsByBlock.columns.map(''.join) # flatten column header
dwellingsByBlock.rename(columns={'clue_small_area': 'clue_area'}, inplace=True) #rename to match GeoJSON extract
dwellingsByBlock.rename(columns={'dwelling_numbersum': 'dwelling_count'}, inplace=True)
dwellingsByBlock.head(5)

Unnamed: 0,block_id,clue_area,dwelling_count
0,1,Melbourne (CBD),385
1,101,West Melbourne (Residential),863
2,103,Melbourne (CBD),638
3,104,Melbourne (CBD),1093
4,105,Melbourne (CBD),1729


<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black;
          padding: 10px"">&emsp;<b>Visualising Residential Dwelling on a Choropleth Map</b>

We use the __[Plotly Python Open Source Graphing Library](https://plotly.com/python/)__ to generate maps from __[mapbox](https://www.mapbox.com/)__.

Creating a choropleth map requires us to know the geometry(shape) of each CLUE Block area as a collection of latitude and longitude points defining a polygon. This data can be downloaded from the Melbourne Open Data Portal in __[GeoJSON](https://en.wikipedia.org/wiki/GeoJSON)__ format.

We also need to supply the data to be used to highlight the CLUE Blocks and that data must include the same unique identifier for each Block contained in the GeoJSON data set.

Below we extract the Melbourne CLUE Block polygons into a JSON datatype.

**The final line in the cell displays the unique key for each polygon which must also exist in the Residential Dwelling dataset.**

In [None]:
########################################################
# EXAMPLE CODE BELOW FROM NEW BUSINESS LOCATION USE CASE
########################################################
from urllib.request import urlopen
import json

geoJSON_Id = 'aia8-ryiq' # Melbourne CLUE Block polygons in GeoJSON format

GeoJSONURL = 'https://'+domain+'/api/geospatial/'+geoJSON_Id+'?method=export&format=GeoJSON'
with urlopen(GeoJSONURL) as response:
    block = json.load(response)
    
block["features"][0]['properties'].keys()
#block
#dwellingsByBlock

dict_keys(['block_id', 'clue_area'])

Now using just one function call called 'choropleth_mapbox' we can diaplay an interactive map using the **block** GeoJSON data to define the regions and the **dwellingsByBlock** dataframe to define the summarised data by block.

In [None]:
########################################################
# EXAMPLE CODE BELOW FROM NEW BUSINESS LOCATION USE CASE
########################################################
# Display the choropleth map
fig = px.choropleth_mapbox(dwellingsByBlock, # pass in the summarised dwellings per block
                           geojson=block, # pass in the GeoJSON data defining the CLUE Block polygons
                           locations='block_id', # define the unique identifier for the Blocks from the dataframe
                           color='dwelling_count', # change the colour of the block region according to the dwelling count
                           color_continuous_scale=["#FFFF88", "yellow", "orange", "orange",
                                                   "orange", "darkorange", "red", "darkred"], # define custom colour scale
                           range_color=(0, dwellingsByBlock['dwelling_count'].max()), # set the numeric range for the colour scale
                           featureidkey="properties.block_id", # define the Unique polygon identifier from the GeoJSON data
                           mapbox_style="stamen-toner", # set the visual style of the map
                           zoom=12.15, # set the zoom level
                           center = {"lat": -37.813, "lon": 144.945}, # set the map centre coordinates
                           opacity=0.5, # opacity of the choropleth polygons
                           hover_name='clue_area', # the title of the hover pop up box
                           hover_data={'block_id':True,'dwelling_count':True}, # defines which dataframe fields to display
                                                                               # in the hover popup box
                           labels={'dwelling_count':'Number of Dwellings','block_id':'CLUE Block Id'}, # defines labels for
                                                                               # the hover popup box
                           title='Residential Dwellings by CLUE Block Id for 2020', # Title for plot
                           width=950, height=800 # dimensions of plot in pixels
                          )
fig.show()

You've successfully used Melbourne CLUE Open Data and Plotly to visualise residential density in the City of Melbourne!<br>
Now zoom in and out on the map above to explore the city and areas of high and low residential density.<br><br>
This is your first step to selecting a suitable location for your new business!

__[You can explore the Residential Density data here](../dataanalysis/eda-clue-residentialdwellings.ipynb)__.

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black;
          padding: 10px"">&emsp;<b>Visualising Residential Density and Cafe or Restaurant Seating</b>

To build our view of cafe venue seating and how it relates to residential density we need to visualise both datasets on the same interactive map view.

We can do this by adding a new layer (or "trace" as it is called in Plotly) to our previous map of residential density.

Let's extract the Melbourne CLUE cafe, restaurant, bistro seats dataset and summarise it so its ready to plot.

In [None]:
########################################################
# EXAMPLE CODE BELOW FROM NEW BUSINESS LOCATION USE CASE
########################################################
# Pull dataset from Melbourne Open Data Portal
data_dyqx_cfn5 = pd.DataFrame.from_dict(client.get_all('dyqx-cfn5')) # Melbourne CLUE Cafe, restaurant, bistro seats

# Cast columns to correct data type
integer_columns = ['census_year', 'block_id', 'property_id', 'base_property_id', 'industry_anzsic4_code', 'number_of_seats']
fp_columns = ['x_coordinate', 'y_coordinate']
data_dyqx_cfn5[integer_columns] = data_dyqx_cfn5[integer_columns].astype(int)
data_dyqx_cfn5[fp_columns] = data_dyqx_cfn5[fp_columns].astype(float)
data_dyqx_cfn5 = data_dyqx_cfn5.convert_dtypes() # convert remaining to string

# Summarise venue seating by location
groupbyfields = ['clue_small_area','block_id','y_coordinate','x_coordinate']
aggregatebyfields = {'number_of_seats': ["sum"]}

seatsByLocn = pd.DataFrame(data_dyqx_cfn5.groupby(groupbyfields, as_index=False).agg(aggregatebyfields))
seatsByLocn.columns = seatsByLocn.columns.map(''.join) # flatten column header
seatsByLocn.rename(columns={'clue_small_area': 'clue_area'}, inplace=True) #rename to match GeoJSON extract
seatsByLocn.rename(columns={'number_of_seatssum': 'number_of_seats'}, inplace=True) #rename to match GeoJSON extract
seatsByLocn['number_of_seats'] = seatsByLocn['number_of_seats'].astype(int)

# Calculate scale for drawing each bubble on scatter map plot
all_data_diffq = (seatsByLocn["number_of_seats"].max() - seatsByLocn["number_of_seats"].min()) / 16
seatsByLocn['scale'] = (seatsByLocn["number_of_seats"] - seatsByLocn["number_of_seats"].min()) / all_data_diffq + 1
seatsByLocn['scale'] = seatsByLocn['scale'].astype(int)+2
seatsByLocn.head(10)

Unnamed: 0,clue_area,block_id,y_coordinate,x_coordinate,number_of_seats,scale
0,Carlton,203,-37.796707,144.965534,51,3
1,Carlton,203,-37.79668,144.9649,42,3
2,Carlton,204,-37.797833,144.965174,50,3
3,Carlton,204,-37.797255,144.965754,120,3
4,Carlton,205,-37.79947,144.964893,96,3
5,Carlton,205,-37.799001,144.964765,80,3
6,Carlton,205,-37.798721,144.965257,41,3
7,Carlton,206,-37.800457,144.966558,51,3
8,Carlton,206,-37.800191,144.966716,140,3
9,Carlton,206,-37.800046,144.966741,115,3


Above we can see our summary dataframe has calculated the total number of seats (indoor and outdoor) at each unique locations (latitude and longitude).

Since there is such a wide variance in venue seating across the city we need to scale the size of the bubbles drawn on the map to just a few (16) distinct sizes.

We set the lowest scale to 3 to ensure even the smallest venue's bubble is large enough when one zooms in at block level.

The next step is to display both the Choropleth and Scatter maps.
We first draw the choropleth map showing residential density.
We then draw the scatter plot assigning it as a trace (aka "layer") to the existing figure then show both.

In [None]:
########################################################
# EXAMPLE CODE BELOW FROM NEW BUSINESS LOCATION USE CASE
########################################################
# Plot residential density and venue seating
fig = px.choropleth_mapbox(dwellingsByBlock, geojson=block, locations='block_id', color='dwelling_count',
                           color_continuous_scale=["#FFFF88", "yellow", "orange", "orange",
                                                   "orange", "darkorange", "red", "darkred"],
                           range_color=(0, dwellingsByBlock['dwelling_count'].max()),
                           featureidkey="properties.block_id",
                           mapbox_style="stamen-toner", #"carto-positron",
                           zoom=12.15,
                           center = {"lat": -37.813, "lon": 144.945},
                           opacity=0.5,
                           hover_name='clue_area',
                           hover_data={'block_id':True,'dwelling_count':True},
                           labels={'dwelling_count':'Number of Dwellings','block_id':'CLUE Block Id'},
                           title='Residential Dwellings Density & Venue Seating (2020)',
                           width=950, height=800
                          )

# Plot of venue seating
fig2 = px.scatter_mapbox(seatsByLocn, lat="y_coordinate", lon="x_coordinate", size="scale",
                        mapbox_style="stamen-toner",
                        zoom=12.15,
                        center = {"lat": -37.813, "lon": 144.945},
                        opacity=0.70,
                        hover_name="clue_area",
                        hover_data={"block_id":True,"scale":False,"number_of_seats":True,"x_coordinate":False,"y_coordinate":False},
                        color_discrete_sequence=['purple'],
                        labels={'number_of_seats':'Number of Seats', 'block_id':'CLUE Block Id'},
                        width=950, height=800)
fig.add_trace(fig2.data[0])
fig.update_geos(fitbounds="locations", visible=False)

fig.show()

You've successfully used Melbourne CLUE Open Data and Plotly to visualise residential density and venue seating in the City of Melbourne in one map!<br>
Now zoom in and out on the map above to explore the city and areas of high residential density but low venue seating.<br><br>
This could be a possible location for your new business!

__[You can explore the Venue Seating data in more detail here](../dataanalysis/eda-clue-venueseats.ipynb)__.

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black;
          padding: 10px"">&emsp;<b>Building an Interactive Visualisation for New Business Location</b>

In the previous step we saw how we can create a new layer, also called a trace, to an existing mapbox plot in order to visualise both residential density and cafe or Restaurant venue seating on the one map.

We now wish to add Employment Density to this visualisation.
Since Employment density and Residential density both require use a choropleth map to visualise data at CLUE block level, we canot overlay these two layers at the same time.

We therefore need a way to select the base choropleth map to show either residential density or employment density and then optionally turn on or off the venue seating as an additional scatter map box layer.

To achieve this interactivity we can make use of Plotly express functions to build a drop down menu and button to be overlaid on the map. 

We will require three datasets and associated layers (traces) for this visualation.

Let's start by extracting our third dataset titled __["Employment per industry for blocks 2020"](https://data.melbourne.vic.gov.au/Business/Employment-per-industry-for-blocks-2020/qnju-it8g)__ and performing some data preparation prior to plotting.

*Note: The ***"Employment per industry for blocks 2020"*** dataset is a summary of employment at CLUE Block level and so we do not need to perform a groupby aggregation on the dataset.*

In [None]:
########################################################
# EXAMPLE CODE BELOW FROM NEW BUSINESS LOCATION USE CASE
########################################################
# Pull dataset from Melbourne Open Data Portal
data_qnju_it8g = pd.DataFrame.from_dict(client.get_all('qnju-it8g')) # Employment per industry for blocks 2020

# Filter out unwanted columns
columnsToKeep = ['clue_small_area','block_id','total_employment_in_block']
employmentByBlock = data_qnju_it8g.filter(columnsToKeep)

# Rename to match GeoJSON extract
employmentByBlock.rename(columns={'clue_small_area': 'clue_area'}, inplace=True)

# Replace all NaNs with zero
employmentByBlock.fillna(value=0,inplace=True)

# Cast columns to correct datatype
employmentByBlock[['block_id','total_employment_in_block']] = employmentByBlock[['block_id','total_employment_in_block']].astype(int)
employmentByBlock = employmentByBlock.convert_dtypes() # convert remaining to string

# Exclude summary total for all of City of Melbourne
employmentByBlock = employmentByBlock[employmentByBlock['block_id'] > 0] 

# Display sample data
employmentByBlock.head(5)

Unnamed: 0,clue_area,block_id,total_employment_in_block
0,Melbourne (CBD),1,764
1,Melbourne (CBD),2,195
2,Melbourne (CBD),4,653
3,Melbourne (CBD),5,5
4,Melbourne (CBD),6,843


Now we have a dataset showing total number of employmees by CLUE block, let's visualise it as a choropleth map and overlay venue seating.

In this map visualisation we will use a different map style called "open-street-map" which lets us identify the names of venues close to where the venue seating measures have been reported. **Note that not all venues may have been marked on Open Street Maps.**

Mapbox styles which do not require a Mapbox API token are 'open-street-map', 'white-bg', 'carto-positron', 'carto-darkmatter', 'stamen- terrain', 'stamen-toner', 'stamen-watercolor'. Mapbox styles which do require a Mapbox API token are 'basic', 'streets', 'outdoors', 'light', 'dark', 'satellite', 'satellite- streets'.

**Source:** __[plotly.express.line_mapbox documentation](https://plotly.com/python-api-reference/generated/plotly.express.line_mapbox.html)__

In [None]:
########################################################
# EXAMPLE CODE BELOW FROM NEW BUSINESS LOCATION USE CASE
########################################################
# Plot employment density
fig = px.choropleth_mapbox(employmentByBlock, geojson=block, locations='block_id', color='total_employment_in_block',
                           color_continuous_scale="Blues",
                           range_color=(0, employmentByBlock['total_employment_in_block'].max()),
                           featureidkey="properties.block_id",
                           mapbox_style="open-street-map",
                           zoom=12.15,
                           center = {"lat": -37.813, "lon": 144.945},
                           opacity=0.5,
                           hover_name='clue_area',
                           hover_data={'block_id':True,'total_employment_in_block':True},
                           labels={'total_employment_in_block':'Number of Employees','block_id':'CLUE Block Id'},
                           title='Employment Density & Venue Seating (2020)',
                           width=950, height=800
                          )

# Plot of venue seating
fig2 = px.scatter_mapbox(seatsByLocn, lat="y_coordinate", lon="x_coordinate", size="scale",
                        mapbox_style="stamen-toner",
                        zoom=12.15,
                        center = {"lat": -37.813, "lon": 144.945},
                        opacity=0.70,
                        hover_name="clue_area",
                        hover_data={"block_id":True,"scale":False,"number_of_seats":True,"x_coordinate":False,"y_coordinate":False},
                        color_discrete_sequence=['purple'],
                        labels={'number_of_seats':'Number of Seats', 'block_id':'CLUE Block Id'},
                        width=950, height=800)
fig.add_trace(fig2.data[0])
fig.update_geos(fitbounds="locations", visible=False)

fig.show()

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:white; background-color: black;
          padding: 10px"">&emsp;<b>Combining all map layers into one interactive map box visualisation</b>

Let's now build a single map box visualisation using our three datasets.

Our first step is to create a base plotly figure to which we can add each individual map plot as a new layer.

The title of the visualisation and any common parameters can be set using the fig.update_layout() method.

In the cell below we also have defined two custom colorscales, one continuous for the choropleth map and the other discrete for the scatter map plot.

We then create a figure for each dataset and add it as a layer to the base figure using the fig.add_trace() method.

In [None]:
########################################################
# EXAMPLE CODE BELOW FROM NEW BUSINESS LOCATION USE CASE
########################################################
# Define custom colour scale for choropleth (continuous) and scatter (discrete)
custom_continuous_colorscale = [(0, "lightblue"), (0.25, "blue"), (1, "darkblue")]
custom_discrete_colorscale = ['red']

# Create the base figure to which layers(traces) will be added.
fig = go.Figure()

# Set the default style for the map
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(hovermode='closest')
fig.update_layout(mapbox_center_lat=-37.813, mapbox_center_lon=144.945, mapbox_zoom=12.15)
fig.update_layout(width=950, height=800)
fig.update_layout(title='Residential & Employment Density plus Venue Seating (2020)')
fig.update_layout(coloraxis_colorscale=custom_continuous_colorscale)
fig.update_layout(coloraxis_colorbar={'title':'Density'})

# Create the definition for the Residential Dwellings Layer
fig1 = px.choropleth_mapbox(dwellingsByBlock, geojson=block, locations='block_id', color='dwelling_count',
                           range_color=(0, dwellingsByBlock['dwelling_count'].max()),
                           featureidkey="properties.block_id",
                           hover_name='clue_area',
                           hover_data={'block_id':True,'dwelling_count':True},
                           labels={'dwelling_count':'Number of Dwellings','block_id':'CLUE Block Id'},
                           opacity=0.5,
                           
                          )
fig.add_trace(fig1.data[0]) # add this layer to the base figure

# Create the definition for the Employment Layer
fig2 = px.choropleth_mapbox(employmentByBlock, geojson=block, locations='block_id', color='total_employment_in_block',
                           range_color=(0, employmentByBlock['total_employment_in_block'].max()),
                           featureidkey="properties.block_id",
                           hover_name='clue_area',
                           hover_data={'block_id':True,'total_employment_in_block':True},
                           labels={'total_employment_in_block':'Number of Employees','block_id':'CLUE Block Id'},
                           opacity=0.5
                          )
fig.add_trace(fig2.data[0]) # add this layer to the base figure

# Create the definition for the Venue Seating Layer
fig3 = px.scatter_mapbox(seatsByLocn, lat="y_coordinate", lon="x_coordinate", size="scale",
                        hover_name="clue_area",
                        hover_data={"block_id":True,"scale":False,"number_of_seats":True,"x_coordinate":False,"y_coordinate":False},
                        labels={'number_of_seats':'Number of Seats', 'block_id':'CLUE Block Id'},
                        opacity=0.70, color_discrete_sequence=custom_discrete_colorscale
                        )
fig.add_trace(fig3.data[0]) # add this layer to the base figure

Finally, we define buttons and text to appear along the top of the map.

Each button turns on a combination of layers when it is clicked. The layers it turns on are defined in the 'visible' arg array with the order of boolean values corresponding to the map layers in the order they were added.

For example: When the 'Residential Density & Seating' button is clicked it turns on the 1st and 3rd layer as defined by the following argument 'visible':[True, False, True] . The 1st layer was the Residential Dwelling density choropleth map and the 3rd layer was the Venue Seating Scatter map.

In [None]:
########################################################
# EXAMPLE CODE BELOW FROM NEW BUSINESS LOCATION USE CASE
########################################################
# Turn off all choropleth layers
fig.update_traces(visible=False, selector=dict(type='choroplethmapbox'))

# Add buttons for selection on plot
buttons = [dict(method='update',
                label='Venue Seating only',  visible=True,
                args=[{'label': 'Venue Seating', 'visible':[False, False, True]}]),
           dict(method='update',
                label='Residential Density & Seating', visible=True,
                args=[{'label': 'Residential Dwelling Density','visible':[True, False, True]}]),
           dict(method='update',
                label='Employment Density & Seating', visible=True,
                args=[{'label': 'Employment Density','visible':[False, True, True]}])
          ]
                   
um_buttons = [{'active':0, 'showactive':True, 'buttons':buttons,
               'direction': 'down', 'xanchor': 'left','yanchor': 'bottom', 'x': 0.71, 'y': 1.01}]
map_annotations = [{'text':'Please select a map view to display', 'x': 1, 'y': 1.1,
                    'showarrow': False, 'font':{'family':'Arial','size':14}}]

fig.update_layout(updatemenus=um_buttons, annotations=map_annotations)

# Display the map
fig.show()

<p style="font-family: helvetica,arial,sans-serif; font-size:1.6em;color:black; background-color: #EEEEEE;
          padding: 10px"">&emsp;<b>Congratulations. Our interactive map is now complete!</b>

Now you can use the controls on the map above to explore the City of Melbourne and observe the residential density and employment density of each city block in relation to venue seating capacity.<br><br>

If you would like to extend this interactive map further, please visit the __[City of Melbourne Open Data Site](https://data.melbourne.vic.gov.au/)__ and explore some of the other valuable datasets including:
- __[Off Street Parking](https://data.melbourne.vic.gov.au/Transport/Off-street-car-parking-2020/g9am-cna5)__
- __[Pedestrian Counting System](https://data.melbourne.vic.gov.au/Transport/Pedestrian-Counting-System-Monthly-counts-per-hour/b2ak-trbp)__
- __[Microclimate sensor readings](https://data.melbourne.vic.gov.au/Environment/Microclimate-Sensor-Readings/u4vh-84j8?src=featured_banner)__
