_**DELETE BEFORE PUBLISHING**_

_This is a template also containing the style guide for use cases. The styling uses the use-case css when uploaded to the website, which will not be visible on your local machine._

_Change any text marked with {} and delete any cells marked DELETE_

***

In [1]:
# DELETE BEFORE PUBLISHING
# This is just here so you can preview the styling on your local machine

from IPython.core.display import HTML
HTML("""
<style>
.usecase-title, .usecase-duration, .usecase-section-header {
    padding-left: 15px;
    padding-bottom: 8px;
    padding-top: 8px;
    padding-right: 15px;
    background-color: #0f9295;
    color: #fff;
}

.usecase-title {
    font-size: 1.7em;
    font-weight: bold;
}

.usecase-authors, .usecase-level, .usecase-skill {
    padding-left: 15px;
    padding-bottom: 6px;
    padding-top: 6px;
    background-color: #baeaeb;
    font-size: 1.4em;
    color: #121212;
}

.usecase-level-skill  {
    display: flex;
}

.usecase-level, .usecase-skill {
    width: 50%;
}

.usecase-duration, .usecase-skill {
    text-align: right;
    padding-right: 15px;
    padding-bottom: 6px;
    font-size: 1.4em;
}

.usecase-section-header {
    font-weight: bold;
    font-size: 1.5em;
}

.usecase-subsection-header, .usecase-subsection-blurb {
    font-weight: bold;
    font-size: 1.2em;
    color: #121212;
}

.usecase-subsection-blurb {
    font-size: 1em;
    font-style: italic;
}
</style>
""")

***

_**DELETE BEFORE PUBLISHING**_

## Style guide for use cases

### Headers

For styling within your markdown cells, there are two choices you can use for headers.

1) You can use HTML classes specific to the use case styling:

```<p class="usecase-subsection-header">This is a subsection header.</p>```

<p style="font-weight: bold; font-size: 1.2em;">This is a subsection header.</p>

```<p class="usecase-subsection-blurb">This is a blurb header.</p>```

<p style="font-weight: bold; font-size: 1em; font-style:italic;">This is a blurb header.</p>


2) Or if you like you can use the markdown header styles:

```# for h1```

```## for h2```

```### for h3```

```#### for h4```

```##### for h5```

## Plot colour schemes

General advice:
1. Use the same colour or colour palette throughout your notebook, unless variety is necessary
2. Select a palette based on the type of data being represented
3. Consider accessibility (colourblindness, low vision)

#### 1) If all of your plots only use 1-2 colors use one of the company style colors:

| Light theme | Dark Theme |
|-----|-----|
|<p style="color:#2af598;">#2af598</p>|<p style="color:#08af64;">#08af64</p>|
|<p style="color:#22e4ac;">#22e4ac</p>|<p style="color:#14a38e;">#14a38e</p>|
|<p style="color:#1bd7bb;">#1bd7bb</p>|<p style="color:#0f9295;">#0f9295</p>|
|<p style="color:#14c9cb;">#14c9cb</p>|<p style="color:#056b8a;">#056b8a</p>|
|<p style="color:#0fbed8;">#0fbed8</p>|<p style="color:#121212;">#121212</p>|
|<p style="color:#08b3e5;">#08b3e5</p>||


#### 2) If your plot needs multiple colors, choose an appropriate palette using either of the following tutorials:
- https://seaborn.pydata.org/tutorial/color_palettes.html
- https://matplotlib.org/stable/tutorials/colors/colormaps.html

#### 3) Consider accessibility as well.

For qualitative plotting Seaborn's 'colorblind' palette is recommended. For maps with sequential or diverging it is recommended to use one of the Color Brewer schemes which can be previewed at https://colorbrewer2.org/.

If you want to design your own colour scheme, it should use the same principles as Cynthia Brewer's research (with variation not only in hue but also, saturation or luminance).

### References

Be sure to acknowledge your sources and any attributions using links or a reference list.

If you have quite a few references, you might wish to have a dedicated section for references at the end of your document, linked using footnote style numbers.

You can connect your in-text reference by adding the number with a HTML link: ```<a href="#fn-1">[1]</a>```

and add a matching ID in the reference list using the ```<fn>``` tag: ```<fn id="fn-1">[1] Author (Year) _Title_, Publisher, Publication location.</fn>```

------

------

<div class="usecase-title">Small Area Population Growth & Active Transport Needs Analysis</div>

<div class="usecase-authors"><b>Authored by: </b>Angie Hollingworth and Mick Wiedermann</div>

<div class="usecase-duration"><b>Duration:</b> 90 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python</div>
</div>

<div class="usecase-section-header">Scenario</div>

- As a future resident of Melbourne, I want to live close to active and/or public transport routes. I prefer not to use my car in and around the city, where shall I live?
- As a city council, we wish to increase the sustainability of our city and reduce the number of motor vehicles coming and going to lower emissions. What infrastructure investment will help achieve this goal?
- As a city council, we wish to see our highest areas of active transport to identify where we could increase services for our residents


<div class="usecase-section-header">Exploratory Data Analysis Objectives</div>

The goals for this analysis (Part A) are: 
- Analyse population growth at the suburb level to quantify the speed of growth of each suburb relative to one another.   
- Analyse the existing active transportation routes’ current demand and access relative to the forecast growth of the population.  
- Identify key areas where active transportation routes could experience higher demand therefore may require additional infrastructure. 

Population Growth & Public Transport Needs Analysis (Part B) will extend this analysis and include public transport, trams, buses, and trains.  


<div class="usecase-section-header">Strategic Benefits for the City of Melbourne</div>

This use case and analysis in conjunction with Part B, can help Melbourne City meet strategic and sustainability goals in the following ways: 
- Support discussions with infrastructure-related partners for the location of new or upgraded, public and active transportation routes to reduce the use of motorised vehicles in turn reducing emissions helping to meet the climate and biodiversity emergency objective.
- Encouraging additional purpose-designed bike paths in heavy use areas can remove bicycles from the road and reduce the number of bike-related injuries helping to meet the safety and well-being objective. 
- Identify areas of higher active transport traffic (foot/bicycle etc) in comparison with predicted population growth to establish a use-case for more resources to encourage a greater use of active transport paths and or bike lanes 


<div class="usecase-section-header">Why Inner-City Transport Routes Matter </div>

Melbourne City is the first in Australia to make a [Voluntary Local Review (VLR) Declaration](https://www.melbourne.vic.gov.au/about-council/vision-goals/Pages/united-nations-sustainable-development-goals.aspx) which is a United Nations initiative for local and regional governments worldwide to formally commit to and report their local progress toward the seventeen Sustainable Development Goals.

By examining the active and public transport routes and usage within Melbourne City, in conjunction with the population growth forecasts, we hope to identify areas with existing and projected increased demand for additional active and public transport routes. 

The hope is that by ensuring that the appropriate sustainable transport options are available and easily accessible, we would discourage the use of motorised vehicles within Melbourne City reducing emissions while creating a more sustainable city.   

This will help Melbourne City to achieve two of the UN sustainability goals namely sustainable cities and communities, and climate action, along with a key strategic objective, the [climate and biodiversity emergency](https://www.melbourne.vic.gov.au/about-council/vision-goals/Pages/council-plan.aspx) objective which prioritises the reduction of emissions.

<div class="usecase-section-header">Data Requirments</div>

## Melbourne Open Data Datasets
### Population Growth Forecast Data
Our first and arguably the most important dataset for this analysis is the [*City of Melbourne Population Forecasts by Small Area 2021-2041*](https://data.melbourne.vic.gov.au/People/City-of-Melbourne-Population-Forecasts-by-Small-Ar/sp4r-xphj) from Melbourne Open Data which provides population forecasts by single year for 2021 to 2041. Prepared by SGS Economics and Planning (Jan-Jun 2021), forecasts are available for the municipality and small areas, as well as by gender and 5-year age groups.

### Super Tuesday Bike Count data
To understand the current volume of cyclists on bike paths, we can use the [*Annual Bike Counts (Super Tuesday)*](https://data.melbourne.vic.gov.au/Transport/Annual-Bike-Counts-Super-Tuesday-/uyp8-7ii8)) dataset. In summary, the dataset contains observed bike counts from sites across the city and is part of Australia’s biggest annual commuter bike path count dataset. Later datasets for Super Tuesday include greater information about the types of bike path user (walker, bike rider, gender etc)

### Bike Path Geospatial data
For this analysis we are looking solely at the Active transport routes. For a visual representation of the bike paths/routes, we require geospatial data. The following dataset [*Bicycle routes, including informal, on-road and off-road routes*](https://data.melbourne.vic.gov.au/Transport/Bicycle-routes-including-informal-on-road-and-off-/24aw-nd3i) contains information about each of the paths along with the geospatial data. 

## Other Datasets
### Victorian Suburbs Geospatial Data
In order to visualise our population forecasts as a map overlay, we need the geographical coordinates of the suburbs we'll be examining. For this we'll use the *VIC Suburb/Locality Boundaries - PSMA Administrative Boundaries GeoJSON* dataset from the Australian Government site [data.gov.au](https://data.gov.au/dataset/ds-dga-af33dd8c-0534-4e18-9245-fc64440f742e/distribution/dist-dga-d467c550-fdf0-480f-85ca-79a6a30b700b/details?q=). 


<div class="usecase-section-header">Importing the data</div>

Before importing our datasets, we shall first import the necessary libraries to support our exploratory data analysis and visualisation.

The following are the core packages required for this analysis:

- {List each non-standard package and why briefly why you're using it. No need to list commonly used packages like numpy, maths,os, time, pandas}
- GeoPandas: Allows us to plot patial data and overlay that data on maps. 
- Folium: 

In [2]:
# For importing the data and using API
from sodapy import Socrata
from urllib.request import urlopen
import os
import zipfile as zf
import requests
from io import BytesIO 

# Working with the data
from shapely.geometry import Polygon, Point
import numpy as np
import pandas as pd
import geopandas as gpd
import json

# Visualisation
from IPython.display import IFrame, display, HTML
import matplotlib.pylab as plt
import seaborn as sns
import warnings
import folium
from folium import plugins
from folium.plugins import HeatMap

# Turn off warnings for report purposes (enable for debugging)
warnings.filterwarnings('ignore')

To connect to the *Melbourne Open Data Portal* we must establish a connection using the sodapy library by specifying a domain, being the website domain where the data is hosted, and an application access token that can be requested from the City of Melbourne Open Data portal by registering [here](https://data.melbourne.vic.gov.au/signup)

For this exercise, we will access the domain without an application token. Each dataset in the Melbourne Open Data Portal has a unique identifier that can be used to retrieve the dataset using the sodapy library.

The *City of Melbourne Population Forecasts by Small Area 2021-2041* dataset unique identifier is *sp4r-xphj*. We will pass this identifier into the sodapy command below to retrieve this data placing it into a Pandas dataframe.


In [5]:
apptoken = os.environ.get("SODAPY_APPTOKEN") # Anonymous App Token
domain = "data.melbourne.vic.gov.au"
client = Socrata(domain, apptoken)           # Open Dataset Connection
pop_data_unique_identifier = 'sp4r-xphj'   

# Population Forecast Data
population_data = pd.DataFrame.from_dict(client.get_all(pop_data_unique_identifier))
population_data.tail()



SSLError: HTTPSConnectionPool(host='data.melbourne.vic.gov.au', port=443): Max retries exceeded with url: /resource/sp4r-xphj.json?%24offset=0 (Caused by SSLError(SSLCertVerificationError("hostname 'data.melbourne.vic.gov.au' doesn't match '*.opendatasoft.com'")))

Next, we’ll use the same app token, domain, and client to import the remaining datasets described above from the Melbourne Open Data Portal. 

In [None]:
# Bike Path Geographical Data
bike_geo_data_unique_identifier = 'hmuz-nz6m'
bike_path_data_url = 'https://'+ domain +'/api/geospatial/'+ bike_geo_data_unique_identifier +'?method=export&format=GeoJSON'
with urlopen(bike_path_data_url) as result:
    BIKE_PATHS = json.load(result) 

# Bike Count Data
bike_count_data_unique_identifier = 'uyp8-7ii8'
BIKE_USAGE_COUNT = pd.DataFrame.from_dict(client.get_all(bike_count_data_unique_identifier))

The last Dataset we need to import is the *Victorian Suburbs/Locality Boundaries* from the Australian Government site, data.gov.au, which is freely available for download via the below URL. As the data includes geometric data, we will also import this data into a GeoPandas dataframe.

In [None]:
# Suburb Geographical Data
suburb_geo_data_url = ('https://data.gov.au/geoserver/vic-suburb-locality-boundaries-psma-administrative-'
    + 'boundaries/wfs?request=GetFeature&typeName=ckan_af33dd8c_0534_4e18_9245_fc64440f742e&outputFormat=json')

vic_suburb_data = gpd.read_file(suburb_geo_data_url)
VIC_SUBURBS = vic_suburb_data[['vic_loca_2', 'geometry']]
VIC_SUBURBS = VIC_SUBURBS.rename(columns={'vic_loca_2':'suburb'})

VIC_SUBURBS.head()

<div class="usecase-section-header">Cleaning and Preparing Our Data</div>

As we know from reading the data descriptions and dictionaries from each of the datasets, most have data that we don’t require. In this section, we will extract the data we need and join various sets ready for analysis. 

### Population Forecast and Suburb Geospatial Data
Starting with the population and suburb geospatial data, we need to remove suburbs from the geospatial data that are not included in our population forecast data. Also, we need to match the City of Melbourne suburbs by name which require a merge of certain fields and renaming of others. 

By passing the population dataset to the following two functions, we can clean the dataset to return only the columns (information) that we require whilst also formatting the column names and data types. The second function will return a new data frame with either the population totals by suburb for the year specified or all years if the boolean parameter is set to true.   

In [None]:
def prepare_pop_data(dataset):
    """
    Filters and cleans the Population dataset returning a new pandas dataframe.
    
        dataset: The City of Melbourne Population Forecasts by Small Area 2021-2041. 
    """
    # Excluding the population totals & average age
    dataset = dataset.loc[dataset['gender'] == 'Total']
    dataset = dataset.loc[dataset['age'] != 'Average age']
    
    # Extract the colomns of interest into "summary" and rename geography.
    summary = dataset[['geography', 'year', 'value', 'age']]
    summary = summary.rename(columns={'geography':'suburb'})
    
    # Convert datatypes
    summary = summary.astype({'year':int, 'value':float, 'suburb':'string'})

    # Consolidating and updating suburb names to match the Geospatial data.
    summary['suburb'] = summary['suburb'].replace(['Melbourne (CBD)', 'Melbourne (Remainder)'], ['Melbourne', 'Melbourne'])
    summary['suburb'] = summary['suburb'].replace(['West Melbourne (Residential)'], ['West Melbourne'])
    
    # Removing unrequired data.  
    summary.drop(summary.index[summary['suburb'] == 'West Melbourne (Industrial)'], inplace=True)
    summary.drop(summary.index[summary['suburb'] == 'City of Melbourne'], inplace=True)
    summary.drop(summary.index[summary['suburb'] == 'Port Melbourne'], inplace=True)
    
    # Sorting the data and resetting the indexes.
    summary.sort_values(['suburb'], inplace = True)
    summary = summary.reset_index(drop=True)
    '''
    n.b Port Melbourne Population count is very small and doesn't seem to be acurate. This jumps in subsequent years skewing the data,
        for that reason we have chosen to exclude port melbourne from the analisys. 
    Melbourne (Remainder) and Melbourne (CBD) were combined to match the Geospatial data.
    '''
    summary = gpd.GeoDataFrame(summary)

    return summary

In [None]:
# Clean the initial dataset to contain only the information we require
POPULATION_DATA = prepare_pop_data(population_data)

POPULATION_DATA.head()

In [None]:
def population_by_year(grouped_data, year, all_years=False):
    """
    Returns Geo DataFrame of the population by suburb of the year specified or all years if set to True.
    
        dataset: The prepared "POPULATION_DATA" dataset. 
        year:    The desired year for the population totals.
        all_years (bool, optional): If True, will return a summary of all years.
    """
    grouped_data = grouped_data[['suburb', 'year', 'value']]
    
    if all_years:
        grouped_data = grouped_data.groupby(['suburb', 'year'])['value'].sum()
        grouped_data = grouped_data.reset_index()
    else:
        grouped_data = grouped_data.loc[grouped_data['year'] == year]
        grouped_data = grouped_data.groupby(['suburb', 'year'])['value'].sum()
        grouped_data = grouped_data.reset_index()
        grouped_data = grouped_data.rename(columns={'value': str(year)})
        grouped_data = grouped_data.drop(columns=['year'])
        grouped_data = grouped_data.astype({'suburb':'string'})
    
    grouped_data = gpd.GeoDataFrame(grouped_data)
    
    return grouped_data


In [None]:
# Reducing the dataset size with the pop_data_by_year function above.
population_2022 = population_by_year(POPULATION_DATA, 2022)

population_2022

Next, we’ll utilise our reduced dataset above to filter and extract the suburbs of interest from the geospatial data. 

In [None]:
# Extract the suburbs of interest that match the population_data into "target_suburbs".
target_subs = population_2022['suburb'].str.upper()

# Locate the index of the target suburbs and store as a list in "subs"
subs = [VIC_SUBURBS.index[VIC_SUBURBS['suburb']==sub].tolist()[0] for sub in target_subs]

#Remove unwanted rows and keep data in geo dataframe format
CITY_SUBURBS = VIC_SUBURBS.take(list(subs))
CITY_SUBURBS.reset_index(drop=True, inplace = True)
CITY_SUBURBS['suburb'] = CITY_SUBURBS['suburb'].str.title()
CITY_SUBURBS = CITY_SUBURBS.astype({'suburb':'string'}) 

CITY_SUBURBS

This DataFrame now only includes suburbs found in our population dataset and their geospatial data (suburb outline). These two datasets can now be combined and used together in the analysis.  

### Bike Count and Path Geospatial Data  

A quick preview of our bike data shows that we have far more attributes (columns) than required. Let's extract the columns of interest for the most recent year and remove any NaN fields. Further, we need to update the data types for the extracted columns and combine the latitude and longitude into a single column containing a Point object which we’ll call geometry.  

In [None]:
# View the output from the bike_count dataset that we imported earlier

BIKE_USAGE_COUNT.head(1)

In [None]:
# Reduce the dataset to only include columns needed for the analysis & mapping.
bike_usage_data =  gpd.GeoDataFrame(BIKE_USAGE_COUNT[['latitude','longitude','total','year', 'description']])

# Drop missing data.
bike_usage_data.dropna(inplace=True)   

# Get a list of years that the bike counts were completed. Convert values to an integer data type. 
years = [int(x) for x in bike_usage_data['year'].unique()]

# Extracting the most recent year of data.
bike_usage_data = bike_usage_data.loc[bike_usage_data['year'] == str(max(years))]

# Updating the data types.
degs = ['latitude','longitude']
for col in degs:
    bike_usage_data[col] = bike_usage_data[col].astype(float)
bike_usage_data['total'] = bike_usage_data['total'].astype(int)
bike_usage_data.reset_index(drop=True, inplace = True)

# Combine the latitude & longitude values to a GeoPandas Point object in a new column for mapping.
df_geometry = [Point(xy) for xy in zip(bike_usage_data['latitude'], bike_usage_data['longitude'])]

# Create a GeoPandasDataFrame with the above cleaned dataset.
BIKE_USAGE_DATA = gpd.GeoDataFrame(bike_usage_data, crs = 4326, geometry = df_geometry)

BIKE_USAGE_DATA.head()

Finally, the bike path data is contained within a dictionary rather than a Pandas DataFrame. In this instance the dataset only contains relevant fields, so we have nothing to remove.  

In [None]:
# Viewing the attributes of the bike path data we imported earlier.  

print(BIKE_PATHS.keys())
BIKE_PATHS['features'][0]['properties'].keys()

<div class="usecase-section-header">Analysing Our Datasets</div>

## Population Growth Analysis
To begin our analysis, we'll first take a look at the population forecast data in five-year intervals. We will utilise our population by year function to build a new data frame and then plot the data as a clustered bar chart.

First, let's confirm the first and last year contained within the population forecast dataset. 

In [None]:
# Select the earliest and latest years for the population forecast in our dataset
start_year = min(POPULATION_DATA['year'])
final_year = max(POPULATION_DATA['year'])

# In this case we expect it to be 2021 and 2041. Let's print to confirm
print(f'Earliest year: {start_year},  Latest year: {final_year}' )

Next, we'll extract a summary for the first year and every fifth year thereafter to create a new dataframe for our clustered bar chart. 

In [None]:
# Extracting each year of interest.
population_start = population_by_year(POPULATION_DATA, start_year)
population_2026 = population_by_year(POPULATION_DATA, 2026)
population_2031 = population_by_year(POPULATION_DATA, 2031)
population_2036 = population_by_year(POPULATION_DATA, 2036)
population_final = population_by_year(POPULATION_DATA, final_year)

# Combining the years into a single dataframe.
population_5y_intervals = population_start.merge(population_2026, left_on='suburb', right_on='suburb')
population_5y_intervals = population_5y_intervals.merge(population_2031, left_on='suburb', right_on='suburb')
population_5y_intervals = population_5y_intervals.merge(population_2036, left_on='suburb', right_on='suburb')
population_5y_intervals = population_5y_intervals.merge(population_final, left_on='suburb', right_on='suburb')

# Converting the dataframe to a GeoPandas dataframe. 
population_5y_intervals = gpd.GeoDataFrame(population_5y_intervals)
population_5y_intervals.head()

Now we can generate our clustered bar chart and take our first visual representation of the data.

In [None]:
sns.set_style('darkgrid')
ax = population_5y_intervals.plot(x= 'suburb', kind='bar', stacked=False, figsize = (15,8))
plt.style.use('seaborn-colorblind')
ax.set_xticklabels(population_5y_intervals.suburb, rotation=45)
plt.title('Population Counts by 5 Year Intervals\n', size=14)
plt.ylabel('Population Counts\n', size=12)
plt.xlabel('City of Melbourne Suburbs', size=12)
ax.plot();

Examining the bar chart, we can see the concentration of Melbourne City's population is contained within the central suburb of Melbourne itself, which also includes the CBD. Most suburbs show quite a steep growth forecast apart from East Melbourne and South Yarra which seem quite flat in comparison.   

Let's take a closer look at each or the suburbs forecast population change from the first to the last year included in the dataset so we can better see the change relative to one another. 

In [None]:
# Extracting a summary of our population data using the "population_by_year" function. 
population_all = population_by_year(POPULATION_DATA, start_year, True)

# Limiting the extraction to the first and last years as defined above.
population_21_41 = population_all.loc[population_all['year'].isin([start_year, final_year])]

# Finaly, plotting the distribution of the two years as a comparison. 
pop_dist = sns.displot(population_21_41, x='value', bins=10, hue='suburb', multiple='stack', col='year')
pop_dist.set_axis_labels('Population', 'Count', size=13);


Here we can clearly see the population for South Yarra and East Melbourne having minimal growth, both remaining under 10,000 residents while suburbs such as Melbourne, Southbank, North Melbourne, Docklands and Carlton, leaping ahead with what looks like a doubling of their populations.  

Let’s see how this change looks overlayed on a map of Melbourne City. We can use a choropleth map to visualise the change between each suburb to determine if the forecast growth is taking place in a cluster or sporadically across Melbourne City. 

The following function will return a GeoPandas DataFrame with our population and suburb geospatial datasets joined with the population for the two years passed and the difference between the two years as both a number and percentage. We can then plot the change between any two given years as a number or percentage. 

In [None]:
def population_change(population_df, suburbs_df, year_1, year_2):
    """
    Returns Geo DataFrame of the population by suburb of the years specified along
    with the growth as a number and a percentage between the two years.
    
        population_df:   The prepared "POPULATION_DATA" dataframe. 
        suburbs_df:      The prepared "CITY_SUBURBS" dataframe.
        year_1 & year_2: The years of interest.
    """
    # Utilising the "population_by_year" function to prepare a dataframe for each year of interest.
    start_year = population_by_year(population_df, year_1)
    end_year = population_by_year(population_df, year_2)

    # Join our above two dataframes based on "suburb"
    combined =  start_year.merge(end_year, left_on='suburb', right_on='suburb')    
    
    # calculate % and # of changes within the two years for each suburb
    combined['growth #'] = combined[list(end_year)[1]] - combined[list(start_year)[1]]
    combined['growth %'] = round((combined[list(combined)[3]] / combined[list(start_year)[1]])*100,2)
    
    # Add the geometry (for mapping) from our CITY_SUBURBS dataframe
    combined['geometry'] = suburbs_df['geometry']
    
    # Convert our standard pandas dataframe to a GeoPandas data frame for map rendering
    combined = gpd.GeoDataFrame(combined)

    return combined

In [None]:
# Looking at the total growth between the start_year and final_year.
pop_diff_2021_2041 = population_change(POPULATION_DATA, CITY_SUBURBS, start_year, final_year)

# Display the head for the new Data Frame with changes in population growth
pop_diff_2021_2041

Now we have a dataframe that captures the total growth, let’s look at the change as a choropleth map.

In [None]:
pop_change_20y = pop_diff_2021_2041.explore(column ='growth %', 
    tiles='CartoDB positron', zoom_start=13, cmap='winter')

# Display our map
pop_change_20y

Examining the map above we can clearly see North Melbourne will outpace all others over the period captured by the data. The next question that comes to mind is if this forecast growth is consistent across all years. Lets’ look at the change at each 5-year interval, to see if the growth rate matches the above choropleth map.  

To do this we'll write two small functions, one to return the map as above, and a second to allow us to examine the maps side by side. 

### Visualising Multiple Years of Change

Mapping the change with a folium map layer.

In [None]:
def growth_map(population_data, title = "Melbourne City", zoom = 12): 
    """
    Returns a choropleth folium map layer.
        population_data:   A prepared dataframe returned from "population_change". 
        title:  Optional - The title of the growth map. Default = Melbourne City.
        zoom:   Optional - The prefered starting zoom. Default = 12.
    """
    geo_layer = population_data.explore(column ='growth %', tiles='CartoDB positron', 
                                        cmap='winter', name = title, zoom_start=zoom)
    folium.LayerControl().add_to(geo_layer)
    
    return geo_layer 

Generating a HTML frame to contain two maps in one visualisation for ease of comparison. 

In [None]:
def create_html_comparison_maps(map_1, map_2):
    """
    Returns an iframe object to display both maps passed side by side for ease of comparison.
        map_1 & map_2: Map layers prepared with the "growth_map" function. 
    """
    htmlmap = HTML('<iframe srcdoc="{}" style="float:left; width: {}px; height: {}px; display:inline-block; width: 49%; margin: 0 auto; border: 2px solid #0f9295"></iframe>'
       '<iframe srcdoc="{}" style="float:right; width: {}px; height: {}px; display:inline-block; width: 49%; margin: 0 auto; border: 2px solid #0f9295"></iframe>'
       .format(map_1.get_root().render().replace('"', '&quot;'),400,400,
               map_2.get_root().render().replace('"', '&quot;'),400,400))

    return (htmlmap) #display

Let's create some 5-year buckets to compare the rate of population growth.

In [None]:
# # Create the first 6 year bucket 
# # We know form above our earliest year in the dataset is 2021, and 2026
# pop_change_1 = population_change(POPULATION_DATA, CITY_SUBURBS, 2021, 2026) 

# # Create our map layer using our new function above
# map_2021_2026 = growth_map(pop_change_1, title = 'Population growth 2021-2026')

In [None]:
# View our first map to make sure everything is working

In [None]:
### Population Growth 2021 - 2026

In [None]:
#map_2021_2026

In [None]:
# Repeat the above 3 more time to cover our full 20 year dataset

In [None]:
# Preparing our dataframe with the "population_change" function. 
pop_change_1 = population_change(POPULATION_DATA, CITY_SUBURBS, 2021, 2026)

# Create our map layer using our new function above.
map_2021_2026 = growth_map(pop_change_1, title = 'Population growth 2021-2026')

# Repeating the above to capture the entire dataset.
pop_change_2 = population_change(POPULATION_DATA, CITY_SUBURBS, 2026, 2031) 
map_2026_2031 = growth_map(pop_change_2, title = 'Population growth 2026-2031')

pop_change_3 = population_change(POPULATION_DATA, CITY_SUBURBS, 2031, 2036) 
map_2031_2036 = growth_map(pop_change_3, title = 'Population growth 2031-2036')

pop_change_4 = population_change(POPULATION_DATA, CITY_SUBURBS, 2036, 2041) 
map_2036_2041 = growth_map(pop_change_4, title = 'Population growth 2036-2041')

In [None]:
# Create our two map comparison HTML elements
growth_5y_10y = create_html_comparison_maps(map_2021_2026, map_2026_2031)
growth_15y_20y = create_html_comparison_maps(map_2031_2036, map_2036_2041)

In [None]:
### Population Growth 6 year buckets, 2021- 2041

In [None]:
# Let's view our maps side by side
display(growth_5y_10y)

In the above maps we can see that the highest % of growth is initially within the inner suburbs, Melbourne CBD, Southbank, Docklands, East Melbourne all at around 25% increase in population over a 5-year period. 

However, after a 10-year period we can see that that increase shifts Northwest to West Melbourne and North Melbourne with around a 30% increase in population. 

We can also see that the population growth in South Yarra is beginning to slow from 24% to 18%. 

In [None]:
display(growth_15y_20y)

Looking ahead at the next 10-year buckets, we can see that South Yarra population growth not only comes to an almost halt for the 2031-2036 period but is predicted to decline by 2041 (in comparison to 2036). 
  
North Melbourne remains a steady state of predicted increase at around 25% throughout the above 10-year period. 

Carlton drops from 27% growth, to 25% then down to less than 8% predicted growth. 
  
There is enough change in the predicted growth rates above, for us to have one more look at our 20-year period of growth. 

In [None]:
# Create the same maps as our above datasets
pop_change_5 = population_change(POPULATION_DATA, CITY_SUBURBS, 2021, 2041) 
map_2021_2041 = growth_map(pop_change_5, title = 'Population Difference (2021 - 2041)', zoom=13)

In [None]:
display(map_2021_2041)

As we saw in the prior four 5-year buckets, we can confirm that North Melbourne is predicted to have the highest increase in population growth at around 145% on the population totals in 2021. 

The initial inner cities that we saw with the initial population growth increases are all next in line at around a 100% increase (or double) in the residents of those suburbs. 

## Active Transport Analysis

In [None]:
# Revisiting our bike data, lets have a look at what we have.
print('Bike path usage:')
print(BIKE_USAGE_DATA.keys(), type(BIKE_USAGE_DATA))

print('\nBike paths:')
print(BIKE_PATHS.keys(), type(BIKE_PATHS))

In [None]:
# Having another look at bike usage, we can see that this has a lat, long pair for mapping.
BIKE_USAGE_DATA.head()

In [None]:
# Our bike path data is in JSON format, so it doesn't print in a table. 
# We can see that we have a collection of lat,long pairs which make a path (features, geometry, coordinates)
print(BIKE_PATHS['features'][0].keys())
print(BIKE_PATHS['features'][0]['type'])
print(BIKE_PATHS['features'][0]['properties'])
print(BIKE_PATHS['features'][0]['geometry'].keys())
print(BIKE_PATHS['features'][0]['geometry']['type'], BIKE_PATHS['features'][0]['geometry']['coordinates'][0][:2])


To visualise this data, we need to be able to combine them so that we can see where the usage counts took place along the bike routes.

In [None]:
def get_bike_path_types(bike_paths):
    #Create an empty list to store our bike path types
    bike_paths_types = []

    # Iterate through the bike path data and look at features
    for feature in bike_paths['features']:

        # Set the route feature type to a variable
        r_type = feature['properties']['type']

        # check if the type is already in the list of unique featers (bike_paths_types)
        if bike_paths_types.count(r_type) == 0:

            # Add unique path type to list
            bike_paths_types.append(feature['properties']['type'])
    
    return  bike_paths_types

In [None]:
bike_path_type_list = get_bike_path_types(BIKE_PATHS)

#let's look at the output
print(bike_path_type_list)


The function below will take the bike path JSON file, and return map layers for every type of bike path in the dataset (using the above function to get the list).

In [None]:
def create_map_layer_with_path_features(bike_routes):    
    '''Create Folium map layers with the different bike path types'''
    
    # We can use the above function to get a list of route types
    bike_paths_types = get_bike_path_types(bike_routes)
    
    # Create a list of hex colours to match our theme that has enough values for all of our path types
    colours = ['#056b8a', '#14a38e', '#2af598', '#08b3e5', '#08af64'] 
    
    # Create an empty list to hold our map layers for each path type
    map_routes = []
    
    # Loop through the bike path types (from above)
    for i, route_type in enumerate(bike_paths_types): # We use enumerate to get the value and index of the loop
        
        # create a copy of the bike path json dataset
        route_json = bike_routes.copy()
        
        # remove any features (bike routes) that do not match the route type for the loop
        route_json['features'] = [path for path in route_json['features'] if path['properties']['type'] == route_type]
        
        # Add colour property to dataset
        # for each feature in the new json file, add a __colour property and add the colour value [i]
        for data in route_json['features']:
            data['properties']['__colour'] = colours[i]
        
        # create the folium map layer for this route type
        g = folium.GeoJson(
                    route_json,# Our copy of our bike route data for this one path type
                    name=f'{route_type}',     
            
                    # This is a lambda function for Folium that applies the colour label to each feature
                    style_function=lambda x:{ 
                        "color": x["properties"]["__colour"],
                        "fillColor": x["properties"]["__colour"],}, 
            
                    # Add some features to popup when you hover over a point on the map
                    tooltip=folium.features.GeoJsonTooltip(fields=['name','direction','type','notes'])
                    )
        
        # add the map layer to the list of layers
        map_routes.extend([g])
        
    return map_routes

Let's have a look at what the above function does to our json data before we use it in the next function

In [None]:
bike_map_route_layers = create_map_layer_with_path_features(BIKE_PATHS)

In [None]:
# View the output from the above function, we can see we get a list of folium features (layers) for our map
bike_map_route_layers

Now that we have our bike path map layers, we need to create a map that has both the bike routes, and the usage counts so that we can visualise where people are mostly using the bikepaths

In [None]:
def draw_heatMap(data, bikepath, city_map, colour = 'blue'):
    '''Function for creating a Folium map with layers for the two bike datasets'''
    
    # Create a base map layer with our city map
    m =  city_map.explore(name = 'Melbourne City Suburbs',style_kwds = dict(color= '#22e4ac', fillOpacity = 0.4, opacity = 0.4),
                         zoom_start=13)
    
    #call the above function to create the coloured bike path layers  
    bike_routes = create_map_layer_with_path_features(bikepath) 
    # Add each bike path type map layer to our base map
    for r in bike_routes:
        r.add_to(m)
        
    # Create Bike count heat map layer - this will allow us to see hot spots for usage
    labels = data['description']
    
    #Convert data to list for heat map rendering
    data = list(map(list, zip(data['latitude'], data['longitude'],data['total'])))        
    # Add our formatted data as a heat map layer to our base map
    HeatMap(data, name='Bike Counts').add_to(m)
    
    # Create markers with the bike count data so we can see what the "heat" relates to
    for i, location in enumerate(data):
        # From our table view abovem we know that [0] and [1] are our latitude and longitude values
        folium.Marker(location=[location[0], location[1]],
                      popup=f'<strong>{labels[i]}</strong>',
                      tooltip=f'Bike Count: {location[2]}',
                      # create an empty icon so that our map isn't cluttered
                      icon = folium.DivIcon(html =f"""<div style="color: {colour};">  </div>""")
                     ).add_to(m)
        
    # Add layer control to switch on and off map features
    folium.LayerControl().add_to(m)
    
    return m # Map with layers

Use the above function to create our  map of layers with both sets of bike data.

In [None]:
# Create a map object
bike_map = draw_heatMap(BIKE_USAGE_DATA,BIKE_PATHS, CITY_SUBURBS, colour = 'green')

In [None]:
# Visualise our data in a map
bike_map

Let's bring back our final map with all the years so that we can compare to the active routes

In [None]:
bike_population_maps = create_html_comparison_maps(bike_map, map_2021_2041)

In [None]:
bike_population_maps

Looking at our two maps, cooncentrating on North Melbourne (being the area with the projected highest populatioin growth), we can see that we had quite a few bike bike path counts with the highest values being where the on-road and informal bike paths interesect. This is often triple the usage on the roads.
From our metadata (or data description) of our bike count data, we know that these values were counted within a 2 hour window. 

Which infers that there (at times) are over 1,000 people riding bikes through/ around the North Melbourne area. Not every bike user will have passed by every count station, so that number is most likely significantly higher.