_**DELETE BEFORE PUBLISHING**_

_This is a template also containing the style guide for use cases. The styling uses the use-case css when uploaded to the website, which will not be visible on your local machine._

_Change any text marked with {} and delete any cells marked DELETE_

***

In [1]:
# DELETE BEFORE PUBLISHING
# This is just here so you can preview the styling on your local machine

from IPython.core.display import HTML
HTML("""
<style>
.usecase-title, .usecase-duration, .usecase-section-header {
    padding-left: 15px;
    padding-bottom: 8px;
    padding-top: 8px;
    padding-right: 15px;
    background-color: #0f9295;
    color: #fff;
}

.usecase-title {
    font-size: 1.7em;
    font-weight: bold;
}

.usecase-authors, .usecase-level, .usecase-skill {
    padding-left: 15px;
    padding-bottom: 6px;
    padding-top: 6px;
    background-color: #baeaeb;
    font-size: 1.4em;
    color: #121212;
}

.usecase-level-skill  {
    display: flex;
}

.usecase-level, .usecase-skill {
    width: 50%;
}

.usecase-duration, .usecase-skill {
    text-align: right;
    padding-right: 15px;
    padding-bottom: 6px;
    font-size: 1.4em;
}

.usecase-section-header {
    font-weight: bold;
    font-size: 1.5em;
}

.usecase-subsection-header, .usecase-subsection-blurb {
    font-weight: bold;
    font-size: 1.2em;
    color: #121212;
}

.usecase-subsection-blurb {
    font-size: 1em;
    font-style: italic;
}
</style>
""")

***

_**DELETE BEFORE PUBLISHING**_

## Style guide for use cases

### Headers

For styling within your markdown cells, there are two choices you can use for headers.

1) You can use HTML classes specific to the use case styling:

```<p class="usecase-subsection-header">This is a subsection header.</p>```

<p style="font-weight: bold; font-size: 1.2em;">This is a subsection header.</p>

```<p class="usecase-subsection-blurb">This is a blurb header.</p>```

<p style="font-weight: bold; font-size: 1em; font-style:italic;">This is a blurb header.</p>


2) Or if you like you can use the markdown header styles:

```# for h1```

```## for h2```

```### for h3```

```#### for h4```

```##### for h5```

## Plot colour schemes

General advice:
1. Use the same colour or colour palette throughout your notebook, unless variety is necessary
2. Select a palette based on the type of data being represented
3. Consider accessibility (colourblindness, low vision)

#### 1) If all of your plots only use 1-2 colors use one of the company style colors:

| Light theme | Dark Theme |
|-----|-----|
|<p style="color:#2af598;">#2af598</p>|<p style="color:#08af64;">#08af64</p>|
|<p style="color:#22e4ac;">#22e4ac</p>|<p style="color:#14a38e;">#14a38e</p>|
|<p style="color:#1bd7bb;">#1bd7bb</p>|<p style="color:#0f9295;">#0f9295</p>|
|<p style="color:#14c9cb;">#14c9cb</p>|<p style="color:#056b8a;">#056b8a</p>|
|<p style="color:#0fbed8;">#0fbed8</p>|<p style="color:#121212;">#121212</p>|
|<p style="color:#08b3e5;">#08b3e5</p>||


#### 2) If your plot needs multiple colors, choose an appropriate palette using either of the following tutorials:
- https://seaborn.pydata.org/tutorial/color_palettes.html
- https://matplotlib.org/stable/tutorials/colors/colormaps.html

#### 3) Consider accessibility as well.

For qualitative plotting Seaborn's 'colorblind' palette is recommended. For maps with sequential or diverging it is recommended to use one of the Color Brewer schemes which can be previewed at https://colorbrewer2.org/.

If you want to design your own colour scheme, it should use the same principles as Cynthia Brewer's research (with variation not only in hue but also, saturation or luminance).

### References

Be sure to acknowledge your sources and any attributions using links or a reference list.

If you have quite a few references, you might wish to have a dedicated section for references at the end of your document, linked using footnote style numbers.

You can connect your in-text reference by adding the number with a HTML link: ```<a href="#fn-1">[1]</a>```

and add a matching ID in the reference list using the ```<fn>``` tag: ```<fn id="fn-1">[1] Author (Year) _Title_, Publisher, Publication location.</fn>```

------

------

<div class="usecase-title">Small Area Population Growth & Transportation Needs Analysis</div>

<div class="usecase-authors"><b>Authored by: </b>Angie Hollingworth and Mick Wiedermann</div>

<div class="usecase-duration"><b>Duration:</b> 90 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python</div>
</div>

<div class="usecase-section-header">Scenario</div>

- As a future resident of Melbourne, I want to live close to active and/or public transport routes. I prefer not to use my car in and around the city, where shall I live?
- As a city council, we wish to increase the sustainability of our city and reduce the number of motor vehicles coming and going to lower emissions. What infrastructure investment will help achieve this goal?
- As a city council, we wish to see our highest areas of non-car traffic to identify where we could increase services for our residents


<div class="usecase-section-header">Exploratory Data Analysis Objectives</div>

The goals for this analysis are:
- Analyse population growth at the suburb level to quantify the speed of growth of each suburb relative to one another.  
- Analyse the existing public and active transportation routes’ current demand and access relative to the forecast growth of the population. 
- Identify key areas where public and active transportation routes could be upgraded or installed. 


<div class="usecase-section-header">Strategic Benefits for the City of Melbourne</div>

This use case and analysis can help Melbourne City meet strategic and sustainability goals in the following ways: 
- Support discussions with infrastructure-related partners for the location of new or upgraded, public and active transportation routes to reduce the use of motorised vehicles in turn reducing emissions helping to meet the climate and biodiversity emergency objective.
- Encouraging additional purpose-designed bike paths in heavy use areas can remove bicycles from the road and reduce the number of bike-related injuries helping to meet the safety and well-being objective. 
- Identify areas of higher non-motorised vehicle traffic (foot/bicycle/tram etc) to something… (thinking specific hot spots, maybe services like drink fountains, or chairs etc…)


<div class="usecase-section-header">Why Inner-City Transport Routes Matter </div>

Melbourne City is the first in Australia to make a [Voluntary Local Review (VLR) Declaration](https://www.melbourne.vic.gov.au/about-council/vision-goals/Pages/united-nations-sustainable-development-goals.aspx) which is a United Nations initiative for local and regional governments worldwide to formally commit to and report their local progress toward the seventeen Sustainable Development Goals.

By examining the active and public transport routes and usage within Melbourne City, in conjunction with the population growth forecasts, we hope to identify areas with existing and projected increased demand for additional active and public transport routes. 

The hope is that by ensuring that the appropriate sustainable transport options are available and easily accessible, we would discourage the use of motorised vehicles within Melbourne City reducing emissions while creating a more sustainable city.   

This will help Melbourne City to achieve two of the UN sustainability goals namely sustainable cities and communities, and climate action, along with a key strategic objective, the [climate and biodiversity emergency](https://www.melbourne.vic.gov.au/about-council/vision-goals/Pages/council-plan.aspx) objective which prioritises the reduction of emissions.

<div class="usecase-section-header">Data Requirments</div>

## Melbourne Open Data Datasets
### Population Growth Forecast Data
Our first and arguably the most important dataset for this analysis is the [*City of Melbourne Population Forecasts by Small Area 2020-2040*](https://data.melbourne.vic.gov.au/People/City-of-Melbourne-Population-Forecasts-by-Small-Ar/sp4r-xphj) from Melbourne Open Data which provides population forecasts by single year for 2020 to 2040. Prepared by SGS Economics and Planning (Jan-Jun 2021), forecasts are available for the municipality and small areas, as well as by gender and 5-year age groups.

### Tram Tracks, Bike Paths, and Bus Stops Geospatial data
For a visual representation of the current public and active transport routes, we require geospatial data for trams, bike paths, and bus stops. We can utilise the following to meet these requirements:
- The [*Tram Tracks*](https://data.melbourne.vic.gov.au/Transport/Tram-tracks/wqka-kyhz), dataset contains the line number and geospatial data.
- The [*Bicycle routes, including informal, on-road and off-road routes*](https://data.melbourne.vic.gov.au/Transport/Bicycle-routes-including-informal-on-road-and-off-/24aw-nd3i) dataset contains information about each of the paths along with the geospatial data.
- Lastly, the [*Bus stops*](https://data.melbourne.vic.gov.au/Transport/Bus-stops/ss79-v558) dataset includes a range of information along with the latitude and longitude of Melbourne City bus stops. 

### Super Tuesday Bike Count data
To understand the current volume of cyclists on bike paths, we can use the [*Annual Bike Counts (Super Tuesday)*](https://data.melbourne.vic.gov.au/Transport/Annual-Bike-Counts-Super-Tuesday-/uyp8-7ii8)) dataset. In summary, the dataset contains observed bike counts from sites across the city and is part of Australia’s biggest annual commuter bike count dataset. 

### Tram and Bus Usage Data - Not Currently Available
This analysis would be more complete with tram and bus patron data to provide daily numbers on each mode of transport as well as numbers on where and when patrons access the services. This dataset is not currently available, so we’ll infer the patronage from known population use ratios onto the population growth forecast.

## Other Datasets
### Victorian Suburbs Geospatial Data
In order to visualise our population forecasts as a map overlay, we need the geographical coordinates of the suburbs we'll be examining. For this we'll use the *VIC Suburb/Locality Boundaries - PSMA Administrative Boundaries GeoJSON* dataset from the Australian Government site [data.gov.au](https://data.gov.au/dataset/ds-dga-af33dd8c-0534-4e18-9245-fc64440f742e/distribution/dist-dga-d467c550-fdf0-480f-85ca-79a6a30b700b/details?q=). 


<div class="usecase-section-header">Importing the data</div>

Before importing our datasets, we shall first import the necessary libraries to support our exploratory data analysis and visualisation.

The following are the core packages required for this analysis:

- {List each non-standard package and why briefly why you're using it. No need to list commonly used packages like numpy, maths,os, time, pandas}
- GeoPandas: Allows us to plot patial data and overlay that data on maps. 
- Folium: 

In [2]:
# For importing the data and using API
from sodapy import Socrata
from urllib.request import urlopen
import os
import zipfile as zf
import requests
from io import BytesIO 

# Working with the data
from shapely.geometry import Polygon, Point
import numpy as np
import pandas as pd
import geopandas as gpd
import json

# Visualisation
from IPython.display import IFrame, display, HTML
import matplotlib.pylab as plt
import warnings
import folium

To connect to the *Melbourne Open Data Portal* we must establish a connection using the sodapy library by specifying a domain, being the website domain where the data is hosted, and an application access token that can be requested from the City of Melbourne Open Data portal by registering [here](https://data.melbourne.vic.gov.au/signup)

For this exercise, we will access the domain without an application token. Each dataset in the Melbourne Open Data Portal has a unique identifier that can be used to retrieve the dataset using the sodapy library.

The *City of Melbourne Population Forecasts by Small Area 2020-2040* dataset unique identifier is *sp4r-xphj*. We will pass this identifier into the sodapy command below to retrieve this data placing it into a Pandas dataframe.


In [3]:
apptoken = os.environ.get("SODAPY_APPTOKEN") # Anonymous App Token
domain = "data.melbourne.vic.gov.au"
client = Socrata(domain, apptoken)           # Open Dataset Connection
pop_data_unique_identifier = 'sp4r-xphj'   

# Population Forecast Data
population_data = pd.DataFrame.from_dict(client.get_all(pop_data_unique_identifier))
population_data.tail()



Unnamed: 0,geography,year,gender,age,value
17047,West Melbourne (Residential),2037,Not applicable,Total population,14854
17048,West Melbourne (Residential),2038,Not applicable,Total population,14815
17049,West Melbourne (Residential),2039,Not applicable,Total population,14794
17050,West Melbourne (Residential),2040,Not applicable,Total population,14824
17051,West Melbourne (Residential),2041,Not applicable,Total population,14814


Next, we’ll use the same app token, domain, and client to import the remaining datasets described above from the Melbourne Open Data Portal. 

In [4]:
# Tram Track Geographical Data
tram_geo_data_unique_identifier = 'au2t-98pn' 
tram_track_data = gpd.GeoDataFrame.from_dict(client.get_all(tram_geo_data_unique_identifier))

# Bike Path Geographical Data
bike_geo_data_unique_identifier = 'hmuz-nz6m'
bike_path_data_url = 'https://'+ domain +'/api/geospatial/'+ bike_geo_data_unique_identifier +'?method=export&format=GeoJSON'
with urlopen(bike_path_data_url) as result:
    bike_path_data = json.load(result) 

# Bus Stop Geographical Data
bus_geo_data_unique_identifier = 'vzsu-xnf6'
bus_stop_data = gpd.GeoDataFrame.from_dict(client.get_all(bus_geo_data_unique_identifier))

# Bike Count Data
bike_count_data_unique_identifier = 'uyp8-7ii8'
bike_count_data = pd.DataFrame.from_dict(client.get_all(bike_count_data_unique_identifier))

The last Dataset we need to import is the *Victorian Suburbs/Locality Boundaries* from the Australian Government site, data.gov.au, which is freely available for download via the below URL. As the data includes geometric data, we will also import this data into a GeoPandas dataframe.

In [5]:
# Suburb Geographical Data
suburb_geo_data_url = ('https://data.gov.au/geoserver/vic-suburb-locality-boundaries-psma-administrative-'
    + 'boundaries/wfs?request=GetFeature&typeName=ckan_af33dd8c_0534_4e18_9245_fc64440f742e&outputFormat=json')
vic_suburb_data = gpd.read_file(suburb_geo_data_url)

<div class="usecase-section-header">Cleaning and Preparing Our Data</div>

As we know from reading the data descriptions and dictionaries from each of the datasets, most have data that we don’t require. In this section, we will extract the data we need and join various sets ready for analysis. 

#### Population Forecast and Suburb Geospatial Data
Starting with the population and suburb geospatial data, we need to remove suburbs from the geospatial data that are not included in our population forecast data. Also, we need to match the City of Melbourne suburbs by name which require a merge of certain fields and renaming of others. 

By passing the population dataset to the following function while specifying our year of interest, the function will remove any unnecessary suburbs, clean, and return a summarised version of our data as a Pandas dataframe containing our suburbs of interest for the year specified.  

In [6]:
def pop_data_by_year(dataset, year):
    """
    Filters and cleans the Population dataset returning a new pandas dataframe focused on the year passed to the function.
    
    Note that the year must be between 2020 and 2040 inclusive. 
    """
    dataset = dataset.loc[dataset['age'] == 'Total population']
    # Extract the colomns of interest into "summary".
    summary = dataset[['geography', 'year', 'value']]
    # Convert datatypes and rename geography to suburb
    summary = summary.astype({'year':int, 'value':float, 'geography':str})
    summary.rename(columns={'geography':'suburb'}, inplace=True)
    # Extract the data matching the year passed from the summary.
    data = summary[summary['year'] == year]
   
    # Grouping the data by suburb while summing the population values. 
    data = pd.DataFrame(data.groupby('suburb')['value'].sum())
    data = data.reset_index()
    # Renaming the column "value" to "population_year" where year represents the year passed.
    data.rename(columns={'value':f'pop_{year}'}, inplace=True)
    
    # Cleaning the data and reset indexes
    data['suburb'] = data['suburb'].replace(['Melbourne (CBD)', 'Melbourne (Remainder)'], ['Melbourne', 'Melbourne'])
    data = pd.DataFrame(data.groupby('suburb')[f'pop_{year}'].sum())
    data = data.reset_index()
    
    # Removing unrequired data.
    subs_to_delete = ['West Melbourne (Industrial)', 'City of Melbourne', 'Port Melbourne']
    subs = [data.index[data['suburb']==sub].tolist()[0] for sub in subs_to_delete]
    data.drop(subs, inplace = True)

    data = data.reset_index(drop=True)
    data['suburb'] = data['suburb'].replace(['West Melbourne (Residential)'], ['West Melbourne'])
    
    # Sorting the data.
    data.sort_values('suburb', inplace = True)
    data = data.reset_index(drop=True)
    data['suburb'] = data['suburb'].astype(str)
    
    '''
    n.b Port melbourne Population count is very small. Melbourne (Remainder) was added to increase the numbers
    Although the actual population count should be more than 17,000, and not 2 as seen in the dataset'''
    
    return data

In [7]:
# Reducing the dataset size with the pop_data_by_year function above.
population_2020 = pop_data_by_year(population_data, 2020)
population_2020

IndexError: list index out of range

Next, we’ll utilise our reduced dataset above to filter and extract the suburbs of interest from the geospatial data. 

In [None]:

# Extract the suburbs of interest that match the population_data into "target_suburbs".
target_suburbs = population_2020['suburb'].str.upper()

# Extracting and renaming the columns of interest from the geospatial suburb data. 
vic_suburb_data_reduced = vic_suburb_data[['vic_loca_2', 'geometry']]
vic_suburb_data_reduced = vic_suburb_data_reduced.rename(columns={'vic_loca_2':'suburb'})#, inplace=True)

# Locate the index of the target suburbs and store as a list in "subs"
subs = [vic_suburb_data.index[vic_suburb_data_reduced['suburb']==sub].tolist()[0] for sub in target_suburbs]

# Create a new dataframe for the melbourne suburbs "mel_suburbs" and reformat.
mel_suburb_data = pd.DataFrame(vic_suburb_data_reduced.iloc[subs])
mel_suburb_data = mel_suburb_data.reset_index(drop=True)
mel_suburb_data['suburb'] = mel_suburb_data['suburb'].str.title()
mel_suburb_data 

We now have the population forecast and suburb geospatial data ready for joining, analysis and visualisation. Moving on to the active transport datasets, bike paths and bike counts.  

#### Bike Count and Path Geospatial Data  

In [8]:
bike_count_data.head(2)

Unnamed: 0,state,electorate,site_id,latitude,longitude,legs,description,layout_1,layout_1_enter,layout_2,...,female,male,_7_00_am,_7_15_am,_7_30_am,_7_45_am,_8_00_am,_8_15_am,_8_30_am,_8_45_am
0,VIC,Melbourne,4399,-37.787979,144.959,2,"Royal Pde/shared path [N], Royal Pde/shared pa...",5,185,186,...,,,,,,,,,,
1,VIC,Melbourne,4405,-37.793993,144.941956,3,"Melrose St [N], Melrose St [S], Mark St [W]",8,188,188,...,,,,,,,,,,


A quick preview of our bike data shows that we have far more attributes than required. Let us extract the columns of interest for the most recent year and remove any N/A fields. Further, we need to update the data types for the extracted columns and combine the latitude and longitude into a single column containing a Point object which we’ll call geometry.  

In [9]:
# Extracting the columns of interest and removing N/A fields
bike_count_data =  gpd.GeoDataFrame(bike_count_data[['latitude','longitude','total','year', 'description']])
bike_count_data.dropna(inplace=True)   

# Coverts the year column to an int datatype
years = [int(x) for x in bike_count_data['year'].unique()]

# Extracting the most recent year of data.
bike_count_data = bike_count_data.loc[bike_count_data['year'] == str(max(years))]

# Updateing the data types 
bike_count_data['latitude'] = bike_count_data['latitude'].astype(float)
bike_count_data['longitude'] = bike_count_data['longitude'].astype(float)
bike_count_data['total'] = bike_count_data['total'].astype(int)
bike_count_data.reset_index(drop=True, inplace = True)
df_geometry = [Point(xy) for xy in zip(bike_count_data['latitude'], bike_count_data['longitude'])]
bike_count_data = gpd.GeoDataFrame(bike_count_data, crs = 4326, geometry = df_geometry)

bike_count_data.head()

Unnamed: 0,latitude,longitude,total,year,description,geometry
0,-37.825963,144.960053,249,2017,"Queens Bridge St [N], City Rd [NE], Morray St ...",POINT (-37.82596 144.96005)
1,-37.787979,144.959,1425,2017,"Royal Pde/shared path [N], Royal Pde/shared pa...",POINT (-37.78798 144.95900)
2,-37.793993,144.941956,42,2017,"Melrose St [N], Melrose St [S], Mark St [W]",POINT (-37.79399 144.94196)
3,-37.794117,144.927689,114,2017,"McCracken St [N], Macaulay Rd [E], Kensington ...",POINT (-37.79412 144.92769)
4,-37.79454,144.930987,247,2017,"Macaulay Rd - east end [E], Eastwood St [S], M...",POINT (-37.79454 144.93099)


The bike path data only contains relevant fields as we can see below.  

In [13]:
bike_path_data["features"][0]['properties'].keys()

dict_keys(['name', 'direction', 'info', 'status', 'notes', 'type'])

## ADD In Cleaning and Previewing Remaing Datasets HERE ^^^

------

## Ignore below

To get our first look at the data we need to join our two datasets together. The following function will return a GeoPandas DataFrame with our two datasets joined. It will also extract the population growth between any two years of interest. We can then choose to plot a single year as a map of population density or the change between any two given years as a number or percentage.

In [10]:
def population_diff(population_data, year_1, year_2):
    
    start_year = pop_data_by_year(population_data, year_1)
    end_year = pop_data_by_year(population_data, year_2)

    combined = start_year.merge(end_year, left_on='suburb', right_on='suburb')
    
    combined['change #'] = combined[list(end_year)[1]] - combined[list(start_year)[1]]
    combined['change %'] = (combined[list(combined)[3]] / combined[list(start_year)[1]])*100
    combined['geometry'] = mel_suburb_data['geometry']
    
    return gpd.GeoDataFrame(combined)

In [11]:
pop_diff_2020_2040 = population_diff(population_data, 2020, 2040)
pop_diff_2020_2040.head(11)

IndexError: list index out of range

Lets take a look at the total growth of the population from the first year our dataset contains to the last year. 

In [None]:
growth_20y = pop_diff_2020_2040.explore(column ='change %', tiles='CartoDB positron', cmap='YlOrRd') #winter
growth_20y 


Examining the map above it seems that the north-west suburbs of Melbourne will outpace the others. Lets take a look at the change at each 5 year interval, 2020 to 2025, 2025 to 2030, 2030 to 2035, and finally 2035 to 2040 to see if the growth rate matches the above choropleth map. 

To do this we'll write two small functions, one to return the map as above, and a second to allow us to examine the maps side by side.

In [None]:
def change_map(population_data, colour):
    geo_layer = population_data.explore(column ='change %', tiles='CartoDB positron', cmap=colour)
    folium.LayerControl().add_to(geo_layer)
    
    return geo_layer 

def map_compare(map_1, map_2):
    warnings.filterwarnings('ignore')
    htmlmap = HTML('<iframe srcdoc="{}" style="float:left; width: {}px; height: {}px; display:inline-block; width: 49%; margin: 0 auto; border: 2px solid #0f9295"></iframe>'
        '<iframe srcdoc="{}" style="float:right; width: {}px; height: {}px; display:inline-block; width: 49%; margin: 0 auto; border: 2px solid #0f9295"></iframe>'
        .format(map_1.get_root().render().replace('"', '&quot;'),400,400,
                map_2.get_root().render().replace('"', '&quot;'),400,400))
        
    return display(htmlmap) 

In [None]:
pop_change_1 = population_diff(population_data, 2020, 2025)  # Enter the two years to compare
map_2020_2025 = change_map(pop_change_1, 'YlOrRd') 

pop_change_2 = population_diff(population_data, 2025, 2030) 
map_2025_2030 = change_map(pop_change_2, 'YlOrRd')

pop_change_3 = population_diff(population_data, 2030, 2035) 
map_2030_2035 = change_map(pop_change_3, 'YlOrRd')

pop_change_4 = population_diff(population_data, 2035, 2040) 
map_2035_2040 = change_map(pop_change_4, 'YlOrRd')

growth_5y_10y = map_compare(map_2020_2025, map_2025_2030)
growth_15y_20y = map_compare(map_2030_2035, map_2035_2040)

growth_5y_10y
growth_15y_20y

<p>
    
</p>