_**DELETE BEFORE PUBLISHING**_

_This is a template also containing the style guide for use cases. The styling uses the use-case css when uploaded to the website, which will not be visible on your local machine._

_Change any text marked with {} and delete any cells marked DELETE_

***

In [10]:
# DELETE BEFORE PUBLISHING
# This is just here so you can preview the styling on your local machine

from IPython.core.display import HTML
HTML("""
<style>
.usecase-title, .usecase-duration, .usecase-section-header {
    padding: 12px;
    background-color: #0f9295;
    color: #fff;
}

.usecase-title {
    font-size: 1.6em;
    font-weight: bold;
}

.usecase-authors, .usecase-level, .usecase-skill {
    padding: 12px;
    background-color: #baeaeb;
    font-size: 1.4em;
    color: #121212;
}

.usecase-level-skill  {
    display: flex;
}

.usecase-level, .usecase-skill {
    width: 50%;
}

.usecase-duration, .usecase-skill {
    text-align: right;
    padding: 12px;
    font-size: 1.4em;
}

.usecase-section-header {
    font-weight: bold;
    font-size: 1.2em;
    padding: 8px;
}

.usecase-subsection-header, .usecase-subsection-blurb {
    font-weight: bold;
    font-size: 1.2em;
    color: #121212;
}

.usecase-subsection-blurb {
    font-size: 1em;
    font-style: italic;
}

p {
    font-family: Sans-serif;
    font-size: 15px; 
}

ul {
    font-family: Sans-serif;
    font-size: 15px;
}

</style>
""")

***

_**DELETE BEFORE PUBLISHING**_

## Style guide for use cases

### Headers

For styling within your markdown cells, there are two choices you can use for headers.

1) You can use HTML classes specific to the use case styling:

```<p class="usecase-subsection-header">This is a subsection header.</p>```

<p style="font-weight: bold; font-size: 1.2em;">This is a subsection header.</p>

```<p class="usecase-subsection-blurb">This is a blurb header.</p>```

<p style="font-weight: bold; font-size: 1em; font-style:italic;">This is a blurb header.</p>


2) Or if you like you can use the markdown header styles:

```# for h1```

```## for h2```

```### for h3```

```#### for h4```

```##### for h5```

## Plot colour schemes

General advice:
1. Use the same colour or colour palette throughout your notebook, unless variety is necessary
2. Select a palette based on the type of data being represented
3. Consider accessibility (colourblindness, low vision)

#### 1) If all of your plots only use 1-2 colors use one of the company style colors:

| Light theme | Dark Theme |
|-----|-----|
|<p style="color:#2af598;">#2af598</p>|<p style="color:#08af64;">#08af64</p>|
|<p style="color:#22e4ac;">#22e4ac</p>|<p style="color:#14a38e;">#14a38e</p>|
|<p style="color:#1bd7bb;">#1bd7bb</p>|<p style="color:#0f9295;">#0f9295</p>|
|<p style="color:#14c9cb;">#14c9cb</p>|<p style="color:#056b8a;">#056b8a</p>|
|<p style="color:#0fbed8;">#0fbed8</p>|<p style="color:#121212;">#121212</p>|
|<p style="color:#08b3e5;">#08b3e5</p>||


#### 2) If your plot needs multiple colors, choose an appropriate palette using either of the following tutorials:
- https://seaborn.pydata.org/tutorial/color_palettes.html
- https://matplotlib.org/stable/tutorials/colors/colormaps.html

#### 3) Consider accessibility as well.

For qualitative plotting Seaborn's 'colorblind' palette is recommended. For maps with sequential or diverging it is recommended to use one of the Color Brewer schemes which can be previewed at https://colorbrewer2.org/.

If you want to design your own colour scheme, it should use the same principles as Cynthia Brewer's research (with variation not only in hue but also, saturation or luminance).

### References

Be sure to acknowledge your sources and any attributions using links or a reference list.

If you have quite a few references, you might wish to have a dedicated section for references at the end of your document, linked using footnote style numbers.

You can connect your in-text reference by adding the number with a HTML link: ```<a href="#fn-1">[1]</a>```

and add a matching ID in the reference list using the ```<fn>``` tag: ```<fn id="fn-1">[1] Author (Year) _Title_, Publisher, Publication location.</fn>```

------

------

<div class="usecase-title">Small Area Population Growth & Transportation Needs Analysis</div>

<div class="usecase-authors"><b>Authored by: </b>Angie Hollingworth and Mick Wiedermann</div>

<div class="usecase-duration"><b>Duration:</b>90 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python</div>
</div>

<div class="usecase-section-header">Scenario</div>

<p>
    As a city planner, City of Melbourne investor, or business owner, knowing where potential growth hotspots are projected to develop helps to better plan for meeting future needs and identify opportunities.
    <ul>  
        <li>As a city planner, I want to identify which routes I should prioritise for public transport upgrades and which for active transport upgrades.</li>  
        <li>As a business owner, I need to know which areas could have a greater demand for my goods or services when planning to establish a new business location.</li>   
        <li>As an investor, I'd like to know which suburbs are set to grow more rapidly than the average so I can minimise my risk</li>  
    </ul>
</p>

<div class="usecase-section-header">What this use case will teach you</div>

<p>
    At the end of this use case you will:
    <ul>
        <li>be able to work with GeoPandas and visualise spatial data on interactive maps</li>
        <li>{list the skills demonstrated in your use case}</li>
    </ul>
</p>

<div class="usecase-section-header">{Heading for introduction or background relating to problem}</div>

{Write your introduction here. Keep it concise. We're not after "War and Peace" but enough background information to inform the reader on the rationale for solving this problem or background non-technical information that helps explain the approach. You may also wish to give information on the datasets, particularly how to source those not being imported from the client's open data portal.}



<div class="usecase-section-header">Which Melbourne Open Data should I use?</div>

### Population Growth Forecast Data
Our first and arguably the most important dataset for this analisys is the *City of Melbourne Population Forecasts by Small Area 2020-2040* from Melbourne Open Data which you can read more about [here](https://data.melbourne.vic.gov.au/People/City-of-Melbourne-Population-Forecasts-by-Small-Ar/sp4r-xphj). In summary, the dataset provides population forecasts by single year for 2020 to 2040. Prepared by SGS Economics and Planning (Jan-Jun 2021), forecasts are available for the municipality and small areas, as well as by gender and 5 year age groups.

### Victorian Suburbs Geospatial Data
In order to visualise our population forcasts as a map overlay, we need the geographical coorinates of the suburbs we'll be examining. For this we'll use the *VIC Suburb/Locality Boundaries - PSMA Administrative Boundaries GeoJSON* dataset from the Australian Government site data.gov.au which you can read more about [here](https://data.gov.au/dataset/ds-dga-af33dd8c-0534-4e18-9245-fc64440f742e/distribution/dist-dga-d467c550-fdf0-480f-85ca-79a6a30b700b/details?q=). 


Before importing out datasets, we shall first import the necessary libraries to support our exploratory data analysis and visualisation.

The following are core packages required for this exercise:

- {List each non-standard package and why briefly why you're using it. No need to list commonly used packages like numpy, maths,os, time, pandas}
- GeoPandas: Allows us to plot patial data and overlay that data on maps. 
- Folium: 

In [2]:
# For importing the data and using API
from sodapy import Socrata
import os
import zipfile as zf
import requests
from io import BytesIO 

# Working with the data
import numpy as np
import pandas as pd
import geopandas as gpd

# Visualisation
import matplotlib.pylab as plt
import folium

<div class="usecase-section-header">Importing the data</div>
<p>
    To connect to the <b>Melbourne Open Data Portal</b> we must establish a connection using the sodapy library by specifying a domain, being the website domain where the data is hosted, and an application access token which can be requested from the City of Melbourne Open Data portal by registering 
<a href="https://data.melbourne.vic.gov.au/signup">here</a>.
</p>
<p>
    For this exercise we will access the domain without an application token. Each dataset in the Melbourne Open Data Portal has a unique identifier which can be used to retrieve the dataset using the sodapy library.
</p>
<p>
    The <b>City of Melbourne Population Forecasts by Small Area 2020-2040</b> dataset unique identifier is <b>sp4r-xphj</b>.
    We will pass this identifier into the sodapy command below to retrieve this data placing it into a Pandas dataframe.
</p>

In [3]:
apptoken = os.environ.get("SODAPY_APPTOKEN") # Anonymous App Token
domain = "data.melbourne.vic.gov.au"
client = Socrata(domain, apptoken)           # Open Dataset Connection
pop_data_unique_identifier = 'sp4r-xphj'   

population_data = pd.DataFrame.from_dict(client.get_all(pop_data_unique_identifier))



<p>
    The next Dataset we need to import is the <b>Victorian Suburbs/Locality Boundaries</b> from the Australian Government site, data.gov.au, which is freely available for download via the below URL. As the data includes geometric data, we will import this data into a GeoPandas Dataframe. 
</p>

In [4]:
suburb_geo_data_url = ('https://data.gov.au/geoserver/vic-suburb-locality-boundaries-psma-administrative-'
    + 'boundaries/wfs?request=GetFeature&typeName=ckan_af33dd8c_0534_4e18_9245_fc64440f742e&outputFormat=json')
vic_suburb_data = gpd.read_file(suburb_geo_data_url)

<p>Now, we will look at each specific dataset to better understand its structure and how we can use it.</p>
<p>
    Our data requirements from this use case include the following:
    <ul>
        <li>Number of residents per suburb</li>
        <li>Number of residents per year</li>
        <li>Suburb geometry and location</li>
        <li>There will be more.....</li>
    </ul>
</p>
<p>
    We shall start by examining the first five rows of the <b>City of Melbourne Population Forecasts by Small Area 2020-2040</b> dataset to confirm it has been importaed correctly.
</p>

In [5]:
print(f'Number of (rows, columns): {population_data.shape}')
population_data.head()       

Number of (rows, columns): (16989, 5)


Unnamed: 0,geography,year,gender,age,value
0,City of Melbourne,2020,Female,Age 0-4,2683
1,City of Melbourne,2021,Female,Age 0-4,2945
2,City of Melbourne,2022,Female,Age 0-4,3212
3,City of Melbourne,2023,Female,Age 0-4,3515
4,City of Melbourne,2024,Female,Age 0-4,3833


<p>
    Moving onto the <b>Victorian Suburbs/Locality Boundaries</b> dataset, as this is quite a large dataset and we only require two columns of information being <b>vic_loca_2</b> the suburb name, and <b>geometry</b>, which holds the geographical location and boundries of our suburb, we will extract those columns only and examin the first few rows of our dataset. 
</p>

In [6]:
vic_suburb_data_reduced = vic_suburb_data[['vic_loca_2', 'geometry']] # Selecting our required columns
vic_suburb_data_reduced.columns = ['suburb', 'geometry']              # Renaming for clarity
print(f'Number of (rows, columns): {vic_suburb_data_reduced.shape}')
vic_suburb_data_reduced.head()

Number of (rows, columns): (2973, 2)


Unnamed: 0,suburb,geometry
0,UNDERBOOL,"MULTIPOLYGON (((141.74552 -35.07229, 141.74552..."
1,NURRAN,"MULTIPOLYGON (((148.66877 -37.39571, 148.66876..."
2,WOORNDOO,"MULTIPOLYGON (((142.92288 -37.97886, 142.90449..."
3,DEPTFORD,"MULTIPOLYGON (((147.82336 -37.66001, 147.82313..."
4,YANAC,"MULTIPOLYGON (((141.27978 -35.99859, 141.27989..."


<div class="usecase-section-header">Filtering Our Data</div>
<p>
    As we can see in the data preview above, the population data has many suburbs that are outside of our target area. By passing the dataset to the following function while specifying our year of interest, the function will remove any unnecasary suburbs, clean, and return a summarised version of our data containing our sububs of interest for the year specified.  
</p>

In [7]:
def pop_data_by_year(dataset, year):
    """
    Filters and cleans the Population dataset returning a new pandas dataframe focused on the year passed to the function.
    
    Note that the year must be between 2020 and 2040 inclusive. 
    """
    # Extract the colomns of interest into "summary".
    summary = dataset[['geography', 'year', 'value']]
    # Convert datatypes and rename geography to suburb
    summary = summary.astype({'year':int, 'value':float, 'geography':str})
    summary.rename(columns={'geography':'suburb'}, inplace=True)
    # Extract the data matching the year passed from the summary.
    data = summary[summary['year'] == year]
   
    # Grouping the data by suburb while summing the population values. 
    data = pd.DataFrame(data.groupby('suburb')['value'].sum())
    data = data.reset_index()
    # Renaming the column "value" to "population_year" where year represents the year passed.
    data.rename(columns={'value':f'population_{year}'}, inplace=True)
    
    # Cleaning the data and reset indexes
    data['suburb'] = data['suburb'].replace(['Melbourne (CBD)', 'Melbourne (Remainder)'], ['Melbourne', 'Melbourne'])
    data = pd.DataFrame(data.groupby('suburb')[f'population_{year}'].sum())
    data = data.reset_index()
    
    # Removing unrequired data.
    subs_to_delete = ['West Melbourne (Industrial)', 'City of Melbourne']
    subs = [data.index[data['suburb']==sub].tolist()[0] for sub in subs_to_delete]
    data.drop(subs, inplace = True)

    data = data.reset_index(drop=True)
    data['suburb'] = data['suburb'].replace(['West Melbourne (Residential)'], ['West Melbourne'])
    
    # Sorting the data
    data.sort_values('suburb', inplace = True)
    data = data.reset_index(drop=True)
    data['suburb'] = data['suburb'].astype(str)
    
    return data

Utilising the function above to examine a single year

In [8]:
pop_data_2030 = pop_data_by_year(population_data, 2030)
pop_data_2030 

Unnamed: 0,suburb,population_2030
0,Carlton,86749.87
1,Docklands,69207.61
2,East Melbourne,23736.45
3,Kensington,49074.21
4,Melbourne,249852.84
5,North Melbourne,83938.89
6,Parkville,39775.58
7,Port Melbourne,4842.04
8,South Yarra,17033.03
9,Southbank,122042.52


Next we want to ensure our geographical data only includes the suburbs of interest matching the suburbs contained within our population dataset. 

In [9]:
# Extract the suburbs of interest that match the population_data into "target_suburbs".
target_suburbs = pop_data_2030['suburb'].str.upper()

# Locate the index of the target suburbs and store as a list in "subs"
subs = [vic_suburb_data.index[vic_suburb_data_reduced['suburb']==sub].tolist()[0] for sub in target_suburbs]

# Create a new dataframe for the melbourne suburbs "mel_suburbs" and reformat.
mel_suburb_data = pd.DataFrame(vic_suburb_data_reduced.iloc[subs])
mel_suburb_data = mel_suburb_data.reset_index(drop=True)
mel_suburb_data['suburb'] = mel_suburb_data['suburb'].str.title()
mel_suburb_data 

Unnamed: 0,suburb,geometry
0,Carlton,"MULTIPOLYGON (((144.97401 -37.80311, 144.97320..."
1,Docklands,"MULTIPOLYGON (((144.95376 -37.82363, 144.95336..."
2,East Melbourne,"MULTIPOLYGON (((144.97136 -37.80773, 144.97308..."
3,Kensington,"MULTIPOLYGON (((144.92282 -37.79913, 144.91977..."
4,Melbourne,"MULTIPOLYGON (((144.97797 -37.83867, 144.97803..."
5,North Melbourne,"MULTIPOLYGON (((144.95599 -37.80588, 144.95360..."
6,Parkville,"MULTIPOLYGON (((144.96521 -37.79315, 144.96460..."
7,Port Melbourne,"MULTIPOLYGON (((144.90749 -37.84326, 144.90652..."
8,South Yarra,"MULTIPOLYGON (((145.00455 -37.84131, 145.00453..."
9,Southbank,"MULTIPOLYGON (((144.97041 -37.83016, 144.97030..."


<p>
    
</p>