<div class="usecase-title"><h1>Effect of Business Type/Residential Properties on Energy Consumption</h1></div>

<div class="usecase-authors"><b>Authored by: Mackenzie Gong
</b> </div>

<div class="usecase-duration"><b>Duration:</b> 90 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python</div>
</div>

<header>
<h2>Scenario</h2>
<div> </div>




As a city planner or policy maker, I need to understand how different types of business and residential properties influence energy consumption patterns in Melbourne. This understanding will enable me to make informed decisions about energy efficiency initiatives, retrofitting projects, and sustainable urban development. By analyzing energy consumption data in conjunction with business types, residential properties, and employment statistics, I can identify high-consumption areas and develop targeted strategies to reduce energy use and promote sustainability.


<h2>Introduction</h2>

Energy consumption is a critical aspect of urban sustainability and economic efficiency. Cities around the world are striving to reduce their carbon footprints, improve energy efficiency, and promote sustainable living. Melbourne, being a vibrant and diverse city, has a wide range of business establishments, residential properties, and employment hubs, each contributing differently to the overall energy consumption.

This use case aims to analyze the effect of business types and residential properties on energy consumption in Melbourne using four key datasets:



**1. Block level energy consumption (modelled on building attributes) - 2021 projection - retrofit scenario**
https://data.melbourne.vic.gov.au/explore/dataset/block-level-energy-consumption-modelled-on-building-attributes-2021-projection-r/information/?location=13,-37.81338,144.96846&basemap=mbs-7a7333

Analysis:

- Analyze the overall energy consumption patterns across different blocks.
- Compare energy consumption between blocks with and without retrofitting.

Tasks:

- Load and preprocess the dataset to handle missing values and ensure consistency.
- Perform statistical analysis to identify blocks with the highest and lowest energy consumption.
- Visualize energy consumption patterns using heatmaps and bar charts.

**2. Floor space per space use for blocks**

https://data.melbourne.vic.gov.au/explore/dataset/floor-space-by-use-by-block/information/


Analysis:

- Investigate how the floor space dedicated to different uses (e.g., residential, commercial) affects energy consumption.
- Correlate floor space data with energy usage to identify high-consumption uses.

Tasks:

- Integrate the floor space data with the energy consumption data using block identifiers.
- Perform regression analysis to explore the relationship between floor space and energy consumption.
- Create visualizations to compare energy consumption across different space uses.

**3. Business establishments per ANZSIC for blocks**

https://data.melbourne.vic.gov.au/explore/dataset/business-establishments-per-block-by-anzsic/information/

Analysis:
- Explore the impact of different types of businesses on energy consumption.
- Identify which business types are associated with higher energy usage.

Tasks:

- Merge the business establishments data with the energy consumption dataset.
- Conduct a categorical analysis to assess energy consumption across different ANZSIC codes.
- Use clustering techniques to group blocks with similar business types and analyze their energy usage patterns.

**4. Jobs per CLUE industry for blocks**
https://data.melbourne.vic.gov.au/explore/dataset/employment-by-block-by-clue-industry/information/

Analysis:
- Examine the relationship between employment density and energy consumption.
- Identify industries that contribute significantly to energy usage.

Tasks:

- Integrate the employment data with the energy consumption and business establishment datasets.
- Perform correlation analysis to study the impact of job density on energy consumption.
- Visualize the distribution of jobs and energy consumption across different blocks and industries.


**5. Blocks for Census of Land Use and Employment (CLUE)**
https://data.melbourne.vic.gov.au/explore/dataset/blocks-for-census-of-land-use-and-employment-clue/table/







At the end of this use case, you will be provided with a comprehensive learning experience in the following areas:

- Data Integration and Preparation
- Exploratory Data Analysis
- Statistical Analysis and Modelling
- Spatial Analysis
- Energy Efficiency and Sustainability Insights


***

_**DELETE BEFORE PUBLISHING**_

## Style guide for use cases

### Headers

For styling within your markdown cells, there are two choices you can use for headers.

1) You can use HTML classes specific to the use case styling:

```<p class="usecase-subsection-header">This is a subsection header.</p>```

<p style="font-weight: bold; font-size: 1.2em;">This is a subsection header.</p>

```<p class="usecase-subsection-blurb">This is a blurb header.</p>```

<p style="font-weight: bold; font-size: 1em; font-style:italic;">This is a blurb header.</p>


2) Or if you like you can use the markdown header styles:

```# for h1```

```## for h2```

```### for h3```

```#### for h4```

```##### for h5```

## Plot colour schemes

General advice:
1. Use the same colour or colour palette throughout your notebook, unless variety is necessary
2. Select a palette based on the type of data being represented
3. Consider accessibility (colourblindness, low vision)

#### 1) If all of your plots only use 1-2 colors use one of the company style colors:

| Light theme | Dark Theme |
|-----|-----|
|<p style="color:#2af598;">#2af598</p>|<p style="color:#08af64;">#08af64</p>|
|<p style="color:#22e4ac;">#22e4ac</p>|<p style="color:#14a38e;">#14a38e</p>|
|<p style="color:#1bd7bb;">#1bd7bb</p>|<p style="color:#0f9295;">#0f9295</p>|
|<p style="color:#14c9cb;">#14c9cb</p>|<p style="color:#056b8a;">#056b8a</p>|
|<p style="color:#0fbed8;">#0fbed8</p>|<p style="color:#121212;">#121212</p>|
|<p style="color:#08b3e5;">#08b3e5</p>||


#### 2) If your plot needs multiple colors, choose an appropriate palette using either of the following tutorials:
- https://seaborn.pydata.org/tutorial/color_palettes.html
- https://matplotlib.org/stable/tutorials/colors/colormaps.html

#### 3) Consider accessibility as well.

For qualitative plotting Seaborn's 'colorblind' palette is recommended. For maps with sequential or diverging it is recommended to use one of the Color Brewer schemes which can be previewed at https://colorbrewer2.org/.

If you want to design your own colour scheme, it should use the same principles as Cynthia Brewer's research (with variation not only in hue but also, saturation or luminance).

### References

Be sure to acknowledge your sources and any attributions using links or a reference list.

If you have quite a few references, you might wish to have a dedicated section for references at the end of your document, linked using footnote style numbers.

You can connect your in-text reference by adding the number with a HTML link: ```<a href="#fn-1">[1]</a>```

and add a matching ID in the reference list using the ```<fn>``` tag: ```<fn id="fn-1">[1] Author (Year) _Title_, Publisher, Publication location.</fn>```

# 1. Understand datasets

In [2]:
import os
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# Load the CSV file into dataframe
# Read the column names and first 10 rows

folder_path = '/content/drive/My Drive/Colab Notebooks/'
csv_names = ['block-level-energy-consumption-modelled-on-building-attributes-2021-projection-r.csv',
            'floor-space-by-use-by-block.csv',
            'business-establishments-per-block-by-anzsic.csv',
            'employment-by-block-by-clue-industry.csv',
            'blocks-for-census-of-land-use-and-employment-clue.csv']

dataframes = []
for csv in csv_names:
    csv_path = folder_path + csv
    df = pd.read_csv(csv_path)
    dataframes.append(df)
    print("=========================")
    print(f"Dataset: {csv}")
    print(f'shape: {df.shape}')
    print(f'column names: {df.columns}')
    print()
    print('First 2 rows:')
    print(df.head(2))
    print()

Dataset: block-level-energy-consumption-modelled-on-building-attributes-2021-projection-r.csv
shape: (640, 3)
column names: Index(['Geo Point', 'Geo Shape', 'total'], dtype='object')

First 2 rows:
                                 Geo Point  \
0  -37.843406112352376, 144.98490735020218   
1   -37.84218330812017, 144.98515315955768   

                                           Geo Shape       total  
0  {"coordinates": [[[[144.98541694518966, -37.84...   68.776427  
1  {"coordinates": [[[[144.9855461804882, -37.842...  162.968659  

Dataset: floor-space-by-use-by-block.csv
shape: (12394, 41)
column names: Index(['Census year', 'Block ID', 'CLUE small area',
       'Commercial Accommodation', 'Common Area', 'Community Use',
       'Educational/Research', 'Entertainment/Recreation - Indoor',
       'Equipment Installation', 'Hospital/Clinic', 'House/Townhouse',
       'Institutional Accommodation', 'Manufacturing', 'Office',
       'Park/Reserve', 'Parking - Commercial Covered',
       '

<h2>Tasks and Outcomes<h2>

**dataset1:
block-level-energy-consumption-modelled-on-building-attributes-2021-projection-r.csv**

- Geo Point: Use df5 to map to the block ID and CLUE small area.
- Geo Shape: Retain to calculate the square meter area of the block.
- Total (Energy Consumption): Retain as the key value for analysis.

**dataset2: floor-space-by-use-by-block.csv**

- Block ID: Check for updates to the floor plan for the block and use the most recent one if available.
- CLUE: Retain.
- Land Use Type: Combine with df3 and df4, unify and merge columns related to business type, industry, and land use.


**dataset3: business-establishments-per-block-by-anzsic.csv**

- Block ID: Check for updates to the floor plan for the block and use the most recent one if available.
- CLUE: Retain.
- Business Type: Combine with df2 and df4, unify and merge columns related to business type, industry, and land use.

**dataset4: employment-by-block-by-clue-industry.csv**

- Block ID: Check for updates to the floor plan for the block and use the most recent one if available.
- CLUE: Retain.
- Industry: Combine with df2 and df3, unify and merge columns related to business type, industry, and land use.

**dataset5: blocks-for-census-of-land-use-and-employment-clue.csv**

- Use the geopoints in df1 to map to block ID and CLUE name.



<h2>Outcome for preprocessing data<h2>


**dataframe_1 (key: block_id)**

columns:
block id, CLUE small area, size(sqaured meter), energy consumption, jobs provided, commercial area(sq m), commerical establishments, residential area, {commerical type, commercial area(sq m), commerical establishments, job provided}

**dataframe_2 (key: land use/industry/business type)**

columns:
land use/industry, size, job, establishment, {block id: {size, establishment, job}, {CLUE area: {size, establishment, job}}


Potential insights from the datasets:
**bold text**
- The block/CLUE area with the highest and lowest energy consumption per square meter.
- The block/CLUE area with the most jobs provided per unit of energy consumption, and their associated business types.
- The average energy consumption per square meter and per establishment in each block/CLUE area.
- The dominant type(s) of business in each block/CLUE area.


# 2. Preprocessing

In [4]:
from shapely.geometry import Point, MultiPolygon, Polygon
from shapely.ops import transform
from geopy.distance import geodesic

import pyproj,ast


# Function to calculate the approximate area of a MultiPolygon, given a list of coordinates
def calculate_multipolygon_area(coords_list):
    '''
    #The coordinates should be in the format [[[longitude, latitude], ...], ...].
    '''

    # Create a list of Polygon objects from the coordinate list
    polygons = [Polygon(coords) for coords in coords_list]

    # Create a MultiPolygon object from the list of Polygons
    multipolygon = MultiPolygon(polygons)

    # use the pyproj library to handle the projection
    wgs84 = pyproj.CRS('EPSG:4326')
    utm = pyproj.CRS('EPSG:3395')  # WGS 84 / World Mercator

    # Define a function to project coordinates to the UTM system
    project = pyproj.Transformer.from_crs(wgs84, utm, always_xy=True).transform

    # Project the MultiPolygon to the UTM system
    multipolygon_proj = transform(project, multipolygon)

    # Calculate the area in square meters
    area_m2 = multipolygon_proj.area

    return area_m2


# Function to verify whether a geo point is within the range of a multipolygon area
def verify_point_in_multipolygon(point_coords, multipolygon_coords):
    """
    Check if a point is within a MultiPolygon area.
    """
    lat_point, lon_point = map(float, point_coords.split(", "))
    point = Point(lon_point, lat_point)  # the order: (longitude, latitude)

    polygons = [Polygon([(lon, lat) for lon, lat in polygon]) for polygon in multipolygon_coords]
    multipolygon = MultiPolygon(polygons)

    # Check if the point is within the multipolygon
    is_within = point.within(multipolygon)

    return is_within


# Convert a dictionary-like string to dictionary
def convert_to_dict(dict_string):
    try:
        # Use ast.literal_eval to safely evaluate the string
        result_dict = ast.literal_eval(dict_string)
        return result_dict
    except Exception as e:
        print(f"Error: {e}")
        return None


def update_cencus_year(df):
    df_output = pd.DataFrame(columns=df.columns)

    seen_dic = {}
    seen_block = []
    new_df_index = 0

    for n,row in df.iterrows():
        if int(row['Census year']) < 2022:

            if row['Block ID'] not in seen_block:
                df_output.loc[new_df_index] = row

                seen_dic[row['Block ID']] = {'year':row['Census year'],'index':new_df_index}
                seen_block.append(row['Block ID'])
                new_df_index += 1

            else:
                if seen_dic[row['Block ID']]['year'] < row['Census year']:
                    index_to_replace = seen_dic[row['Block ID']]['index']
                    df_output.loc[index_to_replace] = row
                    seen_dic[row['Block ID']]['year'] = row['Census year']

    return df_output

In [13]:
# Process information in df1 and df5

df_1 = dataframes[0]
df_5 = dataframes[4]

block_id = []
clue = []
area = []
total = []
geo_point = []

for index_in_1, row_in_1 in df_1.iterrows():
    point = row_in_1['Geo Point']
    area_1 = calculate_multipolygon_area(convert_to_dict(row_in_1['Geo Shape'])['coordinates'][0])
    found = 0
    for index_in_5, row_in_5 in df_5.iterrows():
        poly_coords = convert_to_dict(row_in_5['Geo Shape'])['coordinates']
        #poly_coords = [row_in_5['Geo Shape'].split("type")[0][:-3].split(": ")[1]]
        result = verify_point_in_multipolygon(point, poly_coords)
        if result == True:
            block_id.append(row_in_5['block_id'])
            geo_point.append(point)
            clue.append(row_in_5['clue_area'])
            total.append(row_in_1['total'])
            area.append(area_1)
            found += 1
            break

    if found == 0:
        print(f'Not found! Row:  {index_in_1+1}')


Not found! Row:  16


In [6]:
# Merge the location with the same block_id

seen_id = []

block_id_cleaned = []
clue_cleaned = []
area_cleaned = []
total_cleaned = []

dict_block = {}

for n,id in enumerate(block_id):
    if id not in seen_id:
        seen_id.append(id)
        block_id_cleaned.append(id)
        clue_cleaned.append(clue[n])
        area_cleaned.append(area[n])
        total_cleaned.append(total[n])
    else:
        first_index = seen_id.index(id)
        area_cleaned[first_index] += area[n]
        total_cleaned[first_index] += total[n]

In [7]:
# Process dataset 2,3,4
# Only keep the census data for year 2021 (or the year most close prior to 2021)

df_2 = dataframes[1]
df_3 = dataframes[2]
df_4 = dataframes[3]

df_2_recent = update_cencus_year(df_2)
df_3_recent = update_cencus_year(df_3)
df_4_recent = update_cencus_year(df_4)

print(len(df_2_recent.columns)-3)
print(len(df_3_recent.columns)-3)
print(len(df_4_recent.columns)-3)

print(df_2_recent.columns[3:])
print(df_3_recent.columns[3:])
print(df_4_recent.columns[3:])

  df_output.loc[new_df_index] = row
  df_output.loc[new_df_index] = row
  df_output.loc[new_df_index] = row


38
20
21
Index(['Commercial Accommodation', 'Common Area', 'Community Use',
       'Educational/Research', 'Entertainment/Recreation - Indoor',
       'Equipment Installation', 'Hospital/Clinic', 'House/Townhouse',
       'Institutional Accommodation', 'Manufacturing', 'Office',
       'Park/Reserve', 'Parking - Commercial Covered',
       'Parking - Commercial Uncovered', 'Parking - Private Covered',
       'Parking - Private Uncovered', 'Performances, Conferences, Ceremonies',
       'Private Outdoor Space', 'Public Display Area', 'Residential Apartment',
       'Retail - Cars', 'Retail - Shop', 'Retail - Showroom', 'Retail - Stall',
       'Sports and Recreation - Outdoor', 'Square/Promenade', 'Storage',
       'Student Accommodation', 'Transport', 'Transport/Storage - Uncovered',
       'Unoccupied - Under Construction',
       'Unoccupied - Under Demolition/Condemned',
       'Unoccupied - Under Renovation', 'Unoccupied - Undeveloped Site',
       'Unoccupied - Unused', 'Wholesale

In [None]:
# First
# get the information of the total number of certain block, regardless of the business type

jobs = []
establishments = []
commercial_space = []

for id in block_id:

    for i in range(df_2_recent.shape[0]):
        if df_2_recent.loc[i, 'Block ID'] == id:
            if np.isnan(df_2_recent.loc[i, 'Total floor space in block']):
                add = 0
            else:
                add = int(df_2_recent.loc[i, 'Total floor space in block'])
            commercial_space.append(add)

    for i in range(df_3_recent.shape[0]):
        if df_3_recent.loc[i,'Block ID'] == id:
            if np.isnan(df_3_recent.loc[i,'Total establishments in block']):
                add = 0
            else:
                add = int(df_3_recent.loc[i,'Total establishments in block'])
            establishments.append(add)

    for i in range(df_4_recent.shape[0]):
        if df_4_recent.loc[i, 'Block ID'] == id:
            if np.isnan(df_4_recent.loc[i, 'Total jobs in block']):
                add = 0
            else:
                add = int(df_4_recent.loc[i, 'Total jobs in block'])
            jobs.append(add)



In [14]:
# Create the cleaned dataframe and save

dict_key_blockid = {'block_id':block_id,
                    'geo_point_coords':geo_point,
                    'CLUE':clue,
                    'block_area':area,
                    'commercial_area':commercial_space,
                    'total_energy_consupmtion':total,
                    'total_jobs':jobs,
                    'business_establishments':establishments}

df_blockid = pd.DataFrame(dict_key_blockid)
# Save the DataFrame to a CSV file
df_blockid.to_csv('/content/drive/My Drive/Colab Notebooks/blockid.csv', index=True)

In [15]:
df_blockid = pd.read_csv('/content/drive/My Drive/Colab Notebooks/blockid.csv')

In [None]:
# Merge dataset 2,3,4 so that they could have same column names

df_2_recent
df_3_recent
df_4_recent

