<div class="usecase-title"><h1>Effect of Business Type/Residential Properties on Energy Consumption</h1></div>

<div class="usecase-authors"><b>Authored by: Mackenzie Gong
</b> </div>

<div class="usecase-duration"><b>Duration:</b> 90 mins</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python</div>
</div>

<header>
<h2>Scenario</h2>
<div> </div>




As a city planner or policy maker, I need to understand how different types of business and residential properties influence energy consumption patterns in Melbourne. This understanding will enable me to make informed decisions about energy efficiency initiatives, retrofitting projects, and sustainable urban development. By analyzing energy consumption data in conjunction with business types, residential properties, and employment statistics, I can identify high-consumption areas and develop targeted strategies to reduce energy use and promote sustainability.


<h2>Introduction</h2>

Energy consumption is a critical aspect of urban sustainability and economic efficiency. Cities around the world are striving to reduce their carbon footprints, improve energy efficiency, and promote sustainable living. Melbourne, being a vibrant and diverse city, has a wide range of business establishments, residential properties, and employment hubs, each contributing differently to the overall energy consumption.

This use case aims to analyze the effect of business types and residential properties on energy consumption in Melbourne using four key datasets:



**1. Block level energy consumption (modelled on building attributes) - 2021 projection - retrofit scenario**
https://data.melbourne.vic.gov.au/explore/dataset/block-level-energy-consumption-modelled-on-building-attributes-2021-projection-r/information/?location=13,-37.81338,144.96846&basemap=mbs-7a7333

Analysis:

- Analyze the overall energy consumption patterns across different blocks.
- Compare energy consumption between blocks with and without retrofitting.

Tasks:

- Load and preprocess the dataset to handle missing values and ensure consistency.
- Perform statistical analysis to identify blocks with the highest and lowest energy consumption.
- Visualize energy consumption patterns using heatmaps and bar charts.

**2. Floor space per space use for blocks**

https://data.melbourne.vic.gov.au/explore/dataset/floor-space-by-use-by-block/information/


Analysis:

- Investigate how the floor space dedicated to different uses (e.g., residential, commercial) affects energy consumption.
- Correlate floor space data with energy usage to identify high-consumption uses.

Tasks:

- Integrate the floor space data with the energy consumption data using block identifiers.
- Perform regression analysis to explore the relationship between floor space and energy consumption.
- Create visualizations to compare energy consumption across different space uses.

**3. Business establishments per ANZSIC for blocks**

https://data.melbourne.vic.gov.au/explore/dataset/business-establishments-per-block-by-anzsic/information/

Analysis:
- Explore the impact of different types of businesses on energy consumption.
- Identify which business types are associated with higher energy usage.

Tasks:

- Merge the business establishments data with the energy consumption dataset.
- Conduct a categorical analysis to assess energy consumption across different ANZSIC codes.
- Use clustering techniques to group blocks with similar business types and analyze their energy usage patterns.

**4. Jobs per CLUE industry for blocks**
https://data.melbourne.vic.gov.au/explore/dataset/employment-by-block-by-clue-industry/information/

Analysis:
- Examine the relationship between employment density and energy consumption.
- Identify industries that contribute significantly to energy usage.

Tasks:

- Integrate the employment data with the energy consumption and business establishment datasets.
- Perform correlation analysis to study the impact of job density on energy consumption.
- Visualize the distribution of jobs and energy consumption across different blocks and industries.


**5. Blocks for Census of Land Use and Employment (CLUE)**
https://data.melbourne.vic.gov.au/explore/dataset/blocks-for-census-of-land-use-and-employment-clue/table/







At the end of this use case, you will be provided with a comprehensive learning experience in the following areas:

- Data Integration and Preparation
- Exploratory Data Analysis
- Statistical Analysis and Modelling
- Spatial Analysis
- Energy Efficiency and Sustainability Insights


***

_**DELETE BEFORE PUBLISHING**_

## Style guide for use cases

### Headers

For styling within your markdown cells, there are two choices you can use for headers.

1) You can use HTML classes specific to the use case styling:

```<p class="usecase-subsection-header">This is a subsection header.</p>```

<p style="font-weight: bold; font-size: 1.2em;">This is a subsection header.</p>

```<p class="usecase-subsection-blurb">This is a blurb header.</p>```

<p style="font-weight: bold; font-size: 1em; font-style:italic;">This is a blurb header.</p>


2) Or if you like you can use the markdown header styles:

```# for h1```

```## for h2```

```### for h3```

```#### for h4```

```##### for h5```

## Plot colour schemes

General advice:
1. Use the same colour or colour palette throughout your notebook, unless variety is necessary
2. Select a palette based on the type of data being represented
3. Consider accessibility (colourblindness, low vision)

#### 1) If all of your plots only use 1-2 colors use one of the company style colors:

| Light theme | Dark Theme |
|-----|-----|
|<p style="color:#2af598;">#2af598</p>|<p style="color:#08af64;">#08af64</p>|
|<p style="color:#22e4ac;">#22e4ac</p>|<p style="color:#14a38e;">#14a38e</p>|
|<p style="color:#1bd7bb;">#1bd7bb</p>|<p style="color:#0f9295;">#0f9295</p>|
|<p style="color:#14c9cb;">#14c9cb</p>|<p style="color:#056b8a;">#056b8a</p>|
|<p style="color:#0fbed8;">#0fbed8</p>|<p style="color:#121212;">#121212</p>|
|<p style="color:#08b3e5;">#08b3e5</p>||


#### 2) If your plot needs multiple colors, choose an appropriate palette using either of the following tutorials:
- https://seaborn.pydata.org/tutorial/color_palettes.html
- https://matplotlib.org/stable/tutorials/colors/colormaps.html

#### 3) Consider accessibility as well.

For qualitative plotting Seaborn's 'colorblind' palette is recommended. For maps with sequential or diverging it is recommended to use one of the Color Brewer schemes which can be previewed at https://colorbrewer2.org/.

If you want to design your own colour scheme, it should use the same principles as Cynthia Brewer's research (with variation not only in hue but also, saturation or luminance).

### References

Be sure to acknowledge your sources and any attributions using links or a reference list.

If you have quite a few references, you might wish to have a dedicated section for references at the end of your document, linked using footnote style numbers.

You can connect your in-text reference by adding the number with a HTML link: ```<a href="#fn-1">[1]</a>```

and add a matching ID in the reference list using the ```<fn>``` tag: ```<fn id="fn-1">[1] Author (Year) _Title_, Publisher, Publication location.</fn>```

# 1. Understand datasets

In [1]:
import os
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [111]:
# Load the CSV file into dataframe
# Read the column names and first 10 rows

folder_path = '/content/drive/My Drive/Colab Notebooks/'
csv_names = ['block-level-energy-consumption-modelled-on-building-attributes-2021-projection-r.csv',
            'floor-space-by-use-by-block.csv',
            'business-establishments-per-block-by-anzsic.csv',
            'employment-by-block-by-clue-industry.csv',
            'blocks-for-census-of-land-use-and-employment-clue.csv']

dataframes = []
for csv in csv_names:
    csv_path = folder_path + csv
    df = pd.read_csv(csv_path)
    dataframes.append(df)
    print("=========================")
    print(f"Dataset: {csv}")
    print(f'shape: {df.shape}')
    print(f'column names: {df.columns}')
    print()
    print('First 2 rows:')
    print(df.head(2))
    print()

Dataset: block-level-energy-consumption-modelled-on-building-attributes-2021-projection-r.csv
shape: (640, 3)
column names: Index(['Geo Point', 'Geo Shape', 'total'], dtype='object')

First 2 rows:
                                 Geo Point  \
0  -37.843406112352376, 144.98490735020218   
1   -37.84218330812017, 144.98515315955768   

                                           Geo Shape       total  
0  {"coordinates": [[[[144.98541694518966, -37.84...   68.776427  
1  {"coordinates": [[[[144.9855461804882, -37.842...  162.968659  

Dataset: floor-space-by-use-by-block.csv
shape: (12394, 41)
column names: Index(['Census year', 'Block ID', 'CLUE small area',
       'Commercial Accommodation', 'Common Area', 'Community Use',
       'Educational/Research', 'Entertainment/Recreation - Indoor',
       'Equipment Installation', 'Hospital/Clinic', 'House/Townhouse',
       'Institutional Accommodation', 'Manufacturing', 'Office',
       'Park/Reserve', 'Parking - Commercial Covered',
       '

<h2>Tasks and Outcomes<h2>

**dataset1:
block-level-energy-consumption-modelled-on-building-attributes-2021-projection-r.csv**

- Geo Point: Use df5 to map to the block ID and CLUE small area.
- Geo Shape: Retain to calculate the square meter area of the block.
- Total (Energy Consumption): Retain as the key value for analysis.

**dataset2: floor-space-by-use-by-block.csv**

- Block ID: Check for updates to the floor plan for the block and use the most recent one if available.
- CLUE: Retain.
- Land Use Type: Combine with df3 and df4, unify and merge columns related to business type, industry, and land use.


**dataset3: business-establishments-per-block-by-anzsic.csv**

- Block ID: Check for updates to the floor plan for the block and use the most recent one if available.
- CLUE: Retain.
- Business Type: Combine with df2 and df4, unify and merge columns related to business type, industry, and land use.

**dataset4: employment-by-block-by-clue-industry.csv**

- Block ID: Check for updates to the floor plan for the block and use the most recent one if available.
- CLUE: Retain.
- Industry: Combine with df2 and df3, unify and merge columns related to business type, industry, and land use.

**dataset5: blocks-for-census-of-land-use-and-employment-clue.csv**

- Use the geopoints in df1 to map to block ID and CLUE name.



<h2>Outcome for preprocessing data<h2>


**dataframe_1 (key: block_id)**

columns:
block id, CLUE small area, size(sqaured meter), energy consumption, jobs provided, commercial area(sq m), commerical establishments, residential area, {commerical type, commercial area(sq m), commerical establishments, job provided}

**dataframe_2 (key: land use/industry/business type)**

columns:
land use/industry, size, job, establishment, {block id: {size, establishment, job}, {CLUE area: {size, establishment, job}}


Potential insights from the datasets:
**bold text**
- The block/CLUE area with the highest and lowest energy consumption per square meter.
- The block/CLUE area with the most jobs provided per unit of energy consumption, and their associated business types.
- The average energy consumption per square meter and per establishment in each block/CLUE area.
- The dominant type(s) of business in each block/CLUE area.


# 2. Preprocessing

In [112]:
from shapely.geometry import Point, MultiPolygon, Polygon
from shapely.ops import transform
from geopy.distance import geodesic

import pyproj,ast


# Function to calculate the approximate area of a MultiPolygon, given a list of coordinates
def calculate_multipolygon_area(coords_list):
    '''
    #The coordinates should be in the format [[[longitude, latitude], ...], ...].
    '''

    # Create a list of Polygon objects from the coordinate list
    polygons = [Polygon(coords) for coords in coords_list]

    # Create a MultiPolygon object from the list of Polygons
    multipolygon = MultiPolygon(polygons)

    # use the pyproj library to handle the projection
    wgs84 = pyproj.CRS('EPSG:4326')
    utm = pyproj.CRS('EPSG:3395')  # WGS 84 / World Mercator

    # Define a function to project coordinates to the UTM system
    project = pyproj.Transformer.from_crs(wgs84, utm, always_xy=True).transform

    # Project the MultiPolygon to the UTM system
    multipolygon_proj = transform(project, multipolygon)

    # Calculate the area in square meters
    area_m2 = multipolygon_proj.area

    return area_m2


# Function to verify whether a geo point is within the range of a multipolygon area
def verify_point_in_multipolygon(point_coords, multipolygon_coords):
    """
    Check if a point is within a MultiPolygon area.
    """
    lat_point, lon_point = map(float, point_coords.split(", "))
    point = Point(lon_point, lat_point)  # the order: (longitude, latitude)

    polygons = [Polygon([(lon, lat) for lon, lat in polygon]) for polygon in multipolygon_coords]
    multipolygon = MultiPolygon(polygons)

    # Check if the point is within the multipolygon
    is_within = point.within(multipolygon)

    return is_within


# Convert a dictionary-like string to dictionary
def convert_to_dict(dict_string):
    try:
        # Use ast.literal_eval to safely evaluate the string
        result_dict = ast.literal_eval(dict_string)
        return result_dict
    except Exception as e:
        print(f"Error: {e}")
        return None


def update_cencus_year(df):
    df_output = pd.DataFrame(columns=df.columns)

    seen_dic = {}
    seen_block = []
    new_df_index = 0

    for n,row in df.iterrows():
        if int(row['Census year']) < 2022:

            if row['Block ID'] not in seen_block:
                df_output.loc[new_df_index] = row

                seen_dic[row['Block ID']] = {'year':row['Census year'],'index':new_df_index}
                seen_block.append(row['Block ID'])
                new_df_index += 1

            else:
                if seen_dic[row['Block ID']]['year'] < row['Census year']:
                    index_to_replace = seen_dic[row['Block ID']]['index']
                    df_output.loc[index_to_replace] = row
                    seen_dic[row['Block ID']]['year'] = row['Census year']

    return df_output

In [113]:
# Process information in df1 and df5

df_1 = dataframes[0]
df_5 = dataframes[4]

block_id = []
clue = []
area = []
total = []
geo_point = []

for index_in_1, row_in_1 in df_1.iterrows():
    point = row_in_1['Geo Point']
    area_1 = calculate_multipolygon_area(convert_to_dict(row_in_1['Geo Shape'])['coordinates'][0])
    found = 0
    for index_in_5, row_in_5 in df_5.iterrows():
        poly_coords = convert_to_dict(row_in_5['Geo Shape'])['coordinates']
        #poly_coords = [row_in_5['Geo Shape'].split("type")[0][:-3].split(": ")[1]]
        result = verify_point_in_multipolygon(point, poly_coords)
        if result == True:
            block_id.append(row_in_5['block_id'])
            geo_point.append(point)
            clue.append(row_in_5['clue_area'])
            total.append(row_in_1['total'])
            area.append(area_1)
            found += 1
            break

    if found == 0:
        print(f'Not found! Row:  {index_in_1+1}')


Not found! Row:  16


In [119]:
# Merge the location with the same block_id

seen_id = []

geo_point_cleaned = []
block_id_cleaned = []
clue_cleaned = []
area_cleaned = []
total_cleaned = []

for n,id in enumerate(block_id):
    if id not in seen_id:
        seen_id.append(id)
        geo_point_cleaned.append(geo_point[n])
        block_id_cleaned.append(id)
        clue_cleaned.append(clue[n])
        area_cleaned.append(area[n])
        total_cleaned.append(total[n])
    else:
        first_index = seen_id.index(id)
        area_cleaned[first_index] += area[n]
        total_cleaned[first_index] += total[n]

In [115]:
# Process dataset 2,3,4
# Only keep the census data for year 2021 (or the year most close prior to 2021)

df_2 = dataframes[1]
df_3 = dataframes[2]
df_4 = dataframes[3]

df_2_recent = update_cencus_year(df_2)
df_3_recent = update_cencus_year(df_3)
df_4_recent = update_cencus_year(df_4)

print(len(df_2_recent.columns)-3)
print(len(df_3_recent.columns)-3)
print(len(df_4_recent.columns)-3)

print(df_2_recent.columns[3:])
print(df_3_recent.columns[3:])
print(df_4_recent.columns[3:])

  df_output.loc[new_df_index] = row
  df_output.loc[new_df_index] = row
  df_output.loc[new_df_index] = row


38
20
21
Index(['Commercial Accommodation', 'Common Area', 'Community Use',
       'Educational/Research', 'Entertainment/Recreation - Indoor',
       'Equipment Installation', 'Hospital/Clinic', 'House/Townhouse',
       'Institutional Accommodation', 'Manufacturing', 'Office',
       'Park/Reserve', 'Parking - Commercial Covered',
       'Parking - Commercial Uncovered', 'Parking - Private Covered',
       'Parking - Private Uncovered', 'Performances, Conferences, Ceremonies',
       'Private Outdoor Space', 'Public Display Area', 'Residential Apartment',
       'Retail - Cars', 'Retail - Shop', 'Retail - Showroom', 'Retail - Stall',
       'Sports and Recreation - Outdoor', 'Square/Promenade', 'Storage',
       'Student Accommodation', 'Transport', 'Transport/Storage - Uncovered',
       'Unoccupied - Under Construction',
       'Unoccupied - Under Demolition/Condemned',
       'Unoccupied - Under Renovation', 'Unoccupied - Undeveloped Site',
       'Unoccupied - Unused', 'Wholesale

In [120]:
# First
# get the information of the total number of certain block, regardless of the business type

jobs = []
establishments = []
commercial_space = []

for id in block_id_cleaned:

    for i in range(df_2_recent.shape[0]):
        if df_2_recent.loc[i, 'Block ID'] == id:
            if np.isnan(df_2_recent.loc[i, 'Total floor space in block']):
                add = 0
            else:
                add = int(df_2_recent.loc[i, 'Total floor space in block'])
            commercial_space.append(add)

    for i in range(df_3_recent.shape[0]):
        if df_3_recent.loc[i,'Block ID'] == id:
            if np.isnan(df_3_recent.loc[i,'Total establishments in block']):
                add = 0
            else:
                add = int(df_3_recent.loc[i,'Total establishments in block'])
            establishments.append(add)

    for i in range(df_4_recent.shape[0]):
        if df_4_recent.loc[i, 'Block ID'] == id:
            if np.isnan(df_4_recent.loc[i, 'Total jobs in block']):
                add = 0
            else:
                add = int(df_4_recent.loc[i, 'Total jobs in block'])
            jobs.append(add)



In [121]:
# Create the cleaned dataframe and save

dict_key_blockid = {'block_id':block_id_cleaned,
                    'geo_point_coords':geo_point_cleaned,
                    'CLUE':clue_cleaned,
                    'block_area':area_cleaned,
                    'total_energy_consumption':total_cleaned,
                    'total_jobs':jobs,
                    'business_establishments':establishments}

df_blockid = pd.DataFrame(dict_key_blockid)
# Save the DataFrame to a CSV file
df_blockid.to_csv('/content/drive/My Drive/Colab Notebooks/blockid.csv', index=True)

In [171]:
df_blockid = pd.read_csv('/content/drive/My Drive/Colab Notebooks/blockid.csv')

In [122]:
# Merge dataset 2,3,4 so that they could have same column names

'''
Merge all land use types/industries into 9 types:
 - retail: retail and wholesale shops
 - business: the tertiary sector such as information technology services
 - leisure: dining, accommodations, and indoor recreational activities
 - education: schools and research institutions
 - healthcare: private and public healthcare service
 - manufacture: construction and manufacturing
 - public_service: community service and other services usually operated by the government or other state-owned companies
 - farming: agriculture, forestry, animal husbandry, fishing, and mining
 - residence: private properties for residential purpose (including outdoor areas and parking)
'''

def merge_columns(df, original_list):
    output_keys = ['retail', 'business', 'leisure', 'education',
                   'healthcare','manufacture','public_service','farming','residence',
                   'non_residential']
    output_dict = {'block_id':[],
                   'retail': [], 'business': [], 'leisure':[], 'education': [],
                   'healthcare':[],'manufacture':[],'public_service':[],'farming':[],
                   'residence':[],'non_residential':[]}

    for index, row in df.iterrows():
        output_dict['block_id'].append(row['Block ID'])
        total_non_residence = 0

        for i in range(9):
            new_value = 0
            if len(original_list[i])>0:
                for industry in original_list[i]:
                    if row[industry] > 0:
                        new_value += row[industry]
            output_dict[output_keys[i]].append(new_value)

            if i < 8:
                total_non_residence += new_value

        output_dict['non_residential'].append(total_non_residence)

    output_df = pd.DataFrame(output_dict)

    return output_df


In [123]:
df_2_original_list = [['Retail - Cars','Retail - Shop','Retail - Showroom','Retail - Stall','Wholesale'],
                      ['Office', 'Performances, Conferences, Ceremonies','Institutional Accommodation', 'Workshop/Studio', 'Storage'],
                      ['Commercial Accommodation', 'Entertainment/Recreation - Indoor'],
                      ['Educational/Research',  'Student Accommodation'],
                      ['Hospital/Clinic'],
                      ['Manufacturing'],
                      ['Common Area', 'Community Use', 'Equipment Installation', 'Park/Reserve',
                       'Parking - Commercial Covered', 'Parking - Commercial Uncovered', 'Public Display Area',
                       'Sports and Recreation - Outdoor','Square/Promenade', 'Transport',  'Transport/Storage - Uncovered'],
                      [],
                      ['House/Townhouse', 'Private Outdoor Space',  'Residential Apartment', 'Parking - Private Covered', 'Parking - Private Uncovered'],
                      ]

df_3_original_list = [['Retail Trade', 'Wholesale Trade'],
                      ['Financial and Insurance Services', 'Information Media and Telecommunications',
                       'Professional, Scientific and Technical Services', 'Rental, Hiring and Real Estate Services'],
                      ['Accommodation and Food Services',  'Arts and Recreation Services'],
                      ['Education and Training'],
                      ['Health Care and Social Assistance'],
                      ['Construction', 'Manufacturing'],
                      ['Administrative and Support Services',  'Electricity, Gas, Water and Waste Services',
                       'Other Services',  'Public Administration and Safety','Transport, Postal and Warehousing'],
                      ['Mining', 'Agriculture, Forestry and Fishing'],
                      []
                      ]

df_4_original_list = [['Retail Trade', 'Wholesale Trade'],
                      ['Business Services', 'Finance and Insurance', 'Information Media and Telecommunications',
                       'Real Estate Services', 'Rental and Hiring Services'],
                      ['Accommodation','Arts and Recreation Services', 'Food and Beverage Services'],
                      ['Education and Training'],
                      ['Health Care and Social Assistance'],
                      ['Construction','Manufacturing'],
                      ['Admin and Support Services', 'Electricity, Gas, Water and Waste Services', 'Other Services',
                       'Public Administration and Safety', 'Transport, Postal and Storage'],
                      ['Agriculture and Mining'],
                      []
                      ]


df_floor_space_per_business = merge_columns(df_2_recent,df_2_original_list)
df_business_establishment_per_business = merge_columns(df_3_recent,df_3_original_list)
df_jobs_provided_per_business = merge_columns(df_4_recent,df_4_original_list)

In [124]:
df_floor_space_per_business.to_csv('/content/drive/My Drive/Colab Notebooks/df_floor_space_per_business.csv', index=True)
df_business_establishment_per_business.to_csv('/content/drive/My Drive/Colab Notebooks/df_business_establishment_per_business.csv', index=True)
df_jobs_provided_per_business.to_csv('/content/drive/My Drive/Colab Notebooks/df_jobs_provided_per_business.csv', index=True)

In [20]:
df_2_merged = pd.read_csv('/content/drive/My Drive/Colab Notebooks/df_floor_space_per_business.csv')
df_3_merged = pd.read_csv('/content/drive/My Drive/Colab Notebooks/df_business_establishment_per_business.csv')
df_4_merged = pd.read_csv('/content/drive/My Drive/Colab Notebooks/df_jobs_provided_per_business.csv')

In [172]:
# Update the df_blockid dataframe with information related to business type in every block

# a. Add a column with commercial_area info to the df_blockid
for index, row in df_blockid.iterrows():
    for i, r in df_2_merged.iterrows():
        if r['block_id'] == row['block_id']:
            df_blockid.at[index, 'commercial_area'] = int(r['non_residential'])
            break

In [174]:
# b. Update information about the plot_ratio, the business type with largest floor area for each block

plot_ratios = [] # plot ratio for the block
residential_area = []
business_type_largest_floor_area = []
business_type_2largest_floor_area = []
detailed_business_info = []
energy_consumption_per_sqm = []

for i, r in df_blockid.iterrows():

    for index, row in df_2_merged.iterrows():
        if row['block_id'] == r['block_id']:
            plot_ratio = None
            unit_energy = None

            if (int(r['commercial_area']) + int(row['residence'])) > 0:
                plot_ratio = (int(r['commercial_area']) + int(row['residence']))/int(r['block_area'])
                unit_energy = int(r['total_energy_consumption'])/(int(r['commercial_area']) + int(row['residence']))

            plot_ratios.append(plot_ratio)
            energy_consumption_per_sqm.append(unit_energy)

            residential_area.append(row['residence'])

            subset = row.iloc[2:-1]
            numeric_subset = pd.to_numeric(subset, errors='coerce')
            if (numeric_subset == 0).all():
                max_col = None
                second_max_col = None
            else:
                # Sort the numeric values in descending order
                sorted_subset = numeric_subset.sort_values(ascending=False)
                # Get the column names for the max and second max value
                max_col = sorted_subset.index[0]
                second_max_col = sorted_subset.index[1] if len(sorted_subset) > 1 else None
            business_type_largest_floor_area.append(max_col)
            business_type_2largest_floor_area.append(second_max_col)

            # Save detailed business info in a dictionary
            # Filtering out zeros
            business_details = {col: val for col, val in numeric_subset.items() if val != 0}
            # Sorting by value in descending order
            sorted_business_details = dict(sorted(business_details.items(), key=lambda item: item[1], reverse=True))
            sorted_string = ', '.join([f"{col}: {int(val)}" for col, val in sorted_business_details.items()])
            detailed_business_info.append(sorted_string)
            break

df_blockid['energy_consumption_per_sqm'] = energy_consumption_per_sqm
df_blockid['plot_ratio'] = plot_ratios
df_blockid['residential_area'] = residential_area
df_blockid['business/residence_with_largest_floor_area'] = business_type_largest_floor_area
df_blockid['business/residence_with_second_largest_floor_area'] = business_type_2largest_floor_area
df_blockid['detailed_business_info'] = detailed_business_info

In [175]:
# Save the updated DataFrame to a new CSV file
df_blockid.to_csv('/content/drive/My Drive/Colab Notebooks/blockid_final.csv', index=True)

In [None]:
df_blockid = pd.read_csv('/content/drive/My Drive/Colab Notebooks/blockid_final.csv')

In [128]:
#

business_dict = {'business':['retail', 'business', 'leisure', 'education',
                             'healthcare','manufacture','public_service','farming'],
                 'area':[],
                 'establishments':[],
                 'jobs':[]}

for index, row in df_2_merged.iterrows():
    if int(row['block_id']) == 0:
        business_dict['area'].append(int(row['retail']))
        business_dict['area'].append(int(row['business']))
        business_dict['area'].append(int(row['leisure']))
        business_dict['area'].append(int(row['education']))
        business_dict['area'].append(int(row['healthcare']))
        business_dict['area'].append(int(row['manufacture']))
        business_dict['area'].append(int(row['public_service']))
        business_dict['area'].append(int(row['farming']))
    '''
    else:
        blockid = row['block_id']
        if int(row['non_residential']) > 0:
            #block_detail = {}
            #block_detail[blockid] = {} #'area':0, 'establishment':0,'jobs':0
            for i in range(8):
                if int(row[i+2]) > 0:
                    block_detail_dict[i] = {}


        if row['block_id'] not in block_detail_dict.keys():
            block_detail_dict['block_id'] = i
    '''

for index, row in df_3_merged.iterrows():
    if int(row['block_id']) == 0:
        business_dict['establishments'].append(int(row['retail']))
        business_dict['establishments'].append(int(row['business']))
        business_dict['establishments'].append(int(row['leisure']))
        business_dict['establishments'].append(int(row['education']))
        business_dict['establishments'].append(int(row['healthcare']))
        business_dict['establishments'].append(int(row['manufacture']))
        business_dict['establishments'].append(int(row['public_service']))
        business_dict['establishments'].append(int(row['farming']))

for index, row in df_4_merged.iterrows():
    if int(row['block_id']) == 0:
        business_dict['jobs'].append(int(row['retail']))
        business_dict['jobs'].append(int(row['business']))
        business_dict['jobs'].append(int(row['leisure']))
        business_dict['jobs'].append(int(row['education']))
        business_dict['jobs'].append(int(row['healthcare']))
        business_dict['jobs'].append(int(row['manufacture']))
        business_dict['jobs'].append(int(row['public_service']))
        business_dict['jobs'].append(int(row['farming']))

df_business = pd.DataFrame(business_dict)

In [129]:
# Save the updated DataFrame to a new CSV file
df_business.to_csv('/content/drive/My Drive/Colab Notebooks/df_business.csv', index=True)

# 3. Conclusions

- **Top 10 blocks having the most energy consumption**
  
  With total business area, total residential area, plot ratio and dominant business types

- **Rank of CLUE area having the most energy consumption**
  
  With total business area, total residential area, plot ratio and dominant business types

In [178]:
# Top 10 blocks having the most energy consumption

# Sort the df_blockid by 'total_consumption_energy' in descending order
df_blockid_sorted = df_blockid.sort_values(by='total_energy_consumption', ascending=False)
top_10_rows = df_blockid_sorted.head(10) # Select the top 10 rows

# Create a new DataFrame with the first and second columns
df_consumption_block = top_10_rows.iloc[:, [1,3,5,9,8,11,10,12,13,6]]

df_consumption_block = df_consumption_block.reset_index(drop=True)

df_consumption_block

Unnamed: 0,block_id,CLUE,total_energy_consumption,energy_consumption_per_sqm,commercial_area,residential_area,plot_ratio,business/residence_with_largest_floor_area,business/residence_with_second_largest_floor_area,total_jobs
0,1105,Docklands,129243.855334,0.187456,628607.0,60849.0,6.53197,business,public_service,25531
1,1108,Docklands,109414.035901,0.105651,556708.0,478910.0,1.233949,residence,public_service,16780
2,85,Melbourne (CBD),103252.7379,0.504922,188742.0,15749.0,1.604896,business,public_service,7477
3,931,Parkville,102859.181812,0.13165,780444.0,861.0,0.243087,public_service,healthcare,6690
4,1103,Docklands,77748.02451,0.17168,324046.0,128821.0,2.349626,public_service,business,10560
5,920,Parkville,77303.39954,0.38411,201252.0,0.0,1.497857,healthcare,education,11514
6,804,Southbank,73821.02877,0.265282,187531.0,90743.0,0.584423,residence,public_service,4351
7,870,Melbourne (Remainder),67382.07941,1.062556,63415.0,0.0,0.32189,education,business,9001
8,37,Melbourne (CBD),62466.11651,0.284287,214021.0,5708.0,7.152171,business,public_service,6092
9,78,Melbourne (CBD),60893.48273,0.207849,256002.0,36965.0,9.526762,business,public_service,15379


**Conclusions**

Land use for **business** (including commercial and office properties in the tertiary sector, such as those used for information technology services), **public services** (such as community services and other functions typically managed by government or state-owned entities), **health care**, and **educational facilities** appear to be the major contributors to energy consumption in the city of Melbourne.

**Block Analysis**

- **Docklands (block ID 1105)**

  This block stands out with the highest total energy consumption at 129,243.
  
  This high energy usage is reflective of its large commercial area of 628,607 sqm and a significant number of jobs (25,531), indicating a strong business presence. However, the energy consumption per square meter (sqm) in this block is relatively low at 0.1875 MWh/sqm, suggesting that despite the high total energy usage, the energy efficiency per unit area is relatively good.

- **Docklands (block ID 1108)**

  This block also shows substantial total energy consumption at 109,414, but with a much lower energy consumption per sqm of 0.1057.
  
  It has a more balanced distribution between commercial (556,708 sqm) and residential (478,910 sqm) areas, which likely contributes to the lower energy consumption per sqm.

- **Melbourne CBD (block ID 85)**
  
  Here presents a different scenario where the total energy consumption is 103,252.74, but the energy consumption per sqm is quite high at 0.5049 MWh/sqm. This indicates that the block, with its relatively small commercial area of 188,742 sqm, has high energy demands per unit area, possibly due to dense business activities concentrated in a smaller area.

- **Parkville (block ID 931)**

  Here shows a high total energy consumption of 102,859.18 MWh, with a significant focus on public service and healthcare sectors. The energy consumption per sqm is relatively low at 0.1317 MWh/sqm, despite having the largest commercial area (780,444 sqm) among the listed blocks. This suggests that the activities in Parkville may be more spread out or less energy-intensive per unit area.

- **Southbank (block ID 804)** and **Melbourne (Remainder, block ID 870)**

  There two have interesting profiles with moderate energy consumption but high energy consumption per sqm, especially in Melbourne (Remainder) where the energy consumption per sqm is notably high at 1.0626 MWh/sqm. This block, with a relatively small commercial area of 63,415 sqm, is characterized by healthcare (Alfred Hospital) and educational (Wesley College) sectors, likely driving the higher energy usage intensity.

In [167]:
# Top 10 CLUE area having the most energy consumption

CLUE_area = list(set(df_blockid['CLUE'].tolist()))

total_energy = [0] * len(CLUE_area)
commercial_total = [0] * len(CLUE_area)
resident_total = [0] * len(CLUE_area)
area_total = [0] * len(CLUE_area)
plot_ratio_total = [0] * len(CLUE_area)
total_jobs = [0] * len(CLUE_area)
energy_per_sqm = [0] * len(CLUE_area)

business_name = ['retail', 'business', 'leisure', 'education',
                'healthcare','manufacture','public_service','farming','residence']
area_per_business = [[0] * len(business_name) for _ in range(len(CLUE_area))]

largest_floor_area_business_type = []
second_largest_floor_area_business_type = []

for index, row in df_blockid.iterrows():
    i_clue = CLUE_area.index(row['CLUE'])

    total_energy[i_clue] += int(row['total_energy_consumption'])
    commercial_total[i_clue] += int(row['commercial_area'])
    resident_total[i_clue] += int(row['residential_area'])
    area_total[i_clue] += int(row['block_area'])
    total_jobs[i_clue] += row['total_jobs']

    for i,r in df_2_merged.iterrows():
        business_list = []
        if (row['block_id'] == r['block_id']):
            subset = r.iloc[2:-1]
            numeric_subset = pd.to_numeric(subset, errors='coerce')
            if sum(numeric_subset) > 0:
                business_list = numeric_subset.tolist()
                for k in range(len(business_name)):
                    area_per_business[i_clue][k] += business_list[k]

            break

for n in range(len(plot_ratio_total)):
    plot_ratio_total[n] = (commercial_total[n] + resident_total[n]) / area_total[n]
    energy_per_sqm[n] = total_energy[n]/(commercial_total[n] + resident_total[n])

for li in area_per_business:
    max_index = li.index(max(li))
    li_copy = li[:]
    li_copy[max_index] = float('-inf')
    second_max_index = li.index(max(li_copy))
    largest_floor_area_business_type.append(business_name[max_index])
    second_largest_floor_area_business_type.append(business_name[second_max_index])

CLUE_dict = {'CLUE': CLUE_area,
             'total_energy_consumption': total_energy,
             'energy_consumption_per_sqm':energy_per_sqm,
             'commercial_area':commercial_total,
             'residential_area': resident_total,
             'plot_ratio': plot_ratio_total,
             'business/residence_with_largest_floor_area':largest_floor_area_business_type,
             'business/residence_with_second_largest_floor_area':second_largest_floor_area_business_type,
             'total_jobs':total_jobs}


In [179]:
df_CLUE = pd.DataFrame(CLUE_dict)
df_CLUE.to_csv('/content/drive/My Drive/Colab Notebooks/df_CLUE.csv', index=True)

df_CLUE_sorted = df_CLUE.sort_values(by='total_energy_consumption', ascending=False)
df_CLUE_sorted = df_CLUE_sorted.reset_index(drop=True)

df_CLUE_sorted

Unnamed: 0,CLUE,total_energy_consumption,energy_consumption_per_sqm,commercial_area,residential_area,plot_ratio,business/residence_with_largest_floor_area,business/residence_with_second_largest_floor_area,total_jobs
0,Melbourne (CBD),1840094,0.17812,7430202,2900440,3.996858,business,public_service,215597
1,Docklands,449178,0.107447,2511276,1669191,2.203639,residence,public_service,72917
2,Southbank,410564,0.135836,1357201,1665288,1.770434,residence,public_service,42283
3,Parkville,286033,0.151039,1433955,459812,0.351433,public_service,residence,27300
4,Carlton,214601,0.099739,1044045,1107590,0.66406,residence,public_service,14801
5,East Melbourne,174633,0.129506,552314,796144,0.606885,residence,public_service,20309
6,North Melbourne,170453,0.1098,428526,1123876,0.571867,residence,public_service,7460
7,Melbourne (Remainder),139236,0.093851,1263063,220527,0.478219,public_service,residence,16490
8,Port Melbourne,135552,0.129927,404737,638553,0.151826,residence,business,8491
9,West Melbourne (Industrial),82361,0.038042,1640692,524321,0.335936,public_service,residence,3016


**Conclusions**

Overall, the **Melbourne CBD** has the highest total energy consumption, significantly outpacing other areas. This is expected given its extensive commercial area and high job density, making it the economic hub of the city. The energy consumption per square meter varies across the regions, with Melbourne (Remainder) showing the lowest rate, indicating relatively lower energy intensity per unit area.

- Residential areas

  Residential areas tend to dominate in most regions except for the CBD, where commercial activities prevail. The presence of public service facilities also plays a crucial role in the energy consumption patterns of these regions.

- Effect of high plot ratio

  The top three CLUE areas with the highest energy consumption are Melbourne (CBD), Docklands, and Southbank. These areas not only lead in total energy consumption but also have some of the highest plot ratios, indicating dense urban development and a significant concentration of buildings relative to the land area.