## Investigating Heat-Activity Hotspots for Strategic Cooling in Melbourne
**Authored by:** Ratanakmoni Slot

**Duration:** TBD  
**Level:** Intermediate  
**Pre-requisite Skills:** Python, Data Analysis, Geospatial Analysis, Temporal Analysis


### Scenario

As Melbourne experiences rising temperatures, extreme heat poses significant challenges to urban living. Public health, comfort, and safety are particularly affected during crowded, high-temperature periods. To mitigate these challenges, this project investigates pedestrian and vehicle movement patterns during key heat-prone times.

The goal is to identify "heat-activity hotspots"—areas with high density of activity during extreme heat—to inform the strategic placement of cooling resources such as shaded seating, hydration stations, misting systems, and tree planting initiatives.


### What this Use Case Will Teach You

At the end of this use case, you will:
- Learn how to clean and preprocess pedestrian, vehicle, and weather datasets.
- Perform exploratory data analysis (EDA) to extract insights about movement patterns.
- Apply clustering techniques to identify activity hotspots during peak heat periods.
- Conduct geospatial and temporal analysis to visualize density trends over time.
- Develop targeted cooling recommendations based on data-driven insights.
- Create an interactive framework to inform future heat mitigation strategies.


### Background and Introduction

Melbourne's increasing summer temperatures present a unique urban challenge. Extreme heat impacts pedestrian and vehicle behavior, with some areas remaining busy despite harsh conditions. Identifying and addressing these heat-activity hotspots is critical for public health and comfort.

By combining weather data with movement patterns, this use case explores spatial and temporal correlations to identify areas requiring intervention. The findings will guide strategic placement of cooling resources to maximize impact during peak heat hours.


### Datasets Used

1. **Pedestrian and Vehicle Movement Data:** Details activity patterns across Melbourne, segmented by time of day.
2. **Weather Data:** Includes temperature, heatwave occurrences, and humidity levels.
3. **Geospatial Data:** Provides location-based mapping for pedestrian pathways, vehicle routes, and urban landmarks.

These datasets can be sourced from local government open data portals or relevant APIs.


### Implementation

**Step 1: Data Cleaning and Preprocessing**
- Load and inspect the datasets.
- Handle missing or inconsistent values and standardize metrics.

**Step 2: Exploratory Data Analysis (EDA)**
- Generate summary statistics for traffic and weather patterns.
- Conduct spatial analysis to map high-density areas during heatwaves.
- Visualize movement trends over time.

**Step 3: Clustering and Hotspot Identification**
- Apply K-means clustering to identify high-traffic areas during heat events.
- Segment clusters by time of day (e.g., morning, midday, evening) and activity type (pedestrian or vehicle).

**Step 4: Temporal Analysis**
- Perform temporal analysis to understand the persistence of activity in heat-prone areas.
- Correlate temperature changes with movement density.

**Step 5: Strategic Cooling Recommendations**
- Based on analysis, propose cooling interventions such as hydration stations, shaded seating, or tree planting.
- Prioritize locations for implementation using data-driven criteria.

**Step 6: Implementation Framework**
- Create a phased plan with milestones and metrics to measure cooling impact.
- Develop an interactive map for stakeholders to visualize findings.


### Conclusion

This use case equips you with the skills to analyze heat-activity correlations and develop data-driven cooling strategies. By leveraging geospatial and temporal data, you can design interventions that enhance urban livability during extreme heat.


In [1]:
# Install necessary libraries
!pip install pandas matplotlib seaborn folium geopandas scikit-learn

# Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import folium
from folium.plugins import MarkerCluster
import geopandas as gpd
from shapely.geometry import Point
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import requests
from io import StringIO





## 2. Load Datasets

In this section, we load the datasets required for our analysis, including temperature and humidity data, pedestrian count data, and vehicle count data. After loading each dataset, we’ll inspect the first few rows to understand the data structure and key variables.


In [2]:
def API_Unlimited(datasetname): # pass in dataset name and api key
    dataset_id = datasetname

    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    #apikey = api_key
    dataset_id = dataset_id
    format = 'csv'

    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # all records
        'lang': 'en',
        'timezone': 'UTC'
    }

    # GET request
    response = requests.get(url, params=params)

    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.content.decode('utf-8')
        datasetname = pd.read_csv(StringIO(url_content), delimiter=';')
        print(datasetname.sample(10, random_state=999)) # Test
        return datasetname
    else:
        return (print(f'Request failed with status code {response.status_code}'))

In [3]:
# Dataset Identifiers
pedestrian_sensor_locations_id = "pedestrian-counting-system-sensor-locations"
pedestrian_counts_hourly_id = "pedestrian-counting-system-monthly-counts-per-hour"
weather_data_id = "microclimate-sensors-data"

# Load Datasets
sensor_locations_data = API_Unlimited(pedestrian_sensor_locations_id)
pedestrian_counts_data = API_Unlimited(pedestrian_counts_hourly_id)
weather_data = API_Unlimited(weather_data_id)

# Inspect the datasets
if sensor_locations_data is not None:
    print("Sensor Locations Data:\n", sensor_locations_data.head())

if pedestrian_counts_data is not None:
    print("Pedestrian Counts Data:\n", pedestrian_counts_data.head())

if weather_data is not None:
    print("Weather Data:\n", weather_data.head())

     location_id                                 sensor_description  \
61            65                          Swanston St - City Square   
93            17                              Collins Place (South)   
29            87                                    Errol St (West)   
126          137     COM Pole 2353 - Towards the city, NAB Building   
0              2                         Bourke Street Mall (South)   
24            72                                  Flinders St- ACMI   
18            43                       Monash Rd-Swanston St (West)   
125          131  I-Hub Corner of King Street and Flinders Stree...   
53            41                     Flinders La-Swanston St (West)   
140          166                                 484 Spencer Street   

      sensor_name installation_date  \
61        SwaCs_T        2020-03-12   
93        Col15_T        2009-03-30   
29      Errol23_T        2022-05-20   
126  BouHbr2353_T        2023-11-03   
0        Bou283_T      

In [4]:
# Print column names for all three datasets
print("Columns in Microclimate Data:")
for col in weather_data.columns:
    print(col)

print("\nColumns in Pedestrian Counts Data:")
for col in pedestrian_counts_data.columns:
    print(col)

print("\nColumns in Sensor Locations Data:")
for col in sensor_locations_data.columns:
    print(col)


Columns in Microclimate Data:
device_id
received_at
sensorlocation
latlong
minimumwinddirection
averagewinddirection
maximumwinddirection
minimumwindspeed
averagewindspeed
gustwindspeed
airtemperature
relativehumidity
atmosphericpressure
pm25
pm10
noise

Columns in Pedestrian Counts Data:
id
location_id
sensing_date
hourday
direction_1
direction_2
pedestriancount
sensor_name
location

Columns in Sensor Locations Data:
location_id
sensor_description
sensor_name
installation_date
note
location_type
status
direction_1
direction_2
latitude
longitude
location


In [None]:
# Step 1: Clean Weather Data
weather_data['received_at'] = pd.to_datetime(weather_data['received_at'])
weather_data[['latitude', 'longitude']] = weather_data['latlong'].str.split(',', expand=True).astype(float)
weather_data_cleaned = weather_data.dropna(subset=['latitude', 'longitude', 'sensorlocation'])

# Step 2: Clean Pedestrian Counts Data
pedestrian_counts_data['sensing_date'] = pd.to_datetime(pedestrian_counts_data['sensing_date'])
pedestrian_counts_data['datetime'] = pedestrian_counts_data['sensing_date'] + pd.to_timedelta(pedestrian_counts_data['hourday'], unit='h')
pedestrian_counts_data_cleaned = pedestrian_counts_data.dropna(subset=['pedestriancount'])

# Step 3: Clean Sensor Locations Data
sensor_locations_data_cleaned = sensor_locations_data.dropna(subset=['latitude', 'longitude'])

# Step 4: Merge Pedestrian Counts with Sensor Locations
pedestrian_sensor_merged = pd.merge(
    pedestrian_counts_data_cleaned,
    sensor_locations_data_cleaned,
    on='location_id',
    how='inner'
)

# Step 5: Merge with Weather Data
# Add a 'date' column to weather data for merging
weather_data_cleaned['date'] = weather_data_cleaned['received_at'].dt.date
pedestrian_sensor_merged['date'] = pedestrian_sensor_merged['datetime'].dt.date

# Merge datasets on Date
final_merged_data = pd.merge(
    pedestrian_sensor_merged,
    weather_data_cleaned,
    on='date',
    how='inner'
)

# Step 6: Inspect the Merged Dataset
print("Final Merged Dataset:\n", final_merged_data.head())


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  weather_data_cleaned['date'] = weather_data_cleaned['received_at'].dt.date
