<div class="usecase-title"><strong>Development and Tree Canopy Changes<strong></div>

<div class="usecase-authors"><b>Authored by:</b> Thomas Warren</div>

<div class="usecase-level-skill">
    <div class="usecase-level"><b>Level: </b>Intermediate</div>
    <div class="usecase-skill"><b>Pre-requisite Skills: </b>Python, Folium, GeoPandas, Clustering Analysis</div>
</div>

<div class="usecase-section-header"><b>Scenario</b></div>

Developers, designers, construction managers, and planners all need  to understand the impact of urban development on tree canopy coverage to ensure that progress in the city's infrastructure does not come at the expense of environmental sustainability. 

<div class="usecase-section-header">This analysis will show if there is are actionable insights related to development applications near existing trees, and what the changes near development sites are.</div>

To achieve this I will:
- Collate and clean datasets related to development activity and tree canopies;
- Map out developments during the related time period (2011- 2021);
- Quantify changes to canopy cover near to development sites;
- Map canopy cover changes; and
- Complete clustering analysis on development sites to determine if there are correlations.

<div class="usecase-section-header"><b>Project Background</b></div>

As the climate changes, the heat island effect is increasingly relevant. I used three key datasets: Tree Canopy 2011, Tree Canopy 2021, and Development Activity, aiming to uncover any significant correlations between the extent of development activities and changes in tree canopy cover over a decade. 

# Discussion of Results

Despite employing a logical framework to find interactions between urban development activities and their nearby environment, the results do not unveil any strong correlations or significant impacts. One plausible explanation for the absence of discernible patterns is the highly urbanised nature of the CoM area itself. Opportunities for extensive green space are inherently limited in dense municipalities, with much of the land already occupied by built environments or allocated for future developments. This pre-existing condition likely acts as a buffer, mitigating the potential for significant tree canopy loss directly attributable to new development projects. Moreover, Melbourne's urban planning and environmental policies might have played a crucial role in preserving the existing tree canopy.

The hypothesis that urbanisation's impact on tree canopy cover is minimized due to the already reduced green space area suggests a saturation effect; that is, once an urban area reaches a certain level of development, the relative impact of additional constructions on green spaces, including tree canopies, becomes increasingly marginal.

These findings, or the lack thereof, show the challenges of maintaining and enhancing urban green spaces in cities.

# Tree Change Methodology

The following code was used in order to achieve the above aims.

In [2]:
# Import modules
import requests
import os
import numpy as np
import pandas as pd
import folium
from folium.plugins import MarkerCluster
from io import StringIO
import json
import geopandas as gpd
from shapely.geometry import Point
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

__Get and Clean Data from Development Activity and Tree Canopy Datasets__

In [None]:
def API_Unlimited(datasetname,apikey): # pass in dataset name and api key
    dataset_id = datasetname
 
    base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
    apikey = input("Please enter your API key: ")
    dataset_id = dataset_id
    format = 'csv'
 
    url = f'{base_url}{dataset_id}/exports/{format}'
    params = {
        'select': '*',
        'limit': -1,  # all records
        'lang': 'en',
        'timezone': 'UTC',
        'api_key': apikey
    }
 
    # GET request
    response = requests.get(url, params=params)
 
    if response.status_code == 200:
        # StringIO to read the CSV data
        url_content = response.content.decode('utf-8')
        datasetname = pd.read_csv(StringIO(url_content), delimiter=';')
        print(datasetname.sample(10, random_state=999)) # Test
        return datasetname 
    else:
        return (print(f'Request failed with status code {response.status_code}'))
    
dataset_id_1 = 'development-activity-monitor'
dataset_id_2 = 'tree-canopies-2011-urban-forest'
dataset_id_3 = 'tree-canopies-2021-urban-forest'
raw_dev_df = API_Unlimited(dataset_id_1,api_key)
tree_11_df = API_Unlimited(dataset_id_2,api_key)
tree_21_df = API_Unlimited(dataset_id_3,api_key)

In [4]:
tree_11_df.shape

(94699, 51)

In [5]:
    
# Clean up the data
dev_df = raw_dev_df.copy()
# Convert 'year_completed' to numeric
dev_df['year_completed'] = pd.to_numeric(raw_dev_df['year_completed'])

# Filter developments that were completed on or after 2011 or are still in construction (i.e. is NaN).
dev_df = dev_df[(dev_df['year_completed'] >= 2011) | (dev_df['year_completed'].isna())]

# Find relevant columns
dev_df = dev_df[['development_key', 'latitude','longitude']]

In [None]:
# Clean up the data
tree_11_df = tree_11_df[['geo_point_2d', 'geo_shape']]

# Split the 'Geo Point' column into 'latitude' and 'longitude'
tree_11_df[['latitude', 'longitude']] = tree_11_df['geo_point_2d'].str.split(',', expand=True).astype(float)
tree_21_df[['latitude', 'longitude']] = tree_21_df['geo_point_2d'].str.split(',', expand=True).astype(float)


__Convert to GeoDataFrame__

The tabular data needs to be converted to a format that allows spacial analysis operations, which is most easily achieved using GeoPandas' GeoDataFrames. As certain spatial operations require working in planar units, we projected our GeoDataFrames to a Universal Transverse Mercator (UTM) system specific to Melbourne's region (zone 55). This projection converts latitude and longitude coordinates into a flat, two-dimensional plane, measured in meters, allowing distances and other spatial characteristics to be captured accurately.

It was assumed that development sites would have a area of effect of 50 metres, creating the 'buffer' zone to evaluate tree locations. The dataframes were spatial joined to merge tree canopy data from 2011 and 2021 with the buffered development sites. This spatial join operation identified which trees were located within 50m to each development. Using these values, the gain or loss of trees over the time period was calculated and attributed to their nearest development. 

In [10]:
# Convert into a GeoDataFrame and set the initial CRS
gdf_dev_sites = gpd.GeoDataFrame(dev_df, geometry=gpd.points_from_xy(dev_df.longitude, dev_df.latitude), crs="EPSG:4326")
gdf_trees_2011 = gpd.GeoDataFrame(tree_11_df, geometry=gpd.points_from_xy(tree_11_df.longitude, tree_11_df.latitude), crs="EPSG:4326")
gdf_trees_2021 = gpd.GeoDataFrame(tree_21_df, geometry=gpd.points_from_xy(tree_21_df.longitude, tree_21_df.latitude), crs="EPSG:4326")

# Project to a suitable CRS for meter-based analysis
utm_crs = "EPSG:32755"  # Melbourne's zone
gdf_dev_sites = gdf_dev_sites.to_crs(utm_crs)
gdf_trees_2011 = gdf_trees_2011.to_crs(utm_crs)
gdf_trees_2021 = gdf_trees_2021.to_crs(utm_crs)

# Create a 50 Meter Buffer Around Each Development Site
gdf_dev_sites['buffered'] = gdf_dev_sites.geometry.buffer(50)
gdf_dev_sites = gdf_dev_sites.set_geometry('buffered')

# Perform spatial joins for 2011 and 2021 trees within the buffers
joined_trees_2011 = gpd.sjoin(gdf_trees_2011, gdf_dev_sites, how='inner', op='within')
joined_trees_2021 = gpd.sjoin(gdf_trees_2021, gdf_dev_sites, how='inner', op='within')

# Calculate tree counts near each site for 2011 and 2021
trees_near_site_2011 = joined_trees_2011.groupby(joined_trees_2011.index_right).size()
trees_near_site_2021 = joined_trees_2021.groupby(joined_trees_2021.index_right).size()

# Mapping 'development_key' from 'gdf_dev_sites' based on 'index_right'
joined_trees_2011['development_key'] = joined_trees_2011.index_right.map(gdf_dev_sites.reset_index()['development_key'])
joined_trees_2021['development_key'] = joined_trees_2021.index_right.map(gdf_dev_sites.reset_index()['development_key'])

# Fixing the DataFrame to compare tree counts
tree_loss_comparison = pd.DataFrame({
    'Trees Near Site 2011': trees_near_site_2011,
    'Trees Near Site 2021': trees_near_site_2021
}).fillna(0)  # Fill missing values with 0

# Recalculate 'Development Key' mapping (ensure this aligns with your data structure)
tree_loss_comparison['Development Key'] = tree_loss_comparison.index.map(gdf_dev_sites.reset_index()['development_key'])

# Calculate tree loss/gain
tree_loss_comparison['Tree Loss/Gain'] = tree_loss_comparison['Trees Near Site 2021'] - tree_loss_comparison['Trees Near Site 2011']

# Reset index to make 'Development Key' a column
tree_loss_comparison = tree_loss_comparison.reset_index(drop=True)

# Ensure the 'Development Key' in tree_loss_comparison is the same type for accurate merging
tree_loss_comparison['Development Key'] = tree_loss_comparison['Development Key'].astype(str)
gdf_dev_sites['development_key'] = gdf_dev_sites['development_key'].astype(str)

# Merge the tree loss data back into the geodataframe of development sites
gdf_dev_sites = gdf_dev_sites.merge(tree_loss_comparison, left_on='development_key', right_on='Development Key')

  if await self.run_code(code, result, async_=asy):
  if await self.run_code(code, result, async_=asy):


# Results

## Map #1 - Tree Gain/Loss by Development Site
Each development site within our dataset was represented as a marker on this map. The color of the markers was determined by the net change in tree canopy coverage; a positive gain in trees is shown in green while a loss is indicated in red.

In [20]:
# Initialize the map centered around Melbourne
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=16)
marker_cluster = MarkerCluster().add_to(melbourne_map)  # Initialize marker clustering

# Iterate through each development site to add it to the map
for index, row in gdf_dev_sites.iterrows():
    # Define color based on tree loss/gain
    color = 'green' if row['Tree Loss/Gain'] > 0 else 'red'

    # Add marker to the cluster instead of directly to the map
    folium.Marker(
        location=[row['latitude'], row['longitude']],  # Use the latitude and longitude columns
        popup=f"Development Key: {row['development_key']}, Tree Loss/Gain: {row['Tree Loss/Gain']}",
        icon=folium.Icon(color=color, icon="info-sign"),
    ).add_to(marker_cluster)

# Display the map
melbourne_map

In [12]:
# Merge the tree loss/gain information with the original development dataframe
tree_loss_comparison.rename(columns={'Development Key': 'development_key'}, inplace=True)
dev_impact_df = pd.merge(raw_dev_df, tree_loss_comparison, on='development_key', how='left')

# Remove development sites with no tree loss/gain
dev_impact_df.dropna(subset=['Tree Loss/Gain'], inplace=True)

## Map #2 - Clustering Analysis

For further analysis, K-means clustering was performed to attempt to identify patterns. To facilitate this, two features were selected: 'floors_above', indicating the scale of the development, and 'Tree Loss/Gain', which reflects the change in tree canopy cover, and 'status', as to whether being in construction or completed was a relevant feature.

In [13]:
# Select relevant features for clustering
features = dev_impact_df[['floors_above', 'Tree Loss/Gain']]
status_dummies = pd.get_dummies(dev_impact_df['status'], drop_first=True)  # Convert 'status' to dummy variables
features = pd.concat([features, status_dummies], axis=1)

# Standardize the features (important for K-Means)
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

# Perform K-Means clustering
kmeans = KMeans(n_clusters=4, random_state=42)
clusters = kmeans.fit_predict(features_scaled)

# Add cluster information back to the original DataFrame
dev_impact_df['Cluster'] = clusters

  super()._check_params_vs_input(X, default_n_init=10)


In [18]:
# Initialize the map centered around Melbourne
melbourne_map_2 = folium.Map(location=[-37.8136, 144.9631], zoom_start=17)
marker_cluster = MarkerCluster().add_to(melbourne_map_2)  # Initialize marker clustering

# Define a color map for clusters
cluster_colors = {0: 'blue', 1: 'green', 2: 'red', 3: 'purple'}

# Iterate through each development site to add it to the map with the cluster color
for index, row in dev_impact_df.iterrows():
    # Get the cluster color
    cluster_color = cluster_colors[row['Cluster']]
    
    # Add marker to the cluster with the cluster color
    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=f"Development Key: {row['development_key']}\nTree Loss/Gain: {row['Tree Loss/Gain']}\nCluster: {row['Cluster']}",
        icon=folium.Icon(color=cluster_color, icon="info-sign"),
    ).add_to(marker_cluster)

# Display the map
melbourne_map_2
