### Prepare environment and necessary data for visualization (Presuming notebook is ran under an AWS Sagemaker Studio JupyterLab environment)

In [1]:
!pip install geopandas
!pip install ipympl
!pip install boto3
!pip install s3fs
!pip install shapely
!pip install jupyterlab_widgets
!pip install scikit-learn
!pip install numpy
!pip install seaborn
!pip install scipy



In [2]:
import os
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.lines import Line2D
from matplotlib.ticker import MaxNLocator, ScalarFormatter
import matplotlib.colors as mcolors
import ipywidgets as widgets
from shapely.geometry import Point
from IPython.display import display, clear_output
from sklearn.cluster import DBSCAN
import seaborn as sns
from scipy.stats import gaussian_kde
import numpy as np
import boto3
import s3fs
import json

pd.set_option('display.width', 1000)  # Increase overall display width
pd.set_option('display.max_columns', None)  # Ensure all columns are displayed
pd.set_option('display.max_colwidth', 100)  # Allow longer column contents

# S3 paths for your data files
csv_s3_path = "s3://cmpt732-project-raw-data/analysis_output/vancouver_land_value_and_investment.csv"
csv_s3_postal_path = "s3://cmpt732-project-raw-data/analysis_output/vancouver_land_value_and_investment_by_postal.csv"
csv_s3_buildings_data_path = "s3://cmpt732-project-raw-data/analysis_output/vancouver_building_permits_clean.csv"
csv_s3_taxes_data_path = "s3://cmpt732-project-raw-data/analysis_output/vancouver_property_tax_clean.csv"
shapefile_s3_path = "./local-area-boundary/local-area-boundary.shp"

# Load the CSV data from S3
df = pd.read_csv(csv_s3_path)
posdf = pd.read_csv(csv_s3_postal_path, delimiter=';')
bdf = pd.read_csv(csv_s3_buildings_data_path, delimiter=';')
pdf = pd.read_csv(csv_s3_taxes_data_path, delimiter=';')

# Load the Vancouver shapefile from S3 using GeoPandas and S3FS
gdf = gpd.read_file(shapefile_s3_path, engine='pyogrio')

# Reproject to a projected CRS (e.g., UTM, EPSG:26910 is common for Vancouver)
gdf = gdf.to_crs(epsg=26910)

# ===== Heatmaps of Vancouver Land and Project Data =====

### Click link to view report: 
### https://cloud.dekart.xyz/reports/3853c282-ffa2-46aa-9d12-5653b9328b3c/source

# ===== Overview of distribution in land and new project values in areas of the city of Vancouver =====

In [3]:
# Function to update the plot based on selected year
def update_value_plot(year):
    # Filter data by selected year
    filtered_data = df[df['year'] == year]

    # Merge with the shapefile data based on local_area
    merged_data = gdf.merge(filtered_data, left_on='name', right_on='local_area')

    # Print a basic summary of the data
    # print(f"\nData for the year {year}:\n")
    # print(merged_data[['name', 'average_land_value', 'average_project_value', 'average_improvement_value']])
    
    # Extract centroids for 3D bar placement
    merged_data['centroid_x'] = merged_data.geometry.centroid.x
    merged_data['centroid_y'] = merged_data.geometry.centroid.y

    # Set up 3D plot
    fig = plt.figure(figsize=(15, 12))  
    ax = fig.add_subplot(111, projection='3d')

    # Adjust the viewing angle for z-axis on the left
    ax.view_init(elev=20, azim=60)

    # Add base map outline
    for _, row in merged_data.iterrows():
        x, y = row.geometry.exterior.xy
        ax.plot(x, y, 0, color='black', alpha=0.5)  # Base map outline

    # Plot 3D bars for the three chosen metrics
    metrics = ['average_land_value', 'average_project_value', 'average_improvement_value']
    colors = ['red', 'green', 'blue']  # Colors for each metric

    # Define smaller bar width and depth for better visibility
    bar_width = 200  # Reduced size for visibility
    bar_depth = 200  # Reduced size for visibility

    for _, row in merged_data.iterrows():
        x = row['centroid_x']
        y = row['centroid_y']
        z = 0  # Base of the bars

        # Plot each metric with a different color
        for i, metric in enumerate(metrics):
            height = row[metric]
            ax.bar3d(x + i * bar_width, y, z, dx=bar_width, dy=bar_depth, dz=height, color=colors[i], alpha=0.5)

    # Add local area labels above map regions with a small offset from the 3D bars
    label_offset = 200  # Adjust this value for a larger or smaller offset
    for _, row in merged_data.iterrows():
        x = row['centroid_x']
        y = row['centroid_y']
        z = 0  # Base of the bars

        # Add an offset to the z position to place labels above the bars
        ax.text(x + label_offset, y + label_offset, z, row['name'], color='black', fontsize=10, ha='center', va='center', zorder=5)

    # Hide axes and grid
    ax.grid(False)
    ax.axis('on')

    # Adjust scaling for better visibility (increase axis limits to accommodate larger scales)
    ax.set_box_aspect([1, 1, 0.5])  # Expand the X and Y axes relative to Z

    # Set title with reduced padding
    ax.set_title(f"Vancouver Land Value and Property Investments ({year})", fontsize=16, pad=20)

    # Add axis titles
    ax.set_xlabel("Longitude", fontsize=12)
    ax.set_ylabel("Latitude", fontsize=12)
    ax.set_zlabel("Value (M CAD)", fontsize=12)  # The z-axis label is changed to reflect the monetary values

    # Customize Z-axis tick formatting
    ax.zaxis.set_major_locator(MaxNLocator(nbins=10))  # Increase tick density

    # Create a custom legend
    legend_elements = [
        Line2D([0], [0], marker='o', color='w', markerfacecolor='red', markersize=10, label='Average Land Value'),
        Line2D([0], [0], marker='o', color='w', markerfacecolor='green', markersize=10, label='Average New Project Value'),
        Line2D([0], [0], marker='o', color='w', markerfacecolor='blue', markersize=10, label='Average Improvement Value')
    ]
    ax.legend(handles=legend_elements, loc='upper left', fontsize=10, title="Metrics")

    # Let matplotlib autoscale the view after plotting
    ax.autoscale_view()

    # Use tight layout to reduce whitespace
    plt.tight_layout()

    # Show the plot
    plt.show()


# Create a slider for year selection
value_year_slider = widgets.IntSlider(
    value=2023,
    min=df['year'].min(),
    max=df['year'].max(),
    step=1,
    description='Year:',
    continuous_update=True,
    layout=widgets.Layout(width='40%')  # Adjusted widget width to fit within notebook cell
)

# Display the slider and bind it to the update_plot function
interactive_value_plot = widgets.interactive(update_value_plot, year=value_year_slider)
display(interactive_value_plot)


interactive(children=(IntSlider(value=2023, description='Year:', layout=Layout(width='40%'), max=2024, min=202…

## Trend of land value, property investment, and property improvement for areas in the city of Vancouver

In [4]:
# Function to plot the data based on selected local area
def plot_value_trends(local_area):
    # Ensure the 'year' column is of integer type
    df['year'] = df['year'].astype(int)
    
    # Filter data for the selected local area
    df_local = df[df['local_area'] == local_area]
    
    # Sort the data by 'year' to avoid backward lines
    df_local = df_local.sort_values(by='year')

    # Create the plot
    plt.figure(figsize=(10, 6))
    
    # Plot each trend line
    plt.plot(df_local['year'], df_local['average_land_value'], label='Average Land Value', color='blue', marker='o')
    plt.plot(df_local['year'], df_local['average_project_value'], label='Average Project Value', color='green', marker='s')
    plt.plot(df_local['year'], df_local['average_improvement_value'], label='Average Improvement Value', color='red', marker='^')
    
    # Add labels and title
    plt.xlabel('Year')
    plt.ylabel('Value ($)')
    plt.title(f"Trends in Average Values for {local_area}")
    plt.legend()
    
    # Show the plot
    plt.grid(True)
    plt.show()

# Create a dropdown widget for selecting the local area
local_area_dropdown = widgets.Dropdown(
    options=df['local_area'].unique(),
    value=df['local_area'].unique()[0],  # Default value
    description='Local Area:',
)

# Create an interactive plot with the dropdown
interactive_value_trend_plot = widgets.interactive(plot_value_trends, local_area=local_area_dropdown)

# Display the widget and plot
display(interactive_value_trend_plot)

interactive(children=(Dropdown(description='Local Area:', options=('Victoria-Fraserview', 'Strathcona', 'Dunba…

## **Observations**
1. **High-Value Areas**:
   - **Dunbar-Southlands**, **West Point Grey**, and **Kerrisdale** consistently lead in land and improvement values, underscoring their desirability and high market demand.
   - These areas also show strong growth in project value, signaling significant development activity.

2. **Affordable or Emerging Areas**:
   - **Renfrew-Collingwood** and **Strathcona** maintain relatively low land values, but their improvement and project values show moderate growth, indicating increasing interest or redevelopment potential.

3. **Rapid Growth**:
   - **Hastings-Sunrise** and **Victoria-Fraserview** exhibit significant increases in land and project values, suggesting rising popularity or gentrification trends.

## **Summary**
1. **Land Value**:
   - Most neighborhoods exhibit a steady increase in average land value, reflecting growing real estate demand or appreciation over time.
   - Higher-end neighborhoods like **Kerrisdale**, **Oakridge**, and **Dunbar-Southlands** have consistently high land values, with Kerrisdale surpassing $2 million by 2023.

2. **Project Value**:
   - Project values (potential development values) vary significantly. Some areas, like **Kerrisdale** and **West Point Grey**, have seen substantial growth, with Kerrisdale rising from 449k CAD in 2020 to 865k CAD in 2024.
   - Others, like **Downtown** and **Strathcona**, show fluctuations, potentially due to localized market dynamics or completed development projects.

3. **Improvement Value**:
   - Improvement values (value of existing structures) generally increase over time, with minor fluctuations in some regions.
   - Neighborhoods like **Oakridge** and **Dunbar-Southlands** see notable growth, indicating significant renovations or high-value developments.

# ===== Overview of distribution in zoning classifications in areas of the city of Vancouver =====

In [5]:
def update_classification_plot(year):
    # Filter data by selected year
    filtered_data = df[df['year'] == year]
    
    # Merge with the shapefile data based on local_area
    merged_data = gdf.merge(filtered_data, left_on='name', right_on='local_area')

    # Print basic data summary before plotting
    # print(f"\nData for the year {year}:\n")
    # print(merged_data[['name', 'residential_zoning_count', 'non_residential_zoning_count']])
    
    # Extract centroids for 3D bar placement
    merged_data['centroid_x'] = merged_data.geometry.centroid.x
    merged_data['centroid_y'] = merged_data.geometry.centroid.y
    
    # Set up 3D plot
    fig = plt.figure(figsize=(15, 12))  
    ax = fig.add_subplot(111, projection='3d')

    # Adjust the viewing angle for z-axis on the left
    ax.view_init(elev=20, azim=60)

    # Add base map outline
    for _, row in merged_data.iterrows():
        x, y = row.geometry.exterior.xy
        ax.plot(x, y, 0, color='black', alpha=0.5)  # Base map outline

    # Plot 3D bars for the two chosen metrics (residential zoning count and non-residential zoning count)
    metrics = ['residential_zoning_count', 'non_residential_zoning_count']  # Updated metrics
    colors = ['blue', 'red']  # Colors for each metric

    # Define smaller bar width and depth for better visibility
    bar_width = 200  # Reduced size for visibility
    bar_depth = 200  # Reduced size for visibility

    for _, row in merged_data.iterrows():
        x = row['centroid_x']
        y = row['centroid_y']
        z = 0  # Base of the bars

        # Get residential and non-residential zoning counts
        residential_zoning_count = row['residential_zoning_count']
        non_residential_zoning_count = row['non_residential_zoning_count']

        # Plot each metric with a different color
        # First bar for residential zoning count
        ax.bar3d(x - bar_width/2, y, z, dx=bar_width, dy=bar_depth, dz=residential_zoning_count, color=colors[0], alpha=0.5)

        # Second bar for non-residential zoning count
        ax.bar3d(x + bar_width/2, y, z, dx=bar_width, dy=bar_depth, dz=non_residential_zoning_count, color=colors[1], alpha=0.5)

    # Add local area labels above map regions with a small offset from the 3D bars
    label_offset = 200  # Adjust this value for a larger or smaller offset
    for _, row in merged_data.iterrows():
        x = row['centroid_x']
        y = row['centroid_y']
        z = 0  # Base of the bars

        # Add an offset to the z position to place labels above the bars
        ax.text(x, y + label_offset, z, row['name'], color='black', fontsize=10, ha='center', va='center', zorder=5)

    # Hide axes and grid
    ax.grid(False)
    ax.axis('on')

    # Adjust scaling for better visibility (increase axis limits to accommodate larger scales)
    ax.set_box_aspect([1, 1, 0.5])  # Expand the X and Y axes relative to Z

    # Set title with reduced padding
    ax.set_title(f"Vancouver Zoning by Classification ({year})", fontsize=16, pad=20)

    # Add axis titles
    ax.set_xlabel("Longitude", fontsize=12)
    ax.set_ylabel("Latitude", fontsize=12)
    ax.set_zlabel("Count (M)", fontsize=12)  # The z-axis label is changed to reflect the counts

    # Customize Z-axis tick formatting
    ax.zaxis.set_major_locator(MaxNLocator(nbins=10))  # Increase tick density

    # Create a custom legend
    legend_elements = [
        Line2D([0], [0], marker='o', color='w', markerfacecolor='blue', markersize=10, label='Residential Zoning'),
        Line2D([0], [0], marker='o', color='w', markerfacecolor='red', markersize=10, label='Non-Residential Zoning')
    ]
    ax.legend(handles=legend_elements, loc='upper left', fontsize=10, title="Zoning Type")

    # Let matplotlib autoscale the view after plotting
    ax.autoscale_view()

    # Use tight layout to reduce whitespace
    plt.tight_layout()

    # Show the plot
    plt.show()


# Create a slider for year selection
classification_year_slider = widgets.IntSlider(
    value=2023,
    min=df['year'].min(),
    max=df['year'].max(),
    step=1,
    description='Year:',
    continuous_update=True,
    layout=widgets.Layout(width='40%')  # Adjusted widget width to fit within notebook cell
)

# Display the slider and bind it to the update_classification_plot function
interactive_classification_plot = widgets.interactive(update_classification_plot, year=classification_year_slider)
display(interactive_classification_plot)

interactive(children=(IntSlider(value=2023, description='Year:', layout=Layout(width='40%'), max=2024, min=202…

## Pie Chart of distribution in zoning classifications for the city of Vancouver by local area

In [6]:
# Function to plot the "3D" pie chart of zoning classifications based on selected local area and year
def plot_zoning_3d_pie(local_area, year):
    # Ensure the 'year' column is of integer type
    df['year'] = df['year'].astype(int)
    
    # Filter data for the selected local area and year
    df_local = df[(df['local_area'] == local_area) & (df['year'] == year)]
    
    # If no data for the selected combination, return
    if df_local.empty:
        print(f"No data available for {local_area} in year {year}")
        return
    
    # Data for the pie chart
    residential_count = df_local['residential_zoning_count'].values[0]
    non_residential_count = df_local['non_residential_zoning_count'].values[0]
    
    # Create a 2D pie chart with a 3D-like effect
    fig, ax = plt.subplots(figsize=(10, 6))
    
    # Zoning labels and values
    labels = ['Residential Zoning', 'Non-Residential Zoning']
    values = [residential_count, non_residential_count]
    
    # Plot the pie chart with a shadow and explode effect for visual "3D" look
    wedges, texts, autotexts = ax.pie(values, labels=labels, autopct='%1.1f%%', startangle=90, 
                                      explode=(0.1, 0), shadow=True)
    
    # Add 3D-like effect by slightly changing the angle
    ax.set_title(f"Zoning Classifications for {local_area} in {year}")
    ax.axis('equal')  # Equal aspect ratio ensures the pie is drawn as a circle.
    
    # Show the plot
    plt.show()

# Create a dropdown widget for selecting the local area
local_area_dropdown = widgets.Dropdown(
    options=df['local_area'].unique(),
    value=df['local_area'].unique()[0],  # Default value
    description='Local Area:',
)

# Create a slider widget for selecting the year
year_slider = widgets.IntSlider(
    min=df['year'].min(),
    max=df['year'].max(),
    value=df['year'].min(),  # Default value
    description='Year:',
    continuous_update=False
)

# Create an interactive plot with the dropdown and slider
interactive_pie_chart = widgets.interactive(plot_zoning_3d_pie, local_area=local_area_dropdown, year=year_slider)

# Display the widgets and the plot
display(interactive_pie_chart)

interactive(children=(Dropdown(description='Local Area:', options=('Victoria-Fraserview', 'Strathcona', 'Dunba…

## **Observations:**
- **Shaughnessy**: Residential zoning count remains very low (under 75,000), while non-residential zoning remains extremely high (over 2 million).
- **Grandview-Woodland**: Residential zoning count shows a steady increase, while non-residential zoning remains quite low.
- **Sunset**: Residential zoning count has minimal fluctuation over the years, and non-residential zoning remains consistently low at around **177,000**.

## **Summary:**
- **Residential zoning** is generally stable, with some neighborhoods seeing moderate growth, particularly in areas like Marpole, Mount Pleasant, and Grandview-Woodland.
- **Non-residential zoning** shows more significant growth, especially in areas like Downtown, Hastings-Sunrise, and Marpole, reflecting ongoing urban development, likely due to increasing commercial or industrial developments.
- **Certain neighborhoods** like Shaughnessy and the West End appear to have a more balanced approach between residential and non-residential zones, while other areas show either high residential or high non-residential counts, indicating different focuses in urban planning.

# ===== Graph of distribution in property development in areas of the city of Vancouver =====

In [7]:
def update_development_plot(year):
    # Filter data by selected year
    filtered_data = df[df['year'] == year]

    # Merge with the shapefile data based on local_area
    merged_data = gdf.merge(filtered_data, left_on='name', right_on='local_area')

    # Print basic data summary before plotting
    # print(f"\nData for the year {year}:\n")
    # print(merged_data[['name', 'residential_property_use_count', 'non_residential_property_use_count']])
    
    # Extract centroids for 3D bar placement
    merged_data['centroid_x'] = merged_data.geometry.centroid.x
    merged_data['centroid_y'] = merged_data.geometry.centroid.y

    # Set up 3D plot
    fig = plt.figure(figsize=(15, 12))  
    ax = fig.add_subplot(111, projection='3d')

    # Adjust the viewing angle for z-axis on the left
    ax.view_init(elev=20, azim=60)

    # Add base map outline
    for _, row in merged_data.iterrows():
        x, y = row.geometry.exterior.xy
        ax.plot(x, y, 0, color='black', alpha=0.5)  # Base map outline

    # Plot 3D bars for the two chosen metrics (residential property development and non-residential property development)
    metrics = ['residential_property_use_count', 'non_residential_property_use_count']  # Updated metrics
    colors = ['green', 'orange']  # Colors for each metric

    # Define smaller bar width and depth for better visibility
    bar_width = 200  # Reduced size for visibility
    bar_depth = 200  # Reduced size for visibility

    for _, row in merged_data.iterrows():
        x = row['centroid_x']
        y = row['centroid_y']
        z = 0  # Base of the bars

        # Get residential and non-residential property use counts
        residential_property_use_count = row['residential_property_use_count']
        non_residential_property_use_count = row['non_residential_property_use_count']

        # Plot each metric with a different color
        # First bar for residential property use count
        ax.bar3d(x - bar_width/2, y, z, dx=bar_width, dy=bar_depth, dz=residential_property_use_count, color=colors[0], alpha=0.5)

        # Second bar for non-residential property use count
        ax.bar3d(x + bar_width/2, y, z, dx=bar_width, dy=bar_depth, dz=non_residential_property_use_count, color=colors[1], alpha=0.5)

    # Add local area labels above map regions with a small offset from the 3D bars
    label_offset = 200  # Adjust this value for a larger or smaller offset
    for _, row in merged_data.iterrows():
        x = row['centroid_x']
        y = row['centroid_y']
        z = 0  # Base of the bars

        # Add an offset to the z position to place labels above the bars
        ax.text(x, y + label_offset, z, row['name'], color='black', fontsize=10, ha='center', va='center', zorder=5)

    # Hide axes and grid
    ax.grid(False)
    ax.axis('on')

    # Adjust scaling for better visibility (increase axis limits to accommodate larger scales)
    ax.set_box_aspect([1, 1, 0.5])  # Expand the X and Y axes relative to Z

    # Set title with reduced padding
    ax.set_title(f"Vancouver Property Development by Usage ({year})", fontsize=16, pad=20)

    # Add axis titles
    ax.set_xlabel("Longitude", fontsize=12)
    ax.set_ylabel("Latitude", fontsize=12)
    ax.set_zlabel("Count (M)", fontsize=12)  # The z-axis label is changed to reflect the counts

    # Customize Z-axis tick formatting
    ax.zaxis.set_major_locator(MaxNLocator(nbins=10))  # Increase tick density

    # Create a custom legend
    legend_elements = [
        Line2D([0], [0], marker='o', color='w', markerfacecolor='green', markersize=10, label='Residential Property Development'),
        Line2D([0], [0], marker='o', color='w', markerfacecolor='orange', markersize=10, label='Non-Residential Property Development')
    ]
    ax.legend(handles=legend_elements, loc='upper left', fontsize=10, title="Property Development Type")

    # Let matplotlib autoscale the view after plotting
    ax.autoscale_view()

    # Use tight layout to reduce whitespace
    plt.tight_layout()

    # Show the plot
    plt.show()


# Create a slider for year selection
development_year_slider = widgets.IntSlider(
    value=2023,
    min=df['year'].min(),
    max=df['year'].max(),
    step=1,
    description='Year:',
    continuous_update=True,
    layout=widgets.Layout(width='40%')  # Adjusted widget width to fit within notebook cell
)


# Display the slider and bind it to the update_development_plot function
interactive_development_plot = widgets.interactive(update_development_plot, year=development_year_slider)
display(interactive_development_plot)

interactive(children=(IntSlider(value=2023, description='Year:', layout=Layout(width='40%'), max=2024, min=202…

## Pie Chart of distribution in property usage type for the city of Vancouver by local area

In [8]:
# Function to plot the "3D" pie chart of property use classifications based on selected local area and year
def plot_property_use_pie(local_area, year):
    # Ensure the 'year' column is of integer type
    df['year'] = df['year'].astype(int)
    
    # Filter data for the selected local area and year
    df_local = df[(df['local_area'] == local_area) & (df['year'] == year)]
    
    # If no data for the selected combination, return
    if df_local.empty:
        print(f"No data available for {local_area} in year {year}")
        return
    
    # Data for the pie chart
    residential_property_use_count = df_local['residential_property_use_count'].values[0]
    non_residential_property_use_count = df_local['non_residential_property_use_count'].values[0]
    
    # Create a 2D pie chart with a 3D-like effect
    fig, ax = plt.subplots(figsize=(10, 6))
    
    # Property use labels and values
    labels = ['Residential Property Use', 'Non-Residential Property Use']
    values = [residential_property_use_count, non_residential_property_use_count]
    
    # Plot the pie chart with a shadow and explode effect for visual "3D" look
    wedges, texts, autotexts = ax.pie(values, labels=labels, autopct='%1.1f%%', startangle=90, 
                                      explode=(0.1, 0), shadow=True)
    
    # Add 3D-like effect by slightly changing the angle
    ax.set_title(f"Property Projects Classifications for {local_area} in {year}")
    ax.axis('equal')  # Equal aspect ratio ensures the pie is drawn as a circle.
    
    # Show the plot
    plt.show()

# Create a dropdown widget for selecting the local area
local_area_dropdown = widgets.Dropdown(
    options=df['local_area'].unique(),
    value=df['local_area'].unique()[0],  # Default value
    description='Local Area:',
)

# Create a slider widget for selecting the year
year_slider = widgets.IntSlider(
    min=df['year'].min(),
    max=df['year'].max(),
    value=df['year'].min(),  # Default value
    description='Year:',
    continuous_update=False
)

# Create an interactive plot with the dropdown and slider
interactive_property_use_pie = widgets.interactive(plot_property_use_pie, local_area=local_area_dropdown, year=year_slider)

# Display the widgets and the plot
interactive_property_use_pie  # Just use the interactive widget

interactive(children=(Dropdown(description='Local Area:', options=('Victoria-Fraserview', 'Strathcona', 'Dunba…

## **Observations**

The data presented shows the counts of residential and non-residential property uses in various districts for new projects from the years 2020 to 2024. Here are some key observations:

- **Downtown** consistently has a high count of both residential and non-residential property uses, with non-residential properties steadily increasing over the years.
- **Hastings-Sunrise** has shown an increase in residential property use from 91 in 2020 to 106 in 2024, though non-residential property use has fluctuated.
- **Kerrisdale** has seen a decline in residential property use, from 52 in 2020 to just 33 in 2024, with non-residential uses remaining relatively low.
- **Marpole** shows an increase in non-residential property use, from 21 in 2020 to 13 in 2024, while residential use remained fairly steady.
- **Oakridge** and **Riley Park** have seen small declines in both residential and non-residential uses, though Riley Park maintained relatively higher numbers for residential properties.
- **Victoria-Fraserview** and **Dunbar-Southlands** consistently show higher residential property use in comparison to non-residential property use.
- **Strathcona** and **Killarney** have the lowest non-residential property uses, with Strathcona showing a noticeable dip in residential use by 2024.
- **West End** remains balanced with both residential and non-residential uses, but non-residential projects have generally seen a higher increase over the years.

## **Summary**

From 2020 to 2024, there have been noticeable trends in the shift of property types in Vancouver's districts:
- **Increased non-residential property use** in areas like Downtown, Fairview, and the West End, with some fluctuations in others such as Hastings-Sunrise.
- **Residential property use** has remained relatively stable or slightly decreased in several areas, especially in more established neighborhoods like Kerrisdale and Arbutus Ridge.
- **Districts like Downtown and West End** continue to see a high demand for both residential and non-residential projects, with Downtown leading in both categories.
- **Kerrisdale, Marpole, and Killarney** show a trend toward lower residential development, possibly indicating a shift in property demand or availability.
- Overall, the data reveals a dynamic landscape in property development, where the demand for non-residential properties is consistently rising, while residential demand sees mixed results across districts.

# ===== Overview of correlations between property investment and land value change =====

In [9]:
# Ensure 'year' is an integer column
posdf['year'] = posdf['year'].astype(int)

# Create a function to calculate correlations and plot the results
def plot_correlation_by_investment_type(investment_type):
    # Filter data based on the investment type
    if investment_type == 'residential':
        filtered_data = posdf[posdf['property_use'] == 'residential']
    elif investment_type == 'non-residential':
        filtered_data = posdf[posdf['property_use'] != 'residential']
    else:
        filtered_data = posdf  # 'both' includes all data

    # Create a pivot table for cross-year correlations
    years = sorted(filtered_data['year'].unique())
    correlation_table = pd.DataFrame(index=years, columns=years)

    # Populate the correlation table
    for row_year in years:
        for col_year in years:
            if col_year < row_year:
                correlation = None
            else:
                row_data = filtered_data[filtered_data['year'] == row_year]['total_project_value']
                col_data = filtered_data[filtered_data['year'] == col_year]['total_land_value']
                if not row_data.empty and not col_data.empty:
                    correlation = row_data.reset_index(drop=True).corr(col_data.reset_index(drop=True))
                else:
                    correlation = None
            correlation_table.loc[row_year, col_year] = correlation

    # Plotting the correlations
    correlation_table = correlation_table.astype(float, errors='ignore')
    correlation_table.reset_index(inplace=True)
    correlation_table.rename(columns={'index': 'Investment Year'}, inplace=True)

    plt.figure(figsize=(10, 6))
    for investment_year in correlation_table['Investment Year']:
        row_data = correlation_table[correlation_table['Investment Year'] == investment_year].iloc[0, 1:]
        valid_columns = [col for col in correlation_table.columns[1:] if col >= investment_year]
        plt.plot(valid_columns, row_data.loc[valid_columns], marker='o', label=f'Investment In {investment_year}')

    # Customize the plot
    plt.title(f'Correlation Between Investment and Land Value by Year ({investment_type.capitalize()})', fontsize=16)
    plt.xlabel('Land Value Year', fontsize=12)
    plt.ylabel('Correlation', fontsize=12)
    plt.legend(title='Investment Year', fontsize=10)
    plt.grid(alpha=0.3)
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()


# Create a dropdown widget to select investment type
investment_type_dropdown = widgets.Dropdown(
    options=['both', 'residential', 'non-residential'],
    value='both',  # Default value
    description='Type:',
)

# Create an interactive plot with the dropdown
interactive_plot = widgets.interactive(plot_correlation_by_investment_type, investment_type=investment_type_dropdown)

# Display the widget and plot
interactive_plot


interactive(children=(Dropdown(description='Type:', options=('both', 'residential', 'non-residential'), value=…

In [10]:
# Ensure 'year' is an integer column
posdf['year'] = posdf['year'].astype(int)

# Function to create and display the correlation table based on investment type
def display_correlation_table(investment_type):
    # Filter data based on the selected investment type
    if investment_type == 'residential':
        filtered_data = posdf[posdf['property_use'] == 'residential']
    elif investment_type == 'non-residential':
        filtered_data = posdf[posdf['property_use'] != 'residential']
    else:
        filtered_data = posdf  # 'both' includes all data
    
    # Create a pivot table for cross-year correlations
    years = sorted(filtered_data['year'].unique())
    correlation_table = pd.DataFrame(index=years, columns=years)

    # Populate the correlation table
    for row_year in years:
        for col_year in years:
            if col_year < row_year:
                correlation = None
            else:
                row_data = filtered_data[filtered_data['year'] == row_year]['total_project_value']
                col_data = filtered_data[filtered_data['year'] == col_year]['total_land_value']
                
                if not row_data.empty and not col_data.empty:
                    correlation = row_data.reset_index(drop=True).corr(col_data.reset_index(drop=True))
                else:
                    correlation = None  # Handle missing data gracefully
            
            correlation_table.loc[row_year, col_year] = correlation

    # Prepare the table for display
    correlation_table = correlation_table.astype(float, errors='ignore')  # Ensure numeric data for plotting

    # Display the title and correlation table
    title = f"Correlation Between Investment and Land Value by Year ({investment_type.capitalize()})"
    print(title)  # Title above the table
    display(correlation_table)  # Display the table

# Create a dropdown widget for selecting the investment type
investment_type_dropdown = widgets.Dropdown(
    options=['both', 'residential', 'non-residential'],
    value='both',  # Default value
    description='Type:',
)

# Create an interactive plot with the dropdown
interactive_table = widgets.interactive(display_correlation_table, investment_type=investment_type_dropdown)

# Display the widget and the table
interactive_table

interactive(children=(Dropdown(description='Type:', options=('both', 'residential', 'non-residential'), value=…

Summary
=====================================

The data presents the correlation between investment and land value for **both** overall properties, **residential** properties, and **non-residential** properties from **2017 to 2024**. The correlation coefficient measures the strength and direction of the relationship between investment and land value, with values closer to +1 indicating a strong positive correlation, values closer to -1 indicating a strong negative correlation, and values near 0 indicating little to no correlation.

1. Overall Properties (Both):
-----------------------------
- **2017-2023**: The correlation values fluctuate significantly, with mostly **low or negative correlations** between investment and land value. In particular, correlations drop significantly after 2018, with negative correlations in most years (2019 to 2023). The negative correlations observed during **2020-2022** are likely influenced by the **COVID-19 pandemic**, which caused economic disruption and uncertainty, potentially leading to less predictable investment trends and land value shifts.
- **2024**: A **positive correlation of 0.192159** is seen in 2024, which is the highest value in the entire dataset, suggesting that investment and land value have a **moderate positive correlation** this year. The recovery phase post-pandemic may have contributed to more stable market conditions, allowing investment to positively influence land values again.

2. Residential Properties:
-------------------------
- **2017-2024**: The correlation values are generally **low or negative** in most years, with several periods showing values near zero. The **only significant positive correlation** occurs in 2019 with a value of **0.092306**, but it quickly returns to negative values in 2020. **COVID-19** likely played a role in these fluctuations, as housing markets were hit by uncertainty and shifting demand patterns during the pandemic. In particular, the **2020-2021** period may have experienced both **volatile demand** for housing (due to lockdowns and work-from-home trends) and **disrupted investment flows**.
- **2024**: A **slightly positive correlation of 0.071242** is observed, indicating that in the last year, there is a mild positive relationship between investment and land value in the residential sector. The post-pandemic recovery may have led to a small increase in both residential investment and land values, but overall, this market remains less responsive to investment than other sectors.

3. Non-Residential Properties:
------------------------------
- **2017-2023**: There is some fluctuation in the correlation between investment and land value, with a **strong positive correlation** in **2018** (**0.257353**) and **2023** (**0.146542**), suggesting that investment had a more meaningful impact on land values in these years, especially in **2018**. The **COVID-19 pandemic** likely caused disruptions in the non-residential sector, particularly in 2020 and 2021, with demand for office spaces and retail properties plummeting, which may explain the negative correlations during this period.
- **2024**: The non-residential sector shows a strong **positive correlation of 0.193192** in 2024, the highest of any year across all sectors. This suggests a **strong positive relationship** between investment and land value in the non-residential property sector. Post-pandemic, as the economy recovers and businesses adapt to new needs (e.g., hybrid workspaces, retail transformations), investment in non-residential properties may begin to have a more noticeable impact on land values.

Key Insights
-------------
- **General Trend**: Across all sectors (both, residential, and non-residential), the relationship between investment and land value appears **weak** or **inconsistent**, particularly in the years of the **COVID-19 pandemic** (2020-2022). However, there are some **positive correlations** in 2024, particularly in **non-residential properties**, where investment seems to drive land value more significantly compared to other years.
  
- **Sector-specific Trends**:
  - **Non-residential properties** show more noticeable fluctuations and some years of **strong positive correlation**, especially in **2018** and **2023**, and a **strong positive correlation in 2024**. This indicates that the non-residential market, which was significantly impacted by the pandemic, may be recovering as businesses re-evaluate their real estate needs in a post-COVID world.
  - **Residential properties**, on the other hand, exhibit **weak or negative correlations**, with the **only notable positive correlation** occurring in 2019. The impact of **COVID-19** on the residential sector has been multifaceted, with changes in work-from-home patterns and shifts in demand for suburban housing possibly explaining the lack of a clear, positive trend.

- **Overall Conclusion**: While 2024 shows **moderate to strong positive correlations** in both overall and non-residential sectors, the trend over the years has been **inconsistent**, with some years showing negative or near-zero correlations. The **COVID-19 pandemic** likely played a significant role in disrupting these trends, and the post-pandemic recovery phase in 2024 appears to be stabilizing the relationship between investment and land value.

Possible Factors
----------------
- **COVID-19** had a profound effect on both residential and non-residential markets, particularly during the initial lockdown phases. Economic disruption, changing work patterns, shifting demand, and uncertainty about future market conditions likely led to **volatile investment behaviors** and **land value fluctuations**.
- Other factors, including economic conditions, supply-demand dynamics, government policies, and broader real estate market trends, may also explain the relationship between investment and land value during this period.

Final Conclusion
-----------------
The data shows **mixed trends** in the relationship between investment and land value across different sectors and years, with some positive correlations emerging in **2024**, especially in the **non-residential sector**. However, these correlations are far from consistent or strong, and the influence of **COVID-19** appears to have been a significant factor in the observed volatility, suggesting that other external factors beyond investment were heavily influencing land values during the pandemic period.
