# Project: Toronto Neighborhood Crime Rates Analysis

## Prerequisites
- **Python**
- **Data Visualization**
- **Data Analysis**
- **Automation**

## Objective
This project aims to visualize and analyze the crime rates in various neighborhoods of Toronto over the past decade. By leveraging the provided dataset, we will examine the trends of nine different crime types across multiple neighborhoods, identifying key patterns and insights.

## Data Source
The dataset used in this project is sourced from the [Toronto Open Data Portal](https://open.toronto.ca/dataset/neighbourhood-crime-rates/). It contains annual crime rates for 9 crime types in 158 different neighborhoods of Toronto over a ten-year period.
   - Assault
   - Auto Theft
   - Bike Theft
   - Break and Enter
   - Homicide
   - Robbery
   - Shooting
   - Theft from Motor Vehicle
   - Theft from Vehicle

## Methodology
1. **Data Loading and Preprocessing**:
   - Load the dataset from a CSV file.
   - Extract columns related to crime rates and identify unique crime types and years.
   - Reshape the data into a **long format** for easier analysis.

2. **Data Filtering and Mapping**:
   - Focus on the top five neighborhoods with the highest crime rates for each crime type in the most recent year, and plot the data for these five neighborhoods over the past decade.
   - Apply data mappings to standardize and clarify crime type labels, converting ambiguous or complex crime descriptions into more understandable categories.
   
3. **Visualization**:
   - Use **Plotly's** `graph_objects` to create interactive line plots, showing crime rate trends for each crime type in the selected neighborhoods.
   - Implement a **smooth curve** for better visualization of trends.
   - Customize the plot with appropriate labels, legends, and colors to enhance **readability and accessibility**.
   - Include **buttons** to allow the audience to easily select and view different crime types.

## Tools and Libraries
- **plotly.graph_objects**: For creating interactive visualizations.
- **scipy.interpolate**: For generating smooth curves in the visualizations.
- **pandas**: For data manipulation and preprocessing.
- **numpy**: For numerical operations and data smoothing.

## Key Features
- **Interactive Plots**: Allows users to toggle between different crime types and view trends for specific neighborhoods.
- **Smooth Curves**: Enhances the visualization by providing a clear representation of trends over time.
- **Custom Color Scheme**: Uses **color-blind-friendly** palettes to ensure accessibility for all users.
- **Eye-Friendly Colors**: Incorporates colors that **reduce eye strain**, improving the experience for computer users.
- **Automation**: Automates all data-related processes through Python to streamline workflows and increase efficiency.

## Project Outcomes
By the end of this project, a comprehensive visual representation of crime trends in Toronto neighborhoods will be provided, offering valuable insights for policymakers, researchers, and the general public.

### 1. Top Crime Types in 2023

- In 2023, the following neighborhoods consistently ranked among the top for certain types of crime:


| Neighbourhood                | Crimes                                                            |
|:-----------------------------|:-------------------------------------------------------------------|
| Yonge-Bay Corridor           | Bike Theft, Theft from Vehicle, Assault, Robbery, Break and Enter  |
| Humber Summit                | Theft from Vehicle, Homicide, Theft from Motor Vehicle, Shooting, Auto Theft |
| Yorkdale-Glen Park           | Theft from Vehicle, Robbery, Theft from Motor Vehicle, Break and Enter, Auto Theft |
| Downtown Yonge East          | Bike Theft, Assault, Robbery, Theft from Motor Vehicle             |
| West Humber-Clairville       | Theft from Vehicle, Theft from Motor Vehicle, Auto Theft           |
| Etobicoke City Centre        | Theft from Vehicle, Theft from Motor Vehicle, Auto Theft           |
| Beechborough-Greenbrook      | Homicide, Shooting                                                 |
| Moss Park                    | Assault, Homicide                                                  |
| York University Heights      | Robbery, Break and Enter                                          |
| University                   | Bike Theft, Break and Enter                                        |
| Kensington-Chinatown         | Bike Theft, Assault                                                |


### 2. Top Neighborhoods by Average Crime Rate Over the Past Decade

- Actually, seven of these neighborhoods—such as Yonge-Bay Corridor, Downtown Yonge East, West Humber-Clairville, Moss Park, York University Heights, University, and Kensington-Chinatown—not only have high crime rates in individual years but are also among the top ten neighborhoods with the highest average crime rates over the past decade.


| Top 10 Neighbourhood               | 10_year_avg  | 2014          | 2015          | 2016          | 2017          | 2018          | 2019          | 2020          | 2021          | 2022          | 2023          |
|-------------------------|--------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|
| Yonge-Bay Corridor      | 26739.042615 | 27936.046121  | 30271.456446  | 26812.407443  | 32093.447682  | 31061.328770  | 32884.570601  | 21800.772502  | 17457.553505  | 22587.064243  | 24485.778840  |
| Downtown Yonge East     | 23708.460123 | 21378.319050  | 22245.037901  | 20851.128619  | 25911.458267  | 26748.575362  | 29589.703231  | 22215.257259  | 24573.854629  | 20301.000543  | 23270.266368  |
| Moss Park               | 18876.019204 | 17500.143051  | 16989.706351  | 17816.575414  | 17231.147338  | 21495.605055  | 22641.145049  | 23245.343723  | 18753.509782  | 15930.263732  | 17156.752545  |
| Kensington-Chinatown    | 16976.678837 | 17901.667985  | 18605.412736  | 19262.272571  | 19835.995274  | 19809.355502  | 18819.525394  | 15536.892711  | 12258.925750  | 12347.414696  | 15389.325749  |
| University              | 15678.361229 | 19252.628629  | 18580.405821  | 17575.497451  | 20959.774558  | 18084.663494  | 16039.279822  | 10749.627121  | 9635.135324   | 13267.784594  | 12638.815474  |
| Wellington Place        | 14563.666659 | 19863.477048  | 18449.407026  | 18377.850928  | 17217.391106  | 15994.500339  | 13132.217156  | 10500.000000  | 10243.161371  | 11539.991868  | 10318.669747  |
| West Humber-Clairville  | 11773.586301 | 10752.969013  | 9976.601333   | 10382.883427  | 10677.385682  | 13246.204644  | 11580.369881  | 10535.894463  | 10887.899082  | 14349.937018  | 15345.718473  |
| Church-Wellesley        | 10537.896029 | 9462.219383   | 8805.031321   | 10555.079094  | 10913.508530  | 11541.336649  | 12069.048495  | 10149.015231  | 11426.989276  | 10355.590260  | 10101.142046  |
| Yorkdale-Glen Park      | 10332.252690 | 8106.799436   | 8273.281555   | 9178.309658   | 9191.003879   | 10378.891429  | 11383.604147  | 10610.653799  | 9526.710134   | 11996.021961  | 14677.250902  |
| York University Heights | 10248.629801 | 8171.125362   | 9179.401441   | 9728.498705   | 9650.945217   | 10088.858393  | 10360.589262  | 10819.483939  | 9979.329973   | 11125.451828  | 13382.613892  |

_The above results were derived from the code in the final extension section of this file._


### 3. Analysis of Exceptional Growth Rates by Crime Type

- Years with exceptional growth rates for each crime type include the high rates of shootings starting in 2015, increased homicide rates during the pandemic, and more frequent thefts in recent years under current economic conditions.


| Year | Crime             | total_rate    | prev_rate     | growth_rate |
|------|-------------------|---------------|---------------|-------------|
| 2015 | Shooting          | 1691.807198   | 991.362940    | 0.706547    |
| 2016 | Shooting          | 2261.075214   | 1691.807198   | 0.336485    |
| 2018 | Auto Theft        | 25194.060106  | 19048.272379  | 0.322643    |
| 2018 | Homicide          | 573.563413    | 385.299930    | 0.488615    |
| 2021 | Homicide          | 470.003502    | 350.883750    | 0.339485    |
| 2022 | Auto Theft        | 50062.382374  | 34290.199391  | 0.459962    |
| 2022 | Theft from Vehicle| 7416.576961   | 5402.520970   | 0.372799    |

_The above results were derived from the code in the final extension section of this file._


### 4. Consistently High Crime Types Over the Past Decade

- Assault has consistently ranked highest year after year over the past decade, with a significant share of the total crime rates.


| Year | highest_rate_crime | highest_rate_total | overall_total_rate | percentage |
|------|---------------------|---------------------|---------------------|------------|
| 2014 | Assault             | 96443.483544        | 262831.458666        | 36.694041  |
| 2015 | Assault             | 103805.526954       | 261548.572843        | 39.688814  |
| 2016 | Assault             | 105886.836661       | 257488.348332        | 41.122962  |
| 2017 | Assault             | 106417.780532       | 265841.792909        | 40.030493  |
| 2018 | Assault             | 107460.323219       | 277148.141194        | 38.773604  |
| 2019 | Assault             | 111659.991059       | 290138.792459        | 38.485026  |
| 2020 | Assault             | 95609.981582        | 264277.270422        | 36.177906  |
| 2021 | Assault             | 100748.837433       | 245988.997499        | 40.956644  |
| 2022 | Assault             | 111078.814699       | 282051.145885        | 39.382508  |
| 2023 | Assault             | 127397.061081       | 315943.166996        | 40.322778  |

_The above results were derived from the code in the final extension section of this file._


### 5. Visualization of Consistent High Crime Rates

- All the above analyses are reflected in the visualizations of this project. Through these visualizations, we can clearly see that crime rates in these neighborhoods have remained significantly elevated over the past decade. The data trends highlight a persistent issue, with these areas consistently experiencing higher-than-average crime rates, underscoring the need for targeted intervention and policy action.


![team_project2_visualization.png](attachment:team_project2_visualization.png)

_Please note: The png file is just a screenshot from the visualization. To enable interactive features, please execute the code in the subsequent cells of this file to generate the interactive plot._


(The End)



### 0. Install Python Modules

Check and install the required packages for the project:

In [27]:
import subprocess
import sys

# 0.1 The following function installs/tests the major required packages for this script

def install_and_import(package):
    try:
        import_name = package
        __import__(import_name)
    except ImportError:
        print(f"{package} not found. Installing...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"{package} installed successfully.")
    finally:
        globals()[package] = __import__(import_name)


import_packages = ['plotly', 'pandas', 'numpy', 'scipy', 'requests']

for package in import_packages:
    install_and_import(package)
print('Packages have been installed.')

Packages have been installed.


### 1. Import Libraries and Define Variables

In [28]:
import numpy as np
import pandas as pd
from scipy.interpolate import make_interp_spline
import plotly.graph_objects as go
import plotly.colors as pc
import requests
import plotly.express as px
import os

download_url = 'https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/neighbourhood-crime-rates/resource/02898503-a367-4221-9e74-7addb260d110/download/neighbourhood-crime-rates%20-%204326.csv'
file_path ='neighbourhood_crime_rates.csv'
text_color = '#363636'
axis_color ='#A9A9A9'
backgroud_color ='#FFFFE0'
# Select color-blind-friendly color schemes
color_scale = px.colors.qualitative.Plotly
num_colors = len(color_scale)
font_family='Arial'
sans_serif='sans-serif'
# Select the top 5 neighborhoods with the highest crime rates for each crime type
num_neighbours = 5 
fig_path ='line.html'


### 2. Data Processing
 
- **2.1 Function to download the data file**

Use `requests` to download the data file to the local current working directory:


In [29]:
def download_file(download_url, file_path):
    # Send a GET request to the URL
    response = requests.get(download_url)

    # Check if the request was successful
    if response.status_code == 200:
        # Save the content of the response to a local file
        with open(file_path, 'wb') as file:
            file.write(response.content)
        print("Step 1. File downloaded successfully.")
    else:
        print("Failed to download file, status code:", response.status_code)


- **2.2 Data ETL (Extraction, Transformation, Loading)**

- **2.2.1 Create a mapping to transform the original crime types into more readable and user-friendly items**

In [30]:
def map_crime(crime):
    crime_mappings = {
        'ASSAULT': 'Assault',
        'AUTOTHEFT': 'Auto Theft',
        'BIKETHEFT': 'Bike Theft',
        'BREAKENTER': 'Break and Enter',
        'HOMICIDE': 'Homicide',
        'ROBBERY': 'Robbery',
        'SHOOTING': 'Shooting',
        'THEFTFROMMV': 'Theft from Motor Vehicle',
        'THEFTOVER': 'Theft from Vehicle'  # Assuming 'THEFTOVER' maps to 'Theft from Vehicle'
    }
    return crime_mappings.get(crime, 'Unknown')


- **2.2.2 Function for Data ETL**

In [31]:
def data_etl(file_path, num_neighbours):
    # 1) Load data into a dataframe
    report_data = pd.read_csv(file_path)

    # 2) Extract all columns containing 'RATE'
    rate_columns = [col for col in report_data.columns if 'RATE' in col]

    # 3) Extract crime types
    crime_types = set([col.split('_RATE')[0] for col in rate_columns])

    # 4) Dynamically extract year range
    years = sorted(set(int(col.split('_RATE_')[-1]) for col in rate_columns))
    
    # 5) Create long-format data
    crime_data = pd.DataFrame()
    df_long_total = pd.DataFrame() # Dataframe that contains top neighbours for each of crime type
    crime_2023 = {} # Dictionary that contains top neighbours for the most recent year -- 2023
    df_weighted = pd.DataFrame() # Dataframe that contains crime weights to calculate overall crime rate
    for crime in crime_types:
        value_vars = [f'{crime}_RATE_{year}' for year in years]
        existing_vars = [var for var in value_vars if var in report_data.columns]
        
        # Skip if no corresponding columns are found
        if not existing_vars:
            continue
        
        df_long = pd.melt(report_data, 
                        id_vars=['AREA_NAME', 'HOOD_ID'], 
                        value_vars=existing_vars,
                        var_name='Year',
                        value_name='Rate')
        
        # Extract year
        df_long['Year'] = df_long['Year'].str.extract('(\d+)').astype(int)
        df_long['Crime'] = crime

        # Find specific crime types for the most recent year
        most_recent_year = df_long['Year'].max()

        # Filter data for the most recent year and specific crime type
        filtered_df = df_long[(df_long['Year'] == most_recent_year) & (df_long['Crime'] == crime)]

        # Find the top neighborhoods with the highest crime rates
        top_neighbours = filtered_df.groupby('HOOD_ID')['Rate'].mean().nlargest(num_neighbours).index
        crime_2023[crime] = top_neighbours
        # Select data for the top neighborhoods
        top_neighbour_data = df_long[df_long['HOOD_ID'].isin(top_neighbours)]
        
        # Merge individual dataframes for each crime type into a single comprehensive dataframe 
        # to facilitate unified analysis and visualization
        df_long_total = pd.concat([df_long_total, top_neighbour_data])
        df_weighted = pd.concat([df_weighted, df_long])
        
    # Update the values in the 'Crime' column using the mapping dictionary
    df_long_total['Crime'] = df_long_total['Crime'].map(map_crime)
    # print(crime_2023)
    df_weighted['Crime'] = df_weighted['Crime'].map(map_crime)
    return df_long_total, df_weighted


### 3. Data Visualization

- **3.1 SubFunction to create smooth curve for line charts.**

In [32]:
def create_smooth_curve(x, y):
    """
    Create a smooth curve using spline interpolation.

    Parameters:
    - x: The x-axis values (e.g., years).
    - y: The y-axis values (e.g., crime rates).

    Returns:
    - x_new: The new x-axis values for the smooth curve.
    - y_smooth: The smooth y-axis values.
    """

    
    # Generate new x values for a smooth curve
    x_new = np.linspace(x.min(), x.max(), 300)
    # Create smooth y values
    y_smooth = make_interp_spline(x, y)(x_new)
    # Ensure y_smooth values are non-negative
    y_smooth = np.maximum(y_smooth, 0)
    
    # Return the new x and smooth y values
    return x_new, y_smooth

- **3.2 Create interative line chart using Plotly.**

In [33]:
def data_visualizing(df_long_total, backgroud_color, axis_color, font_family, sans_serif, text_color):

    # 1) Create plot
    fig = go.Figure()

    # 2) Refresh the list of unique crime types
    crime_types = df_long_total['Crime'].unique()
    # print(crime_types)
    # 3) Add lines for each crime type and neighborhood
    for crime in crime_types:
        df_crime = df_long_total[df_long_total['Crime'] == crime]
        # Under each crime type, different Hood will have different color
        color_map = {}
        for j, hood in enumerate(df_long_total['HOOD_ID'].unique()):
            # Ensure color index is within the color range
            color_index = j % num_colors  
            color_map[hood] = color_scale[color_index]

        # 4) Plot a line chart for each neighborhood by each crime type over a decade
        for hood in df_crime['HOOD_ID'].unique():
            df_hood = df_crime[df_crime['HOOD_ID'] == hood]
            area_name = df_hood['AREA_NAME'].iloc[0]
            color = color_map[hood]

            # 5) Create the smooth curve data
            x = df_hood['Year']
            y = df_hood['Rate']
            x_new, y_smooth = create_smooth_curve(x, y)

            # 6) Plot line charts
            fig.add_trace(go.Scatter(
                x=x_new, 
                y=y_smooth, 
                mode='lines', 
                name=f'{crime} - {area_name}', 
                line=dict(color=color), 
                visible=False,
                customdata=np.array([[area_name]] * len(x_new)),  # 添加 customdata
                hovertemplate="<b>Area Name: %{customdata[0]}</b><br>" +
                            "Crime Rate: %{y:.2f}<br>" +
                            "<extra></extra>",  # 去掉默认的额外信息   
                )            
            )

    # 7) Add buttons to control visibility of crime types
    buttons = []
    for crime in crime_types:
        visibility = [False] * len(fig.data)
        for i, trace in enumerate(fig.data):
            if trace.name.startswith(crime):
                visibility[i] = True
        buttons.append(dict(label=crime, method="update", args=[{"visible": visibility}]))

    # 8) Set initial visibility
    crime_list = list(crime_types)
    for i, trace in enumerate(fig.data):
        if trace.name.startswith(crime_list[0]):
            trace.visible = True

    # 9) Update buttons
    fig.update_layout(
        updatemenus=[dict(
            active=0,
            buttons=buttons,
            x=1,
            xanchor="right",
            y=1,
            yanchor="top"
        )],
        # Choose colors that reduce eye strain for the background.
        paper_bgcolor=backgroud_color,  
        # Transparent plot area background
        plot_bgcolor='rgba(0,0,0,0)',
        autosize = True, 
    )

    # 10) Update layout
    fig.update_layout(
        margin=dict(t=120, b=10),  # Adjust the top and bottom margins of the chart
        xaxis_title='Year',
        yaxis_title='Crime Rate',
        legend_title='Crime Type - Neighborhood Name',
        xaxis=dict(
            showgrid=False,  # Hide X-axis grid lines
            zeroline=False,  # Hide X-axis zero line
            showline=True,  # Show X-axis line
            linecolor=axis_color,  # X-axis line color set to gray
            tickcolor=axis_color,  # X-axis tick color set to gray
            showticklabels=True,  # Show X-axis tick labels
        ),
        yaxis=dict(
            showgrid=False,  # Hide Y-axis grid lines
            zeroline=False,  # Hide Y-axis zero line
            showline=True,  # Show Y-axis line
            linecolor=axis_color,  # Y-axis line color set to gray
            tickcolor=axis_color,  # Y-axis tick color set to gray
            showticklabels=True,  # Show Y-axis tick labels
        ),
        legend=dict(
            title_font=dict(
                family=font_family,  # Set legend title font family to Arial
                size=12,
                color=text_color  # Set legend title font color to gray
            ),
            font=dict(
                family=sans_serif,  # Set legend text font family to sans-serif
                size=10,
                color=text_color  # Set legend text font color to gray
            ),
        )
    )

    # 11) Update the title
    fig.update_layout(
        title={
            'text': "Crime Trends in Toronto: A Decadal Analysis of Leading Neighborhoods by Crime Type                              August, 2024<br>"
            "<br>"
            "[Toronto Open Data](https://open.toronto.ca/dataset/neighbourhood-crime-rates/)<br>"
            "[Github](https://github.com/renrihui8415/visualization)"
            "<br>",
            'x': 0.05,  # Title horizontal alignment
            'xanchor': 'left',  # Horizontal anchor for the title
            'y': 0.95,  # Vertical position of the title (0.95 is near the top of the chart)
            'yanchor': 'top',  # Vertical anchor for the title
            'font': {'size': 14, 'color': text_color}  # Title font size and color
        },  
    )
    # Return figure
    return fig
    

### 4. Main() Function to streamline the workflow

In [34]:
def main():

    # 1. Download the data file
    download_file(download_url, file_path)

    # 2. Data Process (ETL)
    # Process data before visualization
    df_long_total,_ = data_etl(file_path, num_neighbours)
    print(f"Step 2. Data has been extracted and transformed. Total : {len(df_long_total)} rows.")
    print("Print the first 5 rows to inspect the data:")
    print(df_long_total.head(5))
    area_counts =df_long_total['AREA_NAME'].value_counts().reset_index()
    area_counts.columns = ['AREA_NAME', 'Count']
    df_with_counts = pd.merge(df_long_total, area_counts, on='AREA_NAME', how='left')
    unique_combinations = df_with_counts[['AREA_NAME', 'HOOD_ID']].drop_duplicates()
    # print(unique_combinations)
    
    # 3. Data Visualization
    fig = data_visualizing(df_long_total, backgroud_color, axis_color, font_family, sans_serif, text_color)
    fig.write_html(fig_path)
    fig.show()
    current_directory = os.getcwd()
    print(f'Step 3. Interactive Plot has been generated and saved in "{os.path.join(current_directory, fig_path)}".')
       
if __name__ == "__main__":
    main()

Step 1. File downloaded successfully.
Step 2. Data has been extracted and transformed. Total : 450 rows.
Print the first 5 rows to inspect the data:
                AREA_NAME  HOOD_ID  Year         Rate       Crime
4      Yonge-Bay Corridor      170  2014  2160.852783  Bike Theft
37    Downtown Yonge East      168  2014  1195.249268  Bike Theft
79             University       79  2014  1506.109741  Bike Theft
80   Kensington-Chinatown       78  2014   496.518982  Bike Theft
104        Dufferin Grove       83  2014   214.627701  Bike Theft


Step 3. Interactive Plot has been generated and saved in "/Users/julia/Desktop/Life/DSI/10. Team Project 2/2/line.html".


### 5. Extension

In [69]:
def composite_crime_rates(file_path, num_neighbours, normalize=True):
    # Define crime weights
    crime_weights = {
        'Assault': 4,
        'Auto Theft': 3,
        'Bike Theft': 2,
        'Break and Enter': 3,
        'Homicide': 5,
        'Robbery': 4,
        'Shooting': 5,
        'Theft from Motor Vehicle': 2,
        'Theft from Vehicle': 2
    }

    # Replace this with the actual data loading function
    _, df = data_etl(file_path, num_neighbours)
    
    # Calculate weighted crime rates
    df['weight'] = df['Crime'].map(crime_weights)
    df['weighted_rate'] = df['Rate'] * df['weight']
    
    # Calculate annual weighted crime rate
    annual_crime_rate = df.groupby(['Year', 'AREA_NAME']).agg(
        total_weighted_rate=('weighted_rate', 'sum')
    ).reset_index()
    
    # Calculate total weighted crime rate for each year
    annual_total_rate = annual_crime_rate.groupby('Year')['total_weighted_rate'].sum().reset_index()
    annual_crime_rate = annual_crime_rate.merge(annual_total_rate, on='Year', suffixes=('', '_total'))
    
    if normalize:
        # Normalize the crime rate
        normalization_var = annual_crime_rate['total_weighted_rate_total']
        annual_crime_rate['normalized_rate'] = annual_crime_rate['total_weighted_rate'] / normalization_var
    else:
        annual_crime_rate['normalized_rate'] = annual_crime_rate['total_weighted_rate']
    
    # Pivot table to get each year as a column
    pivot_table = annual_crime_rate.pivot_table(
        index='AREA_NAME', 
        columns='Year', 
        values='normalized_rate', 
        fill_value=0
    ).reset_index()
  
    # Calculate the average crime rate over the last 10 years
    pivot_table['10_year_avg'] = pivot_table.iloc[:, 1:].mean(axis=1)
    
    # Sort by 10-year average crime rate in descending order
    sorted_df = pivot_table.sort_values(by='10_year_avg', ascending=False).reset_index(drop=True)

    # Rearrange columns: move '10_year_avg' to be after 'AREA_NAME'
    columns_order = ['AREA_NAME', '10_year_avg'] + [col for col in sorted_df.columns if col not in ['AREA_NAME', '10_year_avg']]
    sorted_df = sorted_df[columns_order]
    
    sorted_df = sorted_df.reset_index(drop=True)
    return sorted_df

# Call the function with normalization
result_df_normalized = composite_crime_rates(file_path, num_neighbours, normalize=True)

# Call the function without normalization
result_df_non_normalized = composite_crime_rates(file_path, num_neighbours, normalize=False)

# Print the top 10 neighborhoods ranked by composite crime rates for both cases
# result_df_normalized.head(10)

result_df_non_normalized.head(10)


Year,AREA_NAME,10_year_avg,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023
0,Yonge-Bay Corridor,26739.042615,27936.046121,30271.456446,26812.407443,32093.447682,31061.32877,32884.570601,21800.772502,17457.553505,22587.064243,24485.77884
1,Downtown Yonge East,23708.460123,21378.31905,22245.037901,20851.128619,25911.458267,26748.575362,29589.703231,22215.257259,24573.854629,20301.000543,23270.266368
2,Moss Park,18876.019204,17500.143051,16989.706351,17816.575414,17231.147338,21495.605055,22641.145049,23245.343723,18753.509782,15930.263732,17156.752545
3,Kensington-Chinatown,16976.678837,17901.667985,18605.412736,19262.272571,19835.995274,19809.355502,18819.525394,15536.892711,12258.92575,12347.414696,15389.325749
4,University,15678.361229,19252.628629,18580.405821,17575.497451,20959.774558,18084.663494,16039.279822,10749.627121,9635.135324,13267.784594,12638.815474
5,Wellington Place,14563.666659,19863.477048,18449.407026,18377.850928,17217.391106,15994.500339,13132.217156,10500.0,10243.161371,11539.991868,10318.669747
6,West Humber-Clairville,11773.586301,10752.969013,9976.601333,10382.883427,10677.385682,13246.204644,11580.369881,10535.894463,10887.899082,14349.937018,15345.718473
7,Church-Wellesley,10537.896029,9462.219383,8805.031321,10555.079094,10913.50853,11541.336649,12069.048495,10149.015231,11426.989276,10355.59026,10101.142046
8,Yorkdale-Glen Park,10332.25269,8106.799436,8273.281555,9178.309658,9191.003879,10378.891429,11383.604147,10610.653799,9526.710134,11996.021961,14677.250902
9,York University Heights,10248.629801,8171.125362,9179.401441,9728.498705,9650.945217,10088.858393,10360.589262,10819.483939,9979.329973,11125.451828,13382.613892


In [67]:
def highest_rate_crime_percentage(file_path, num_neighbours):
    # Replace this with the actual data loading function
    _, df = data_etl(file_path, num_neighbours)
    
    # Calculate the total crime rate for each type per year
    yearly_crime_rate = df.groupby(['Year', 'Crime']).agg(
        total_rate=('Rate', 'sum')
    ).reset_index()
    
    # Find the crime type with the highest rate per year
    highest_rate_crime = yearly_crime_rate.loc[yearly_crime_rate.groupby('Year')['total_rate'].idxmax()].reset_index(drop=True)
    
    # Calculate the total rate for all crimes each year
    total_yearly_crime_rate = df.groupby('Year').agg(total_rate=('Rate', 'sum')).reset_index()
    
    # Merge the highest rate crime with the total yearly crime rate
    result = highest_rate_crime.merge(total_yearly_crime_rate, on='Year', suffixes=('_highest', '_total'))
    
    # Calculate the percentage
    result['percentage'] = (result['total_rate_highest'] / result['total_rate_total']) * 100
    
    # Select and rename columns
    result = result.rename(columns={
        'Crime': 'highest_rate_crime',
        'total_rate_highest': 'highest_rate_total',
        'total_rate_total': 'overall_total_rate'
    })
    result = result[['Year', 'highest_rate_crime', 'highest_rate_total', 'overall_total_rate', 'percentage']]
    
    return result

# Call the function
result_df = highest_rate_crime_percentage(file_path, num_neighbours)
result_df


Unnamed: 0,Year,highest_rate_crime,highest_rate_total,overall_total_rate,percentage
0,2014,Assault,96443.483544,262831.458666,36.694041
1,2015,Assault,103805.526954,261548.572843,39.688814
2,2016,Assault,105886.836661,257488.348332,41.122962
3,2017,Assault,106417.780532,265841.792909,40.030493
4,2018,Assault,107460.323219,277148.141194,38.773604
5,2019,Assault,111659.991059,290138.792459,38.485026
6,2020,Assault,95609.981582,264277.270422,36.177906
7,2021,Assault,100748.837433,245988.997499,40.956644
8,2022,Assault,111078.814699,282051.145885,39.382508
9,2023,Assault,127397.061081,315943.166996,40.322778


In [62]:
def exceptional_growth_rate(file_path, num_neighbours):
    # Replace this with the actual data loading function
    _, df = data_etl(file_path, num_neighbours)
    
    # Calculate the total crime rate for each type per year
    yearly_crime_rate = df.groupby(['Year', 'Crime']).agg(
        total_rate=('Rate', 'sum')
    ).reset_index()
    
    # Calculate the year-on-year growth rate for each crime type
    yearly_crime_rate['prev_rate'] = yearly_crime_rate.groupby('Crime')['total_rate'].shift(1)
    yearly_crime_rate['growth_rate'] = (yearly_crime_rate['total_rate'] - yearly_crime_rate['prev_rate']) / yearly_crime_rate['prev_rate']
    
    # Identify exceptionally high growth rates (greater than 30%)
    threshold = 0.30  # 30%
    exceptional_growth = yearly_crime_rate[yearly_crime_rate['growth_rate'] > threshold]
    
    return exceptional_growth

# Call the function
exceptional_growth_df = exceptional_growth_rate(file_path, num_neighbours)
exceptional_growth_df


Unnamed: 0,Year,Crime,total_rate,prev_rate,growth_rate
15,2015,Shooting,1691.807198,991.36294,0.706547
24,2016,Shooting,2261.075214,1691.807198,0.336485
37,2018,Auto Theft,25194.060106,19048.272379,0.322643
40,2018,Homicide,573.563413,385.29993,0.488615
67,2021,Homicide,470.003502,350.88375,0.339485
73,2022,Auto Theft,50062.382374,34290.199391,0.459962
80,2022,Theft from Vehicle,7416.576961,5402.52097,0.372799
