# Group Assignment: Bicycle Thefts in Toronto, Ontario

### Group 12: 
- Cao, Huy, h69cao, tripico@gmail.com
- Jain, Dherya, d63jain, jaindherya16@gmail.com
- Prajapati, Bhavin, b4prajap, bhavin.prajapati@gmail.com
- Richards, Colin, c23richa, colinrichards@protonmail.com
- Sarmast Sakhvidi, Sepideh, ssarmast, ssarmast@uwaterloo.ca
- Thakur, Chinmay, c2thakur, chinmaythakur27@gmail.com



# Dataset
<a href="https://open.toronto.ca/dataset/bicycle-thefts/" data-toc-modified-id="Bicycle-Thefts-City-of-Toronto-Open-Data-Portal">Bicycle Thefts - City of Toronto Open Data Portal</a>
<br>
URL: https://open.toronto.ca/dataset/bicycle-thefts/

# Key Questions
### When are bicycle offences most likely to occur?
- Analyze yearly, monthly, and hourly trends to identify peak times.
- Determine high-risk days of the week and seasonal patterns.

### Where are bicycle thefts most frequent?
- Identify hotspots based on police divisions and location types.
- Assess geographic concentration using longitude and latitude data.

### What types of bicycles are targeted?
- Investigate if certain makes, models, or higher-cost bicycles are more likely to be stolen.
- Analyze the impact of bicycle characteristics (speed, type, colour) on theft likelihood.



## Import Libraries
First we will import all the libraries we need to do our analysis

In [None]:
# Install libraries if needed by uncommenting
# !pip install numpy
# !pip install pandas
# !pip install matplotlib
# !pip install descartes
# !conda install -c conda-forge geopandas
# !pip install shapely
# !pip install mplcursors

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import descartes
import geopandas as gpd
import shapely
from shapely.geometry import Point, Polygon
from mplcursors import cursor

%matplotlib inline

## Read the Data
First we will read the data from our dataset which was obtained from the City of Toronto website.

In [None]:
# Read the dataset
bicycle_thefts = pd.read_csv('bicycle-thefts - 4326.csv', sep=',') 
print(bicycle_thefts)

## Clean the Data
Now we clean the data using techniques we learned from our course material.

In [None]:
# Remove redundant columns
bicycle_thefts_cleaned = bicycle_thefts.drop(['OCC_YEAR', 'OCC_MONTH', 'OCC_DOW', 'OCC_DAY', 'OCC_DOY'], axis=1)
bicycle_thefts_cleaned = bicycle_thefts_cleaned.drop(['REPORT_YEAR', 'REPORT_MONTH', 'REPORT_DOW', 'REPORT_DAY', 'REPORT_DOY'], axis=1)
bicycle_thefts_cleaned = bicycle_thefts_cleaned.drop(['geometry'], axis=1)

bicycle_thefts_cleaned

## Where are bicycle thefts most frequent?
Now we analysis where the occurances took place using the map metadata from our dataset

In [None]:
# Read the neighborhood shapefile data and plot
street_map = gpd.read_file("Neighbourhoods - 4326.shp")
# neighbourhoods = street_map.plot(column='AREA_NA7', figsize=(15,15), edgecolor='black', legend=False)

df = gpd.read_file('bicycle-thefts - 4326.geojson')
df.to_crs(epsg=7991, inplace=True)

geometry = [Point(xy) for xy in zip(bicycle_thefts_cleaned['LONG_WGS84'], bicycle_thefts_cleaned['LAT_WGS84'])]

geo_df = gpd.GeoDataFrame(df, #specify our data
                          crs='EPSG:7991', #specify our coordinate reference system
                          geometry=geometry) #specify the geometry list we created

fig, ax = plt.subplots(figsize=(15,15))
street_map.plot(ax=ax, alpha=0.4, color='white', edgecolor='black')

geo_df.plot(ax=ax,
            markersize=2,
            color='red',
            marker='o',
            label='Bike Thefts')

plt.legend(prop={'size':10})