<a href="https://colab.research.google.com/github/KANAL1234/city-watch/blob/main/City_Watch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Project Introduction

Our project, City Watch, investigates patterns of criminal activity in Chicago using the Crimes - 2001 to Present dataset provided by the City of Chicago‚Äôs open data portal. The dataset contains over 8.4 million records of reported crimes spanning from 2001 to the present, including details such as the type of crime, location, date and time, arrest status, and FBI crime code.

The primary goal of our analysis is to uncover spatial and temporal crime trends, identify hotspots (areas and time periods with high concentrations of criminal activity), and predict potential future hotspots. By leveraging data analytics and machine learning, we aim to create insights that could help city planners, law enforcement, and the general public understand crime dynamics more effectively.

Key problems we are investigating include:
	1.	Crime Hotspot Detection: Which neighborhoods and time periods experience the highest frequency of crimes?
	2.	Temporal Patterns: How do crimes vary by season, time of day, and year?
	3.	Predictive Modeling: Can we forecast where and when future crimes are most likely to occur based on historical data?

Any Changes Since the Proposal

Since our initial project proposal, the scope of the project has evolved slightly based on feasibility and time constraints.

Removed or Modified Parts
	‚Ä¢	We initially planned to integrate socioeconomic data (e.g., income levels, education rates) to correlate with crime rates, but due to time and data integration challenges, this part was removed.
	‚Ä¢	The spatiotemporal neural network model was replaced with a simpler Random Forest classifier due to computational limitations.

Added Parts
	‚Ä¢	We added Kernel Density Estimation (KDE) visualization to complement DBSCAN clustering for hotspot detection.
	‚Ä¢	A CityWatch Dashboard prototype was added as an interactive visualization tool to display the analyzed results (heatmaps, time-series, etc.).


Data Preparation

The dataset was sourced from:
üîó Chicago Crimes - 2001 to PresentÔøº

Data preparation included several preprocessing and cleaning steps:
	1.	Handling Missing Data:
	‚Ä¢	Records missing critical fields such as latitude, longitude, or primary type were removed.
	‚Ä¢	Null values in non-critical fields (e.g., description) were filled with ‚ÄúUnknown.‚Äù
	2.	Data Filtering:
	‚Ä¢	We filtered out records from years with incomplete reporting (very recent months) for consistent temporal analysis.
	3.	Feature Engineering:
	‚Ä¢	Extracted hour, day of week, month, and year from the date column.
	‚Ä¢	Converted location descriptions into categorical variables.
	‚Ä¢	Created binary features for ‚ÄúArrest Made‚Äù and ‚ÄúDomestic Crime.‚Äù
	4.	Standardization:
	‚Ä¢	All coordinates were converted into a consistent spatial reference system (latitude/longitude).
	‚Ä¢	Crime types were standardized based on FBI code groups.

Exploratory Data Analysis (EDA)

Our EDA focused on identifying initial patterns and anomalies in the dataset through visual exploration and descriptive statistics.

Key Insights
	‚Ä¢	Crime Distribution: Theft, Battery, and Criminal Damage were the most common crime types.
	‚Ä¢	Temporal Trends:
	‚Ä¢	Crime frequency peaked during summer months (June‚ÄìAugust) and dropped in winter.
	‚Ä¢	Most crimes occurred between noon and midnight, aligning with high public activity hours.
	‚Ä¢	Spatial Patterns:
	‚Ä¢	Downtown Chicago and the South Side exhibited higher concentrations of crimes.
	‚Ä¢	Residential areas experienced more domestic-related crimes, while business districts had more thefts.
	‚Ä¢	Arrest Trends:
	‚Ä¢	Arrest rates varied significantly by crime type, with narcotics-related offenses showing the highest arrest ratio.

**Crime Hotspot Heatmap for October 2025**

Using the filtered data for October 2025, we will generate a heatmap to visualize the spatial distribution of crimes during this period. This will help identify potential hotspots.

In [None]:
import pandas as pd

# Specify the raw URL of the CSV file in the public GitHub repository
csv_file_url = 'https://media.githubusercontent.com/media/KANAL1234/city-watch/refs/heads/main/dataset.csv?token=AKBFS4QDVXIJRPOSJK6MM7TJBPWP6' # Replace with the actual raw URL of your CSV file

try:
    # Load the data from the CSV file URL into a pandas DataFrame
    crime_df = pd.read_csv(csv_file_url)

    # Assuming the date column is named 'Date' or similar, convert it to datetime objects
    # You might need to adjust the column name based on the actual data
    date_column_name = 'Date' # Replace with the actual date column name
    if date_column_name in crime_df.columns:
        # Specify the date format to avoid the UserWarning
        crime_df[date_column_name] = pd.to_datetime(crime_df[date_column_name], format='%m/%d/%Y %I:%M:%S %p', errors='coerce')

        # Prepare data for heatmap (assuming Latitude and Longitude columns exist)
        # You might need to adjust the column names based on the actual data
        latitude_column_name = 'Latitude' # Replace with the actual latitude column name
        longitude_column_name = 'Longitude' # Replace with the actual longitude column name

        if latitude_column_name in crime_df.columns and longitude_column_name in crime_df.columns:
             # Drop rows with missing lat/lon
             crime_location_df = crime_df.dropna(subset=[latitude_column_name, longitude_column_name])[[latitude_column_name, longitude_column_name]]
        else:
            print(f"Latitude or longitude column not found. Please check the column names: {latitude_column_name}, {longitude_column_name}")
            crime_location_df = pd.DataFrame() # Create an empty DataFrame if columns are missing

    else:
        print(f"Date column '{date_column_name}' not found in the data.")
        crime_location_df = pd.DataFrame() # Create an empty DataFrame if date column is missing


except Exception as e:
    print(f"Error loading or processing CSV data from URL: {e}")
    crime_df = pd.DataFrame()
    crime_location_df = pd.DataFrame()

# Now, the crime_location_df can be used in the heatmap generation cell

In [35]:
import folium
from folium.plugins import HeatMap
from IPython.display import display
import os

# Check if crime_location_df is not empty
if not crime_location_df.empty:
    # Create a base map centered around Chicago
    chicago_map = folium.Map(location=[41.8781, -87.6298], zoom_start=11)

    # Add the heatmap layer using the crime locations
    HeatMap(crime_location_df.values.tolist()).add_to(chicago_map)

    # Display the map directly
    display(chicago_map)

else:
    print("No crime data available to generate a heatmap.")