##


---

# 📝 FBI Time Series Forecasting – Project Explanation

## Problem We’re Solving

The **Federal Bureau of Investigation (FBI)** and law enforcement agencies face a critical challenge:
👉 *How can we predict when and where crimes are likely to occur, so resources can be allocated effectively to prevent them?*

Crime is not random. It follows **patterns over time (temporal)** and **across locations (spatial)**.
If we can **forecast crime incidents** based on historical data, law enforcement can:

* Optimize **patrol schedules**
* Deploy officers to **high-risk neighborhoods**
* Install **preventive measures** (street lighting, CCTV) in the right places
* Provide insights to **policymakers & urban planners**

This project is about building a **predictive model** that estimates **monthly crime incidents** using past data.

---

## Data We Have



## 🧩 How They Fit Together

* The **training data** gives us a detailed view of past crimes: *when, where, and what type*.
* The **test data** gives us future records: *when and what type*, but **not how many crimes** — and that’s the forecasting challenge.

So the **goal of the project** is:
➡️ *Use the training data to learn patterns, then predict the missing “Incident\_Counts” in the test data.*

---

##  Why This Is Important

Predicting crimes is not just for police efficiency — it impacts **society as a whole**:

* **Public safety**: Reduces crime rates through preventive action.
* **Urban planning**: Helps city authorities place streetlights, CCTV, and community policing in high-risk areas.
* **Policy-making**: Governments can allocate budgets more effectively.
* **Community awareness**: Residents can be warned about crime trends in their areas.

---

## Approach / Methodology

The project uses a **data-driven machine learning approach**:

1. **Data Cleaning & Preprocessing** → Handle missing values, encode categories, balance dataset.
2. **Exploratory Data Analysis (EDA)** → Visualize trends (which months have more crime? which crime types are rising?).
3. **Feature Engineering** → Extract useful features from date/time (e.g., weekday, holiday, season).
4. **Modeling**:

   * **Time Series Models** → ARIMA, SARIMA for temporal patterns.
   * **Machine Learning Models** → XGBoost, Random Forest for structured prediction.
5. **Evaluation** → Measure performance using metrics like RMSE, MAE, R².
6. **Insights** → Identify hotspots, crime patterns, and make recommendations.

---



In [1]:
## importing libraries needed
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

In [4]:
## load dataset
df_test=pd.read_csv('FBI_Test.csv')  ## test data
df_train=pd.read_csv('FBI_Train.csv')  ## train data

In [6]:
df_train.head()

Unnamed: 0,TYPE,HUNDRED_BLOCK,NEIGHBOURHOOD,X,Y,Latitude,Longitude,HOUR,MINUTE,YEAR,MONTH,DAY,Date
0,Other Theft,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763,16.0,15.0,1999,5,12,12/05/1999
1,Other Theft,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763,15.0,20.0,1999,5,7,07/05/1999
2,Other Theft,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763,16.0,40.0,1999,4,23,23/04/1999
3,Other Theft,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763,11.0,15.0,1999,4,20,20/04/1999
4,Other Theft,9XX TERMINAL AVE,Strathcona,493906.5,5457452.47,49.269802,-123.083763,17.0,45.0,1999,4,12,12/04/1999


## 🔍 Training Data (`FBI_Train.csv`)

This dataset contains **historical crime records** with detailed attributes:

* **TYPE** → Category of crime (e.g., theft, assault, vehicle collision).
* **HUNDRED\_BLOCK** → Approximate street address of the incident.
* **NEIGHBOURHOOD** → Area where the crime happened.
* **X, Y** → Spatial coordinates (city grid system).
* **Latitude, Longitude** → Geographic coordinates.
* **HOUR, MINUTE** → Time of the incident.
* **YEAR, MONTH, DAY** → Date breakdown.
* **Date** → Full date (YYYY-MM-DD).

👉 This data is rich: it has **time, location, and type of crime**, making it ideal for both **time series forecasting** and **spatial crime analysis**.

---

In [5]:
df_test.head()

Unnamed: 0,YEAR,MONTH,TYPE,Incident_Counts
0,2013,6,Vehicle Collision or Pedestrian Struck (with I...,
1,2013,6,Theft of Vehicle,
2,2013,6,Theft of Bicycle,
3,2013,6,Theft from Vehicle,
4,2013,6,Other Theft,




## 🔍 Test Data (`FBI_Test.csv`)

This dataset is for making predictions and contains:

* **YEAR** → Year of the record.
* **MONTH** → Month of the record.
* **TYPE** → Category of crime (same as training set).
* **Incident\_Counts** → This is the **target variable** (currently missing, we need to predict it).

👉 The test data tells us *when and what type of crime* to predict, but not the actual count — that’s what our model must forecast.


## 
---

# 🔍 Exploratory Data Analysis (EDA)

EDA is about **understanding the data before modeling**. Think of it as detective work: you carefully examine the evidence (data) before solving the crime (forecasting problem 😉).

We’ll break EDA into **5 key stages**:

---

## 1. Data Quality Check

Before anything else, we check:

* **Missing values** → Are there gaps in the data (e.g., crimes without a date or type)?
* **Duplicates** → Are the same crime events logged multiple times?
* **Consistency** → Are dates valid? Are crime types spelled the same way (e.g., "Theft of Vehicle" vs "Vehicle Theft")?

👉 Why important?
A dirty dataset will lead to misleading predictions. Cleaning ensures our model learns **true patterns, not noise**.

---

## 2. Time-based Patterns

Since this is a **time series problem**, we want to see:

* **Trends** → Is crime increasing or decreasing over the years?
* **Seasonality** → Do crimes peak in certain months (e.g., summer thefts, winter assaults)?
* **Weekly/Daily cycles** → Do certain crimes happen more on weekends or nights?

👉 Why important?
If crimes have a seasonal cycle (say, more burglaries in December), our model can capture this and make **smarter forecasts**.

---

## 3. Crime Type Analysis

We explore:

* **Most common crime types** → e.g., theft may dominate the dataset.
* **Rare crime types** → some categories might have very few incidents.
* **Yearly trends by type** → e.g., car theft might be rising, while assaults might be stable.

👉 Why important?
Some crime categories may need **separate models**, because their patterns differ (e.g., vehicle theft ≠ domestic violence trends).

---

## 4. Geographical Patterns

Since we have **neighborhood + latitude/longitude**:

* **Hotspots** → which neighborhoods consistently report high crime?
* **Crime clusters** → are certain crimes localized (like theft in downtown)?
* **Spatial consistency** → do crime hotspots remain stable across years, or shift?

👉 Why important?
Spatial features help the model learn **where crimes are more likely**. Even if we’re only asked to predict counts (not locations), knowing hotspots helps interpret results.

---

## 5. Correlations and Feature Relationships

We study:

* **Correlation between features** → e.g., does crime type correlate with time of day?
* **Interactions** → Are thefts more common on weekends at night? Are collisions more frequent during weekdays at rush hour?

👉 Why important?
Understanding relationships between variables helps us **engineer better features** (like “Weekend flag” or “Rush hour flag”), which improve predictions.



## Outcome

The end goal is to deliver:

* A **forecasting model** that predicts crime incidents.
* **Visual dashboards/plots** showing crime trends.
* **Actionable recommendations** for resource allocation and prevention.

---

✨ So, in simple terms:
This project is about **teaching a computer to learn crime patterns from the past, so it can predict what might happen in the future**, and help make cities safer.

---

####