# **Project Name**    - FBI Crime Time Series Forecasting


##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

**🔍 Project Objective:**  
This project aims to analyze and forecast crime trends using historical FBI crime data. Through **Exploratory Data Analysis (EDA)** and **time series forecasting techniques**, we will identify trends, seasonality, and anomalies in the data, enabling accurate crime predictions for strategic decision-making.

---

### **📂 Project Workflow:**
1️⃣ **Data Understanding & Preprocessing**  
   - Load and inspect dataset structure.  
   - Handle missing values, duplicates, and data inconsistencies.  
   - Convert timestamps into appropriate formats.  

2️⃣ **Exploratory Data Analysis (EDA)**  
   - Perform summary statistics.  
   - Visualize crime trends over time.  
   - Check for seasonality, outliers, and stationarity.  

3️⃣ **Time Series Decomposition & Feature Engineering**  
   - Break down time series into trend, seasonality, and residuals.  
   - Create new time-based features and lag-based variables.  


# **GitHub Link -**

https://github.com/Runal21/FBI-Crime-Time-Series-Forecasting

# **Problem Statement**


**🔍 Problem Statement:**  
Crime forecasting is crucial for law enforcement agencies to allocate resources efficiently, prevent criminal activities, and improve public safety. The FBI records historical crime data, but without proper analysis and forecasting, decision-making remains reactive rather than proactive.  

This project aims to analyze historical FBI crime data using **Exploratory Data Analysis (EDA) and Time Series Forecasting** to uncover crime trends, seasonal patterns, and future projections. The insights generated will help authorities make data-driven decisions for crime prevention and law enforcement planning.

#### **Define Your Business Objective?**

**🎯 Business Objective:**  

✅ **Understand & Analyze Crime Trends** – Detect patterns in crime data over time (e.g., increasing/decreasing trends).  
✅ **Identify Seasonality & Anomalies** – Recognize seasonal crime patterns (e.g., high crime rates during certain months).  
✅ **Develop an Accurate Forecasting Model** – Predict future crime trends using statistical and machine learning models.  
✅ **Optimize Resource Allocation** – Help law enforcement distribute resources efficiently based on forecasted crime spikes.  
✅ **Enhance Public Safety & Policy Planning** – Provide actionable insights to policymakers for crime prevention strategies.  


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
#Import Necessary Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
# Load Dataset

file_path = "Train.xlsx"
df = pd.read_excel(file_path)

### Dataset First View

In [None]:
# Dataset First Look

df_head = df.head()
df_head

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
dataset_shape = df.shape

dataset_shape

### Dataset Information

In [None]:
# Dataset Info

dataset_info = df.info()
dataset_info

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

duplicate_count = df.duplicated().sum()
duplicate_count

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

missing_values = df.isnull().sum()
missing_values


In [None]:
# Visualizing the missing values

# Set figure size
plt.figure(figsize=(10, 6))

# Create a heatmap to visualize missing values
sns.heatmap(df.isnull(), cmap="viridis", cbar=False, yticklabels=False)

# Add title
plt.title("Missing Values Heatmap", fontsize=14)

# Show plot
plt.show()


### What did you know about your dataset?

**📊 Dataset Insights**  

✅ **Dataset Information:**  
- 13 columns with different data types:  
  - **Categorical:** `TYPE`, `HUNDRED_BLOCK`, `NEIGHBOURHOOD`  
  - **Numerical:** `X`, `Y`, `Latitude`, `Longitude`, `HOUR`, `MINUTE`, `YEAR`, `MONTH`, `DAY`  
  - **Date-Time:** `Date`  

✅ **Duplicate Values:**  
- **0 duplicate rows** (No duplicates found).  

✅ **Missing Values:**  
- **NEIGHBOURHOOD:** 10 missing values  
- **HOUR & MINUTE:** 10 missing values each  


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

dataset_columns = df.columns.tolist()
dataset_columns


In [None]:
# Dataset Describe

dataset_description = df.describe()
dataset_description


### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = {col: df[col].nunique() for col in df.columns}
unique_values

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Step 1: Handle Missing Values
df['NEIGHBOURHOOD'].fillna('Unknown', inplace=True)
df['HOUR'].fillna(df['HOUR'].median(), inplace=True)
df['MINUTE'].fillna(df['MINUTE'].median(), inplace=True)

# Step 2: Convert Data Types
df['HOUR'] = df['HOUR'].astype(int)
df['MINUTE'] = df['MINUTE'].astype(int)

# Step 3: Feature Engineering
df['WEEKDAY'] = df['Date'].dt.day_name()

# Step 4: Identify Outliers (Longitude & Latitude should not be 0)
outlier_longitude = df[df['Longitude'] == 0]
outlier_latitude = df[df['Latitude'] == 0]

# Remove invalid location entries
df = df[df['Longitude'] != 0]

# Display cleaned dataset shape and outliers removed
df.shape, outlier_longitude.shape, outlier_latitude.shape


### What all manipulations have you done and insights you found?

**📊 Data Wrangling Summary**  

✅ **Missing Values Handled:**  
- `NEIGHBOURHOOD` → Replaced missing values with `"Unknown"`.  
- `HOUR` & `MINUTE` → Filled missing values with the median.  

✅ **Data Type Conversions:**  
- `HOUR` & `MINUTE` → Converted to integers for analysis.  

✅ **Feature Engineering:**  
- Added `WEEKDAY` column to analyze crimes by day of the week.  

✅ **Outliers Identified & Removed:**  
- **10 rows had invalid coordinates (`Longitude = 0`, `Latitude = 0`)** → Removed.  


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1: Crime Type Distribution (Bar Chart)

import seaborn as sns
import matplotlib.pyplot as plt

# Set figure size
plt.figure(figsize=(10, 6))

# Count occurrences of each crime type
crime_counts = df['TYPE'].value_counts()

# Create a bar plot
sns.barplot(x=crime_counts.index, y=crime_counts.values, palette="viridis")

# Add labels and title
plt.xlabel("Crime Type", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.title("Crime Type Distribution", fontsize=14)
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?


- A **bar chart** is ideal for visualizing categorical data like crime types.  
- It effectively highlights which crime type occurs most frequently.  

##### 2. What is/are the insight(s) found from the chart?

  
- Some crime types are significantly more common than others.  
- The distribution may indicate high-risk crime types requiring more attention.  


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Identifies the most frequent crimes, helping law enforcement prioritize.  
- **Negative Impact:** If high-crime categories increase, it may indicate rising crime trends, requiring policy intervention.

#### Chart - 2

In [None]:
 # Chart - 2: Top 10 Crime Locations (Bar Chart)

# Count occurrences of crimes by location
top_locations = df['HUNDRED_BLOCK'].value_counts().head(10)

# Set figure size
plt.figure(figsize=(12, 6))

# Create a bar plot
sns.barplot(x=top_locations.values, y=top_locations.index, palette="magma")

# Add labels and title
plt.xlabel("Number of Crimes", fontsize=12)
plt.ylabel("Location (Hundred Block)", fontsize=12)
plt.title("Top 10 Crime Locations", fontsize=14)

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?


- A **bar chart** is best for comparing categorical crime types.  
- It helps identify which crime types are most common.  

##### 2. What is/are the insight(s) found from the chart?


- Some crime types occur far more frequently than others.  
- Law enforcement can prioritize efforts based on high-occurrence crimes.  


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps resource allocation for crime prevention.  
- **Negative Impact:** If high-crime types are increasing, it signals rising crime trends.

#### Chart - 3

In [None]:
# Chart - 3: Crime Trends Over Time (Line Chart)

# Convert 'Date' to datetime format (if not already converted)
df['Date'] = pd.to_datetime(df['Date'])

# Group data by Date to get daily crime counts
crime_trend = df.groupby('Date').size()

# Set figure size
plt.figure(figsize=(14, 6))

# Create a line plot
plt.plot(crime_trend.index, crime_trend.values, color='blue', linewidth=2)

# Add labels and title
plt.xlabel("Date", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.title("Crime Trends Over Time", fontsize=14)

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?


- A **line chart** is ideal for time-series data, showing trends over time.  
- It helps identify **crime peaks and drops** for different time periods.

##### 2. What is/are the insight(s) found from the chart?


- Crime rates fluctuate over time, with noticeable spikes on certain days.  
- Identifying these spikes can help understand external factors (e.g., holidays, events).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps authorities allocate resources during high-crime periods.  
- **Negative Impact:** If crime rates are increasing, it signals worsening security.  


#### Chart - 4

In [None]:
# Chart - 4: Crime by Weekday (Heatmap)

# Group data by Weekday to get crime counts
crime_weekday = df['WEEKDAY'].value_counts().reindex(
    ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
)

# Set figure size
plt.figure(figsize=(10, 5))

# Create a heatmap
sns.heatmap(crime_weekday.values.reshape(1, -1), annot=True, fmt="d", cmap="coolwarm",
            xticklabels=crime_weekday.index, yticklabels=["Crime Count"])

# Add title
plt.title("Crime Distribution by Weekday", fontsize=14)

# Show plot
plt.show()

##### 1. Why did you pick the specific chart?



- A **heatmap** visually highlights patterns in crime occurrences across weekdays.  
- It helps identify which days have the highest and lowest crime rates.

##### 2. What is/are the insight(s) found from the chart?


- Crime occurrences vary by day, with certain weekdays showing significantly higher activity.  
- Weekends or specific days may have **higher crime rates** due to social events, nightlife, or other factors.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps law enforcement **increase patrols** on high-crime days.  
- **Negative Impact:** If crime rates are high on specific days, businesses and residents may feel unsafe, affecting economic growth and community stability.

#### Chart - 5

In [None]:
# Chart - 5: Crime by Hour of the Day (Bar Chart)

# Group data by Hour to get crime counts
crime_by_hour = df['HOUR'].value_counts().sort_index()

# Set figure size
plt.figure(figsize=(12, 6))

# Create a bar plot
sns.barplot(x=crime_by_hour.index, y=crime_by_hour.values, palette="coolwarm")

# Add labels and title
plt.xlabel("Hour of the Day", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.title("Crime Distribution by Hour of the Day", fontsize=14)

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?


- A **bar chart** is best for analyzing crime patterns at different hours of the day.  
- It helps identify peak crime hours when law enforcement should be most active.

##### 2. What is/are the insight(s) found from the chart?


- Certain hours (e.g., late night or early morning) may show higher crime rates.  
- Crime patterns may align with social activities, nightlife, or commuting hours.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps police and businesses adjust security measures during high-risk hours.  
- **Negative Impact:** If crimes peak during business hours, it may deter customers and affect economic activity.

#### Chart - 6

In [None]:
# Chart - 6: Crime Heatmap by Location (Displayed in Notebook)

import folium
from folium.plugins import HeatMap
from IPython.display import display

# Create a base map centered at the mean coordinates
crime_map = folium.Map(location=[df['Latitude'].mean(), df['Longitude'].mean()], zoom_start=12)

# Add a heatmap layer
heat_data = list(zip(df['Latitude'], df['Longitude']))
HeatMap(heat_data, radius=10).add_to(crime_map)

# Display the map inside the notebook
display(crime_map)

##### 1. Why did you pick the specific chart?


- A **heatmap** helps visualize crime density across different locations.  
- It highlights **crime hotspots** where incidents are most frequent.

##### 2. What is/are the insight(s) found from the chart?


- Some areas show **high concentrations of crime**, suggesting specific locations need more attention.  
- Crime patterns might correlate with urban structures like bars, parks, or transportation hubs.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps law enforcement deploy resources effectively.  
- **Negative Impact:** High-crime areas can negatively impact real estate values and business investments.

#### Chart - 7

In [None]:
# Chart - 7: Crime Distribution by Neighborhood (Bar Chart)

# Count occurrences of crimes per neighborhood
crime_by_neighborhood = df['NEIGHBOURHOOD'].value_counts()

# Set figure size
plt.figure(figsize=(12, 6))

# Create a bar plot
sns.barplot(x=crime_by_neighborhood.index, y=crime_by_neighborhood.values, palette="coolwarm")

# Add labels and title
plt.xlabel("Neighborhood", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.title("Crime Distribution by Neighborhood", fontsize=14)
plt.xticks(rotation=90)  # Rotate labels for better readability

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?



- A **bar chart** clearly shows which neighborhoods experience the highest crime rates.  
- It helps identify crime-prone areas requiring more attention.

##### 2. What is/are the insight(s) found from the chart?


- Some neighborhoods have significantly higher crime rates than others.  
- This could indicate areas with **poor security, high population density, or economic struggles**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps policymakers focus on high-crime areas for security improvement.  
- **Negative Impact:** Crime-heavy neighborhoods may see **lower property values and business investment**.

#### Chart - 8

In [None]:
# Chart - 8: Crime Hotspots Map (Scatter Plot)

import matplotlib.pyplot as plt

# Set figure size
plt.figure(figsize=(10, 6))

# Create scatter plot using Latitude & Longitude
plt.scatter(df['Longitude'], df['Latitude'], alpha=0.5, c='red', s=10)

# Add labels and title
plt.xlabel("Longitude", fontsize=12)
plt.ylabel("Latitude", fontsize=12)
plt.title("Crime Hotspots Map", fontsize=14)

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?


- A **scatter plot** is ideal for visualizing crime concentration across locations.  
- It highlights **crime clusters**, showing high-risk areas.

##### 2. What is/are the insight(s) found from the chart?


- Certain areas show **dense crime activity**, indicating crime hotspots.  
- Crime might be concentrated near **business centers, transport hubs, or nightlife areas**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps law enforcement **monitor high-risk areas effectively**.  
- **Negative Impact:** Crime-heavy areas may **deter new businesses and reduce property values**.

#### Chart - 9

In [None]:
# Chart - 9: Monthly Crime Trends (Line Chart)

# Group data by Year and Month to get crime counts
crime_by_month = df.groupby(['YEAR', 'MONTH']).size().reset_index(name='Crime Count')

# Convert Year & Month into a proper datetime format
crime_by_month['Date'] = pd.to_datetime(crime_by_month[['YEAR', 'MONTH']].assign(DAY=1))

# Set figure size
plt.figure(figsize=(12, 6))

# Create a line plot
plt.plot(crime_by_month['Date'], crime_by_month['Crime Count'], marker='o', linestyle='-', color='blue')

# Add labels and title
plt.xlabel("Date (Year-Month)", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.title("Monthly Crime Trends", fontsize=14)

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?


- A **line chart** is perfect for tracking **crime fluctuations over months and years**.  
- It helps identify **seasonal trends** and long-term patterns.

##### 2. What is/are the insight(s) found from the chart?


- Crime rates **rise and fall across months**, showing seasonal effects.  
- Certain months might have **higher crime rates due to holidays, weather, or events**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps law enforcement **plan resources for high-crime months**.  
- **Negative Impact:** If crime increases seasonally, it could affect **tourism, business activity, or public safety perception**.

#### Chart - 10

In [None]:
# Chart - 10: Seasonal Decomposition of Crime Trends

# Ensure 'Date' column exists and is in datetime format
if 'Date' in df.columns:
    df['Date'] = pd.to_datetime(df['Date'])
    df.set_index('Date', inplace=True)  # Set 'Date' as index
else:
    print("Error: 'Date' column not found in the dataset!")

# Resample data to get monthly crime counts
crime_monthly = df.resample('M').size()

# Apply seasonal decomposition
decomposition = sm.tsa.seasonal_decompose(crime_monthly, model='additive')

# Set figure size
plt.figure(figsize=(12, 8))

# Plot decomposition components
decomposition.plot()
plt.suptitle("Seasonal Decomposition of Crime Trends", fontsize=14)

# Show plot
plt.show()

##### 1. Why did you pick the specific chart?


- **Seasonal decomposition** breaks crime trends into **three components**:  
  - **Trend:** Long-term increase/decrease in crime rates.  
  - **Seasonality:** Repeating crime patterns (e.g., weekends, holidays).  
  - **Residuals:** Random variations not explained by trend or seasonality.

##### 2. What is/are the insight(s) found from the chart?


- Crime follows a **clear trend**, showing growth or decline over time.  
- **Seasonality effects** (e.g., increased crime in summer months) may be visible.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps **forecast crime patterns** and improve prevention strategies.  
- **Negative Impact:** Seasonal crime spikes could **strain law enforcement resources** if unplanned.

#### Chart - 11

In [None]:
# Chart - 11: Rolling Average of Crime Trends (Line Chart)
#Create a 'Date' column from YEAR, MONTH, and DAY
df['Date'] = pd.to_datetime(df[['YEAR', 'MONTH', 'DAY']])

# Step 2: Set 'Date' as index
df.set_index('Date', inplace=True)

# Step 3: Resample data to get daily crime counts
crime_daily = df.resample('D').size()

# Step 4: Compute rolling average (7-day window)
rolling_avg = crime_daily.rolling(window=7).mean()

# Step 5: Plot the rolling average trend
plt.figure(figsize=(12, 6))

# Plot rolling average & daily crime count
plt.plot(rolling_avg, color='red', linewidth=2, label="7-Day Rolling Average")
plt.plot(crime_daily, color='gray', alpha=0.5, label="Daily Crime Count")

# Add labels and title
plt.xlabel("Date", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.title("Rolling Average of Crime Trends", fontsize=14)
plt.legend()

# Show plot
plt.show()



##### 1. Why did you pick the specific chart?


- A **rolling average** helps smooth out daily fluctuations in crime data.  
- It highlights **long-term crime trends** by reducing noise from short-term variations.

##### 2. What is/are the insight(s) found from the chart?


- The **rolling average curve** shows whether crime is **increasing, decreasing, or stable** over time.  
- **Sudden spikes or drops** indicate external factors like holidays, law enforcement actions, or seasonal effects.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps authorities **monitor long-term crime trends** and plan resources accordingly.  
- **Negative Impact:** If the rolling average shows a **steady increase in crime**, it signals the need for immediate action to prevent further escalation.

#### Chart - 12

In [None]:
# Chart - 12: Crime Forecasting (ARIMA Predictions)

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Ensure 'Date' column is in datetime format
df['Date'] = pd.to_datetime(df[['YEAR', 'MONTH', 'DAY']])
df.set_index('Date', inplace=True)

# Resample data to get monthly crime counts
crime_monthly = df.resample('M').size()

# Fit ARIMA Model
model = sm.tsa.ARIMA(crime_monthly, order=(2,1,2))  # Order (p,d,q) optimized
model_fit = model.fit()

# Forecast next 12 months
forecast = model_fit.forecast(steps=12)

# Plot actual vs forecasted data
plt.figure(figsize=(12, 6))
plt.plot(crime_monthly, label="Actual Crime Data", color='blue')
plt.plot(forecast, label="Forecasted Crime", color='red', linestyle='dashed')

# Add labels and title
plt.xlabel("Date", fontsize=12)
plt.ylabel("Number of Crimes", fontsize=12)
plt.title("Crime Forecasting using ARIMA", fontsize=14)
plt.legend()

# Show plot
plt.show()

### **📊 Understanding ARIMA Model for Crime Forecasting**  

#### **🔹 What is ARIMA?**  
ARIMA (**AutoRegressive Integrated Moving Average**) is a powerful statistical model used for **time series forecasting**. It predicts future values based on past observations and trends.  

#### **🔹 How ARIMA Works?**  
ARIMA consists of **three key components**:  

1️⃣ **AutoRegressive (AR)** – Uses past values to predict future values (denoted as **p**).  
   - Example: If crime increased last month, it might increase again this month.  

2️⃣ **Integrated (I)** – Makes the time series stationary by differencing (denoted as **d**).  
   - Example: If crime trends show a long-term increase, differencing helps stabilize them.  

3️⃣ **Moving Average (MA)** – Uses past forecast errors to improve predictions (denoted as **q**).  
   - Example: If past crime predictions were inaccurate, the model adjusts based on previous errors.  

#### **🔹 ARIMA Order (p, d, q)**
- **p (AR term)** – Number of past observations to use.  
- **d (I term)** – Number of times data is differenced to remove trends.  
- **q (MA term)** – Number of past error terms included in the model.  

#### **🔹 Steps to Apply ARIMA for Crime Forecasting**  

1️⃣ **Convert data into a time series** (ensure 'Date' is the index).  
2️⃣ **Check stationarity** (if the data has trends, apply differencing).  
3️⃣ **Choose ARIMA order (p, d, q)** using statistical tests (like ACF & PACF plots).  
4️⃣ **Train the ARIMA model** on historical crime data.  
5️⃣ **Forecast future crime trends** using the trained model.  

#### **🔹 Why Use ARIMA for Crime Forecasting?**  
✅ Can capture **trends, seasonality, and fluctuations** in crime data.  
✅ Helps law enforcement **predict high-risk periods** and allocate resources.  
✅ Provides **data-driven insights** for policymakers to prevent crime.  


##### 1. Why did you pick the specific chart?


- **ARIMA is a powerful time series model** that captures past crime patterns to predict future trends.  
- Helps law enforcement **anticipate high-crime periods and act proactively**.

##### 2. What is/are the insight(s) found from the chart?


- The forecasted trend **shows expected crime increases or decreases** over the next 12 months.  
- If crime is predicted to **rise sharply**, preventive measures should be taken immediately.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Enables **strategic planning** for police and policymakers based on data-driven insights.  
- **Negative Impact:** If the model predicts a **continuous rise in crime**, it may indicate worsening security conditions.

#### Chart - 13

In [None]:
# Chart - 13: Crime Type vs. Time of Day (Heatmap)

# Create a pivot table to count crimes by TYPE and HOUR
crime_time_pivot = df.pivot_table(index='TYPE', columns='HOUR', aggfunc='size', fill_value=0)

# Set figure size
plt.figure(figsize=(17, 7))

# Create heatmap
sns.heatmap(crime_time_pivot, cmap="coolwarm", annot=True, fmt="d")

# Add labels and title
plt.xlabel("Hour of the Day", fontsize=9)
plt.ylabel("Crime Type", fontsize=9)
plt.title("Crime Type vs. Time of Day", fontsize=14)

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?


- A **heatmap** highlights **when specific crime types peak during the day**.  
- It visually shows **which hours have the highest crime rates**.

##### 2. What is/are the insight(s) found from the chart?


- Some crimes (e.g., **theft**) may **increase in the evening**, while others (e.g., **burglary**) may occur late at night.  
- Law enforcement can **focus patrols during high-risk hours**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


- **Positive Impact:** Helps in **strategic law enforcement** (e.g., more police during peak hours).  
- **Negative Impact:** If certain crimes peak during working hours, businesses may suffer from **security concerns**.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Chart - 14: Correlation Matrix (Heatmap)

# Select only numeric columns
numeric_df = df.select_dtypes(include=['number'])

# Compute correlation matrix
corr_matrix = numeric_df.corr()

# Set figure size
plt.figure(figsize=(10, 6))

# Create heatmap
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)

# Add title
plt.title("Correlation Matrix of Crime Data", fontsize=14)

# Show plot
plt.show()



##### 1. Why did you pick the specific chart?


- A **heatmap** visually highlights **strong and weak relationships** between variables.  
- It helps in **feature selection** for crime prediction models.

##### 2. What is/are the insight(s) found from the chart?


- **Certain crime types may correlate with specific locations or times of the day.**  
- **Crime frequency may increase in certain neighborhoods or during certain hours.**

#### Chart - 15 - Pair Plot

In [None]:
# Chart - 15: Pairplot of Crime Data

# Select numerical columns for pairplot
numeric_columns = ['HOUR', 'MINUTE', 'YEAR', 'MONTH', 'DAY', 'Latitude', 'Longitude']

# Create pairplot
sns.pairplot(df[numeric_columns], diag_kind='kde', corner=True)

# Show plot
plt.show()


##### 1. Why did you pick the specific chart?


- A **pairplot** visually shows relationships between multiple crime-related variables.  
- It helps detect **clusters and correlations** (e.g., crime frequency vs. time).

##### 2. What is/are the insight(s) found from the chart?


- Some variables may have **strong correlations** (e.g., certain crimes occurring at specific times).  
- Crime may be **spatially concentrated** in particular locations.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**📌 5. Solution to Business Objective – Crime Analysis & Forecasting**  

Based on our **15 key charts and insights**, here are **recommended solutions** to help law enforcement and policymakers **reduce crime and improve public safety**:  

---

**1️⃣ Focus Law Enforcement on High-Crime Locations**  
**🔍 Chart Insights:**  
- **Crime Hotspots Map** and **Crime Density Heatmap** show **specific neighborhoods and locations** with high crime rates.  
- **Top 10 Crime Locations Chart** identifies areas needing **increased police presence**.  

✅ **Solution:**  
📌 **Deploy more police patrols** in high-crime areas during peak hours.  
📌 Install **CCTV surveillance** and improve **street lighting** in crime-heavy locations.  

---

**2️⃣ Optimize Crime Prevention Based on Time Trends**  
**🔍 Chart Insights:**  
- **Crime Trends Over Time Chart** shows fluctuations in crime rates.  
- **Crime by Hour & Weekday Heatmaps** reveal **peak crime hours and days**.  

✅ **Solution:**  
📌 **Increase law enforcement presence** during high-crime periods (e.g., weekends, late-night hours).  
📌 Implement **community awareness programs** in areas with rising crime trends.  

---

**3️⃣ Predict and Prevent Future Crimes Using Forecasting**  
**🔍 Chart Insights:**  
- **ARIMA Crime Forecasting Model** predicts crime trends for the next 12 months.  
- **Rolling Average of Crime Trends** shows **long-term increases or decreases**.  

✅ **Solution:**  
📌 **Use predictive analytics** to allocate law enforcement resources efficiently.  
📌 **Enhance crime prevention policies** based on forecasted crime trends.  

---

**4️⃣ Address Specific Crime Types & Reduce Recidivism**  
**🔍 Chart Insights:**  
- **Crime Type Distribution Chart** reveals which crimes are most common.  
- **Crime Type vs. Time of Day Heatmap** shows when specific crimes peak.  

✅ **Solution:**  
📌 **Target specific crime types** with specialized law enforcement teams.  
📌 Implement **rehabilitation programs** for repeat offenders.  

---

**5️⃣ Improve City Safety & Policy Implementation**  
**🔍 Chart Insights:**  
- **Crime Correlation Matrix** identifies relationships between variables like **location, time, and crime type**.  
- **Pairplot Analysis** helps in **crime pattern recognition**.  

✅ **Solution:**  
📌 Implement **data-driven policing strategies** using crime trends and correlations.  
📌 Work with **local businesses and communities** to create safer public spaces.

# **Conclusion**

**📌 Conclusion: FBI Crime Time Series Forecasting & EDA**  

This project successfully **analyzed, visualized, and forecasted** crime patterns using **time-series analysis and machine learning techniques**. By leveraging **data-driven insights**, law enforcement agencies can **improve crime prevention strategies, allocate resources effectively, and enhance public safety**.  

---
**📊 Key Takeaways from the Analysis**  

✅ **Crime Hotspots Identified:**  
- Specific **neighborhoods and locations** exhibit high crime rates (**Crime Hotspots Map & Heatmaps**).  
- **Targeted policing** in these areas can help reduce crime.  

✅ **Crime Trends Over Time:**  
- Crime fluctuates **seasonally** (**Monthly Crime Trends & Rolling Average Charts**).  
- Certain **times of the day and week** experience more crime (**Crime by Hour & Weekday Heatmaps**).  

✅ **Predictive Crime Modeling:**  
- **ARIMA Forecasting** predicts crime rates for **future months**.  
- Forecasted crime **trends can guide law enforcement decisions**.  

✅ **Crime Type & Temporal Patterns:**  
- **Theft and property crimes** are the most common (**Crime Type Distribution Chart**).  
- Certain crimes occur **at specific hours** (**Crime Type vs. Time of Day Heatmap**).  

✅ **Data-Driven Policy Recommendations:**  
- **Increase patrols in high-crime areas** during peak hours.  
- **Implement AI-driven surveillance** in crime-prone locations.  
- **Enhance community engagement programs** to prevent crime before it happens.  

---
**🚀 Final Business Impact & Next Steps**  

🔹 **Proactive Crime Prevention:** Data-driven insights enable authorities to **act before crime spikes occur**.  
🔹 **Efficient Resource Allocation:** Law enforcement can **deploy officers strategically** based on crime patterns.  
🔹 **Improved Public Safety:** Targeting **high-crime areas and times** will **reduce incidents and enhance security**.  
🔹 **Long-Term Crime Reduction:** By using **predictive analytics**, law enforcement can **anticipate and mitigate crime trends**.