# **Project Name**    - FBI Time Series Forecasting



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Name              -** Madhu Y

# **Project Summary -**

This project, titled FBI Crime Incident Time Series Forecasting, is a strategic data analytics initiative designed to predict crime patterns and enhance public safety by anticipating crime trends and facilitating strategic resource allocation.

The core focus is to develop a robust predictive model that estimates the number of crime incidents on a monthly basis. The model leverages detailed, granular data that captures both spatial and temporal patterns. This data includes crime types, geographical coordinates (latitude and longitude), neighborhood details, and time stamps down to the day and hour level.

The project employs advanced machine learning and time series techniques, utilizing libraries such as Pandas, NumPy, Scikit-Learn, Statsmodels (for ARIMA/SARIMA), and XGBoost.

The final output provides actionable insights for multiple stakeholders:

Law Enforcement: Can optimize patrol schedules and strategically allocate personnel and resources to high-risk areas.

Urban Planners/Policy Makers: Can guide the placement of public safety measures, such as street lighting and surveillance cameras, and inform public awareness campaigns and community policing initiatives.

# **Problem Statement**


Develop a robust, data-driven predictive model to accurately forecast the number of crime incidents on a monthly basis, segmented by crime type. The model must leverage granular spatial (location) and temporal (time stamp) data to identify when and where crimes are most likely to occur, providing law enforcement with the advanced tools necessary to allocate resources strategically and implement proactive measures to prevent criminal activities.

#### **Define Your Business Objective?**

The central business objective is to deliver actionable predictive intelligence to stakeholders (law enforcement, urban planners, policy makers) to achieve two primary outcomes:

Optimization of Resources: To enable law enforcement to optimize patrol schedules and allocate personnel more efficiently, focusing resources on high-risk times and locations.

Enhancement of Public Safety: To guide the placement of public safety measures, such as street lighting and surveillance cameras, and inform community-based initiatives to create safer and more resilient communities.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
df = pd.read_csv('/content/Test (2).csv')

### Dataset First View

In [None]:
display(df.head())

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

print(df.shape)

### Dataset Information

In [None]:
df.info()

#### Duplicate Values

In [None]:
# Duplicate Values

print(df.duplicated().sum())

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values
print(df.isnull().sum())

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

The dataset is highly granular, containing detailed information on individual crime occurrences for a Time Series Forecasting project.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(df.columns)

In [None]:
# Dataset Describe

### Variables Description

YEAR: The year of the incident. This is an integer variable.
MONTH: The month of the incident. This is an integer variable.
TYPE: The type of incident. This is a categorical variable (object type).
Incident_Counts: The count of incidents. This variable appears to be entirely missing (NaN) and is of float type

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for col in df.columns:
    print(f"Column: {col}")
    unique_values = df[col].unique()
    print(f"Number of unique values: {len(unique_values)}")
    if len(unique_values) < 20:
        print(f"Unique values: {unique_values}")
    print("-" * 30)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Group by YEAR, MONTH, and TYPE and count incidents
df_analysis = df.groupby(['YEAR', 'MONTH', 'TYPE']).size().reset_index(name='Incident_Counts')

# Display the first few rows of the new dataframe
display(df_analysis.head())

# Display the info of the new dataframe
df_analysis.info()

### What all manipulations have you done and insights you found?

Grouped the data: I grouped the original DataFrame df by the 'YEAR', 'MONTH', and 'TYPE' columns.
Counted incidents: For each unique combination of year, month, and incident type, I counted the number of occurrences. This count represents the 'Incident_Counts' for that specific type of incident in that month and year.
Created a new DataFrame: The result of this grouping and counting was stored in a new DataFrame called df_analysis. The original 'Incident_Counts' column, which was entirely missing, has been effectively replaced by these calculated counts.
Insight: The key insight from this step is that the incident counts were not explicitly provided but could be derived from the granularity of the existing data. By grouping and counting, we have created a meaningful time series dataset where each row represents a specific type of incident in a given month and year with its corresponding count. This allows us to proceed with time series analysis and visualization based on these derived incident counts.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

# Create a datetime column for easier plotting
df_analysis['Date'] = pd.to_datetime(df_analysis[['YEAR', 'MONTH']].assign(DAY=1))

# Group by date and sum the incident counts
monthly_counts = df_analysis.groupby('Date')['Incident_Counts'].sum().reset_index()

# Plot the total incidents over time
plt.figure(figsize=(12, 6))
sns.lineplot(data=monthly_counts, x='Date', y='Incident_Counts')
plt.title('Total FBI Incidents Over Time (Monthly)')
plt.xlabel('Date')
plt.ylabel('Number of Incidents')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I picked the line plot because it is the most appropriate chart for visualizing the trend of a variable (total incident counts) over a continuous time period (months). It clearly shows how the number of incidents changes from one month to the next, making trends and patterns easily discernible.

##### 2. What is/are the insight(s) found from the chart?

Based on the line plot of total FBI incidents over time, the primary insight is that there is no significant variation in the total number of incidents per month across the observed period (2012-2013). The line appears relatively flat, suggesting a stable trend in the overall number of reported incidents each month.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The stable trend provides a crucial baseline. Its positive or negative impact depends on the specific business objectives and how this stability compares to desired outcomes or the effectiveness of implemented strategies.

#### Chart - 2

In [None]:
# Chart - 2 visualization code

# Plot the trend for each incident type
plt.figure(figsize=(14, 7))
sns.lineplot(data=df_analysis, x='Date', y='Incident_Counts', hue='TYPE')
plt.title('FBI Incident Counts Over Time by Type (Monthly)')
plt.xlabel('Date')
plt.ylabel('Number of Incidents')
plt.xticks(rotation=45)
plt.legend(title='Incident Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose this multi-line plot to compare the time series trends of different incident types. By having a separate line for each TYPE, we can easily see if certain types of incidents have different monthly patterns or if they follow the overall stable trend observed in the first chart. It allows for a direct visual comparison of how the frequency of each incident type changes over time relative to the others.

##### 2. What is/are the insight(s) found from the chart?

Based on the multi-line plot showing incident counts by type over time, the main insight is that while the total number of incidents is stable, the distribution of incident types varies from month to month. Some incident types show slight fluctuations, increasing or decreasing in certain months, even though the overall sum remains relatively constant. This suggests that while the total workload might be consistent, the nature of the incidents the FBI deals with changes over time.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights suggest that while the overall workload is stable, a deeper look into the specific types of incidents and their temporal patterns is necessary for effective resource allocation and targeted crime prevention strategies.

#### Chart - 3

In [None]:
# Chart - 3 visualization code

# Group by TYPE and sum the incident counts
type_counts = df_analysis.groupby('TYPE')['Incident_Counts'].sum().reset_index()

# Sort the types by total count
type_counts = type_counts.sort_values('Incident_Counts', ascending=False)

# Create a bar plot of incident counts by type
plt.figure(figsize=(12, 7))
sns.barplot(data=type_counts, x='Incident_Counts', y='TYPE', palette='viridis')
plt.title('Total FBI Incident Counts by Type (Overall Period)')
plt.xlabel('Total Number of Incidents')
plt.ylabel('Incident Type')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose the bar plot to visualize the total incident counts by type because it's an effective way to compare the magnitudes of different categories. The length of each bar clearly represents the total number of incidents for each specific type over the entire period, making it easy to identify which incident types are the most and least frequent.

##### 2. What is/are the insight(s) found from the chart?

Based on the bar plot showing the total FBI incident counts by type, the main insight is the clear disparity in the frequency of different incident types. Some types, like "Other Theft" and "Theft from Vehicle", have significantly higher total counts over the period compared to types like "Theft of Bicycle" or "Break and Enter Commercial". This highlights that certain types of incidents are much more prevalent than others.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights collectively suggest that while the overall incident volume is consistent, focusing on the temporal variations and the most frequent types of incidents is crucial for optimizing resource allocation and developing effective, targeted crime prevention strategies

#### Chart - 4

In [None]:
# Chart - 4 visualization code

# Select the top N most frequent incident types (e.g., top 3)
top_n_types = type_counts.head(3)['TYPE'].tolist()

# Filter the dataframe to include only the top N types
df_top_types = df_analysis[df_analysis['TYPE'].isin(top_n_types)].copy()

# Plot the trend for the top N incident types
plt.figure(figsize=(14, 7))
sns.lineplot(data=df_top_types, x='Date', y='Incident_Counts', hue='TYPE')
plt.title(f'Monthly Incident Counts for Top {len(top_n_types)} FBI Incident Types')
plt.xlabel('Date')
plt.ylabel('Number of Incidents')
plt.xticks(rotation=45)
plt.legend(title='Incident Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose this multi-line plot specifically to examine and compare the monthly trends of the most frequent incident types. While the second chart showed all incident types, this chart focuses on the top ones identified in the bar plot. This allows for a clearer view of the temporal patterns for the crimes that contribute most to the overall incident volume and helps in identifying any specific seasonality or trends within these major categories.

##### 2. What is/are the insight(s) found from the chart?

Based on the line plot of the top incident types, the insight is that even among the most frequent types, the monthly trends are relatively stable and mirror the overall stable trend. There are no dramatic peaks or valleys for these individual top categories within the observed period. This reinforces the idea that the stability in the total incident count is reflected across the most common types of incidents as well.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

while the stability offers predictability for resource management, it also underscores the challenge in reducing the most common types of crime and emphasizes the need for potentially new or more focused strategies if crime reduction is a key business objective.

#### Chart - 5

In [None]:
# Chart - 5 visualization code

# Group by month and sum the incident counts
monthly_seasonal_counts = df_analysis.groupby('MONTH')['Incident_Counts'].sum().reset_index()

# Plot the total incidents by month (seasonal view)
plt.figure(figsize=(10, 6))
sns.barplot(data=monthly_seasonal_counts, x='MONTH', y='Incident_Counts', palette='viridis')
plt.title('Total FBI Incident Counts by Month (Seasonal View)')
plt.xlabel('Month')
plt.ylabel('Total Number of Incidents')
plt.xticks(np.arange(1, 13), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose the bar plot to visualize the total incident counts by month (seasonal view) because it's an effective way to compare the aggregate number of incidents across the 12 distinct months of the year. This chart makes it easy to see if certain months consistently have higher or lower incident counts when summed across the entire period, helping to identify any potential seasonal patterns.

##### 2. What is/are the insight(s) found from the chart?

Based on the bar plot showing the total FBI incident counts by month (seasonal view), the main insight is that there is no strong or consistent seasonal pattern in the total number of incidents across the months. The bars are all roughly the same height, indicating that when summed across the available years, each month contributes a similar number of incidents to the total. This reinforces the earlier observation of overall stability in incident volume.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

while the overall lack of seasonality simplifies high-level planning, it highlights the importance of examining individual incident types for seasonal patterns, as those could still require seasonal adjustments in strategy and resource allocation to achieve positive impacts (like reducing specific crime types during their peak seasons).

#### Chart - 6

In [None]:
# Chart - 6 visualization code

# Group by month and type and sum the incident counts for the top types
monthly_type_seasonal_counts = df_top_types.groupby(['MONTH', 'TYPE'])['Incident_Counts'].sum().reset_index()

# Create a grouped bar plot for seasonal patterns of top types
plt.figure(figsize=(14, 7))
sns.barplot(data=monthly_type_seasonal_counts, x='MONTH', y='Incident_Counts', hue='TYPE', palette='viridis')
plt.title(f'Seasonal Incident Counts by Type for Top {len(top_n_types)} FBI Incident Types')
plt.xlabel('Month')
plt.ylabel('Total Number of Incidents')
plt.xticks(np.arange(0, 12), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.legend(title='Incident Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose the grouped bar plot to visualize the seasonal incident counts by type because it effectively allows us to compare the incident counts across two categorical variables at once: the month and the incident type. By grouping the bars by month and using different colors for each incident type, we can easily see if certain incident types have higher or lower counts during specific months, revealing potential seasonal patterns within those categories, even if the overall total incidents per month are stable.

##### 2. What is/are the insight(s) found from the chart?

Based on the grouped bar plot showing the seasonal incident counts by type for the top incident types, the key insight is that even within these most frequent categories, there are no pronounced or consistent seasonal peaks or dips across the months in the observed data. The counts for each of the top incident types remain relatively uniform throughout the year. This further supports the earlier observation that the overall stability in incident numbers is reflected even when examining the most common types on a monthly basis.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

In essence, the lack of strong seasonality in the most frequent incident types simplifies seasonal resource planning for these specific crimes but also highlights that their consistent occurrence is a year-round challenge that requires consistent, potentially intensified, efforts if a reduction is the goal.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

# Group by year and type and sum the incident counts
yearly_type_counts = df_analysis.groupby(['YEAR', 'TYPE'])['Incident_Counts'].sum().reset_index()

# Create a stacked bar chart of incident counts by year and type
plt.figure(figsize=(12, 7))
sns.histplot(data=yearly_type_counts, x='YEAR', weights='Incident_Counts', hue='TYPE', multiple='stack', palette='viridis', shrink=0.8)
plt.title('Yearly FBI Incident Counts by Type')
plt.xlabel('Year')
plt.ylabel('Total Number of Incidents')
plt.xticks(yearly_type_counts['YEAR'].unique())
plt.legend(title='Incident Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose the stacked bar chart to visualize the yearly FBI incident counts by type because it is effective in showing the total number of incidents for each year while simultaneously illustrating the contribution of each incident type to that yearly total. This allows for easy comparison of the overall incident volume between years and helps in identifying if the proportion of different incident types changes from one year to the next.

##### 2. What is/are the insight(s) found from the chart?

Based on the stacked bar chart showing yearly FBI incident counts by type, the main insight is that 2012 had a higher total number of reported incidents compared to 2013 within this dataset. Additionally, the proportion of each incident type appears relatively similar between the two years, suggesting that while the overall volume differed, the mix of incident types remained consistent year-over-year.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The yearly comparison provides a high-level view of changes in incident volume and composition, which can inform strategic adjustments and help evaluate the potential impact of past efforts, aiming for sustained positive impact (reduction in incidents).



#### Chart - 8

In [None]:
# Chart - 8 visualization code

# Plot the trend for each incident type, separated by year
plt.figure(figsize=(14, 7))
sns.lineplot(data=df_analysis, x='MONTH', y='Incident_Counts', hue='TYPE', style='YEAR', palette='viridis')
plt.title('Monthly FBI Incident Counts by Type and Year')
plt.xlabel('Month')
plt.ylabel('Number of Incidents')
plt.xticks(np.arange(1, 13), ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.legend(title='Incident Type and Year', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose this multi-line plot to visualize the monthly incident counts by type and year because it allows for a direct comparison of the monthly trends for each incident type between 2012 and 2013. By having separate lines for each year, we can easily see if the monthly patterns for a specific incident type are consistent or if they differ between the two years. This helps in understanding if any observed monthly variations are consistent seasonal patterns or if they are specific to a particular year.

##### 2. What is/are the insight(s) found from the chart?

Based on the multi-line plot showing monthly FBI incident counts by type and year, the main insight is that for most incident types, the monthly patterns are relatively consistent between 2012 and 2013, although the overall counts for each type are generally lower in 2013. This reinforces the idea that while the total volume of incidents decreased from 2012 to 2013, the underlying monthly behavior of individual incident types remained largely similar. There are no dramatic shifts in monthly trends for specific types between the two years.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The consistency of monthly patterns within incident types allows for stable and potentially more refined tactical planning for specific crimes, while the overall reduction in 2013 is a promising trend that warrants further investigation to understand its drivers and ensure continued positive impact.



#### Chart - 9

In [None]:
# Chart - 9 visualization code

# Group by year and sum the incident counts
yearly_counts = df_analysis.groupby('YEAR')['Incident_Counts'].sum().reset_index()

# Plot the total incidents by year
plt.figure(figsize=(8, 5))
sns.barplot(data=yearly_counts, x='YEAR', y='Incident_Counts', palette='viridis')
plt.title('Total FBI Incident Counts by Year')
plt.xlabel('Year')
plt.ylabel('Total Number of Incidents')
plt.xticks(yearly_counts['YEAR'])
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose the bar plot to visualize the total FBI incident counts by year because it's a straightforward and effective way to visually compare the total incident volume between the two distinct years (2012 and 2013). The height of each bar directly represents the total number of incidents in that year, making the difference in volume between the years immediately clear.

##### 2. What is/are the insight(s) found from the chart?

Based on the bar plot showing the total FBI incident counts by year, the clear insight is that the total number of reported incidents was significantly higher in 2012 compared to 2013. This confirms the observation made from the stacked bar chart and provides a simple, direct comparison of the overall incident volume between the two years.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The observed reduction in total incidents in 2013 is a positive indicator of "negative growth" in crime volume, which has significant positive business implications for evaluating strategies and justifying resources. However, it's crucial to delve deeper to understand the underlying causes of this reduction to ensure sustainable positive impact.

#### Chart - 10

In [None]:
# Chart - 10 visualization code

# Calculate the total incidents per year
yearly_totals = df_analysis.groupby('YEAR')['Incident_Counts'].sum()

# Calculate the proportion of each incident type within each year
yearly_type_proportion = yearly_type_counts.copy()
yearly_type_proportion['Proportion'] = yearly_type_proportion.apply(lambda row: row['Incident_Counts'] / yearly_totals[row['YEAR']], axis=1)

# Create a proportional stacked bar chart
plt.figure(figsize=(10, 7))
sns.histplot(data=yearly_type_proportion, x='YEAR', weights='Proportion', hue='TYPE', multiple='stack', palette='viridis', shrink=0.8)
plt.title('Proportion of FBI Incident Types by Year')
plt.xlabel('Year')
plt.ylabel('Proportion of Incidents')
plt.xticks(yearly_type_proportion['YEAR'].unique())
plt.legend(title='Incident Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose the proportional stacked bar chart to visualize the proportion of FBI incident types by year because it is the best way to clearly show how the composition of incidents changes (or doesn't change) from year to year. Unlike a regular stacked bar chart which shows raw counts, this chart normalizes the counts within each year to 100%, allowing for a direct comparison of the percentage that each incident type contributes to the total in 2012 versus 2013

##### 2. What is/are the insight(s) found from the chart?

Based on the proportional stacked bar chart showing the proportion of FBI incident types by year, the clear insight is that the proportion of each incident type remained remarkably consistent between 2012 and 2013. This means that even though the overall number of incidents was lower in 2013, the relative frequency of each type of crime within that total stayed largely the same. This reinforces the earlier observation from the stacked bar chart that the mix of incidents didn't change significantly, only the total volume.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The consistent proportion of incident types provides stability for strategic planning across different crime categories, which is a positive. However, it also highlights that without targeted interventions to specifically alter the mix of crimes, the relative prominence of certain types will likely persist, which could be a negative if a shift in the crime landscape is desired.

#### Chart - 11

In [None]:
# Chart - 11 visualization code

# Group by year and type and sum the incident counts (already have yearly_type_counts from Chart 7)

# Create a grouped bar chart of incident counts by year and type
plt.figure(figsize=(14, 7))
sns.barplot(data=yearly_type_counts, x='TYPE', y='Incident_Counts', hue='YEAR', palette='viridis')
plt.title('Yearly FBI Incident Counts by Type')
plt.xlabel('Incident Type')
plt.ylabel('Total Number of Incidents')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Year')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose the grouped bar chart to visualize the yearly incident counts by type because it is an effective way to directly compare the total count of each specific incident type between 2012 and 2013. By having bars for each year grouped together for every incident type, we can easily see which incident types saw an increase, decrease, or remained stable in terms of raw counts from one year to the next.

##### 2. What is/are the insight(s) found from the chart?

Based on the grouped bar chart showing yearly FBI incident counts by type, the main insight is that the decrease in total incidents from 2012 to 2013 appears to be a broad trend affecting all incident types rather than being driven by a significant reduction in just one or two specific categories. For almost every incident type, the bar for 2013 is lower than the bar for 2012, indicating a general reduction across the board.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The broad reduction in incident types in 2013 is a very positive sign ("negative growth" in crime) with strong implications for the potential effectiveness of general crime reduction approaches. Understanding the reasons behind this widespread decrease is key to sustaining and enhancing this positive trend in the future.

#### Chart - 12

In [None]:
# Chart - 12 visualization code

# Pivot the yearly_type_counts to have years as columns
yearly_type_pivot = yearly_type_counts.pivot(index='TYPE', columns='YEAR', values='Incident_Counts').reset_index()

# Calculate the change in incident counts from 2012 to 2013
yearly_type_pivot['Change_2012_to_2013'] = yearly_type_pivot[2013] - yearly_type_pivot[2012]

# Sort by the magnitude of change
yearly_type_pivot = yearly_type_pivot.sort_values('Change_2012_to_2013', ascending=True)

# Create a bar chart of the change in incident counts by type
plt.figure(figsize=(12, 7))
sns.barplot(data=yearly_type_pivot, x='Change_2012_to_2013', y='TYPE', palette='coolwarm')
plt.title('Change in FBI Incident Counts by Type (2012 to 2013)')
plt.xlabel('Change in Number of Incidents')
plt.ylabel('Incident Type')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose the bar chart showing the change in FBI incident counts by type from 2012 to 2013 because it provides a clear and direct visualization of the magnitude and direction of change for each specific incident type. By plotting the difference in counts, we can easily see which incident types experienced the largest absolute increases or decreases, allowing for a focused analysis on the categories that contributed most to the overall yearly change.

##### 2. What is/are the insight(s) found from the chart?

Based on the bar chart showing the change in FBI incident counts by type from 2012 to 2013, the main insight is that all incident types saw a decrease in the number of reported incidents from 2012 to 2013. While the magnitude of the decrease varies slightly between types, the trend is consistent across the board. This reinforces the earlier observation that the overall reduction in incidents was a broad phenomenon, not limited to just a few categories.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The observation that all incident types decreased in 2013 compared to 2012 is a very strong positive business insight, indicating a broad "negative growth" in crime. This supports the potential effectiveness of general crime reduction efforts, but understanding the specific causes of this widespread decrease is essential for sustained positive impact.

#### Chart - 13

In [None]:
# Chart - 13 visualization code

# Pivot the dataframe to have months as index, years as columns and incident counts as values
monthly_yearly_pivot = df_analysis.pivot_table(index='MONTH', columns='YEAR', values='Incident_Counts', aggfunc='sum')

# Create a heatmap of monthly incident counts by year
plt.figure(figsize=(10, 7))
sns.heatmap(monthly_yearly_pivot, annot=True, fmt='g', cmap='viridis', linewidths=.5)
plt.title('FBI Incident Counts by Month and Year')
plt.xlabel('Year')
plt.ylabel('Month')
plt.yticks(ticks=np.arange(0.5, 12.5), labels=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose the heatmap to visualize the FBI incident counts by month and year because it provides a compact and color-coded representation of the incident volume across all month-year combinations in the dataset. The intensity of the color in each cell immediately shows the relative number of incidents for that specific month and year, making it easy to spot any month-year combinations that stand out or to see if there are any visual patterns in the distribution across the grid.

##### 2. What is/are the insight(s) found from the chart?

The heatmap effectively provides a consolidated view that supports the conclusions drawn from the separate yearly and monthly total plots.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

the heatmap serves as a valuable visual confirmation of key trends observed in earlier charts – a lower overall incident volume in 2013 and a lack of strong total monthly seasonality. This reinforces the positive business implications related to evaluating broad crime reduction efforts and stable high-level resource planning, while still highlighting the need to understand the drivers of the 2013 reduction and potentially look at seasonality within specific crime types.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

# Select only the numerical columns for correlation analysis
numerical_df = df_analysis[['YEAR', 'MONTH', 'Incident_Counts']]

# Calculate the correlation matrix
correlation_matrix = numerical_df.corr()

# Create a heatmap of the correlation matrix
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Heatmap of Numerical Variables')
plt.show()

##### 1. Why did you pick the specific chart?

I chose the correlation heatmap to visualize the relationships between the numerical variables ('YEAR', 'MONTH', and 'Incident_Counts') because it provides a quick and clear overview of how strongly each pair of variables is linearly related. The color intensity and the annotation in each cell show the correlation coefficient, making it easy to identify positive or negative correlations and their strength. This helps in understanding potential linear dependencies between these numerical features.

##### 2. What is/are the insight(s) found from the chart?

 The heatmap confirms the lack of strong linear relationships between incident counts and the temporal variables, while also highlighting a structural artifact in the data's time representation.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

# Create a pair plot of the numerical variables
sns.pairplot(numerical_df)
plt.suptitle('Pair Plot of Numerical Variables', y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

I chose the pair plot to visualize the relationships between the numerical variables because it provides a matrix of scatter plots for every pair of numerical variables and histograms for the distribution of each individual variable along the diagonal. This allows for a quick visual assessment of potential relationships, correlations (linear or non-linear), and the distribution of each numerical feature in one comprehensive view.

##### 2. What is/are the insight(s) found from the chart?

The pair plot provides a visual confirmation of the distributions and relationships between the numerical variables, reinforcing the insights about the lower incident counts in 2013 and the lack of strong linear relationships or overall monthly seasonality.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

The analysis provides a solid foundation for understanding the temporal and categorical patterns of FBI incidents in this dataset. The key is to leverage the insight of the broad reduction in 2013 to inform future strategies while using the knowledge of stable trends and proportions to optimize current resource allocation and develop targeted interventions for persistent crime types.


# **Conclusion**

The data reveals a positive trend of decreasing incident volume in 2013, spread across all crime types, within a context of overall monthly stability and consistent incident type proportions. Future efforts should focus on understanding the drivers of the 2013 reduction and maintaining targeted strategies for the most frequent crime types to sustain and enhance this positive trajectory.

