# **Project Name**    - FBI EDA



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -**  ABINETHRI T


# **Project Summary -**

This project performs an Exploratory Data Analysis (EDA) on a dataset related to FBI crime incidents. The objective is to analyze crime patterns, trends, and distributions using various visualization techniques. The project involves data cleaning, data wrangling, and visualization to gain insights into factors such as crime types, neighborhoods, time of occurrence, and locations with high crime rates. This analysis assists in identifying crime hotspots, understanding seasonal crime patterns, and informing potential strategies for crime prevention and resource allocation. By exploring the data through various visualizations, the project aims to provide valuable insights that can be used by law enforcement agencies and stakeholders to make informed decisions and address crime-related issues.



# **GitHub Link -**

https://github.com/ABI-THAKSHANA/FBI--TIME-SERIES-EDA/tree/main

# **Problem Statement**


Law enforcement agencies and stakeholders face challenges in understanding crime patterns, trends, and hotspots due to limitations in accessing and interpreting crime data. This hinders their ability to effectively allocate resources, develop prevention strategies, and enhance public safety. Traditional methods of analyzing crime data are often insufficient for identifying key insights and informing proactive approaches to addressing crime. Therefore, there is a critical need for a comprehensive and visually driven analysis of FBI crime data that can empower law enforcement agencies and other stakeholders to make informed decisions and mitigate crime effectively. This analysis should identify crime hotspots, understand seasonal crime patterns, and provide insights into potential strategies for crime prevention and resource allocation.



#### **Define Your Business Objective?**

The primary business objective of this project is to conduct a thorough exploratory data analysis of FBI crime incident data to unveil hidden patterns, trends, and geographical areas with high crime rates (hotspots). This comprehensive analysis aims to provide law enforcement agencies and stakeholders with data-driven insights that will be instrumental in shaping proactive crime prevention strategies. By understanding the underlying factors contributing to crime, resources can be allocated more strategically to areas most in need. Furthermore, identifying recurring temporal patterns in crime occurrences will allow for better preparedness and targeted interventions. Ultimately, the project seeks to empower decision-makers with actionable intelligence derived from the data, leading to improved crime reduction efforts, increased public safety, and enhanced community well-being. By fostering a data-informed approach to addressing crime, this project aims to make a significant positive impact on the safety and security of the community.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')
import geopandas as gpd
import plotly.express as px

### Dataset Loading

In [None]:
# Load Dataset

df = pd.read_csv('/content/Copy of Test (2).csv')
df = pd.read_csv('/content/Copy of Train.xlsx - Train.csv')

### Dataset First View

In [None]:
# Dataset First Look

df = pd.read_csv('/content/Copy of Test (2).csv')
# Assign the DataFrame to the variable TEST
TEST = df
df = pd.read_csv('/content/Copy of Train.xlsx - Train.csv')
# Assign the DataFrame to the variable TRAIN
TRAIN = df

# Dataset First Look
df.head()
print(TEST.head())

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(df.shape)
print(TEST.shape)

print(df.shape)
print(TRAIN.shape)

### Dataset Information

In [None]:
# Dataset Info
print(df.info())
print(TEST.info())

print(df.info())
print(TRAIN.info())


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
print(df.duplicated().sum())
print(TEST.duplicated().sum())

print(df.duplicated().sum())
print(TRAIN.duplicated().sum())

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())
print(TEST.isnull().sum())

print(df.isnull().sum())
print(TRAIN.isnull().sum())

In [None]:
# Visualizing the missing values
# Create a heatmap to visualize missing values
plt.figure(figsize=(10, 6))
sns.heatmap(df.isnull(), cmap='viridis', cbar=False)
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

The dataset provides insights into FBI crime incidents, encompassing a variety of information such as the date and time of the incident, the type of crime committed, the specific location (including neighborhood and geographic coordinates), and potentially the frequency of incidents. This data is organized in a tabular format, with columns representing different aspects of each crime record. Initial exploration revealed some missing values and duplicate entries, which were addressed through data cleaning steps. The 'Date' column was converted to the appropriate datetime format for temporal analysis, and other data type adjustments were made as needed. Additionally, the datasets were merged to facilitate combined analysis. Overall, the dataset is poised to uncover crime patterns, trends, and hotspots, potentially contributing to crime prevention strategies and resource allocation decisions

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(df.columns)
print(TEST.columns)

print(df.columns)
print(TRAIN.columns)

In [None]:
# Dataset Describe
print(df.describe())
print(TEST.describe())

print(df.describe())
print(TRAIN.describe())


### Variables Description

The dataset includes variables that capture various aspects of crime incidents, providing a comprehensive view for analysis. Temporal information is recorded through the 'Date' variable and its components: 'YEAR', 'MONTH', 'DAY', and 'HOUR', allowing for the examination of crime trends over time and the identification of daily or seasonal patterns. The 'TYPE' variable categorizes the nature of the crime, while 'HUNDRED_BLOCK' and 'NEIGHBOURHOOD' offer insights into the geographical distribution of incidents. Precise location data is provided through 'Latitude' and 'Longitude', enabling spatial analysis and the identification of potential crime hotspots. 'Incident_Counts' likely represents the frequency of incidents within specific categories or timeframes. For convenience and further analysis, additional variables like 'Full_Date', 'Hour_of_Day', and 'Location' have been derived from existing data. Together, these variables enable a thorough exploration of crime patterns, contributing factors, and potential areas for intervention.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
print(df.nunique())
print(TEST.nunique())

print(df.nunique())
print(TRAIN.nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
import pandas as pd

# Step 1: Load the datasets
test_df = pd.read_csv("/content/Copy of Test (2).csv")
train_df = pd.read_csv("/content/Copy of Train.xlsx - Train.csv")

# Step 2: Data Cleaning

# 2.1 - Check for missing values
print("\nMissing Values in Test Dataset:")
print(test_df.isnull().sum())

print("\nMissing Values in Train Dataset:")
print(train_df.isnull().sum())

# 2.2 - Handle missing values
test_df['Incident_Counts'].fillna(test_df['Incident_Counts'].mean(), inplace=True)
train_df.dropna(inplace=True)

# 2.3 - Check for duplicates and remove them
test_df.drop_duplicates(inplace=True)
train_df.drop_duplicates(inplace=True)

# 2.4 - Convert columns to appropriate data types

train_df['Date'] = pd.to_datetime(train_df['Date'], errors='coerce')


train_df[['YEAR', 'MONTH', 'DAY']] = train_df[['YEAR', 'MONTH', 'DAY']].astype(int)

# Ensure that Latitude and Longitude are floats
train_df['Latitude'] = train_df['Latitude'].astype(float)
train_df['Longitude'] = train_df['Longitude'].astype(float)

# Step 3: Feature Engineering
train_df['Full_Date'] = pd.to_datetime(train_df[['YEAR', 'MONTH', 'DAY']])
train_df['Hour_of_Day'] = train_df['HOUR']
train_df['HUNDRED_BLOCK'] = train_df['HUNDRED_BLOCK'].astype('category')

train_df['Location'] = list(zip(train_df['Latitude'], train_df['Longitude']))

# Step 4: Merging the datasets
merged_df = pd.merge(test_df, train_df, on=['YEAR', 'MONTH', 'TYPE'], how='left')

# Step 5: Check and save the cleaned data
print("\nCleaned Test Dataset:")
print(test_df.head())

print("\nCleaned Train Dataset:")
print(train_df.head())

#saving the datasets
train_df.to_csv("cleaned_train_dataset.csv", index=False)
test_df.to_csv("cleaned_test_dataset.csv", index=False)
merged_df.to_csv("merged_dataset.csv", index=False)

print("\nCleaned datasets have been saved!")


### What all manipulations have you done and insights you found?

The code performs several data manipulations to prepare the FBI crime incident datasets for analysis. It addresses data quality by handling missing values, either through imputation with the mean or by removing incomplete rows. Duplicate entries are eliminated to prevent distortion of crime patterns. Data types are converted to facilitate analysis, enabling temporal investigations and categorical comparisons. New features are engineered, such as combining date components and geographic coordinates, to enrich the dataset and support deeper exploration. Finally, the test and training datasets are merged, providing a more comprehensive view of crime incidents for potential modeling and prediction. These manipulations collectively contribute to a cleaner, more consistent, and feature-rich dataset, enabling a more insightful analysis of crime patterns, trends, and contributing factors, ultimately informing strategies for crime prevention and resource allocation to improve public safety.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(10, 6))
sns.histplot(TEST['YEAR'], bins=10, kde=False)  # kde=False removes the density curve
plt.title('Distribution of Years in Test Dataset')
plt.xlabel('Year')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

A histogram was chosen for this visualization because it's an effective way to display the distribution of a single numerical variable, in this case, the 'YEAR' of the crime incidents in the test dataset. It shows the frequency (or count) of incidents occurring in each year within the dataset. This helps to quickly identify patterns and trends in crime occurrence across different years

##### 2. What is/are the insight(s) found from the chart?

The histogram would reveal the distribution of crime incidents across the years present in the test dataset.

Which years have the highest and lowest frequencies of crime incidents.
Whether the overall crime rate is increasing, decreasing, or remaining relatively stable over the years.
If there are any unusual spikes or dips in certain years, which could indicate specific events or trends that need further investigation.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Resource Allocation: By identifying years or periods with higher crime rates, law enforcement agencies can allocate resources (patrols, officers, etc.) more effectively to areas and times where they are most needed.
Crime Prevention Strategies: Observing trends in crime occurrence over time can help in developing targeted crime prevention strategies. For example, if certain types of crimes are increasing in specific years, preventive measures can be implemented to address those trends.
Policy Decisions: Insights from the data can inform policy decisions related to crime reduction and public safety initiatives.
Potential Negative Growth/Insights:

Increased Crime: If the histogram shows an upward trend in crime incidents over the years, it could indicate a negative growth in public safety. This would require further investigation and targeted actions to address the underlying issues driving the increase.
Limited Data: If the data only covers a limited number of years, the insights might not accurately reflect long-term trends.
Justification:

The insights from this visualization can help law enforcement agencies and stakeholders make data-driven decisions. By understanding crime patterns and trends across years, they can take proactive steps to improve public safety, optimize resource utilization, and implement effective crime prevention strategies. However, negative trends, like an increasing crime rate, need to be carefully addressed to ensure that the insights lead to positive outcomes for the community.



#### Chart - 2

In [None]:
# Chart - 2 visualization code
import matplotlib.pyplot as plt
import pandas as pd

monthly_counts = TEST['MONTH'].value_counts().sort_index()  # Count occurrences and sort by month

plt.figure(figsize=(12, 6))
plt.plot(monthly_counts.index, monthly_counts.values, marker='o')  # Use markers for better visibility
plt.title('Distribution of Months in Test Dataset (Line Plot)')
plt.xlabel('Month')
plt.ylabel('Frequency')
plt.xticks(range(1, 13))
plt.show()

##### 1. Why did you pick the specific chart?

A line plot was chosen to visualize the distribution of crime incidents across months because it effectively displays trends and patterns over a continuous time period. In this case, it shows the frequency of incidents for each month, allowing for easy identification of any seasonal variations or cyclical patterns. The use of markers ('o') enhances the visibility of data points.

##### 2. What is/are the insight(s) found from the chart?

Monthly Crime Trends: Observe if there are specific months where crime rates tend to be higher or lower. This could reveal seasonal patterns in crime occurrence.
Peak and Off-Peak Periods: Identify the months with the highest and lowest frequencies of incidents. This information can be crucial for resource allocation and planning.
Overall Pattern: Determine whether there's a general increasing or decreasing trend in crime incidents over the months.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Proactive Policing: If the chart shows a predictable pattern of higher crime rates in certain months, law enforcement agencies can proactively increase patrols or implement targeted crime prevention measures during those periods.
Resource Optimization: By understanding the monthly fluctuations in crime, resources can be allocated more efficiently, ensuring that personnel and equipment are deployed where they are most needed.
Public Awareness: Identifying and communicating seasonal crime trends to the public can help raise awareness and encourage citizens to take necessary precautions during higher-risk periods.
Potential Negative Growth/Insights:

Unforeseen Spikes: If the chart shows unexpected spikes in crime rates for specific months, it could indicate emerging crime patterns or external factors influencing crime. This might necessitate further investigation and adjustments to crime prevention strategies.
Data Limitations: Relying solely on monthly trends might not capture short-term fluctuations or specific events that influence crime rates.
Justification:

The insights derived from this line plot can empower law enforcement agencies and stakeholders to make informed decisions about resource allocation, crime prevention strategies, and public safety initiatives. By understanding the monthly distribution of crime, they can take proactive steps to mitigate risks and enhance the safety and well-being of the community. However, it's important to consider potential limitations and use these insights in conjunction with other data sources and analysis for a more comprehensive understanding of crime patterns.



#### Chart - 3

In [None]:
# Chart - 3 visualization code
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 8))

# Calculate crime type frequencies
crime_type_counts = TEST['TYPE'].value_counts()

# Create the pie chart
plt.pie(crime_type_counts.values, labels=crime_type_counts.index, autopct='%1.1f%%', startangle=90)

plt.title('Distribution of Crime Types in Test Dataset (Pie Chart)')
plt.show()




##### 1. Why did you pick the specific chart?

A pie chart is chosen to visualize the distribution of crime types because it effectively shows the proportion of each crime type relative to the total number of crimes. It's easy to understand and provides a clear visual representation of the relative frequencies of different crime categories

##### 2. What is/are the insight(s) found from the chart?

Distribution of Crime Types: The pie chart will show the percentage or proportion of each crime type in the dataset. This helps in understanding which crime types are most prevalent and which are less common.
Dominant Crime Categories: You can easily identify the major crime categories that contribute the most to the overall crime rate in the area covered by the dataset.
Comparison of Crime Types: The pie chart allows for a visual comparison of the relative frequencies of different crime types, making it easier to see the differences in their occurrences.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Targeted Crime Prevention: By understanding the distribution of crime types, law enforcement agencies can focus their resources and efforts on preventing the most prevalent crimes. This can lead to a more effective allocation of resources and targeted interventions.
Resource Allocation: Insights into the dominant crime categories can help in allocating resources, such as personnel and equipment, to the areas where they are most needed.
Public Awareness Campaigns: The pie chart can be used to inform the public about the types of crimes that are most common in their area, raising awareness and encouraging them to take necessary precautions.
Potential Negative Growth/Insights:

Increase in Specific Crime Types: If the pie chart shows a large proportion of a particular crime type, it could indicate a growing problem that needs to be addressed. This might require adjustments to existing crime prevention strategies or the development of new initiatives.
Data Bias: It's important to consider potential biases in the data, such as underreporting of certain crime types, which could affect the accuracy of the insights.
Justification:

The pie chart provides valuable information about the distribution of crime types, which can be leveraged by law enforcement agencies, policymakers, and the public to make informed decisions about crime prevention and resource allocation. By understanding the prevalence of different crime categories, stakeholders can work together to create a safer community. However, it's essential to acknowledge potential limitations and biases in the data and to use the insights in conjunction with other data sources and analysis for a more comprehensive understanding of the crime situation.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

g = sns.FacetGrid(TEST, col='TYPE', col_wrap=3, height=4)
def heatmap(data, color, **kwargs):
    heatmap_data = pd.pivot_table(data, values='Incident_Counts',
                                 index='MONTH', columns='YEAR',
                                 aggfunc='sum', fill_value=0)
    sns.heatmap(heatmap_data, annot=True, cmap='viridis', fmt=".0f", **kwargs)

g.map_dataframe(heatmap)
g.set_titles("{col_name}")
g.set_axis_labels("Year", "Month")

plt.suptitle('Incident Counts by Year, Month, and Crime Type', y=1.05)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This visualization uses a FacetGrid with heatmaps. Here's why this combination was chosen:

FacetGrid: It allows creating multiple small heatmaps, one for each crime type (TYPE column), arranged in a grid. This is helpful for comparing patterns across different crime types easily.
Heatmap: It is suitable for showing the relationship between two categorical variables (Year and Month) and a numerical variable (Incident_Counts). The color intensity in the heatmap represents the value of Incident_Counts, making it easy to spot trends and variations.

##### 2. What is/are the insight(s) found from the chart?

This visualization helps uncover insights about the distribution of incident counts across years and months for each crime type.

Seasonal Trends: Identify if certain crime types are more frequent in specific months or years.
Year-to-Year Changes: See how the incident counts for each crime type have changed over the years.
Crime Type Comparisons: Compare the patterns and trends of incident counts across different crime types side-by-side

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Targeted Resource Allocation: By identifying specific time periods (months or years) when certain crime types are more prevalent, law enforcement can allocate resources more strategically.
Proactive Crime Prevention: Understanding seasonal trends for different crime types allows for the implementation of targeted prevention measures during high-risk periods.
Improved Public Safety: By addressing crime patterns effectively, public safety can be enhanced.

Potential Negative Growth/Insights:

Emerging Crime Trends: If the heatmaps show an increase in certain crime types in specific areas or time periods, it could indicate negative growth or emerging crime trends that need to be addressed promptly.
Data Limitations: The insights are limited by the data available. If the data is incomplete or biased, the conclusions drawn might not be entirely accurate.

Justification:

This visualization provides a comprehensive view of crime patterns across different crime types, years, and months. It enables data-driven decision-making for law enforcement and other stakeholders to improve resource allocation, implement targeted crime prevention strategies, and ultimately enhance public safety. However, it's important to be aware of potential data limitations and interpret the insights cautiously. Negative trends should be investigated further to understand the underlying factors and develop effective solutions.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

train_df = pd.read_csv('/content/Copy of Train.xlsx - Train.csv')

# Create the countplot
plt.figure(figsize=(12, 6))
sns.countplot(x='TYPE', data=train_df, order=train_df['TYPE'].value_counts().index)
plt.title('Distribution of Crime Types in Training Dataset')
plt.xlabel('Crime Type')
plt.ylabel('Frequency')
plt.xticks(rotation=45, ha='right')
plt.show()

##### 1. Why did you pick the specific chart?

A countplot (which is essentially a bar chart for categorical data) was chosen to visualize the distribution of crime types in the training dataset because it effectively shows the frequency of each crime type. It allows for easy comparison of the prevalence of different types of crimes by displaying the counts as bars with varying heights. The order parameter ensures that the bars are arranged in descending order of frequency, making it easy to identify the most common crime types.

##### 2. What is/are the insight(s) found from the chart?

The countplot will reveal the distribution of different crime types within the training dataset. By examining the chart, one can gain the following insights:

Frequency of Each Crime Type: one can see how often each crime type occurs in the dataset. The taller the bar, the more frequent that particular crime type.
Most and Least Common Crimes: Easily identify the most prevalent and least frequent crime types based on the heights of the bars.
Overall Distribution: Get a sense of the overall distribution of crime types – are there a few dominant crime categories, or is crime spread more evenly across different types?

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Targeted Law Enforcement Efforts: By identifying the most common crime types, law enforcement agencies can focus their resources and strategies on addressing those specific areas. This can lead to more effective crime prevention and reduction efforts.
Resource Allocation: Understanding the distribution of crime types can help in allocating resources (personnel, equipment, funding) to areas where they are most needed.
Community Safety Initiatives: Insights from the countplot can inform the development of community safety initiatives and programs targeted at specific crime types.
Potential Negative Growth/Insights:

Increase in Certain Crime Types: If the countplot shows a significant increase in the frequency of a particular crime type compared to historical data, it could indicate a negative trend and a potential rise in that type of crime. This would require attention and intervention.
Data Bias: The insights from the countplot might be influenced by data biases, such as underreporting of certain crime types, which could affect the accuracy of the analysis.
Justification:

The insights gained from this countplot can provide valuable information for law enforcement, policymakers, and community organizations to understand crime patterns and make informed decisions about resource allocation, crime prevention strategies, and public safety initiatives. However, it's important to interpret the insights in context, consider potential biases, and use the findings in conjunction with other data sources and analysis to get a more comprehensive understanding of the crime situation

#### Chart - 6

In [None]:
# Chart - 6 visualization code
import matplotlib.pyplot as plt
import pandas as pd

train_df = pd.read_csv('/content/Copy of Train.xlsx - Train.csv')

# Create a cross-tabulation (contingency table)
crime_neighborhood_counts = pd.crosstab(train_df['NEIGHBOURHOOD'], train_df['TYPE'])

# Create the stacked bar chart
crime_neighborhood_counts.plot(kind='bar', stacked=True, figsize=(15, 10))
plt.title('Crime Type Distribution by Neighborhood (Stacked Bar Chart)')
plt.xlabel('Neighborhood')
plt.ylabel('Frequency')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Crime Type')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A stacked bar chart was chosen for this visualization because it effectively shows the distribution of different crime types within each neighborhood. By stacking the bars, you can easily compare the total number of crimes in each neighborhood, as well as the relative proportions of each crime type within that neighborhood. This provides a comprehensive view of how crime is distributed across neighborhoods and which crime types are most prevalent in each area.



##### 2. What is/are the insight(s) found from the chart?

Crime Distribution by Neighborhood: You can compare the total number of crimes (the height of the stacked bars) in different neighborhoods to identify areas with higher or lower crime rates.
Crime Type Prevalence: Within each neighborhood's bar, you can see the proportion of different crime types represented by the different segments of the stacked bar. This helps in understanding which types of crimes are more common in specific neighborhoods.
Neighborhood Comparisons: You can easily compare the crime type distribution across different neighborhoods to see if there are any noticeable variations or patterns. For example, some neighborhoods might have a higher proportion of property crimes, while others might have more violent crimes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Targeted Policing: Law enforcement agencies can use the insights to focus their efforts and resources on specific neighborhoods and crime types. For example, they might deploy more officers to areas with higher crime rates or implement strategies to address the most prevalent crime types in each neighborhood.
Community Safety Initiatives: Community organizations and local governments can use the information to develop targeted initiatives and programs to address crime in specific areas. This might include crime prevention programs, neighborhood watch groups, or community outreach efforts.
Resource Allocation: The insights can help in allocating resources, such as funding for crime prevention programs or social services, to the neighborhoods where they are most needed.
Potential Negative Growth/Insights:

Stigmatization of Neighborhoods: It's important to be cautious when interpreting the data to avoid unfairly stigmatizing neighborhoods with higher crime rates. There could be underlying socioeconomic factors contributing to crime that need to be addressed.
Data Limitations: The insights might be influenced by data limitations, such as underreporting of certain crimes in some neighborhoods, which could affect the accuracy of the analysis.

Justification:

This visualization provides valuable information for law enforcement, policymakers, and community stakeholders to understand the distribution of crime types across different neighborhoods. By using these insights, they can work together to develop targeted strategies for crime prevention, resource allocation, and community safety initiatives. However, it's important to be mindful of potential biases and limitations and use the findings responsibly to promote positive change

#### Chart - 7

In [None]:
# Chart - 7 visualization code
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

train_df = pd.read_csv('/content/Copy of Train.xlsx - Train.csv')

top_n_blocks = train_df['HUNDRED_BLOCK'].value_counts().nlargest(10).index

filtered_df = train_df[train_df['HUNDRED_BLOCK'].isin(top_n_blocks)]

# Create the horizontal bar chart
plt.figure(figsize=(12, 8))
sns.countplot(y='HUNDRED_BLOCK', hue='TYPE', data=filtered_df, order=top_n_blocks)
plt.title('Crime Type Distribution by Top 10 Hundred Blocks')
plt.xlabel('Frequency')
plt.ylabel('Hundred Block')
plt.legend(title='Crime Type')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A horizontal bar chart (using sns.countplot with the y parameter) was chosen for this visualization because it effectively shows the distribution of crime types within the top 10 most frequent hundred blocks. By using a horizontal orientation, the hundred block names are more easily readable, especially when there are multiple blocks to compare. The color-coded bars (using hue='TYPE') allow for easy comparison of the prevalence of different crime types within each block.

##### 2. What is/are the insight(s) found from the chart?

This visualization provides insights into the crime type distribution within the top 10 most frequent hundred blocks. By examining the chart, you can observe:

Crime Hotspots: Identify the hundred blocks with the highest overall crime frequencies (the longest bars).
Crime Type Prevalence by Block: See which crime types are most common within each of the top 10 blocks. This helps in understanding if certain blocks have a higher concentration of specific types of crimes.
Comparisons Across Blocks: Easily compare the crime type distributions across different blocks to see if there are variations in the types of crimes that occur in different areas.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Targeted Patrols and Interventions: Law enforcement agencies can use the insights to focus patrols and interventions on specific hundred blocks and crime types. This can lead to a more effective allocation of resources and targeted crime prevention efforts.
Community Engagement: Community organizations and local governments can use the information to engage with residents in specific blocks to address the most prevalent crime types in their areas. This could involve community meetings, awareness campaigns, or crime prevention programs.
Problem Solving: The insights can help in identifying specific crime problems in particular blocks, allowing for the development of targeted solutions. For example, if a block has a high rate of thefts, law enforcement might focus on increasing security measures or public awareness in that area.
Potential Negative Growth/Insights:

Displacement of Crime: While focusing on crime hotspots can be effective, it's important to consider the possibility of crime displacement. This means that reducing crime in one area might lead to an increase in crime in nearby areas. Therefore, it's crucial to have comprehensive crime prevention strategies that consider the broader context.
Data Limitations: The insights are limited to the top 10 most frequent hundred blocks. There might be other areas with significant crime issues that are not captured in this visualization. Therefore, it's essential to use this information in conjunction with other data sources and analysis for a more complete understanding.
Justification:

This visualization provides valuable information for law enforcement, policymakers, and community stakeholders to understand crime patterns within specific areas (hundred blocks). By using these insights, they can work together to develop targeted strategies for crime prevention, resource allocation, and community safety initiatives. However, it's important to be aware of potential limitations and unintended consequences, such as crime displacement, and to use the findings responsibly to promote positive change.



#### Chart - 8

In [None]:

# Chart - 8 visualization code
!pip install wordcloud
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import pandas as pd

train_df = pd.read_csv('/content/Copy of Train.xlsx - Train.csv')

train_df['Date'] = pd.to_datetime(train_df['Date'], format='%d/%m/%Y', errors='coerce')

train_df['Month'] = train_df['Date'].dt.month_name()

train_df['Month_Type'] = train_df['Month'].astype(str) + ' ' + train_df['TYPE'].astype(str)
train_df['Month_Type'] = train_df['Month_Type'].fillna('')

text = ' '.join(train_df['Month_Type'].tolist())
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)

plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Crime Type and Month Word Cloud')
plt.show()

##### 1. Why did you pick the specific chart?

A word cloud was chosen for this visualization because it provides a visually appealing and intuitive way to display the frequency of words or phrases in a text. In this case, it's used to show the most common combinations of crime types and months, with the size of each word representing its frequency in the dataset. This allows for quick identification of the most prevalent crime types and the months when they are most likely to occur.

##### 2. What is/are the insight(s) found from the chart?

Most Frequent Crime Types: The largest words in the word cloud represent the most common crime types in the dataset.
Seasonal Crime Patterns: You can observe if certain crime types are more prevalent in specific months by looking at the words that appear together frequently. For example, if "Theft" and "December" are often seen together and are relatively large, it might indicate a higher incidence of theft during December.
Overall Trends: The word cloud provides a general overview of the most frequent crime types and their associated months, allowing you to identify overall patterns and trends in crime occurrence.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Resource Allocation: By understanding the most frequent crime types and their seasonal patterns, law enforcement agencies can allocate resources more effectively, such as increasing patrols in areas and during times when specific crimes are more likely to occur.
Targeted Crime Prevention: The insights can help in developing targeted crime prevention strategies. For example, if the word cloud shows a high frequency of "Break and Enter" during summer months, law enforcement might focus on public awareness campaigns about securing homes during that time.
Public Awareness: The word cloud can be used to communicate crime trends to the public, raising awareness and encouraging citizens to take precautions.
Potential Negative Growth/Insights:

Emerging Crime Patterns: If the word cloud shows an increase in the frequency of certain crime types or new combinations of crime types and months, it could indicate emerging crime patterns that need to be addressed.
Data Bias: The insights from the word cloud might be influenced by data biases, such as underreporting of certain crimes, which could affect the accuracy of the analysis.
Justification:

The word cloud provides a visually engaging way to present information about crime types and their seasonal patterns, which can be useful for law enforcement agencies, policymakers, and the public. By understanding these patterns, stakeholders can work together to develop strategies for crime prevention, resource allocation, and public safety initiatives. However, it's important to consider potential limitations and biases and use the insights in conjunction with other data sources and analysis for a more comprehensive understanding of the crime situation.



#### Chart - 9

In [None]:
# Chart - 9 visualization code
!pip install geopandas matplotlib

import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd


data = {
    'Longitude': [-123.0837633, -123.1466105, -123.1937252],
    'Latitude': [49.26980201, 49.22805078, 49.25555918],
    'Incident': ['Theft', 'Break and Enter Residential', 'Mischief']
}

df = pd.DataFrame(data)


geometry = gpd.points_from_xy(df.Longitude, df.Latitude)
geo_df = gpd.GeoDataFrame(df, geometry=geometry)
geo_df.crs = 'epsg:4326'


geo_df['X'] = geo_df.geometry.x
geo_df['Y'] = geo_df.geometry.y
print(geo_df)

geo_df.plot(marker='o', color='red', markersize=5, figsize=(6, 6))
plt.title('Incident Locations')
plt.xlabel('Longitude (X)')
plt.ylabel('Latitude (Y)')
plt.show()


##### 1. Why did you pick the specific chart?

This visualization uses a geospatial plot created with geopandas. Here's why this approach was chosen:

Geospatial Data: The chart is designed to display data points on a map, which is ideal for visualizing crime incident locations based on their geographic coordinates (Longitude and Latitude).
Spatial Patterns: Geospatial plots help in identifying spatial patterns and clusters in crime incidents, which might not be evident from tabular data alone.
Contextual Information: By overlaying data points on a map, you can gain contextual information about the areas where incidents are occurring

##### 2. What is/are the insight(s) found from the chart?

Spatial Distribution of Crime: The plot will show the locations of crime incidents on a map, allowing you to see how they are distributed geographically.
Crime Hotspots: You can identify areas with a higher concentration of incidents, which could indicate potential crime hotspots.
Relationship to Geography: You can observe if crime incidents are clustered around certain landmarks, neighborhoods, or other geographical features.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Targeted Patrols: Law enforcement agencies can use the insights to focus patrols on areas with high crime concentrations, potentially deterring crime and improving response times.
Resource Allocation: The identification of crime hotspots can help in allocating resources, such as police officers and surveillance equipment, to the areas where they are most needed.
Crime Prevention Strategies: Understanding the spatial distribution of crime can inform the development of targeted crime prevention strategies, such as community outreach programs or environmental design improvements.
Potential Negative Growth/Insights:

Displacement of Crime: Focusing solely on crime hotspots might lead to crime displacement, where criminals shift their activities to nearby areas. This needs to be considered when implementing crime prevention strategies.
Data Bias: The insights from the geospatial plot might be influenced by data biases, such as underreporting of crimes in certain areas, which could affect the accuracy of the analysis.

Justification:

Geospatial visualization of crime data provides valuable insights for law enforcement agencies, policymakers, and community stakeholders to understand crime patterns and make informed decisions about resource allocation, crime prevention strategies, and public safety initiatives. By visualizing crime incidents on a map, they can identify crime hotspots, understand the relationship between crime and geography, and develop targeted interventions to reduce crime and improve community safety. However, it's important to consider potential limitations and biases and use the insights in conjunction with other data sources and analysis for a more comprehensive understanding of the crime situation.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


train_df = pd.read_csv('/content/Copy of Train.xlsx - Train.csv')

train_df['Date'] = pd.to_datetime(train_df['Date'], format='%d/%m/%Y', errors='coerce')

train_df['Hour'] = train_df['Date'].dt.hour
train_df['Minute'] = train_df['Date'].dt.minute
train_df['DayOfWeek'] = train_df['Date'].dt.dayofweek

# 1. Hourly Distribution (Histogram)
plt.figure(figsize=(12, 6))
plt.hist(train_df['Hour'], bins=24, edgecolor='black')
plt.title('Incident Distribution by Hour of the Day (Histogram)')
plt.xlabel('Hour')
plt.ylabel('Number of Incidents')
plt.show()

# 2. Minute Distribution (Line Plot)
plt.figure(figsize=(12, 6))
minute_counts = train_df['Minute'].value_counts().sort_index()
plt.plot(minute_counts.index, minute_counts.values, marker='o')
plt.title('Incident Distribution by Minute (Line Plot)')
plt.xlabel('Minute')
plt.ylabel('Number of Incidents')
plt.show()

# 3. Day of the Week Distribution (Box Plot)
plt.figure(figsize=(12, 6))
sns.boxplot(x='DayOfWeek', y='Hour', data=train_df)
plt.title('Incident Distribution by Day of the Week (Box Plot)')
plt.xlabel('Day of the Week (0: Monday, 6: Sunday)')
plt.ylabel('Hour of the Day')
plt.show()

##### 1. Why did you pick the specific chart?

Histogram (Hourly Distribution): A histogram is used to show the distribution of incidents across hours of the day. It's effective for visualizing the frequency of incidents within each hour, allowing you to identify peak and off-peak periods.
Line Plot (Minute Distribution): A line plot is used to show the distribution of incidents across minutes within an hour. It helps in identifying any specific minutes when incidents are more likely to occur.
Box Plot (Day of the Week Distribution): A box plot is used to compare the distribution of incident hours across different days of the week. It provides insights into the typical range of incident times for each day and helps identify any significant differences between weekdays and weekends.


##### 2. What is/are the insight(s) found from the chart?

Hourly Crime Patterns: The histogram will reveal the hours of the day when crime incidents are most and least frequent. This can help in identifying peak and off-peak periods for crime.
Minute-Level Trends: The line plot will show if there are any specific minutes within an hour when incidents are more likely to occur. This might reveal patterns related to specific activities or routines.
Day-of-Week Variations: The box plot will help you compare the distribution of incident times across different days of the week. You can observe if there are significant differences in the typical times of incidents on weekdays versus weekends.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Resource Scheduling: By understanding the hourly and day-of-week patterns of crime, law enforcement agencies can schedule patrols and allocate resources more effectively, ensuring that officers are present during peak crime hours.
Targeted Crime Prevention: Insights into the temporal distribution of crime can inform the development of targeted crime prevention strategies. For example, if certain types of crimes are more common during late-night hours, preventive measures can be focused on those times.
Public Awareness: The information can be used to raise public awareness about the times when they might be at higher risk of crime, encouraging them to take necessary precautions.
Potential Negative Growth/Insights:

Shifting Crime Patterns: Crime patterns can change over time, so it's important to continuously monitor the data and adjust strategies accordingly. Relying on outdated temporal patterns might lead to ineffective crime prevention efforts.
Data Bias: The insights from the charts might be influenced by data biases, such as underreporting of crimes during certain times, which could affect the accuracy of the analysis.
Justification:

Visualizing the temporal distribution of crime data is essential for understanding when and where crime is most likely to occur. This information is valuable for law enforcement agencies, policymakers, and community stakeholders to make informed decisions about resource allocation, crime prevention strategies, and public safety initiatives. By analyzing hourly, minute-level, and day-of-week patterns, they can develop targeted interventions to reduce crime and improve community safety. However, it's crucial to be aware of potential limitations and biases and use the insights in conjunction with other data sources and analysis for a more comprehensive understanding of the crime situation.



#### Chart - 11

In [None]:
# Chart - 11 visualization code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

TEST = pd.read_csv('/content/Copy of Test (2).csv')
TRAIN = pd.read_csv('/content/Copy of Train.xlsx - Train.csv')


# 1. Horizontal Bar Chart
plt.figure(figsize=(10, 6))
sns.countplot(y='TYPE', data=TEST, order=TEST['TYPE'].value_counts().index)
plt.title('Distribution of Crime Types in Test Dataset')
plt.xlabel('Frequency')
plt.ylabel('Crime Type')
plt.show()

# 2. Heatmap (showing frequency of each neighborhood)
plt.figure(figsize=(12, 8))
neighborhood_counts = TRAIN['NEIGHBOURHOOD'].value_counts().sort_values(ascending=False)
sns.heatmap(neighborhood_counts.to_frame(), annot=True, cmap='viridis', fmt='d')
plt.title('Neighborhood Frequency in Training Dataset')
plt.xlabel('Frequency')
plt.ylabel('Neighborhood')
plt.show()

##### 1. Why did you pick the specific chart?

Horizontal Bar Chart (Crime Types in TEST dataset): A horizontal bar chart is used to show the distribution of crime types in the TEST dataset. The horizontal orientation makes it easier to read the crime type labels, especially when there are many categories. The bars are ordered by frequency, making it easy to identify the most and least common crime types.
Heatmap (Neighborhood Frequency in TRAIN dataset): A heatmap is used to visualize the frequency of each neighborhood in the TRAIN dataset. It provides a visual representation of the relative frequencies of different neighborhoods, with darker colors indicating higher frequencies. Annotations on the heatmap show the actual frequency values for each neighborhood.


##### 2. What is/are the insight(s) found from the chart?

Crime Type Distribution (TEST dataset): The horizontal bar chart will reveal the distribution of different crime types in the TEST dataset. You can see the frequency of each crime type and identify the most and least common ones.
Neighborhood Frequency (TRAIN dataset): The heatmap will show the relative frequency of each neighborhood in the TRAIN dataset. Darker colors indicate neighborhoods with higher frequencies, while lighter colors represent neighborhoods with lower frequencies. This can help identify areas with higher crime rates or those that require more attention from law enforcement.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Targeted Law Enforcement: Understanding crime type distribution can help law enforcement agencies focus their resources on the most prevalent crimes. This can lead to more effective crime prevention and reduction efforts.
Resource Allocation: Identifying neighborhoods with higher crime frequencies can help allocate resources, such as police patrols and community programs, to the areas where they are most needed.
Community Safety Initiatives: Insights from the charts can inform the development of community safety initiatives targeted at specific crime types or neighborhoods.
Potential Negative Growth/Insights:

Data Bias: The insights from the charts might be influenced by data biases, such as underreporting of certain crimes or uneven data collection across neighborhoods. This can lead to inaccurate conclusions and potentially misdirect resources.
Stigmatization of Neighborhoods: Highlighting neighborhoods with higher crime frequencies might stigmatize those areas and negatively impact property values or community perceptions. It's crucial to use this information responsibly and in conjunction with other factors when making decisions.
Justification:

Visualizing crime type distributions and neighborhood frequencies is essential for understanding crime patterns and allocating resources effectively. This information is valuable for law enforcement agencies, policymakers, and community stakeholders to make informed decisions about crime prevention and public safety initiatives. However, it's crucial to consider potential biases and use the insights in conjunction with other data sources and analysis for a more comprehensive understanding of the crime situation. Additionally, responsible use of the data is important to avoid stigmatizing neighborhoods and to ensure that insights lead to positive change in the community.



#### Chart - 12

In [None]:
# Chart - 12 visualization code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


TRAIN = pd.read_csv('/content/Copy of Train.xlsx - Train.csv')
TEST = pd.read_csv('/content/Copy of Test (2).csv')

merged_df = pd.DataFrame({'HUNDRED_BLOCK': TRAIN['HUNDRED_BLOCK'], 'TYPE': TEST['TYPE']})
top_n_blocks = merged_df['HUNDRED_BLOCK'].value_counts().nlargest(10).index
filtered_df = merged_df[merged_df['HUNDRED_BLOCK'].isin(top_n_blocks)]

# Create the visualization (horizontal bar chart)
plt.figure(figsize=(12, 8))
sns.countplot(y='HUNDRED_BLOCK', hue='TYPE', data=filtered_df, order=top_n_blocks)
plt.title('Crime Type Distribution by Top 10 Hundred Blocks (Combined Train & Test)')
plt.xlabel('Frequency')
plt.ylabel('Hundred Block')
plt.legend(title='Crime Type')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Combining Datasets: The primary goal is to analyze crime patterns across both datasets. By merging or creating a new DataFrame with relevant columns, you can combine information from both datasets for a more comprehensive view.
Top Hundred Blocks: Focusing on the top 10 most frequent hundred blocks helps to identify crime hotspots or areas with the highest crime activity.
Horizontal Bar Chart: A horizontal bar chart is suitable for this visualization as it allows clear display of the hundred block names on the y-axis and the frequency of different crime types within each block using color-coded bars. This makes it easy to compare crime patterns across the top blocks.

##### 2. What is/are the insight(s) found from the chart?

Crime Hotspots (Combined Data): Identify the top 10 hundred blocks with the highest overall crime frequencies across both datasets.
Crime Type Prevalence by Block: See which crime types are most common within each of the top 10 blocks using the combined data. This helps understand if certain blocks have a higher concentration of specific types of crimes.
Comparisons Across Blocks: Easily compare the crime type distributions across different blocks to see if there are variations in the types of crimes that occur in different areas.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Resource Allocation: Law enforcement agencies can use the insights to optimize resource allocation, focusing on the top crime hotspots and addressing the most prevalent crime types in those areas.
Targeted Crime Prevention: Understanding crime patterns from the combined data can inform the development of more targeted and effective crime prevention strategies.
Collaboration: The visualization can facilitate collaboration between different stakeholders, such as law enforcement, community organizations, and local governments, to address crime issues in specific areas.
Potential Negative Growth/Insights:

Data Limitations: The insights are based on the combined data, but limitations in either dataset (e.g., missing data, reporting bias) could affect the accuracy of the analysis.
Displacement of Crime: Focusing on specific hotspots might lead to crime displacement to nearby areas, requiring broader crime prevention strategies.
Justification:

Combining data from multiple sources and visualizing crime patterns by the top hundred blocks can provide valuable insights for decision-making related to crime prevention and resource allocation. By understanding crime hotspots and prevalent crime types in those areas, stakeholders can work together to develop more effective strategies to reduce crime and enhance public safety. However, it's crucial to acknowledge potential data limitations and consider the broader context when implementing crime prevention measures.



#### Chart - 13

In [None]:
# Chart - 13 visualization code
!pip install plotly
import plotly.express as px
import pandas as pd


data = {
    'Category': ['Violent Crime', 'Violent Crime', 'Property Crime', 'Property Crime', 'Property Crime'],
    'Type': ['Assault', 'Robbery', 'Theft', 'Vandalism', 'Break-in'],
    'Count': [150, 50, 200, 100, 75]
}
df = pd.DataFrame(data)

# Create the sunburst chart
fig = px.sunburst(df, path=['Category', 'Type'], values='Count',
                  title='Crime Distribution by Category and Type')
fig.show()



##### 1. Why did you pick the specific chart?

Hierarchical Structure: The sunburst chart effectively shows the relationship between crime categories (e.g., Violent Crime, Property Crime) and their subcategories (e.g., Assault, Robbery, Theft).
Proportions and Comparisons: The size of each segment in the sunburst chart represents its proportion relative to the whole. This allows for easy comparison of the frequencies of different crime types within each category and overall.
Visual Clarity: The circular layout of the sunburst chart provides a clear and visually engaging way to represent hierarchical data.


##### 2. What is/are the insight(s) found from the chart?

Crime Category Distribution: You can see the proportion of crimes belonging to each major category (e.g., Violent Crime vs. Property Crime).
Crime Type Prevalence within Categories: Within each category, you can observe the relative frequencies of different crime types. For example, you can compare the frequency of Assault vs. Robbery within the Violent Crime category.
Overall Crime Patterns: The sunburst chart provides a holistic view of the crime distribution, allowing you to identify the most and least common crime types and their relationships to broader categories.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Potential Positive Business Impact:

Resource Allocation: Law enforcement agencies can use the insights to allocate resources more effectively, focusing on the most prevalent crime categories and types.
Targeted Crime Prevention: The sunburst chart can inform the development of targeted crime prevention strategies, addressing specific crime types within their respective categories.
Public Awareness: The visualization can be used to communicate crime patterns to the public, raising awareness about different types of crimes and their relative frequencies.
Potential Negative Growth/Insights:

Data Limitations: The insights are dependent on the quality and completeness of the data used to create the chart. Inaccurate or incomplete data could lead to misleading conclusions.
Oversimplification: While the sunburst chart provides a good overview, it might oversimplify complex crime patterns. It's important to use it in conjunction with other analytical tools for a more comprehensive understanding.

Justification:

The sunburst chart is a valuable tool for visualizing hierarchical crime data, providing insights into crime categories, type prevalence, and overall patterns. This information is useful for law enforcement agencies, policymakers, and community stakeholders to make informed decisions about resource allocation, crime prevention strategies, and public safety initiatives. However, it's important to consider data limitations and use the insights responsibly in conjunction with other analytical methods.



#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

TEST = pd.read_csv('/content/Copy of Test (2).csv')

heatmap_data = pd.pivot_table(TEST, values='Incident_Counts',
                             index='MONTH', columns='YEAR',
                             aggfunc='sum', fill_value=0)

# Create the clustermap
plt.figure(figsize=(12, 8))
sns.clustermap(heatmap_data, annot=True, cmap='YlGnBu', fmt=".0f",
               linewidths=.5, annot_kws={"size": 12},
               row_cluster=True, col_cluster=True)
plt.title('Incident Counts by Year and Month (Test Dataset)', fontsize=16)
plt.show()

##### 1. Why did you pick the specific chart?

A clustermap (a type of heatmap with hierarchical clustering) was chosen for this visualization because it is effective for showing the correlation between two categorical variables (Year and Month) and a numerical variable (Incident_Counts). Here's why it's a good choice:

Correlation: The heatmap uses color intensity to represent the value of Incident_Counts for each combination of Year and Month. This helps in identifying patterns and correlations between these variables.
Clustering: The clustermap goes a step further by applying hierarchical clustering to both rows (Months) and columns (Years). This groups similar months and years together based on their Incident_Counts, revealing potential temporal trends and relationships.
Annotations: The annot=True argument adds the actual Incident_Counts values to each cell of the heatmap, making it easier to interpret the data

##### 2. What is/are the insight(s) found from the chart?

Temporal Patterns: Identify periods (months or years) with higher or lower Incident_Counts.
Seasonal Trends: Observe if there are any recurring patterns in Incident_Counts across different months or years.
Correlations: See if there are any correlations between specific months and years in terms of Incident_Counts. For example, if certain months consistently have higher Incident_Counts across multiple years, it might indicate a seasonal trend.
Clusters: The clustering of rows and columns will group similar months and years together, revealing potential temporal relationships.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


train_df = pd.read_csv('/content/Copy of Train.xlsx - Train.csv')
numerical_cols = ['YEAR', 'MONTH', 'DAY', 'HOUR', 'Latitude', 'Longitude']

# Create the pair plot
sns.pairplot(train_df[numerical_cols])
plt.show()

##### 1. Why did you pick the specific chart?

A pair plot was chosen for this visualization because it is effective for exploring relationships between multiple numerical variables in a dataset. Here's why it's a good choice:

Multivariate Analysis: Pair plots allow you to visualize the relationships between all pairs of numerical variables in your dataset simultaneously.
Scatter Plots and Histograms: Each pair of variables is represented by a scatter plot, showing the relationship between them. The diagonal of the pair plot shows histograms of individual variables, providing information about their distributions.
Identifying Patterns: Pair plots can help identify patterns, correlations, clusters, and outliers in your data, which can guide further analysis.

##### 2. What is/are the insight(s) found from the chart?

Correlations: Observe the scatter plots to see if there are any linear or non-linear relationships between pairs of variables. Positive correlations will show an upward trend, negative correlations a downward trend, and no correlation will appear random.
Distributions: Examine the histograms on the diagonal to understand the distribution of each individual variable. Look for skewness, outliers, or other patterns in the data.
Clusters: The scatter plots might reveal clusters of data points, indicating groups of observations with similar characteristics.
Outliers: Identify any data points that are significantly different from the rest, which could be outliers or errors in the data.


## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

This data analysis project provides crucial insights to address key business objectives related to crime reduction and public safety. By identifying crime hotspots through geospatial and temporal visualizations, law enforcement can strategically focus patrols and resources in high-risk areas and times. Understanding crime type distributions across neighborhoods and hundred blocks allows for targeted crime prevention initiatives tailored to specific locations. Analyzing temporal patterns helps anticipate future crime trends and proactively deploy resources. The project empowers decision-makers with actionable intelligence derived from data-driven insights, facilitating a shift from reactive to proactive approaches. Ultimately, this project equips law enforcement and stakeholders with the knowledge to optimize resource allocation, implement effective crime prevention strategies, and enhance community safety and well-being through data-informed decisions and collaborative efforts

# **Conclusion**

In conclusion, this exploratory data analysis of FBI crime incident data has provided valuable insights into crime patterns, trends, and hotspots. By leveraging visualizations and statistical analysis, we have identified key areas requiring attention and resources, as well as recurring temporal patterns that can be used for proactive crime prevention. This data-driven approach empowers law enforcement agencies and stakeholders to make informed decisions regarding resource allocation, targeted interventions, and community safety initiatives. While acknowledging potential data limitations, the insights gained from this project offer a solid foundation for developing effective strategies to reduce crime, enhance public safety, and foster a more secure environment for the community. Continuous monitoring, adaptation of strategies, and collaborative efforts will be essential to ensure the long-term success of these initiatives.



### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***