# Virginia Beach Incident Report - Data Analysis

### Table of Contents
- 0. Introduction
- 1. Import Libraries
- 2. Set Variables
- 3. Data Clean-Up
- 4. Data Analysis + Visualization
    - 4.1. What top 5 dates have recored the most incidents? What was the top date?
    - 4.2. Which days/months have the highest number of total police incidents?
    - 4.3. What are the top 5 incidents?
    - 4.4. Which days/months have the highest number of total police incidents for each of the top 5 incidents?
    - 4.5. Top 5 Dates for each of the top five incidents.

### Introduction

Virginia Beach, Virginia is an independent city in the Hampton Roads Region in SouthEastern Virginia. With a population of roughly 450,000, it is the largest city in the Commonwealth of Virginia, and the 42nd most populous
city in the United States. It has the longest pleasure beach in the world, 28 miles, making Virginia Beach a tourist destination. Even though it is a large city, its population is mainly suburban with the main "urban" areas being the OceanFront, and Town Center. 

Virginia Beach has been constantly ranked as one of the safest big cities. However, like with every city, there is crime. Utilizing the "Virginia Beach Police Incident" dataset from the "Virginia Beach Open Data" portal, we will analyze crime in Virginia Beach from 2018 - 2023.

Took inspiration from [Alaa Mohamedahmed](https://github.com/alaa-mohamedahmed/mtl-crime-data/blob/main/Montreal%20Crime%20Data%20Analysis%20(2015-2021).ipynb). Highly recommend you check out her work.

### 1. Import Libraries

In [None]:
import os
import pandas as pd 
import seaborn as sns 
import numpy as np 
import matplotlib.pyplot as plt 
%matplotlib inline

### 2. Set Variables

In [None]:
# Set Variables
ws = "C:/Desktop/VSCODE/VirginiaBeach/"

# Read in dataset
inCSV_one = os.path.join(ws, "VirginiaBeachIncidentReport.csv")
inCSV_two = os.path.join(ws, "MasterCodeList.csv")

### 3. Data Clean-Up

In [None]:
# Read in datasets
data = pd.read_csv(inCSV_one)
code_list = pd.read_csv(inCSV_two)
data.head()

In [None]:
# Lets look at the list of offense description to see if there are any discrempancies 
offense_list = data['Offense_Description'].unique().tolist()
offense_list.sort()
offense_list

By looking at the list, there are several instances of more than one itteration of each 'Offense_Description' for each 'Offense_Code'. This can negativly influence analysis. Lets merge the 'MasterCodeList' to the 'VirginiaBeachIncidentReport' dataframe. This can me done with "Merge"

In [None]:
# Merge the code list to the incident report dataframe using the 'Offense_Code' field in both tables
merged_data = data.merge(code_list, on='Offense_Code')
merged_data.head()

In [None]:
offense_list = merged_data['Description'].unique().tolist()
offense_list.sort()
offense_list

In [None]:
# Find column data types
merged_data.info()

Notice how "Date_Occurred", and "Date_Found" is an object, even though tey are both dates.Remember that for Data Clean-Up

In [None]:
# Check for NaN values
pd.DataFrame(merged_data.isnull().sum())

For me personally, I like to leave NaN or null values because they can be useful in storytelling

In [None]:
# Converting datatypes to required format
merged_data['Date_Occurred'] = pd.to_datetime(merged_data['Date_Occurred'])
merged_data['year'], merged_data['month'], merged_data['day_of_week'] = merged_data['Date_Occurred'].dt.year, merged_data['Date_Occurred'].dt.month, merged_data['Date_Occurred'].dt.dayofweek

In [None]:
dmap = {0:'Mon',1:'Tue',2:'Wed',3:'Thu',4:'Fri',5:'Sat',6:'Sun'}
merged_data['day_of_week'] = merged_data['day_of_week'].map(dmap)

### 4. Data Analysis + Visualization

#### 4.1 What top 5 dates have recored the most incidents? What was the top date?

In [None]:
top_dates = merged_data.groupby(['Date_Occurred']).Date_Occurred.value_counts().nlargest(5)
top_dates

Q. What top 5 dates have recored the most incidents? What was the top date?

A. January 1st, 2021 has the highest count of police incidents in Virginia Beach since 2018. 4 out of 5 occurred on New Years Day

Lets did deeper, and find out what crimes were committed on January 1st, 2021

In [None]:
jan012021 = merged_data.loc[merged_data['Date_Occurred'] == '2021-01-01']
top_crimes_jan012021 = jan012021.groupby(['Description', 'Offense_Code']).Date_Occurred.value_counts().nlargest(5)
top_crimes_jan012021

***
#### 4.2 Which days/months have the highest number of total police incidents?

Some people are starting the year off with fighting. But a lot more people are stealing identities. 

Lets now create a temporal heat map to visualize how crime since 2018 are temporally.

*Q. Which days/months have the highest number of total police incidents?*

In [None]:
merged_data['num'] = 1
dayMonth = merged_data.groupby(by=['day_of_week', 'month']).count()['num'].unstack()
plt.figure(figsize=(12,6))
sns.heatmap(dayMonth,cmap='coolwarm')

plt.title("Temporal Heat Map of Crime in Virginia Beach")
plt.xlabel("Month")
plt.ylabel("Day of the Week")

plt.tight_layout()
plt.show()

Q. Which days/months have the highest number of total police incidents?

A. Saturdays, and Fridays in July

***
#### 4.3 Top 5 incidents?

*Q. What are the top 5 incidents?*

In [None]:
top_offense = merged_data.groupby(['Description', 'Offense_Code']).Description.value_counts().nlargest(5)
top_offense

*Q. What are the top 5 incidents?*

*A. It looks like "Larceny, from Motor Vehicle" has the most reports. Don't keep anything valuable in your car.*

***
#### 4.4. Which days/months have the highest number of total police incidents for each of the top 5 incidents?
Lets create a helper function to take the 'Offense_Codes' that cooresponds to each of the top "Descriptions", and create heatmaps for each one

*Q. Which days/months have the highest number of total police incidents for each of the top 5 incidents?*

In [None]:
def heatmap(crime_code,name):
    crime_name = merged_data.loc[merged_data['Offense_Code'] == crime_code]
    dayMonth = crime_name.groupby(by=['day_of_week', 'month']).count()['num'].unstack()
    plt.figure(figsize=(15,6))   
    heat_map = sns.heatmap(dayMonth,cmap='coolwarm')
    plt.title(name)
    plt.xlabel("Month")
    plt.ylabel("Day of the Week")
    return heatmap

print( heatmap('23F', 'Temporal HeatMap of Larceny of Motor Vehicle Incidents '))
print( heatmap('13B1', 'Temporal HeatMap of Simple Assault Incidents' ))
print( heatmap('90ZC', 'Temporal HeatMap of Hit & Run Incidents' ))
print( heatmap('290B', 'Temporal HeatMap of Desctruction of Private Property Incidents' ))
print( heatmap('13B2', 'Temporal HeatMap of Domestic Simple Assault Incidents' ))

Q. Which days/months have the highest number of total police incidents for each of the top 5 incidents?

A. 
- Larceny, From Motor Vehicle: Mondays, and Sundays in July
- Simple Assault: Saturdays in June, and Sundays in August
- Hit & Run: Saturdays in June, and Fridays in July
- Destruction of Property, Private Property: Fridays in July
- Simple Domestic Assault: Saturdays in July, but Sundays throughout the year have consistently have high counts

Take Away: Stay home on the weekend in July (My birth month)
   

***
#### 4.5 Top 5 Dates for each of the top five incidents

*Q. What are the Top 5 Dates for each of the Top 5 Incidents?*

In [None]:
def top_dates(crime_code,name):
    dates_filter = merged_data.loc[merged_data['Offense_Code'] == crime_code]
    dates_top5 = dates_filter['Date_Occurred'].value_counts().nlargest(5)
    df = pd.DataFrame(dates_top5)
    df = df.sort_values('Date_Occurred')
    sns.barplot(x='Date_Occurred', y='count', data=df, order=df.index)
    plt.title(name)
    plt.xlabel("Date")
    plt.ylabel("Count")
    plt.tight_layout()
    plt.show()
    return top_dates


print( top_dates('23F', 'Larceny of Motor Vehicle Incidents' ))
print( top_dates('13B1', 'Simple Assault Incidents' ))
print( top_dates('90ZC', 'Hit & Run Incidents' ))
print( top_dates('290B', 'Destruction of Private Property Incidents' ))
print( top_dates('13B2', 'Domestic Simple Assault Incidents' ))