# Analysis of Crime in Virginia Beach, Virginia

Purpose: Learn different python tools with **Virginia Beach Crime** dataset

Took inspiration from [Alaa Mohamedahmed](https://github.com/alaa-mohamedahmed/mtl-crime-data/blob/main/Montreal%20Crime%20Data%20Analysis%20(2015-2021).ipynb). Highly recommend you check out her work.

_Original dataset is available from the [Virginia Beach Open Data Portal](https://data.virginiabeach.gov/datasets/67bc708103e746f18e216c32ba39febe_0/about)_


In [None]:
# Import packages

import pandas as pd
import numpy as np
import os
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
pd.options.display.max_rows = None

## Explore the data

In [None]:
# Read in crime dataset
crime = pd.read_csv(r'D:\Projects\Learning\VSCode\Crime\vbcrime.csv') 

# Look at first 5 rows
crime.head()                  

In [None]:
# Find data types
crime.info()

In [None]:
# Check Unique values
crime.groupby('Offense_Description')['Offense_Code'].unique()

As you can see, there are multiple variations of _Offense_Description_ per _Offense_Code_.

For example:
>OVERDOSE - DEATH = 90ZK

>OVERDOSE DEATH = 90ZK

We want to normalize _Offense_Description_ into a single value
>Overdose = 90ZK

But first, lets preform basic house cleaning 

***
### Clean up dataset
***

In [None]:
# remove unwanted columns
crime.drop(['IncidentNumber', 'Date_Found', 'Zone_ID', 'Precinct'], axis=1, inplace=True)

In [None]:
# find and remove rows with NaN values
pd.DataFrame(crime.isnull().sum())

crime = crime.dropna()

In [None]:
# Convert to Date
crime['Date_Occurred'] = pd.to_datetime(crime['Date_Occurred'])
                                      

***
### Subset crime
***

The following code we will:
>1. Create a new dataframe that has only a couple _Offense_Code_.
>2. Normalize the _Offense_Description_ values.

In [None]:
crime.info()

In [None]:
# Create a new dataframe containing select crimes
year_crime = crime.loc[(crime.Offense_Code == '09A') | (crime.Offense_Code == '13B2') | (crime.Offense_Code == '35A1') | (crime.Offense_Code == '11A') | (crime.Offense_Code == '90ZK') | (crime.Offense_Code == '23C') | (crime.Offense_Code == '120A') | (crime.Offense_Code == '13B1')]

# The "|" means "OR" 

# Normalize data to ensure Offense Description is the same for each Offense Code
year_crime['Offense_Description'].mask(year_crime['Offense_Code'] == '13B2', "Simple Domestic Assult", inplace=True)
year_crime['Offense_Description'].mask(year_crime['Offense_Code'] == "13B1", "Simple Assult", inplace=True)  
year_crime['Offense_Description'].mask(year_crime['Offense_Code'] == '35A1', "Drug / Narcotic Violations", inplace=True)
year_crime['Offense_Description'].mask(year_crime['Offense_Code'] == '09A', "Murder", inplace=True) 
year_crime['Offense_Description'].mask(year_crime['Offense_Code'] == "11A", "Rape", inplace=True)
year_crime['Offense_Description'].mask(year_crime['Offense_Code'] == "90ZK", "Overdose", inplace=True)
year_crime['Offense_Description'].mask(year_crime['Offense_Code'] == "23C", "Shoplifting", inplace=True) 
year_crime['Offense_Description'].mask(year_crime['Offense_Code'] == "120A", "Robbery", inplace=True) 

# Extract year into new field
year_crime['year'] = year_crime['Date_Occurred'].dt.year

year_crime['Offense_Description'].mask --> selects which column to normalize the condition, (year_crime['Offense_Code] == '13B2) --> selects the _'Offense_Code_, "Simple Domestic Assult" --> name of new vlaues to be put in _'Offense_Description'_

In [None]:
year_crime.head()

***
### Plot by Year
***


In [None]:
# Create color palette
custom_colors = ["#ff0000", "#ffa500", "#ffff00", "#22d933", "#004aad", "#8a2be2", "#9c6860", "#261c00"]

# Graph
plt.figure(figsize=(20, 10))
ax = sns.countplot(data=year_crime, x='year', hue='Offense_Description', palette=custom_colors, hue_order=["Simple Assult", "Simple Domestic Assult", "Shoplifting", "Robbery", "Drug / Narcotic Violations", "Overdose", "Rape", "Murder"])

# Legend items
legend_labels = ["Simple Assult", "Simple Domestic Assult", "Shoplifting", "Robbery", "Drug / Narcotic Violations", "Overdose", "Rape", "Murder"]
ax.legend(title="Offense Description", labels=legend_labels)

# add labels
plt.xlabel('Year')
plt.ylabel('Count')
plt.title('Distribution of Crime in Virginia Beach (2018-2023)')

# Display
plt.tight_layout()
plt.show()