# Montgomery Police Traffic Stops Analysis with Pandas

## This project explores the Montgomery Police traffic stops and it analyzes gender, race, time of the day and the rate at which subagencies carry out traffic stops.

The dataset contains traffic stopps by montgomery police officers. This dataset is gotten from https://data.montgomerycountymd.gov/Public-Safety/Traffic-Violations/4mse-ku6q and it is focused on the state of Maryland.

### Data Preparation/Cleaning

A good analysis requires the data to be thoroughly examined and cleaned.
A clean dataset makes the process easier to work with.
Data preparation involves importing the dataset, handling missing  values, 
place holders, null values and fixing data type to the appropriate columns,
droping less useful columns.

The dataset is from 01/01/2012 to 12/2/2020.

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
sns.set(color_codes=True)
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)


In [None]:
# Importing the dataset into a DataFrame and naming it df
df = pd.read_csv(r'C:\Users\Ice Asortse\Desktop\Traffic_Violations.csv')

In [None]:
# Examine the dataset
df.head()

In [None]:
# Renaming Time Of Stop column for beter manipulation
df.rename(columns = {'Time Of Stop': 'Time','Violation Type': 'Violation'}, inplace=True)

In [None]:
# Examining after renaming the Time column
df.head()

In [None]:
# Examine the shape of the DataFrame
df.shape

The Dataset has 1,048,575 rows and 43 columns

In [None]:
# Examing the info the DataFrame
df.info

In [None]:
df.head()

In [None]:
# Examine the missing values
df.isna().any()

In [None]:
# Checking to amount of missing value
df.isna().sum()

Looks like we have a lot of missing values in some of the columns

In [None]:
# Checking for duplicates
duplicates = df[df.duplicated()]


In [None]:
print(len(duplicates))

We have 1,593 duplicates

In [None]:
# Dropping the duplicates
df.drop_duplicates(keep='first',inplace=True) 

In [None]:
# Cross Checking the amount of duplicates
duplicates = df[df.duplicated()]
print(len(duplicates))

In [None]:
# Check for percentage of value counts for entries
for col in df.columns:
    print(col, '\n', df[col].value_counts(normalize=True).head(10), '\n\n')

In [None]:
# Replace place holders with nan
df.replace(['?',], np.nan, inplace=True)

In [None]:
#make list of var containing missing values
vars_with_na = [var for var in df.columns if df[var].isnull().sum()>1]

#print var name and % of missing values
for var in vars_with_na:
    print(var, np.round(df[var].isnull().mean(),3), '% missing values')

In [None]:
# drop columns that are not useful for the project analysis
drop_column = ['SeqID','Agency','Description','Location','Latitude','Longitude','Accident', 
               'Belts','Personal Injury', 'Property Damage','Fatal','Commercial License',
               'HAZMAT','Commercial Vehicle','Alcohol','Work Zone','Search Conducted',
               'Search Disposition','Search Outcome','Search Reason','Search Reason For Stop',
               'Search Type','Search Arrest Reason','VehicleType','Model','Color','Charge','Article',
               'Contributed To Accident','Arrest Type','Geolocation']
df.drop(drop_column, axis=1, inplace=True)

In [None]:
# Drop missing values
data = df.dropna()

In [None]:
# Examine data shape
data.shape

In [None]:
# examing if there is any missing values left
data.isna().sum()

In [None]:
#examine if there is any null values left
data.isnull().sum()

### Data Visualization and Analysis

In [None]:
# Examine the shape of the clean dataset
data.shape

The clean dataset has 1,039,614 rows and 12 columns

In [None]:
# converting 'Year Of  Stop' to datatime 
data['Year Of Stop'] = pd.DatetimeIndex(data['Date Of Stop']).year
data.head()

In [None]:
# Examine the top 15 states with most stops
pd.DataFrame(data['State'].value_counts()/len(data)).nlargest(15, columns = ['State'])*100

Looks like cars from Maryland make up 87.4% of the stops in montgomery county

In [None]:
# Visualize Stops by the months
month = []
for time_stamp in pd.to_datetime(data['Date Of Stop']):
    month.append(time_stamp.month)
m_count = pd.Series(month).value_counts()

plt.figure(figsize=(12,8))
sns.barplot(y=m_count.values, x=m_count.index, alpha=0.6)
plt.title("Number of Stops Each Month", fontsize=16)
plt.xlabel("Month", fontsize=16)
plt.ylabel("No. of cars", fontsize=16)
plt.show();

From our visualization, March has the highest stops followed closely by May then April

In [None]:
# examine the year model of the vehicles stopped
pd.DataFrame(data['Year'].value_counts()).nlargest(10, columns = ['Year'])

the 2006 cars are the most stopped followed closely by 2007 then 2005

In [None]:
# examine the number of stops based on Race
pd.DataFrame(data['Race'].value_counts()).nlargest(10, columns = ['Race'])

In [None]:
# Examine the number of stops based on Gender
pd.DataFrame(data['Gender'].value_counts()).nlargest(10, columns = ['Gender'])

The male gender is most stopped

In [None]:
# Visuals for the Race and Gender disparity
fig, ax = plt.subplots(1, 2, figsize=(16,8))

fig.subplots_adjust(hspace=0.5)

sns.countplot(data['Gender'], ax=ax[0], color='blue')
ax[0].set_title("Gender", fontsize=14)

sns.countplot(df['Race'], ax=ax[1], color='salmon')
ax[1].set_title("Race", fontsize=14)

sns.despine()


In [None]:
# Examine the city with the most stopped drivers
violation_county = pd.DataFrame(data['Driver City'].
value_counts()/len(data)).nlargest(10, columns = ['Driver City'])*100

violation_county

Silver Spring has the most drivers stopped followed by Gaithersburg then Germantown

In [None]:
#Check the data
data.head()

In [None]:
# Concatenate 'Data Of Stop' and 'Time' (separated by a space)
combined = data['Date Of Stop'].str.cat(data['Time'], sep = ' ')

# Convert 'combined' to datetime format
data['stop_datetime'] = pd.to_datetime(combined)

# Examine the data type of 'stop_datetime'
print(data.stop_datetime.dtype)

data.head()


In [None]:
# Set index to 'stop_datetime'
data.set_index('stop_datetime', inplace=True)

In [None]:
# Check the dataset
data.head()

In [None]:
# print index to make sure 
print(data.index)

In [None]:
#  index 'time_of_stops' by the hour
time_of_stops = data.groupby(data.index.hour).Time.count()

In [None]:
# Create a line plot of 'hourly_arrest_rate'
time_of_stops.plot(kind='bar', figsize=(16,8))

# Add the xlabel, ylabel, and title
plt.xlabel('Hour', fontsize=16)
plt.ylabel('Number of Stops', fontsize=16)
plt.title('Stops By the Hour', fontsize =20)

# Display the plot
plt.show()

In [None]:
# Count the unique values in 'violation'
violations = pd.DataFrame(data.Violation.value_counts())


# Express the counts as proportions
violation_perct = pd.DataFrame(data.Violation.value_counts(normalize = True))

In [None]:
# Count the unique values in 'violations' and print
violations = pd.DataFrame(data.Violation.value_counts())
print(violations)

print('----------------------')

# Express the counts as proportions and print
violation_perct = pd.DataFrame(data.Violation.value_counts(normalize = True))
print(violation_perct)

In [None]:
# plot 'violations'
violations.plot(kind='bar', color='green')

 As we can see moost of the stop end up with a citation

In [None]:
# count the number of stops for each precint
precints = pd.DataFrame(data.SubAgency.value_counts())
precints

In [None]:
# express in percentage the number of stop for each precint
precints_perct = pd.DataFrame(data.SubAgency.value_counts(normalize = True))*100
precints_perct

In [None]:
# Plot 'precints'
precints.plot(kind='bar', color='tan', figsize= (8,6))

### Conclusion

The Montgomery Police stops data analysis shows us alot about the department stops. From the data we are able to analyze alot.

* More than 87% of the vehicle stops werevehicles from Maryland, followed by Virginia with 4.6% then DC with 2.4%.


* More than 100,000 stops were made in the month of March and May. the least amount of stops made was in Decemeber.

* About 370,000 stop made were white, folloed by black with about 320,000 then Hispanics with about 230,000.

* More than 710,000 people stopped by the police were males and about 327,000 were females.

* About 24% of the people stopped were from Silver Spring while 10% are from Gaithersburg, 8.4% were from Germantown and 7.8% from Rockville.

* The most stopps happen between the hours of 10PM and 11PM followed by 8AM.

* About 69% of the stops were citations while 29% were warnings.

* about 24% of the stops were police from 4Th District, Wheaton, about 20% were from 3rd District, Silver Spring and 16% were from 2nd District, Bethesda.
