<a href="https://colab.research.google.com/github/alexwells-22/storytelling-with-data/blob/master/data-stories/vt-traffic-stops/vt-traffic-stops.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

Run this code so that the rest of the notebook works!

In [None]:
import numpy as np
import pandas as pd

import plotly # pip: plotly==4.14.3
import plotly.express as px
import plotly.graph_objects as go

# Project team

Vermont 2020 Traffic Stops is a data story created by Alex Wells (on github as alexwells-22).

# Background and overview

Because I was pulled over for the first time in my life last December, I thought it would be interesting to examine who gets pulled over the most (i.e. - is overpoliced). I know that men tend to be considered riskier than women, so I wanted to back that assumption up with some kind of data and I thought it would be interesting to see what the age and race distributions could say about policing patterns.

The data story can be accessed on [Panopto](https://dartmouth.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0015daa9-93cb-4363-84ec-ae5a0018951a)

Other information used in telling this data story can be found at https://governor.vermont.gov/covid19response

# Approach

This data story was created using bar and scatterplots to understand who is pulled over the most and when that happens.

# Quick summary

This data story generally confirmed assumptions about the data it concerned and also suggests that lockdown measures are effective at limiting the number of traffic stops.



# Data

Data for this story can be downloaded directly from the [VT Department of Public Safety website](https://vsp.vermont.gov/communityaffairs/trafficstops) or from [github](https://github.com/alexwells-22/storytelling-with-data/blob/master/data-stories/vt-traffic-stops/2020Trafficstops.xlsx?raw=true) for a ready-to-use version.

In [None]:
data = pd.read_excel('https://github.com/alexwells-22/storytelling-with-data/blob/master/data-stories/vt-traffic-stops/2020Trafficstops.xlsx?raw=true')

In [None]:
# vt_race = pd.read_html('https://datausa.io/api/data?Geography=04000US50&drilldowns=Race,Ethnicity&measures=Hispanic%20Population,Hispanic%20Population%20Moe')
# vt_race.head()

# Analysis

Briefly describe each step of your analysis, followed by the code implementing that part of the analysis and/or producing the relevant figures.  (Copy this text block and the following code block as many times as are needed.)

In [None]:
np.unique(data['Gender'])

array(['   ', 'F  ', 'M  ', 'U  ', 'X  '], dtype=object)

In [None]:
Gender_data = data
Gender_data.dropna(inplace = True)
Gender_data = Gender_data[Gender_data['Gender'].str.contains('F|M|U|X')==True]
np.unique(Gender_data['Gender'])

array(['F  ', 'M  ', 'U  ', 'X  '], dtype=object)

In [None]:
Gender_data.head()

Unnamed: 0,Profile Number,Agency Involved,Agency Name,Stop Date and Time,Street Address of Stop,City of Stop,Reason for Stop 1,Reason for Stop 2,Reason for Stop 3,Reason for Stop 4,...,Outcome3,Outcome4,Search Outcome 1,Search Outcome 2,Search Outcome 3,Search Outcome 4,Gender,Driver Race,Driver Description,Age Range
0,803548.0,SPB1,WESTMINSTER VSP,2020-01-01 00:01:00,VT ROUTE 131 & US ROUTE 5,WET,V,,,,...,,,X,,,,M,W,White,34.0
1,H713829,SPB4,RUTLAND VSP,2020-01-01 00:08:00,NORTH MAIN STREET,RUC,M,,,,...,,,X,,,,F,W,White,43.0
2,808440.0,SPB2,ROYALTON VSP,2020-01-01 00:10:00,North Rd & VT Rt 107,BET,M,,,,...,,,X,,,,F,W,White,28.0
3,801885.0,SPA4,ST JOHNSBURY VSP,2020-01-01 00:14:00,College Rd; Lower Campus Rd,LYN,E,,,,...,,,X,,,,M,W,White,18.0
4,H714298,SPA3,MIDDLESEX VSP,2020-01-01 00:15:00,ROUTE 107 / I89,ROY,V,,,,...,,,X,,,,M,W,White,24.0


In [None]:
fig = px.histogram(Gender_data, x='Gender',labels={"F ": "Female", "M ": "Male", "X ":"Other", "U ":"Unknown"})

# fig.layout.xaxis.tickvals = ["Female", "Male", "Other", "Unknown"]

fig.show()

In [None]:
fig = px.histogram(data, x='Driver Race')

fig

Now, on to explore how age is involved in traffic stops!

In [None]:
# Age dataframe abbreviated "ad"
ad = data
ad.dropna(inplace = True)

In [None]:
fig = px.scatter(ad, x='Stop Date and Time', y = "Age Range")
fig

And now, the time of day to look out for!

In [None]:
td = data
td['Date_time'] = td['Stop Date and Time'].astype('datetime64[ns]')
td.head()
times = (24*td.Date_time.dt.hour + td.Date_time.dt.minute) * 60 + td.Date_time.dt.second
td['Times'] = times

fig = px.scatter(td, x="Stop Date and Time", y='Times', labels={'Stop Date and Time':"Date", 'Times':'Time of Day (seconds after midnight)'})
fig.update_yaxes(autorange="reversed")

fig

In [None]:
td = data
td['Date_time'] = td['Stop Date and Time'].astype('datetime64[ns]')
td.head()
times = (24*td.Date_time.dt.hour + td.Date_time.dt.minute) * 60 + td.Date_time.dt.second
td['Times'] = times

fig = px.scatter(td, x="Stop Date and Time", y='Times', labels={'Stop Date and Time':"Date", 'Times':'Time of Day (seconds after midnight)'}, animation_frame=td.Date_time.dt.month)
fig.update_yaxes(autorange="reversed")

# Commented out code is for future exploration that would show an animation by month. Code used from stack overflow: https://stackoverflow.com/questions/69584171/is-there-a-way-to-dynamically-change-a-plotly-animation-axis-scale-per-frame

# go.Figure(
#     data=fig.data,
#     frames=[
#         fr.update(
#             layout={
#                 "xaxis": {"range": [min(fr.data[0].x) - 0.1, max(fr.data[0].x) + 0.1]},
#                 "yaxis": {"range": [min(fr.data[0].y) - 0.1, max(fr.data[0].y) + 0.1]},
#             }
#         )
#         for fr in fig.frames
#     ],
#     layout=fig.layout,
# )

fig

In [None]:
fig = px.histogram(td, x=td.Date_time.dt.month, labels={'x':'Month'})

fig

# Interpretations and conclusions

This data story did not produce the conclusions that I might have expected. While data exists to make more concrete assertions about the fairness of policing practices in Vermont, the connection between COVID restrictions and the number of traffic stops was beautifully apparent and was worth exploring. This data story has room for expansion in many directions detailed below.

# Future directions

This project could be continued by increasing the temporal resolution of some of the graphs to make connections between COVID restrictions and traffic stops more clear. Additional tweaks to the project could include coding data sets by color to indicate gender and/or ethnicity of the drivers stopped. The number of stops could also be correlated with overall COVID case numbers in Vermont, which could also be time-logged with the additions and removals of restrictions.