# Analysis of Terry Stops in Seattle, Washington, USA
***

<img src= "images/seattle-pd.jpg" width=700/> 

## Terry Traffic Stops
***

In Terry v. Ohio (Links to an external site.), a landmark Supreme Court case in 1967-8, the court found that a police officer was not in violation of the "unreasonable search and seizure" clause of the Fourth Amendment, even though he stopped and frisked a couple of suspects only because their behavior was suspicious. Thus was born the notion of "reasonable suspicion", according to which an agent of the police may e.g. temporarily detain a person, even in the absence of clearer evidence that would be required for full-blown arrests etc. **Terry Stops are stops made of suspicious drivers.**

We are looking to predict whether an arrest was made after a Terry Stop, given information about the presence of weapons, the time of day of the call, etc.

## Objectives
***
The objectives of this data analysis was to gain presepctive and understanding of the Terry Stops data. Using this data, I was able to predict arrests made after Terry Stops using Binary Classification. This analysis can give the Seattle Police Department and local Seattle government better insight into the demographics of these Terry Stops so that they can better their interactions with citizens of Seattle.

## Other Resources
***

## Data Understanding
***
#### Data column names and descriptions
* Subject Age Group: 10 year increments as reported by the officer
* Subject ID: Key, generated daily, identifying unique subjects
* GO / SC Num: "General Offense" or Street Check Number, relating the Terry Stop to the Parent Report
* Terry Stop ID: Key Identifying Terry Stop Reports Stop Resolution: Resolution of the Stop as reported by the officer
* Stop Resolution: Resolution of the stop as reported by the officer
* Weapon Type: Type of weapon, if any, identified during a search or frisk of the subject. Indicates "None" if no weapons was found.
* Officer ID: Unique key identifying officers in the dataset
* Officer YOB: Year of brith as reported by the officer
* Officer Gender: Gender of the Officer
* Officer Race: Race of the Officer
* Subject Percieved Race: Race of the subject as reported officer
* Subject Percieved Gender: Percieved gender as reported by the officer
* Reported Date: Date the Report was filed
* Reported Time: Time the stop was reported
* Initial Call Type: Initial classicifaction of the call as assigned by 911
* Final Call Type: Final classicifaction of the call as assigned by 911
* Call Type: How the call was recieved by the communication center
* Officer Squad: Functional sqaud assignment (not budget) of the officer as reported by the Data Analytics Platform (DAP)
* Arrest Flag: Indicator of whether or not a physical arrest was Made, of the subject, during the Terry Stop. Does not necessarily relfect a report of an arrest in Records Management System (RMS)
* Frisk Flag: Indicator of whether a frisk was conducted
* Sector: Sector of the address associated with the Computer Aided Dispatch (CAD) event. Not necessarily where the Terry Stop occurred
* Precinct: Precinct of the address assictaed with the CAD event. Not necessarily where the Terry Stop occurred
* Beat: Beat of the address associated with the underlying CAD event. Not nen=cessarily where the Terry Stop occurred*

This data was obtained from www.data.gov City of Seattle website. The data contains information recaring
This data represents records of police reported stops under Terry v. Ohio, 392 U.S. 1 (1968). Each row represents a unique stop.Each record contains perceived demographics of the subject, as reported by the officer making the stop and officer demographics as reported to the Seattle Police Department, for employment purposes. Where available, data elements from the associated Computer Aided Dispatch (CAD) event (e.g. Call Type, Initial Call Type, Final Call Type) are included. For this dataset, I am looking to predict arrests using features in the dataset.

## Exploratory Data Analysis
***

### Which Race is Stopped the Most?
<img src= "images/stops_by_race.png" /> 

#### Graphic Description
The graph above displays Terry Stops according to the subjects' race. The graph shows that White subjects are stopped the most, followed by black/African-American subjects. However, some subject demographics were missing from the data, so further data collection would be necessary.

***
### What demographic of officers perform the most stops?
<img src= "images/stops_by_off_race.png"/> 

#### Graphic Description
The graph above displays the Terry Stops officer race demographic. The officers who performed the most stops were white. However, some officer demographics were missing from the data, so further data collection would be necessary.

***
### Is there a Difference in Subject Ages across their Races?
<img src= "images/sub_ages_by_race.png"/> 

#### Graphic Description
The graph above shows the Terry Stops subjects' age-group, according to race. Across all races, the majority of subjects that were stops were within the 26-35 age group.

***
### What is the Distribution of Officer Ages?
<img src= "images/barplot_off_age.png"/> 
<img src= "images/distplot_off_age.png"/> 

#### Graphic Description
The graph above displays the Terry Stops officer ages.

***
### What Subject Age Groups are Most Stopped and are they Carry Weapons?
<img src= "images/age-weapons.png"/>

#### Graphic Description
The graph above displays the Terry Stops subjects' age groups and whether or not the subjects were carrying weapons.

***
### What Subject Races are Most Stopped and are they Carry Weapons?
<img src= "images/race-weapons.png"/>

#### Graphic Description
The graph above displays the Terry Stops subjects' races and whether or not the subjects were carrying weapons.

## Model and Model Performance
***

#### Model
<img src= "images/model.png" width=700/> 
#### Model Confusion Matrix
<img src= "images/conf_matrix.png" width=300/> 
#### Model Description

After cleaning the provided Terry Stops dataset, the data was split 75/25 using the sklearn.model_selection train_test_split package. The data containing continuous values has been normalized using sklearn.preprocessing StandardScaler() package. The data containing categorical values was one hot encoded. Using imblearn.over_sampling SMOTE() function, the imbalanced data was balanced by increasing the minority class. A custom classifier was created with sklearn's BaseEstimator with ClfSwitcher to pass any classifier and parameters for each classifer. This custom classifier was used along with a Pipeline and GridSearchCV. The classifiers used included: KNeighborsClassifier(), RandomForestClassifier(), AdaBoostClassifier(), and GradientBoostingClassifier().


GradientBoostingClassifier() had the best model performance. This classifier showed to have a training accuracy score of 0.945 and a testing accuracy score of 0.820. I found the most imoportant features to be 'frisk', 'officer_yob', 'stop_resolution_Arrest'.

## Conclusions

***
Questions to consider:
* What would you recommend the business do as a result of this work?
* What are some reasons why your analysis might not fully solve the business problem?
* What else could you do in the future to improve this project?
***