Build a classifier to predict whether an arrest was made after a Terry Stop, given information about the presence of weapons, the time of day of the call, etc. This is a binary classification problem.
Civil rights organizations: people who work to protect civil liberties and ensure fair treatment of individuals by law enforcement. They can use this classifier’s predictions to highlight potential biases or disproportionate targeting during the Terry Stops.
Law enforcement agencies: police departments, law enforcement agencies, particularly street crime units, and the narcotics units that are responsible for conducting Terry Stops. They could use the predictions to evaluate the effectiveness of their practices and identify areas for improvement in terms of increasing successful arrests or reducing false positives.
The general public: Anyone who is interested in civil rights and law enforcement practices. It could contribute to public discourse on policing methods and their impact on communities.
Data Scientists who are interested in working in law enforcement agencies.
The overall goal is to come up with a binary classification model solution and to provide the top 2 features in this model.
Terry Stops City of Seattle Open Data Portal
Description of Data: This data represents records of police-reported stops under Terry v. Ohio, 392 U.S. 1 (1968). Each row represents a unique stop. Each record contains the perceived demographics of the subject, as reported by the officer making the stop, and officer demographics as reported to the Seattle Police Department, for employment purposes.
The decision Tree Classifier Model has the best performance to predict the arrest rate after a Terry stop. It yields an F1 score of 0.89.
The top features in this model are the arrest flag and frisk flag.
To further improve my model, I will use gridsearchCV for hyperparameter tuning to run my models in order to get a better F1 score. I will also be looking into feature importance to see what are the important features of my model. Also, applying other models, such as random forest, K-nearest neighbor.