# Terry Stop Legal Analysis and Prediction

## Project Overview

The goal of this project is to build a machine learning model that predicts whether an arrest was made following a Terry Stop. Using features such as the presence of weapons, the time of day, and other contextual factors recorded during the stop, the model classifies each case as either resulting in an arrest or not.
This is a binary classification problem.

## Business Understanding

By analyzing this data, the project aims to:
- Understand which factors most influence the likelihood of an arrest.
- Explore patterns or potential biases in Terry Stops.
- Demonstrate the use of supervised learning techniques in a real-world legal and social context.


Include stakeholder and key business questions

## Data Understanding
Source of data : This data is from the Seattle Police Department

In [1]:
#Importing the needed Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler 
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_curve, roc_auc_score

In [2]:
#Load the data
df = pd.read_csv("Terry_Stops_20250507.csv")

In [3]:
#Look at the dataset
df.head(5)

Unnamed: 0,Subject Age Group,Subject ID,GO / SC Num,Terry Stop ID,Stop Resolution,Weapon Type,Officer ID,Officer YOB,Officer Gender,Officer Race,...,Reported Time,Initial Call Type,Final Call Type,Call Type,Officer Squad,Arrest Flag,Frisk Flag,Precinct,Sector,Beat
0,26 - 35,9770358745,20190000313099,9770376049,Field Contact,-,5653,1967,M,Black or African American,...,00:25:27.0000000,SHOPLIFT - THEFT,SUSPICIOUS CIRCUM. - SUSPICIOUS PERSON,ONVIEW,SOUTH PCT 2ND W - ROBERT - PLATOON 2,N,N,-,-,-
1,26 - 35,-1,20160000282794,180985,Arrest,Handgun,6355,1970,F,White,...,07:09:00.0000000,DISTURBANCE,NARCOTICS - OTHER,911,EAST PCT 1ST W - E/G RELIEF (CHARLIE),N,N,East,C,C3
2,26 - 35,-1,20180000002480,438562,Field Contact,,7564,1979,M,Declined to Answer,...,13:50:00.0000000,-,-,-,WEST PCT 2ND W - DAVID BEATS,N,N,-,-,-
3,36 - 45,-1,20180000065356,392012,Offense Report,,7514,1987,M,White,...,02:09:00.0000000,ROBBERY - CRITICAL (INCLUDES STRONG ARM),ROBBERY - ARMED,911,NORTH PCT 3RD W - BOY (JOHN) - PLATOON 1,N,N,North,L,L2
4,56 and Above,-1,20170000004325,309626,Field Contact,,6783,1976,M,White,...,15:00:00.0000000,-,-,-,EAST PCT 2ND W - GEORGE - PLATOON 2,N,N,-,-,-


In [4]:
#Number of rows and columns
df.shape

(63462, 23)

In [5]:
#Overview of columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63462 entries, 0 to 63461
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Subject Age Group         63462 non-null  object
 1   Subject ID                63462 non-null  int64 
 2   GO / SC Num               63462 non-null  int64 
 3   Terry Stop ID             63462 non-null  int64 
 4   Stop Resolution           63462 non-null  object
 5   Weapon Type               30897 non-null  object
 6   Officer ID                63462 non-null  object
 7   Officer YOB               63462 non-null  int64 
 8   Officer Gender            63462 non-null  object
 9   Officer Race              63462 non-null  object
 10  Subject Perceived Race    63462 non-null  object
 11  Subject Perceived Gender  63462 non-null  object
 12  Reported Date             63462 non-null  object
 13  Reported Time             63462 non-null  object
 14  Initial Call Type     

Description of data: The dataset contains 5,000 records with 8 categorical features, such as 'Arrest Made' (Yes/No). Most data types are objects, with only one date at  Officer YOB and three other IDs.

## Data Preparation
First, I will check for Nulls, duplicate rows and change the Officer YOB type in
to a date and the ID columns into strings to help with filtering and parsing.


In [7]:
#Finding Nulls
df.isna().sum()

Subject Age Group               0
Subject ID                      0
GO / SC Num                     0
Terry Stop ID                   0
Stop Resolution                 0
Weapon Type                 32565
Officer ID                      0
Officer YOB                     0
Officer Gender                  0
Officer Race                    0
Subject Perceived Race          0
Subject Perceived Gender        0
Reported Date                   0
Reported Time                   0
Initial Call Type               0
Final Call Type                 0
Call Type                       0
Officer Squad                 561
Arrest Flag                     0
Frisk Flag                      0
Precinct                        0
Sector                          0
Beat                            0
dtype: int64

In [8]:
#Looking for duplicate rows
df.duplicated().value_counts()

False    63462
Name: count, dtype: int64

In [9]:
#Convert data types
df['Officer YOB'] = pd.to_datetime(df['Officer YOB'], format='%Y')
df[['Subject ID', 'GO / SC Num' ,'Terry Stop ID']] = df[['Subject ID', 'GO / SC Num' ,'Terry Stop ID']].astype(str)

In [10]:
#Overview of updated columns
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63462 entries, 0 to 63461
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Subject Age Group         63462 non-null  object        
 1   Subject ID                63462 non-null  object        
 2   GO / SC Num               63462 non-null  object        
 3   Terry Stop ID             63462 non-null  object        
 4   Stop Resolution           63462 non-null  object        
 5   Weapon Type               30897 non-null  object        
 6   Officer ID                63462 non-null  object        
 7   Officer YOB               63462 non-null  datetime64[ns]
 8   Officer Gender            63462 non-null  object        
 9   Officer Race              63462 non-null  object        
 10  Subject Perceived Race    63462 non-null  object        
 11  Subject Perceived Gender  63462 non-null  object        
 12  Reported Date     

## Modeling

## Evaluation
Summary of conclusions including three relevant findings