# Analysis of Racial Equity in San Diego Criminal Justice System

Jiayi Zhao & Sreetama Chowdhury

## Abstract 

The California Racial Justice Act (RJA) approved in 2020 which prohibits state prosecutors from racially discriminating against suspects, prosecutors are not allowed to prosecute suspects as long as they can prove that law enforcement officials discriminated against them; even if a suspect is convicted, a retrial is available. However, the RJA does not mandate the publication of aggregated data on racial disparity allegations, convictions, and sentences, as well as defining acceptable statistical methods to demonstrate racial disparity. This project is a preliminary search to find out what aspects of criminal justice data might be useful to those citing the CRJA. Accordingly, this project aims to seek and disclose data on the racial composition of charges, convictions, and sentences in San Diego County and its municipalities through the submission of Public Records Act (PRA) and Freedom of Information Act (FOIA) requests, as well as data analyzed using the Pandas python library. To some extent, this project will include a precise analysis on the traffic stop reason and resultant charges in San Diego neighborhoods from traffic stop data collected under the Racial and Identity Profiling Act of 2015 (RIPA) to investigate any statistical evidence of racial bias affecting the outcomes of different stop reasons and their resultant charges in different city neighborhoods of San Diego county.

## Table of Contents

1. ### [Background & Prior Work](#intro)
2. ### [Methods](#methods)
3. ### [References](#ref)

## Background & Prior Work <a id='intro'></a>

California Governor Gavin Newsom signed several laws after taking office in 2019 to promote racial equality and justice. Among them, The California Racial Justice Act (AB 2542), proposed by Assemblyman Ash Kalra in the South Bay constituency, was also signed by Newsom in September 2020, which would prohibit the use of race or ethnicity to seek or obtain a conviction or sentence. This new law prohibits state prosecutors from racially discriminating against suspects, prosecutors are not allowed to prosecute suspects as long as they can prove that law enforcement officials discriminated against them; even if a suspect is convicted, a retrial is available.

However, the only public race and ethnicity related data set is the traffic stop data under the Racial and Identity Profiling Act (RIPA) from the San Diego Police Department. There are researchers and journalists utilizing the traffic stop data to analyze whether there exists racial “bias affecting the outcomes of police stops and the resultant charges” [[2](#ref2)]. Both of the prior works by Greg Moran, et al and Sreetama Chowdhury conclude that there is no statistically significant evidence of bias. However, richer & whiter neighborhoods have lower stop rates than equivalent sized poorer & browner neighborhoods [[1](#ref1)]. 

The goal of this research is to extend both prior works that update the traffic stop database that Chowdhury constructed and seek for a more precise analysis on the traffic stop reason and resultant charges in San Diego neighborhoods to investigate whether the disproportionate stop rate among races will lead to a disproportionate arrest and citation rate. In addition, this project aims to seek and disclose more data on the racial composition of charges, convictions, and sentences in San Diego County and its municipalities through the submission of Public Records Act (PRA) and Freedom of Information Act (FOIA) requests, as well as data analyzed using the Pandas python library.

## Methods <a id='methods'></a>

All data used can be found & downloaded [here](https://data.sandiego.gov/datasets/police-ripa-stops/) and on linked pages.

In [None]:
#import + consolidate San Diego RIPA data into one massive df (result_df)

In [1]:
import pandas as pd   
import missingno as msno
import matplotlib.pyplot as plt
from astral import LocationInfo
from astral.geocoder import database, lookup
import seaborn as sns
import datetime
from datetime import date
from astral.sun import sun

In [None]:
# city = lookup("San Diego", database())

# def convert_dtype_float(x):
#     if not x:
#         return 0
#     try:
#         return float(x)   
#     except:        
#         return 0
    
# def convert_dtype_string(x):
#     if not x:
#         return ''
#     try:
#         return str(x)   
#     except:        
#         return ''
    
# convert_dict = {'date_stop': str}
    
# contraband_evid_df = pd.read_csv("data/ripa_contraband_evid_datasd.csv")                                
# disability_df = pd.read_csv("data/ripa_disability_datasd.csv")
# gender_df = pd.read_csv("data/ripa_gender_datasd.csv", converters = {"gender": convert_dtype_string})
# prop_seize_basis_df = pd.read_csv("data/ripa_prop_seize_basis_datasd.csv", converters = {"basisforpropertyseizure": convert_dtype_string})
# prop_seize_type_df = pd.read_csv("data/ripa_prop_seize_type_datasd.csv", converters = {"type_of_property_seized": convert_dtype_string})
# race_df = pd.read_csv("data/ripa_race_datasd.csv")
# stop_result_df = pd.read_csv("data/ripa_stop_result_datasd.csv")
# stop_reason_df = pd.read_csv("data/ripa_stop_reason_datasd.csv", converters = {"reason_for_stopcode": convert_dtype_float})
# stop_details_df = pd.read_csv("data/ripa_stops_datasd.csv", converters = {"land_mark": convert_dtype_string}) 
# stop_details_df = stop_details_df.astype(convert_dict)
# result_df = pd.merge(contraband_evid_df, disability_df, how="outer", on=["stop_id","pid"])
# result_df = pd.merge(result_df, gender_df, how="outer", on=["stop_id","pid"])
# result_df = pd.merge(result_df, prop_seize_basis_df, how="outer", on=["stop_id","pid"])
# result_df = pd.merge(result_df, prop_seize_type_df, how="outer", on=["stop_id","pid"])
# result_df = pd.merge(result_df, race_df, how="outer", on=["stop_id","pid"])
# result_df = pd.merge(result_df, stop_result_df, how="outer", on=["stop_id","pid"])
# result_df = pd.merge(result_df, stop_reason_df, how="outer", on=["stop_id","pid"])
# result_df = pd.merge(result_df, stop_details_df, how="outer", on=["stop_id","pid"])
# result_df['datetime_stop'] = pd.to_datetime(result_df['date_stop'] + ' ' + result_df['time_stop'])

In [None]:
# result_df.to_csv('df.csv')

In [2]:
result_df = pd.read_csv('df.csv')

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


### PART 1: PRELIMINARY DATA BREAKDOWNS & ANALYSIS

In [None]:
# missingno package allows for visualization of missing data within result_df

In [None]:
msno.bar(result_df)

In [None]:
# construct a column of the full address of stops

In [3]:
t = result_df[['intersection','address_block', 'land_mark', 'address_street', 'address_city']]
t['address_block'] = t['address_block'].astype('Int64').astype(str)
t = t.astype(str).replace('<NA>', '').replace('nan', '')
t['full_address'] = t.apply(lambda x: ' '.join(x), axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  t['address_block'] = t['address_block'].astype('Int64').astype(str)


In [5]:
# apply geocode to convert the full address into coordinates

In [6]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(timeout=10, user_agent = "myGeolocator")
t['geocodes'] = t['full_address'].head(5).apply(geolocator.geocode, country_codes='usa')

t['lat'] = t['geocodes'].head(5).apply(lambda x: x.latitude if x else None)
t['lon'] = t['geocodes'].head(5).apply(lambda x: x.longitude if x else None)

In [7]:
t = t.head(5)
t

Unnamed: 0,intersection,address_block,land_mark,address_street,address_city,full_address,geocodes,lat,lon
0,,700.0,,Grand Avenue,SAN DIEGO,700 Grand Avenue SAN DIEGO,"(700, Grand Avenue, Mission Beach, San Diego, ...",32.794545,-117.255888
1,I-5,,,NOBEL DRIVE,SAN DIEGO,I-5 NOBEL DRIVE SAN DIEGO,"(Nobel Drive, La Jolla Colony, University City...",32.868219,-117.220656
2,,4400.0,,59th Street,SAN DIEGO,4400 59th Street SAN DIEGO,"(4400, 59th Street, El Cerrito Heights, San Di...",32.758321,-117.07037
3,,4400.0,,59th Street,SAN DIEGO,4400 59th Street SAN DIEGO,"(4400, 59th Street, El Cerrito Heights, San Di...",32.758321,-117.07037
4,,4800.0,,NIAGARA AVE,SAN DIEGO,4800 NIAGARA AVE SAN DIEGO,"(4800, Niagara Avenue, Ocean Beach, San Diego,...",32.744019,-117.248383


In [8]:
encamp = ["2500 Sports Arena Blvd San Diego", "2600 Sports Arena Blvd San Diego", "2700 Sports Arena Blvd San Diego",
         "16th street San Diego", "17th street San Diego", "Imperial street San Diego",
          "Market street San Diego", "G street San Diego"]

dest = []
for x in encamp:
    lat = geolocator.geocode(x).latitude
    lon = geolocator.geocode(x).longitude
    dest.append([lon, lat])

dest

[[-117.197842, 32.747106],
 [-117.198277, 32.747372],
 [-117.2218506, 32.7563492],
 [-117.1494912, 32.7173509],
 [-117.148547, 32.7065843],
 [-117.1141481936995, 32.5640853],
 [-117.1500023, 32.7115382],
 [-117.1458286, 32.7126201]]

In [9]:
from route import *

df = t
for i in range(len(encamp)):
    df = commute_time(df, str(i), dest[i], 'foot-walking')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  result["To_{}_dist(km)".format(dest_name)] = dist
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  result["To_{}_time(min)".format(dest_name)] = [x / 60 for x in time]


In [10]:
df

Unnamed: 0,intersection,address_block,land_mark,address_street,address_city,full_address,geocodes,lat,lon,To_0_dist(km),...,To_3_dist(km),To_3_time(min),To_4_dist(km),To_4_time(min),To_5_dist(km),To_5_time(min),To_6_dist(km),To_6_time(min),To_7_dist(km),To_7_time(min)
0,,700.0,,Grand Avenue,SAN DIEGO,700 Grand Avenue SAN DIEGO,"(700, Grand Avenue, Mission Beach, San Diego, ...",32.794545,-117.255888,9.07,...,18.08,216.986667,19.05,228.625333,37.29,439.342333,18.69,224.220333,19.2,230.447167
1,I-5,,,NOBEL DRIVE,SAN DIEGO,I-5 NOBEL DRIVE SAN DIEGO,"(Nobel Drive, La Jolla Colony, University City...",32.868219,-117.220656,17.34,...,22.59,271.117333,23.83,285.9415,43.93,518.9765,23.15,277.7585,23.91,286.889833
2,,4400.0,,59th Street,SAN DIEGO,4400 59th Street SAN DIEGO,"(4400, 59th Street, El Cerrito Heights, San Di...",32.758321,-117.07037,14.3,...,11.75,141.052167,12.83,153.948333,28.63,343.521,12.52,150.182,11.91,142.882833
3,,4400.0,,59th Street,SAN DIEGO,4400 59th Street SAN DIEGO,"(4400, 59th Street, El Cerrito Heights, San Di...",32.758321,-117.07037,14.3,...,11.75,141.052167,12.83,153.948333,28.63,343.521,12.52,150.182,11.91,142.882833
4,,4800.0,,NIAGARA AVE,SAN DIEGO,4800 NIAGARA AVE SAN DIEGO,"(4800, Niagara Avenue, Ocean Beach, San Diego,...",32.744019,-117.248383,6.02,...,12.1,145.227,13.07,156.8655,31.31,367.582667,12.71,152.460667,13.22,158.687333


## References<a id='ref'></a>

1. <a id='ref1'></a> “Stopped by police in San Diego? A lot depends on where you live, drive”:https://www.sandiegouniontribune.com/news/watchdog/story/2022-01-16/stopped-by-police-in-san-diego-a-lot-depends-on-where-you-live-drive by Greg Moran, Lyndsay Winkley, Lauryn Schroeder, Cristina Byvik, and Michelle Gilchrist

2. <a id='ref2'></a>“The California Racial Justice Act & San Diego PD RIPA Data”:https://github.com/FleischerResearchLab/CRJA-analysis/blob/main/src/CRJA%20%2B%20SDPD%20Ripa%20Data.ipynb by Sreetama Chowdhury