# **Analyze the Influence of Crime Types on the Arrest Rate in Different Areas of Chicago.**

## Introduction ##

Crime is a complex and multifaceted issue that has significant implications for public safety and community well-being. In general, Crime could be divided into two types, violent and non-violent crime; furthermore, crimes like "Homicide", "Assault", and "Robbery" are considered violent, while crimes like "Theft" and "Property damage" are non-violent. The city of Chicago, like many urban centers, experiences a diverse range of criminal activities. Therefore, Chicago has been grappling with crime-related concerns, and understanding the factors influencing arrest rates can offer valuable insights into effective law enforcement strategies and community safety. This dataset reference is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system and hosted in Google BigQuery, which contains incidents of crime except murders that occurred in the city of Chicago from 2001 to 2024.

This project seeks to analyze the relationship between different crime types and areas of Chicago and arrest rate from 2002 to 2023. Also, we will explore the influence of some economic factors on different crime types in various regions, contributing to our understanding of the nuanced factors affecting arrest rates in Chicago. Throughout our analysis, we will employ statistical tools, including histograms, box plots, and line charts, to visually represent the relationships between crime types, locations, and arrest rates. These visualizations will provide valuable insights into patterns and trends within the dataset.

The outcome of this project is arrest rate and there are two independent variables, crime types and location. For the fist variable (X_1), there are 35 crime types in the dataset. And the second variable (X_2) is location which includes the latitude and longitude where the incident occured in Chicago. The type of crime in a given area can significantly affect law enforcement strategies, community safety perceptions, and arrest rates. In addition, factors such as socioeconomic conditions, population density, and community policing strategies may contribute to differences in arrest rates across locations. Hence, after this research, we could understand how different crime types and locations relate to arrest rates, and so learn more about Chicago's crime dynamics. However, it is essential to acknowledge the limitations inherent in this study, such as potential data constraints and external factors influencing crime dynamics.

## Data Cleaning and Loading

Before summarising, we need to clean our origional dataset first. In this section, we will drop some columns that are useless to our research, such as 'ICUR', 'Domestic', 'Beat', and 'Ward'. Moreover, in the year of 2001 and 2024, the number of incidents are much less than other years, so we decide to remove these two years data. Also, there are some incidents do not keep detailed records, so we need to clean those data to get a more complete dataset, and called 'chicago_crime_clean'. To keep data clean, we change the data of 'District' and 'Community Area' from float to integer.

In [10]:
import pandas as pd
import os

In [12]:
relative_path = os.path.join('..', 'Data', 'Chicago_Crime.csv')
abs_path = os.path.abspath(relative_path)

# Read the CSV file
chicago_crime = pd.read_csv(abs_path)

In [32]:
#drop missing value
chicago_crime_clean = chicago_crime.dropna()
#drop useless columns
chicago_crime_clean = chicago_crime_clean.drop(['Date','IUCR','Location Description','Domestic','Beat','Ward',
                                                'Block','FBI Code','X Coordinate','Y Coordinate','Updated On'], axis=1)
#drop year 2001 and 2024
chicago_crime_clean = chicago_crime_clean[(chicago_crime_clean['Year'] != 2001) & (chicago_crime_clean['Year'] != 2024)]
#change 'District' and ' Community Area' data into int
chicago_crime_clean['District'] = chicago_crime_clean['District'].astype(int)
chicago_crime_clean['Community Area'] = chicago_crime_clean['Community Area'].astype(int)
chicago_crime_clean.head()

Unnamed: 0,ID,Case Number,Primary Type,Description,Arrest,District,Community Area,Year,Latitude,Longitude,Location
11,12045583,JD226426,THEFT,$500 AND UNDER,False,2,35,2020,41.830482,-87.621752,"(41.830481843, -87.621751752)"
12,12031001,JD209965,BATTERY,SIMPLE,True,9,60,2020,41.83631,-87.639624,"(41.836310224, -87.639624112)"
13,12093529,JD282112,ASSAULT,AGGRAVATED - HANDGUN,True,4,46,2020,41.74761,-87.549179,"(41.747609555, -87.549179329)"
14,12178140,JD381597,BATTERY,SIMPLE,False,7,67,2020,41.774878,-87.671375,"(41.77487752, -87.671374872)"
15,4144897,HL474854,BATTERY,AGGRAVATED: OTHER DANG WEAPON,False,7,68,2005,41.781003,-87.652107,"(41.781002663, -87.652107119)"


In [34]:
chicago_crime_clean.shape

(7268159, 11)

## Summary Statistics Tables

In [36]:
crime_type = pd.DataFrame(
    chicago_crime_clean.groupby('Primary Type').size().sort_values(ascending=False).rename('counts').reset_index())
crime_type

Unnamed: 0,Primary Type,counts
0,THEFT,1540384
1,BATTERY,1331644
2,CRIMINAL DAMAGE,833588
3,NARCOTICS,672582
4,ASSAULT,482086
5,OTHER OFFENSE,451066
6,BURGLARY,395515
7,MOTOR VEHICLE THEFT,359338
8,DECEPTIVE PRACTICE,313534
9,ROBBERY,276283


In [37]:
crime_counts = chicago_crime_clean.groupby('Primary Type').size().reset_index(name='Crime Count')

# Group by 'Primary Type' and calculate the binary arrest rate (true/false)
arrest_rate = chicago_crime_clean.groupby('Primary Type')['Arrest'].any().reset_index(name='Arrested')
crime_arrest_relation = pd.merge(crime_counts, arrest_rate, on='Primary Type')
crime_arrest_relation

Unnamed: 0,Primary Type,Crime Count,Arrested
0,ARSON,12239,True
1,ASSAULT,482086,True
2,BATTERY,1331644,True
3,BURGLARY,395515,True
4,CONCEALED CARRY LICENSE VIOLATION,1216,True
5,CRIM SEXUAL ASSAULT,23914,True
6,CRIMINAL DAMAGE,833588,True
7,CRIMINAL SEXUAL ASSAULT,7124,True
8,CRIMINAL TRESPASS,199238,True
9,DECEPTIVE PRACTICE,313534,True


## Plots, Histograms, Figures

## Coclusion