In [1]:
pip install holidays

Collecting holidays
  Downloading holidays-0.30-py3-none-any.whl (736 kB)
     -------------------------------------- 736.9/736.9 kB 9.3 MB/s eta 0:00:00
Installing collected packages: holidays
Successfully installed holidays-0.30
Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd

# Load the data for 2021
file_path_2021 = "C:/Users/aharo/OneDrive/Documents/GitHub/Project-4---Chicago-Crime-Data/Data/DataChicago-Crime_2021.csv"
data_2021 = pd.read_csv(file_path_2021)

# Load the data for 2022
file_path_2022 = "C:/Users/aharo/OneDrive/Documents/GitHub/Project-4---Chicago-Crime-Data/Data/DataChicago-Crime_2022.csv"
data_2022 = pd.read_csv(file_path_2022)

# Display the first few rows of each dataset to get an overview
data_2021.head(), data_2022.head()

(         ID                    Date        Primary Type  \
 0  12259461  01/01/2021 01:00:00 AM             BATTERY   
 1  12258531  01/01/2021 01:00:00 AM   WEAPONS VIOLATION   
 2  12258697  01/01/2021 01:00:00 AM             BATTERY   
 3  13115666  01/01/2021 01:00:00 PM  DECEPTIVE PRACTICE   
 4  12259015  01/01/2021 01:00:00 AM     CRIMINAL DAMAGE   
 
                   Description         Location Description  Arrest  Domestic  \
 0                      SIMPLE                    RESIDENCE   False     False   
 1  RECKLESS FIREARM DISCHARGE  RESIDENCE - PORCH / HALLWAY   False     False   
 2     DOMESTIC BATTERY SIMPLE                HOTEL / MOTEL   False      True   
 3              COMPUTER FRAUD                    APARTMENT   False     False   
 4                 TO PROPERTY                    RESIDENCE   False     False   
 
    Beat  District  Ward   Latitude  Longitude  
 0   624       6.0   6.0  41.755932 -87.611409  
 1  1032      10.0  22.0  41.837774 -87.712169  
 2 

# Data Dictionary

- ID: Unique identifier for the crime event.


- Date: Date and time of the crime.


- Primary Type: The primary category of the crime.


- Description: Detailed description of the crime.


- Location Description: Description of the location where the crime occurred.


- Arrest: Indicator if an arrest was made (True/False).


- Domestic: Indicator if the crime was domestic (True/False).


- Beat: Police beat code.


- District: Police district code.


- Ward: Ward code.


- Latitude: Latitude coordinate of the crime location.


- Longitude: Longitude coordinate of the crime location.

# Topic 1: Comparing Police Districts 

We'll start by analyzing the police districts to determine which district has the most crimes and which has the least.

## Q1: Which district has the most crimes? Which has the least?

In [3]:
# Grouping the data by the 'District' column and counting the number of crimes in each district for both 2021 and 2022
district_crimes_2021 = data_2021['District'].value_counts()
district_crimes_2022 = data_2022['District'].value_counts()

# Finding the districts with the most and least crimes for both years
most_crimes_district_2021 = district_crimes_2021.idxmax()
least_crimes_district_2021 = district_crimes_2021.idxmin()
most_crimes_district_2022 = district_crimes_2022.idxmax()
least_crimes_district_2022 = district_crimes_2022.idxmin()

most_crimes_district_2021, least_crimes_district_2021, most_crimes_district_2022, least_crimes_district_2022

(11.0, 31.0, 8.0, 31.0)

Here are the results for Topic 1, comparing police districts:

For the year 2021:

- District with the most crimes: District 11
- District with the least crimes: District 31


For the year 2022:

- District with the most crimes: District 8
- District with the least crimes: District 31


It's interesting to note that District 31 has the least crimes in both years, while the district with the most crimes changed from 2021 to 2022.

# Topic 2: Crimes Across the Years

We'll analyze the total number of crimes and see if there are any individual crimes that are doing the opposite (e.g., decreasing when overall crime is increasing or vice-versa).

## Q2: Is the total number of crimes increasing or decreasing across the years? 

In [4]:
# Calculating the total number of crimes for both 2021 and 2022
total_crimes_2021 = len(data_2021)
total_crimes_2022 = len(data_2022)

# Finding the trend (increasing or decreasing) between 2021 and 2022
crime_trend = "increasing" if total_crimes_2022 > total_crimes_2021 else "decreasing"

total_crimes_2021, total_crimes_2022, crime_trend

(208763, 238742, 'increasing')

The analysis for Topic 2 reveals the following:

Total number of crimes in 2021: 208,763
Total number of crimes in 2022: 238,742
The total number of crimes is increasing from 2021 to 2022.

## Q2.1: Are there any individual crimes that are doing the opposite? 

In [5]:
# Calculating the total number of crimes for both 2021 and 2022
total_crimes_2021 = len(data_2021)
total_crimes_2022 = len(data_2022)

# Finding the trend (increasing or decreasing) between 2021 and 2022
crime_trend = "increasing" if total_crimes_2022 > total_crimes_2021 else "decreasing"

total_crimes_2021, total_crimes_2022, crime_trend

(208763, 238742, 'increasing')

The analysis for Topic 2 reveals the following:

- Total number of crimes in 2021: 208,763
- Total number of crimes in 2022: 238,742


The total number of crimes is increasing from 2021 to 2022.

## Q2.2: Are there any individual crimes that are doing the opposite?


To answer this question, we'll compare the counts of individual crime types between the two years to identify any that are showing a trend opposite to the overall increase.

In [6]:
# Grouping the data by 'Primary Type' and counting the number of occurrences for each crime type
crime_types_2021 = data_2021['Primary Type'].value_counts()
crime_types_2022 = data_2022['Primary Type'].value_counts()

# Combining the data for both years into a single DataFrame for comparison
crime_comparison = pd.concat([crime_types_2021, crime_types_2022], axis=1, keys=['2021', '2022']).fillna(0)

# Finding individual crimes that are doing the opposite of the overall trend (i.e., decreasing when overall is increasing)
opposite_trend_crimes = crime_comparison[(crime_comparison['2021'] > crime_comparison['2022'])]

opposite_trend_crimes

Unnamed: 0_level_0,2021,2022
Primary Type,Unnamed: 1_level_1,Unnamed: 2_level_1
DECEPTIVE PRACTICE,17354,16352
WEAPONS VIOLATION,8980,8766
NARCOTICS,5337,4716
OFFENSE INVOLVING CHILDREN,1915,1873
HOMICIDE,810,725
ARSON,529,422
OBSCENITY,51,49
GAMBLING,13,9


The analysis reveals individual crimes that showed a decreasing trend from 2021 to 2022, even though the overall number of crimes increased:

- Deceptive Practice: Decreased from 17,354 to 16,352
- Weapons Violation: Decreased from 8,980 to 8,766
- Narcotics: Decreased from 5,337 to 4,716
- Offense Involving Children: Decreased from 1,915 to 1,873
- Homicide: Decreased from 810 to 725
- Arson: Decreased from 529 to 422
- Obscenity: Decreased from 51 to 49
- Gambling: Decreased from 13 to 9


These individual crimes did not follow the overall increasing trend in the number of crimes from 2021 to 2022.

# Top 3: Comparing AM vs PM Rush Hour

## Q3: Are crimes more common during AM rush hour or PM rush hour?

We'll start by classifying the crimes into AM and PM rush hours based on the time they occurred. Then, we'll compare the counts to determine which rush hour has more crimes.

- AM rush hour: 7 AM - 10 AM
- PM rush hour: 4 PM - 7 PM


Let's begin with this analysis for both 2021 and 2022.

In [7]:
# Function to classify crimes into AM and PM rush hours
def classify_rush_hours(data):
    # Convert the 'Date' column to datetime
    data['Date'] = pd.to_datetime(data['Date'])
    
    # Extract the hour from the 'Date' column
    data['Hour'] = data['Date'].dt.hour
    
    # Classify crimes into AM and PM rush hours
    am_rush_hour = data[(data['Hour'] >= 7) & (data['Hour'] < 10)]
    pm_rush_hour = data[(data['Hour'] >= 16) & (data['Hour'] < 19)]
    
    return len(am_rush_hour), len(pm_rush_hour)

# Classify crimes for both 2021 and 2022
am_rush_hour_2021, pm_rush_hour_2021 = classify_rush_hours(data_2021)
am_rush_hour_2022, pm_rush_hour_2022 = classify_rush_hours(data_2022)

am_rush_hour_2021, pm_rush_hour_2021, am_rush_hour_2022, pm_rush_hour_2022

  data['Date'] = pd.to_datetime(data['Date'])


(21275, 31859, 24444, 37097)

Here are the results for Topic 3, comparing AM vs. PM rush hour crimes:

1. For the year 2021:

- AM rush hour crimes (7 AM - 10 AM): 21,275
- PM rush hour crimes (4 PM - 7 PM): 31,859


2. For the year 2022:

- AM rush hour crimes (7 AM - 10 AM): 24,444
- PM rush hour crimes (4 PM - 7 PM): 37,097


From this analysis, we can conclude that crimes are more common during PM rush hour (4 PM - 7 PM) in both 2021 and 2022.

## Q3.1: What are the top 5 most common crimes during AM rush hour? What are the top 5 most common crimes during PM rush hour?


Next, we'll identify the top 5 most common crimes during both AM and PM rush hours for the years 2021 and 2022. Let's proceed with this analysis.

In [8]:
# Function to find the top 5 most common crimes during AM and PM rush hours
def top_5_crimes(data):
    # AM rush hour crimes
    am_rush_hour = data[(data['Hour'] >= 7) & (data['Hour'] < 10)]
    top_5_am = am_rush_hour['Primary Type'].value_counts().head(5)

    # PM rush hour crimes
    pm_rush_hour = data[(data['Hour'] >= 16) & (data['Hour'] < 19)]
    top_5_pm = pm_rush_hour['Primary Type'].value_counts().head(5)

    return top_5_am, top_5_pm

# Find the top 5 most common crimes during AM and PM rush hours for both 2021 and 2022
top_5_am_2021, top_5_pm_2021 = top_5_crimes(data_2021)
top_5_am_2022, top_5_pm_2022 = top_5_crimes(data_2022)

top_5_am_2021, top_5_pm_2021, top_5_am_2022, top_5_pm_2022

(Primary Type
 THEFT                 4080
 BATTERY               3552
 DECEPTIVE PRACTICE    3047
 CRIMINAL DAMAGE       2407
 ASSAULT               2285
 Name: count, dtype: int64,
 Primary Type
 THEFT                 7492
 BATTERY               6049
 CRIMINAL DAMAGE       3627
 ASSAULT               3599
 DECEPTIVE PRACTICE    2176
 Name: count, dtype: int64,
 Primary Type
 THEFT                 5701
 BATTERY               3811
 CRIMINAL DAMAGE       2753
 DECEPTIVE PRACTICE    2601
 ASSAULT               2323
 Name: count, dtype: int64,
 Primary Type
 THEFT                  9790
 BATTERY                6116
 CRIMINAL DAMAGE        4061
 ASSAULT                3726
 MOTOR VEHICLE THEFT    3557
 Name: count, dtype: int64)

Here are the top 5 most common crimes during AM and PM rush hours:

For the year 2021:

- AM rush hour (7 AM - 10 AM):
1. Theft: 4,080
2. Battery: 3,552
3. Deceptive Practice: 3,047
4. Criminal Damage: 2,407
5. Assault: 2,285


- PM rush hour (4 PM - 7 PM):
1. Theft: 7,492
2. Battery: 6,049
3. Criminal Damage: 3,627
4. Assault: 3,599
5. Deceptive Practice: 2,176


For the year 2022:

- AM rush hour (7 AM - 10 AM):

1. Theft: 5,701
2. Battery: 3,811
3. Criminal Damage: 2,753
4. Deceptive Practice: 2,601
5. Assault: 2,323

- PM rush hour (4 PM - 7 PM):

1. Theft: 9,790
2. Battery: 6,116
3. Criminal Damage: 4,061
4. Assault: 3,726
5. Motor Vehicle Theft: 3,557


The top 5 crimes are similar in both rush hours but vary in counts.

## Q3.2: Are Motor Vehicle Thefts more common during AM rush hour or PM Rush Hour?

We'll proceed with the analysis to determine whether Motor Vehicle Thefts are more common during AM rush hour (7 AM - 10 AM) or PM rush hour (4 PM - 7 PM) for both 2021 and 2022.

In [9]:
# Function to compare Motor Vehicle Thefts during AM and PM rush hours
def motor_vehicle_thefts(data):
    # Filter for Motor Vehicle Thefts
    motor_vehicle_thefts_data = data[data['Primary Type'] == 'MOTOR VEHICLE THEFT']

    # AM rush hour crimes
    am_rush_hour = motor_vehicle_thefts_data[(motor_vehicle_thefts_data['Hour'] >= 7) & (motor_vehicle_thefts_data['Hour'] < 10)]

    # PM rush hour crimes
    pm_rush_hour = motor_vehicle_thefts_data[(motor_vehicle_thefts_data['Hour'] >= 16) & (motor_vehicle_thefts_data['Hour'] < 19)]

    return len(am_rush_hour), len(pm_rush_hour)

# Compare Motor Vehicle Thefts during AM and PM rush hours for both 2021 and 2022
motor_vehicle_thefts_2021_am, motor_vehicle_thefts_2021_pm = motor_vehicle_thefts(data_2021)
motor_vehicle_thefts_2022_am, motor_vehicle_thefts_2022_pm = motor_vehicle_thefts(data_2022)

motor_vehicle_thefts_2021_am, motor_vehicle_thefts_2021_pm, motor_vehicle_thefts_2022_am, motor_vehicle_thefts_2022_pm

(1039, 1741, 2174, 3557)

Here are the results for Motor Vehicle Thefts during AM and PM rush hours:

- For the year 2021:

1. AM rush hour (7 AM - 10 AM): 1,039
2. PM rush hour (4 PM - 7 PM): 1,741

- For the year 2022:

1. AM rush hour (7 AM - 10 AM): 2,174
2. PM rush hour (4 PM - 7 PM): 3,557


From this analysis, we can conclude that Motor Vehicle Thefts are more common during PM rush hour (4 PM - 7 PM) in both 2021 and 2022.

# Topic 4: Comparing Months 

We'll analyze which months have the most and least crime and see if there are any individual crimes that do not follow this pattern.

## Q4: What months have the most crime? What months have the least?


We'll start by grouping the crimes by month for both 2021 and 2022 to identify the months with the most and least crimes.

In [10]:
# Function to group crimes by month and find the months with the most and least crimes
def crimes_by_month(data):
    # Extract the month from the 'Date' column
    data['Month'] = data['Date'].dt.month
    
    # Group by month and count the crimes
    crimes_per_month = data['Month'].value_counts().sort_index()
    
    # Find the months with the most and least crimes
    most_crimes_month = crimes_per_month.idxmax()
    least_crimes_month = crimes_per_month.idxmin()

    return most_crimes_month, least_crimes_month, crimes_per_month

# Analyzing crimes by month for both 2021 and 2022
most_crimes_month_2021, least_crimes_month_2021, crimes_per_month_2021 = crimes_by_month(data_2021)
most_crimes_month_2022, least_crimes_month_2022, crimes_per_month_2022 = crimes_by_month(data_2022)

most_crimes_month_2021, least_crimes_month_2021, most_crimes_month_2022, least_crimes_month_2022

(10, 2, 10, 2)

The analysis for Topic 4 reveals the following:

- For the year 2021:

1. Month with the most crimes: October (10th month)
2. Month with the least crimes: February (2nd month)


- For the year 2022:

1. Month with the most crimes: October (10th month)
2. Month with the least crimes: February (2nd month)


It's interesting to observe that both years have the same months for the most and least crimes.

## Q4.1: Are there any individual crimes that do not follow this pattern? If so, which crimes?


Next, we'll analyze individual crimes by month to identify any that do not follow the pattern of most and least crimes in October and February, respectively.

In [11]:
# Function to find individual crimes that do not follow the pattern of most and least crimes in October and February
def individual_crimes_pattern(data):
    # Group by 'Primary Type' and 'Month', and count the occurrences
    crimes_by_type_month = data.groupby(['Primary Type', 'Month']).size().reset_index(name='Count')

    # Pivot the DataFrame to have months as columns
    crimes_by_type_month_pivot = crimes_by_type_month.pivot_table(index='Primary Type', columns='Month', values='Count', fill_value=0)

    # Find crimes that do not have the most occurrences in October (10) or least occurrences in February (2)
    opposite_pattern_crimes = crimes_by_type_month_pivot[(crimes_by_type_month_pivot.idxmax(axis=1) != 10) | (crimes_by_type_month_pivot.idxmin(axis=1) != 2)]

    return opposite_pattern_crimes

# Analyzing individual crimes that do not follow the pattern for both 2021 and 2022
opposite_pattern_crimes_2021 = individual_crimes_pattern(data_2021)
opposite_pattern_crimes_2022 = individual_crimes_pattern(data_2022)

opposite_pattern_crimes_2021, opposite_pattern_crimes_2022

(Month                                1     2     3     4     5     6     7   \
 Primary Type                                                                  
 ARSON                                39    23    49    41    49    49    51   
 ASSAULT                            1339  1284  1452  1601  1800  1905  1956   
 BATTERY                            2776  2439  3101  3077  3679  3845  3886   
 CONCEALED CARRY LICENSE VIOLATION    11    15    14    15    12    18    16   
 CRIMINAL DAMAGE                    1727  1457  1896  1948  2287  2317  2489   
 CRIMINAL SEXUAL ASSAULT              86    95   130   114   133   147   147   
 CRIMINAL TRESPASS                   254   214   279   271   272   279   308   
 DECEPTIVE PRACTICE                 2587  1627  1640  1225  1272  1500  1329   
 GAMBLING                              0     0     1     2     1     3     1   
 HOMICIDE                             55    39    45    55    66    84   112   
 HUMAN TRAFFICKING                     2

The analysis for Topic 4 has revealed that many individual crimes do not follow the pattern of having the most occurrences in October and the least in February.

Here are some examples for the year 2021:

- Arson: Most occurrences in July, least in February.
- Assault: Most occurrences in August, least in February.
- Battery: Most occurrences in July, least in February.
- Narcotics: Most occurrences in March, least in October.
- Theft: Most occurrences in August, least in February.


And many more. A similar pattern is observed for the year 2022.

This detailed analysis shows that while the overall trend indicates that crimes peak in October and dip in February, this pattern is not consistent across all types of crimes. Individual crimes may have their own unique trends.

# Topic 5: Comparing Holidays

## Q5: Are there any holidays that show an increase in the number of crimes?

We'll use the holidays package to get the holidays in Chicago (United States, Illinois) for both 2021 and 2022. Then we'll compare the crime rates on those holidays to see if there is any significant increase or decrease.

In [12]:
# Importing the holidays package
import holidays

# Function to analyze crimes on holidays
def crimes_on_holidays(data, year):
    # Getting the holidays for Chicago (United States, Illinois)
    us_holidays = holidays.US(years=year, observed=True, state='IL')
    
    # Marking the dates that are holidays
    data['Is_Holiday'] = data['Date'].dt.date.isin(us_holidays)
    
    # Grouping by 'Is_Holiday' and counting the occurrences
    crimes_by_holiday = data.groupby('Is_Holiday').size().reset_index(name='Count')
    
    # Getting the counts for holidays and non-holidays
    crimes_on_holidays = crimes_by_holiday.loc[crimes_by_holiday['Is_Holiday'], 'Count'].values[0]
    crimes_on_non_holidays = crimes_by_holiday.loc[~crimes_by_holiday['Is_Holiday'], 'Count'].values[0]

    # Calculating the average crimes per day for holidays and non-holidays
    avg_crimes_holidays = crimes_on_holidays / len(us_holidays)
    avg_crimes_non_holidays = crimes_on_non_holidays / (365 - len(us_holidays))

    return avg_crimes_holidays, avg_crimes_non_holidays

# Analyzing crimes on holidays for both 2021 and 2022
avg_crimes_holidays_2021, avg_crimes_non_holidays_2021 = crimes_on_holidays(data_2021, 2021)
avg_crimes_holidays_2022, avg_crimes_non_holidays_2022 = crimes_on_holidays(data_2022, 2022)

avg_crimes_holidays_2021, avg_crimes_non_holidays_2021, avg_crimes_holidays_2022, avg_crimes_non_holidays_2022

(570.7647058823529, 572.0114942528736, 630.8823529411765, 655.221264367816)

- For the year 2021:

1. Average crimes on holidays: 570.76
2. Average crimes on non-holidays: 572.01


- For the year 2022:

1. Average crimes on holidays: 630.88
2. Average crimes on non-holidays: 655.22


From this analysis, we can observe that there is no significant difference in the number of crimes on holidays compared to non-holidays for both years. In fact, the average number of crimes on non-holidays is slightly higher in both cases.

This finding suggests that holidays do not have a noticeable impact on the number of crimes in Chicago, at least for the years 2021 and 2022.