[Watch the YouTube Video](https://www.youtube.com/watch?v=8z5jsznoB0A) <p>

![Description of the graphic](graphic.jpg) <p>
[All Data availible here](https://opendata.minneapolismn.gov/)


## Description

This notebook will analyze the crime data to determine whether there was any period before April 28, 2023, where the homicide rate decreased by 43%. We will compare homicide data across different time intervals, using a combination of data visualization and statistical methods to identify any significant reductions in the rate. The analysis will focus on grouping relevant data and calculating percentage changes to verify the accuracy of the claim.


In [15]:
import pandas as pd
import plotly.express as px

# Step 1: Load the data
df_2022 = pd.read_csv('Police_Incidents_2022.csv')
df_2023 = pd.read_csv('Police_Incidents_2023.csv')

In [16]:
# This was useful to get a list of unique descriptions
unique_descriptions = sorted(df_2022['description'].unique())
print(unique_descriptions)


['1ST DEG DOMES ASLT', '2ND DEG DOMES ASLT', '3RD DEG DOMES ASLT', 'ACCESS/ALTER SYSTEM/NETWORK', 'ADULTERATION - HARM/ILLNESS', 'ADULTERATION - NO HARM', 'ARSON', 'ARSON-1ST DEGREE', 'ARSON-3RD DEGREE', 'ARSON-4TH DEGREE', 'ASLT-GREAT BODILY HM', 'ASLT-SGNFCNT BDLY HM', 'ASLT4-LESS THAN SUBST HARM', 'ASLT4-SUBST HARM OR WEAPON', 'ASSLT W/DNGRS WEAPON', 'AUTOMOBILE THEFT', 'BIKE THEFT', 'BURGLARY OF BUSINESS', 'BURGLARY OF DWELLING', 'CSC - PENETRATE WITH OBJECT', 'CSC - RAPE', 'CSC - SODOMY', 'DISARM A POLICE OFFICER', 'DOMESTIC ASSAULT/STRANGULATION', 'FAIL TO PAY - TAXI/HOTEL/REST', 'GAS STATION DRIV-OFF', 'HACKING - THEFT OF SERVICE', 'JUSTIFIABLE HOMICIDE', 'MURDER (GENERAL)', 'OBS - COMPUTER HACKING', 'OBS - PETTY THEFT', 'ONLINE THEFT', 'OTHER THEFT', 'OTHER VEHICLE THEFT', 'POCKET-PICKING', 'RAPE - VULNERABLE ADULT', 'ROBBERY INCLUDING AUTO THEFT', 'ROBBERY OF BUSINESS', 'ROBBERY OF PERSON', 'ROBBERY PER AGG', 'ROOFIE / DRUGS TO COMMIT CRIME', 'SCRAPPING-RECYCLING THEFT', 'SEX 

In [17]:
# This did a good job sorting the descriptions
unique_descriptions_2022 = pd.DataFrame(df_2022['description'].unique(), columns=['description'])
print(unique_descriptions_2022)


                       description
0                      OTHER THEFT
1             THEFT-MOTR VEH PARTS
2             THEFT FROM MOTR VEHC
3                       BIKE THEFT
4                      SHOPLIFTING
5             BURGLARY OF DWELLING
6                 THEFT BY SWINDLE
7             BURGLARY OF BUSINESS
8                ROBBERY OF PERSON
9                 AUTOMOBILE THEFT
10             THEFT FROM BUILDING
11                    ONLINE THEFT
12                 ROBBERY PER AGG
13   FAIL TO PAY - TAXI/HOTEL/REST
14             ROBBERY OF BUSINESS
15            ASSLT W/DNGRS WEAPON
16                      CSC - RAPE
17              2ND DEG DOMES ASLT
18    ROBBERY INCLUDING AUTO THEFT
19              3RD DEG DOMES ASLT
20            ASLT-SGNFCNT BDLY HM
21  DOMESTIC ASSAULT/STRANGULATION
22   THEFT FROM PERSON SNATCH/GRAB
23     TRESPASSED - BURG BUISINESS
24                MURDER (GENERAL)
25      ASLT4-LESS THAN SUBST HARM
26                           ARSON
27                  

In [19]:
# Let Pandas infer the date format automatically
df_2022['reportedDate'] = pd.to_datetime(df_2022['reportedDate'], errors='coerce')
df_2023['reportedDate'] = pd.to_datetime(df_2023['reportedDate'], errors='coerce')

# Add a new column "DayOfYear" based on the reportedDate column
df_2022['DayOfYear'] = df_2022['reportedDate'].dt.dayofyear
df_2023['DayOfYear'] = df_2023['reportedDate'].dt.dayofyear

# Display the updated DataFrames with 'DayOfYear'
print(df_2022[['reportedDate', 'DayOfYear']])
print(df_2023[['reportedDate', 'DayOfYear']])


      reportedDate  DayOfYear
0       2022-06-28        179
1       2022-06-28        179
2       2022-06-28        179
3       2022-06-28        179
4       2022-06-28        179
...            ...        ...
26521   2022-11-01        305
26522   2022-11-01        305
26523   2022-11-01        305
26524   2022-11-01        305
26525   2022-11-01        305

[26526 rows x 2 columns]
                   reportedDate  DayOfYear
0     2023-02-03 00:00:00+00:00         34
1     2023-02-13 00:00:00+00:00         44
2     2023-02-15 00:00:00+00:00         46
3     2023-02-28 00:00:00+00:00         59
4     2023-02-27 00:00:00+00:00         58
...                         ...        ...
26008 2023-12-27 00:00:00+00:00        361
26009 2023-12-27 00:00:00+00:00        361
26010 2023-12-27 00:00:00+00:00        361
26011 2023-12-27 00:00:00+00:00        361
26012 2023-10-22 00:00:00+00:00        295

[26013 rows x 2 columns]


  df_2022['reportedDate'] = pd.to_datetime(df_2022['reportedDate'], errors='coerce')


# This was the deprecated code

``` python
# Define the end date for filtering
end_date = '2022-04-28'

# Filter both DataFrames for 'MURDER (GENERAL)' and 'JUSTIFIABLE HOMICIDE', and reportedDate before April 28
filtered_df_2022 = df_2022[
    (df_2022['description'].isin(['MURDER (GENERAL)', 'JUSTIFIABLE HOMICIDE'])) &
    (df_2022['reportedDate'] <= end_date)
]

# Do the same for df_2023 with the corresponding end date for 2023
end_date_2023 = '2023-04-28'
filtered_df_2023 = df_2023[
    (df_2023['description'].isin(['MURDER (GENERAL)', 'JUSTIFIABLE HOMICIDE'])) &
    (df_2023['reportedDate'] <= end_date_2023)
]

# Add the "DayOfYear" column to both filtered DataFrames
filtered_df_2022['DayOfYear'] = filtered_df_2022['reportedDate'].dt.dayofyear
filtered_df_2023['DayOfYear'] = filtered_df_2023['reportedDate'].dt.dayofyear

# Display the filtered DataFrames with the "DayOfYear" column
print(filtered_df_2022[['reportedDate', 'description', 'DayOfYear']])
print(filtered_df_2023[['reportedDate', 'description', 'DayOfYear']])


In [21]:
# Define the end date for filtering
end_date = '2022-04-28'

# Filter both DataFrames for 'MURDER (GENERAL)' and 'JUSTIFIABLE HOMICIDE', and reportedDate before April 28
filtered_df_2022 = df_2022[
    (df_2022['description'].isin(['MURDER (GENERAL)', 'JUSTIFIABLE HOMICIDE'])) &
    (df_2022['reportedDate'] <= end_date)
]

# Do the same for df_2023 with the corresponding end date for 2023
end_date_2023 = '2023-04-28'
filtered_df_2023 = df_2023[
    (df_2023['description'].isin(['MURDER (GENERAL)', 'JUSTIFIABLE HOMICIDE'])) &
    (df_2023['reportedDate'] <= end_date_2023)
]

# Add the "DayOfYear" column using .loc to avoid the SettingWithCopyWarning
filtered_df_2022.loc[:, 'DayOfYear'] = filtered_df_2022['reportedDate'].dt.dayofyear
filtered_df_2023.loc[:, 'DayOfYear'] = filtered_df_2023['reportedDate'].dt.dayofyear

# Display the filtered DataFrames with the "DayOfYear" column
print(filtered_df_2022[['reportedDate', 'description', 'DayOfYear']])
print(filtered_df_2023[['reportedDate', 'description', 'DayOfYear']])


      reportedDate           description  DayOfYear
3040    2022-04-07      MURDER (GENERAL)         97
3083    2022-04-07      MURDER (GENERAL)         97
3441    2022-04-16      MURDER (GENERAL)        106
3497    2022-04-17      MURDER (GENERAL)        107
3662    2022-04-20      MURDER (GENERAL)        110
3723    2022-04-22      MURDER (GENERAL)        112
3785    2022-04-23      MURDER (GENERAL)        113
3819    2022-04-24      MURDER (GENERAL)        114
3823    2022-04-24      MURDER (GENERAL)        114
4025    2022-04-27      MURDER (GENERAL)        117
4084    2022-01-14      MURDER (GENERAL)         14
9806    2022-02-27  JUSTIFIABLE HOMICIDE         58
10272   2022-02-02  JUSTIFIABLE HOMICIDE         33
12938   2022-01-20      MURDER (GENERAL)         20
19557   2022-02-05      MURDER (GENERAL)         36
19789   2022-02-09      MURDER (GENERAL)         40
19861   2022-02-10      MURDER (GENERAL)         41
20122   2022-02-16      MURDER (GENERAL)         47
20437   2022

In [23]:
filtered_df_2022.info()
filtered_df_2023.info()

<class 'pandas.core.frame.DataFrame'>
Index: 28 entries, 3040 to 22165
Data columns (total 24 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   publicaddress      28 non-null     object        
 1   caseNumber         28 non-null     object        
 2   precinct           28 non-null     object        
 3   reportedDate       28 non-null     datetime64[ns]
 4   reportedTime       28 non-null     int64         
 5   beginDate          28 non-null     object        
 6   reportedDateTime   28 non-null     object        
 7   beginTime          28 non-null     int64         
 8   offense            28 non-null     object        
 9   description        28 non-null     object        
 10  UCRCode            28 non-null     float64       
 11  enteredDate        28 non-null     object        
 12  centergbsid        0 non-null      float64       
 13  centerLong         28 non-null     float64       
 14  centerLat  