# Crime Peaks Time Period Analysis

## Summary

## Introduction

Law enforcement agencies globally prioritize crime prevention and public safety. Effective policing requires strategically allocating resources to prevent crime and respond promptly to incidents (Perry et al., 2013). In the past, police departments have depended on experience and intuition to determine patrol schedules and allocate resources. Advancements in data analysis and predictive modeling allow law enforcement organizations to take a more data-driven strategy. Some magazines stated that usually, many police incidents take place during the day but more violent incidents happen at night (Security, 2019).

Therefore, this analysis seeks to investigate the correlation between crime rates and various periods of the day (morning, afternoon, evening, night, and late night) through the analysis of 2023 police reports in San Francisco. The goal is to recognize trends that may suggest if certain time periods are linked to increased or decreased crime rates (Braga and Weisburd, 2010). This information could greatly improve police patrols by helping law enforcement organizations distribute resources wisely, promoting a proactive approach to crime prevention. 

Comprehending the time-related patterns of criminal behavior is essential for creating specific treatments that can decrease crime rates and enhance community security. This study aims to categorize time periods as having either high or low crime rates by utilizing 2023 police reports data. The goal is to provide a forecasting tool that can assist in decision-making for police patrol scheduling and resource allocation. This method has the capacity to enhance law enforcement activities and enhance public safety and security.

Our research aims to connect conventional policing approaches with current, data-driven methodologies to offer practical insights that help improve the efficiency and efficacy of crime prevention efforts.


**Research Question**: Given 2023 San Francisco police reports, can we predict likelihood of the criminal incidents based on time period, day of week, and police district?

### Define time period & crime rate!!
**Time Periods**:
* Morning: 6:00am to 12:00pm
* Afternoon: 12:01pm to 6:00pm
* Evening: 6:01pm to 9:00pm
* Night: 9:01pm to 12:00am
* Late Night: 00:01am to 6:

**Data Set**: The data set is from the San Francisco Police Department's(SFPD) Incident Report Dataset. The original link to the website is as follows: *https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-2018-to-Present/wg3w-h783/about_data.* This original data set contains incident reports from 2018 to the present, but the link downloading the data set provides data contains 2023-specific data. These incident reports are filled by self-reported members or by officers using SFPD's online reporting system. The data set contains 27 columns of useful information related to each specific incident. 








## Methods & Results 

### Load Data From the Web

In [26]:
import pandas as pd
import numpy as np

In [27]:
url="https://data.sfgov.org/resource/wg3w-h783.csv?$query=SELECT%0A%20%20%60incident_datetime%60%2C%0A%20%20%60incident_date%60%2C%0A%20%20%60incident_time%60%2C%0A%20%20%60incident_year%60%2C%0A%20%20%60incident_day_of_week%60%2C%0A%20%20%60report_datetime%60%2C%0A%20%20%60row_id%60%2C%0A%20%20%60incident_id%60%2C%0A%20%20%60incident_number%60%2C%0A%20%20%60cad_number%60%2C%0A%20%20%60report_type_code%60%2C%0A%20%20%60report_type_description%60%2C%0A%20%20%60filed_online%60%2C%0A%20%20%60incident_code%60%2C%0A%20%20%60incident_category%60%2C%0A%20%20%60incident_subcategory%60%2C%0A%20%20%60incident_description%60%2C%0A%20%20%60resolution%60%2C%0A%20%20%60intersection%60%2C%0A%20%20%60cnn%60%2C%0A%20%20%60police_district%60%2C%0A%20%20%60analysis_neighborhood%60%2C%0A%20%20%60supervisor_district%60%2C%0A%20%20%60supervisor_district_2012%60%2C%0A%20%20%60latitude%60%2C%0A%20%20%60longitude%60%2C%0A%20%20%60point%60%2C%0A%20%20%60%3A%40computed_region_jwn9_ihcz%60%2C%0A%20%20%60%3A%40computed_region_jg9y_a9du%60%2C%0A%20%20%60%3A%40computed_region_h4ep_8xdi%60%2C%0A%20%20%60%3A%40computed_region_n4xg_c4py%60%2C%0A%20%20%60%3A%40computed_region_nqbw_i6c3%60%2C%0A%20%20%60%3A%40computed_region_viu7_rrfi%60%2C%0A%20%20%60%3A%40computed_region_26cr_cadq%60%2C%0A%20%20%60%3A%40computed_region_qgnn_b9vv%60%0AWHERE%20caseless_one_of(%60incident_year%60%2C%20%222023%22)"
raw_data=pd.read_csv(url,parse_dates=['incident_datetime']) # parse date time
raw_data.head()

Unnamed: 0,incident_datetime,incident_date,incident_time,incident_year,incident_day_of_week,report_datetime,row_id,incident_id,incident_number,cad_number,...,longitude,point,:@computed_region_jwn9_ihcz,:@computed_region_jg9y_a9du,:@computed_region_h4ep_8xdi,:@computed_region_n4xg_c4py,:@computed_region_nqbw_i6c3,:@computed_region_viu7_rrfi,:@computed_region_26cr_cadq,:@computed_region_qgnn_b9vv
0,2023-03-13 23:41:00,2023-03-13T00:00:00.000,23:41,2023,Monday,2023-03-13T23:41:00.000,125373607041,1253736,230167874,,...,,,,,,,,,,
1,2023-03-01 05:02:00,2023-03-01T00:00:00.000,05:02,2023,Wednesday,2023-03-11T15:40:00.000,125379506374,1253795,236046151,,...,,,,,,,,,,
2,2023-03-13 13:16:00,2023-03-13T00:00:00.000,13:16,2023,Monday,2023-03-13T13:17:00.000,125357107041,1253571,220343896,,...,,,,,,,,,,
3,2023-03-13 10:59:00,2023-03-13T00:00:00.000,10:59,2023,Monday,2023-03-13T11:00:00.000,125355107041,1253551,230174885,,...,,,,,,,,,,
4,2023-03-14 18:44:00,2023-03-14T00:00:00.000,18:44,2023,Tuesday,2023-03-14T18:45:00.000,125402407041,1254024,230176728,,...,,,,,,,,,,


### Save the original dataset into folder

In [28]:
## c.to_csv('data/raw_data_file.csv')

### Wrangle and Clean the Data

In [29]:
# keep necessary columns
data = raw_data[['incident_datetime','incident_time','incident_day_of_week','incident_category','incident_subcategory','police_district']]
data.head()

Unnamed: 0,incident_datetime,incident_time,incident_day_of_week,incident_category,incident_subcategory,police_district
0,2023-03-13 23:41:00,23:41,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF
1,2023-03-01 05:02:00,05:02,Wednesday,Larceny Theft,Larceny Theft - Other,Mission
2,2023-03-13 13:16:00,13:16,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF
3,2023-03-13 10:59:00,10:59,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF
4,2023-03-14 18:44:00,18:44,Tuesday,Recovered Vehicle,Recovered Vehicle,Out of SF


In [30]:
# create a month column
data["month"] = data["incident_datetime"].dt.month
data['hour'] = data['incident_datetime'].dt.hour
data['minute'] = data['incident_datetime'].dt.minute
data.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["month"] = data["incident_datetime"].dt.month
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['hour'] = data['incident_datetime'].dt.hour
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['minute'] = data['incident_datetime'].dt.minute


Unnamed: 0,incident_datetime,incident_time,incident_day_of_week,incident_category,incident_subcategory,police_district,month,hour,minute
0,2023-03-13 23:41:00,23:41,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,23,41
1,2023-03-01 05:02:00,05:02,Wednesday,Larceny Theft,Larceny Theft - Other,Mission,3,5,2
2,2023-03-13 13:16:00,13:16,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,13,16
3,2023-03-13 10:59:00,10:59,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,10,59
4,2023-03-14 18:44:00,18:44,Tuesday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,18,44


### Add a column identifying the time period

In [31]:
# differentiate time periods
def get_time_period(hour, minute):
    if 0 < hour < 6 or (hour == 6 and minute == 0):
        return 'Late Night'
    elif 6 < hour < 12 or (hour == 6 and minute > 0) or (hour == 12 and minute == 0):
        return 'Morning'
    elif 12 < hour < 18 or (hour == 12 and minute > 0) or (hour == 18 and minute == 0):
        return 'Afternoon'
    elif 18 < hour < 21 or (hour == 18 and minute > 0) or (hour == 21 and minute == 0):
        return 'Evening'
    else:
        return 'Night'

In [32]:
data['time_period'] = data.apply(lambda row: get_time_period(row['hour'], row['minute']), axis=1)
data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['time_period'] = data.apply(lambda row: get_time_period(row['hour'], row['minute']), axis=1)


Unnamed: 0,incident_datetime,incident_time,incident_day_of_week,incident_category,incident_subcategory,police_district,month,hour,minute,time_period
0,2023-03-13 23:41:00,23:41,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,23,41,Night
1,2023-03-01 05:02:00,05:02,Wednesday,Larceny Theft,Larceny Theft - Other,Mission,3,5,2,Late Night
2,2023-03-13 13:16:00,13:16,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,13,16,Afternoon
3,2023-03-13 10:59:00,10:59,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,10,59,Morning
4,2023-03-14 18:44:00,18:44,Tuesday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,18,44,Evening
...,...,...,...,...,...,...,...,...,...,...
995,2023-03-24 00:00:00,00:00,Friday,Malicious Mischief,Vandalism,Taraval,3,0,0,Night
996,2023-03-24 11:00:00,11:00,Friday,Recovered Vehicle,Recovered Vehicle,Park,3,11,0,Morning
997,2023-03-24 13:43:00,13:43,Friday,Other Miscellaneous,Other,Central,3,13,43,Afternoon
998,2023-03-24 19:45:00,19:45,Friday,Larceny Theft,Larceny - From Vehicle,Richmond,3,19,45,Evening


### Add a column identifying if criminal incidents

In [33]:
# identify criminal incidents using subcategory
criminal_incident = ["Larceny - From Vehicle", "Vandalism", "Larceny Theft - Other", "Motor Vehicle Theft",             
"Simple Assault", "Drug Violation", "Aggravated Assault", "Fraud", "Theft From Vehicle",                  
"Burglary - Other", "Weapons Offense", "Intimidation", "Warrant", "Larceny - Auto Parts",                
"Other Offenses", "Larceny Theft - From Building", "Larceny Theft - Shoplifting",         
"Robbery - Other", "Burglary - Residential", "Robbery - Street",                    
"Traffic Violation Arrest", "Robbery - Commercial", "Larceny Theft - Pickpocket", "Forgery And Counterfeiting",           
"Motor Vehicle Theft (Attempted)", "Burglary - Hot Prowl", "Prostitution",                         
"Burglary - Commercial", "Disorderly Conduct", "Arson",                                
"Larceny Theft - Bicycle", "Embezzlement",                         
"Extortion-Blackmail", "Sex Offense"]                             

In [109]:
def find_criminal_incident(incident):
    return int(incident in criminal_incident)

In [110]:
data['if_crime'] = data.apply(lambda row: find_criminal_incident(row['incident_subcategory']), axis=1)
data

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['if_crime'] = data.apply(lambda row: find_criminal_incident(row['incident_subcategory']), axis=1)


Unnamed: 0,incident_datetime,incident_time,incident_day_of_week,incident_category,incident_subcategory,police_district,month,hour,minute,time_period,if_crime
0,2023-03-13 23:41:00,23:41,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,23,41,Night,0
1,2023-03-01 05:02:00,05:02,Wednesday,Larceny Theft,Larceny Theft - Other,Mission,3,5,2,Late Night,1
2,2023-03-13 13:16:00,13:16,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,13,16,Afternoon,0
3,2023-03-13 10:59:00,10:59,Monday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,10,59,Morning,0
4,2023-03-14 18:44:00,18:44,Tuesday,Recovered Vehicle,Recovered Vehicle,Out of SF,3,18,44,Evening,0
...,...,...,...,...,...,...,...,...,...,...,...
995,2023-03-24 00:00:00,00:00,Friday,Malicious Mischief,Vandalism,Taraval,3,0,0,Night,1
996,2023-03-24 11:00:00,11:00,Friday,Recovered Vehicle,Recovered Vehicle,Park,3,11,0,Morning,0
997,2023-03-24 13:43:00,13:43,Friday,Other Miscellaneous,Other,Central,3,13,43,Afternoon,0
998,2023-03-24 19:45:00,19:45,Friday,Larceny Theft,Larceny - From Vehicle,Richmond,3,19,45,Evening,1


### Summary of the Data

In [36]:
categorical_fields = ["incident_day_of_week", 'incident_category', 'incident_subcategory',
                      'police_district','time_period']

In [37]:
def data_cardinality():
    for field in categorical_fields:
        print(field, '', data[field].nunique())
        if data[field].nunique() > 0:
            print(data[field].value_counts())
        print('********\n')

In [38]:
data_cardinality()

incident_day_of_week  7
incident_day_of_week
Thursday     339
Friday       294
Wednesday    153
Tuesday       66
Monday        65
Saturday      46
Sunday        37
Name: count, dtype: int64
********

incident_category  36
incident_category
Larceny Theft                               242
Assault                                      69
Other Miscellaneous                          68
Motor Vehicle Theft                          67
Non-Criminal                                 64
Malicious Mischief                           63
Recovered Vehicle                            62
Burglary                                     50
Drug Offense                                 44
Lost Property                                34
Warrant                                      24
Disorderly Conduct                           24
Suspicious Occ                               23
Fraud                                        22
Robbery                                      22
Missing Person                          

### Visualization

In [39]:
import altair as alt

In [40]:
# visualize the distribution  
time_period_dist = alt.Chart(data).mark_bar().encode(
    x = alt.X('time_period', title = 'Time Period',sort=['Morning', 'Afternoon','Evening','Night','Late Night']),
    y = alt.Y('count()', title = 'Count of Records')
).configure_axisX(
    labelAngle=45
).properties(title='Chart 1: The distribution of Time Period')
time_period_dist

## !! some explanation ##

In [41]:
day_time = alt.Chart(data).mark_point().encode(
    x = alt.X('time_period',title='Time Period',sort=['Morning', 'Afternoon','Evening','Night','Late Night']),
    y = alt.Y('count()', title = 'Count of Records')
).properties(height = 150).facet(
    facet = alt.Facet('incident_day_of_week',title=None,
                      
                      ),
    title = 'Chart 2: Incidents Records by Time Period & Day'
)


# FacetedEncoding(angle=45, title=None, 
#                       sort=['Morning','Afternoon','Evening','Night','Late Night']),
     

#title = 'Chart 2: Incidents Records by Time Period & Day'


   
day_time

## ! some explanation !

### Perform an Analysis

In [49]:
data_used= data[['incident_day_of_week','police_district','if_crime','time_period']]

X = data_used.drop(columns=['if_crime'])
y = data_used['if_crime']

In [94]:
from sklearn.dummy import DummyClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import (train_test_split, cross_validate)

In [104]:
X = pd.get_dummies(X).astype(float)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)

dummy = DummyClassifier()
lr = LogisticRegression()

dummy.fit(X_train, y_train)

cv_results_dummy = np.sum(dummy.predict(X_test) == y_test) / len(y_test)
cv_results_lr = pd.DataFrame(cross_validate(lr, X_train, y_train, return_train_score=True))

print("Baseline:", cv_results_dummy, "\n")
display(cv_results_lr)

Baseline: 0.6366666666666667 



Unnamed: 0,fit_time,score_time,test_score,train_score
0,0.012556,0.001735,0.678571,0.717857
1,0.006507,0.001549,0.728571,0.707143
2,0.006032,0.0013,0.707143,0.710714
3,0.007671,0.001613,0.664286,0.717857
4,0.00717,0.002461,0.707143,0.705357


### Visualization of the Analysis

## Discussion


## Reference

1. Braga, A. A., Weisburd, D., & Oxford Scholarship Online Law. (2010). *Policing problem places: Crime hot spots and effective prevention.* Oxford University Press. https://doi.org/10.1093/acprof:oso/9780195341966.001.0001
2. Perry, W. L., McInnis, B., Price, C. C., Smith, S. C., & Hollywood, J. S. (2013). *Predictive Policing: The Role of Crime Forecasting in Law Enforcement Operations.* RAND Corporation. http://www.jstor.org/stable/10.7249/j.ctt4cgdcz
4. Security (2019). *Violent Crimes Most Likely to Occur At Night.* Security. https://www.securitymagazine.com/articles/90384-murder-robbery-and-driving-while-impaired-happen-at-night