# Explore the area likely to experience crime with seasonal and teporal change

### Introduction

The dataset is provided by Chicago Data Portal, And Data is extracted from the Chicago Police Department's CLEAR system https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2/about_data. This dataset reflects reported incidents of crime that occured in the city of Chicago from 2001 to present. it records various types of crime with exception of murders. This project will analyze which areas in the city will have the most crime with different seasons and day time period. With briefly looking into the dataset, 

In [1]:
import pandas as pd
import qeds
%matplotlib inline
 # activate plot theme
import qeds

from IPython.display import display

### Data Cleaning 

In [2]:
url = "Crimes_-_2001_to_Present_20240130.csv"
df = pd.read_csv(url)
columns = ['Year'] + [col for col in df.columns if col != 'Year']
df = df[columns]

In [3]:
# checking the duplicated data in case some cases are reported multiple times.
df = df.drop_duplicates()

In [4]:
# First Checking which columns contain Nah value, in case deleting the rows which have nah value but contain other
# important stuff.

columns_with_nan = df.isna().any()
print(columns_with_nan)

Year                          False
ID                            False
Case Number                   False
Date                          False
Block                         False
IUCR                          False
Primary Type                  False
Description                   False
Location Description           True
Arrest                        False
Domestic                      False
Beat                          False
District                       True
Ward                           True
Community Area                 True
FBI Code                      False
X Coordinate                   True
Y Coordinate                   True
Updated On                    False
Latitude                       True
Longitude                      True
Location                       True
Historical Wards 2003-2015     True
Zip Codes                      True
Community Areas                True
Census Tracts                  True
Wards                          True
Boundaries - ZIP Codes      

In [5]:
# Drop the Nah value from these rows which are useful to my research.
df = df.dropna(subset=['Latitude', 'Longitude'])

In [7]:
# Converting the original 'Date' into Datetime
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y %I:%M:%S %p')
# Extracting month and time from Datetime, in order to better assist the further research
df['Month'] = df['Date'].dt.month
df['Time'] = df['Date'].dt.time
df[['Date', 'Month', 'Time']]
df.head()

Unnamed: 0,Year,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,...,Historical Wards 2003-2015,Zip Codes,Community Areas,Census Tracts,Wards,Boundaries - ZIP Codes,Police Districts,Police Beats,Month,Time
11,2020,12045583,JD226426,2020-05-07 10:24:00,035XX S INDIANA AVE,0820,THEFT,$500 AND UNDER,APARTMENT,False,...,12.0,4301.0,1.0,446.0,9.0,36.0,24.0,101.0,5,10:24:00
12,2020,12031001,JD209965,2020-04-16 05:00:00,005XX W 32ND ST,0460,BATTERY,SIMPLE,APARTMENT,True,...,26.0,21194.0,58.0,223.0,48.0,40.0,23.0,170.0,4,05:00:00
13,2020,12093529,JD282112,2020-07-01 10:16:00,081XX S COLES AVE,051A,ASSAULT,AGGRAVATED - HANDGUN,STREET,True,...,43.0,21202.0,42.0,505.0,37.0,25.0,19.0,234.0,7,10:16:00
14,2020,12178140,JD381597,2020-09-27 23:29:00,065XX S WOLCOTT AVE,0460,BATTERY,SIMPLE,RESIDENCE - PORCH / HALLWAY,False,...,44.0,22257.0,65.0,281.0,3.0,23.0,17.0,205.0,9,23:29:00
15,2005,4144897,HL474854,2005-07-10 15:00:00,062XX S ABERDEEN ST,0430,BATTERY,AGGRAVATED: OTHER DANG WEAPON,STREET,False,...,19.0,21559.0,66.0,434.0,2.0,11.0,17.0,261.0,7,15:00:00


### Summary Statistics Tables

In [8]:
df.describe(include = "all")

  df.describe(include = "all")


Unnamed: 0,Year,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,...,Historical Wards 2003-2015,Zip Codes,Community Areas,Census Tracts,Wards,Boundaries - ZIP Codes,Police Districts,Police Beats,Month,Time
count,7898640.0,7898640.0,7898640,7898640,7898640,7898640.0,7898640,7898640,7890749,7898640,...,7875087.0,7898640.0,7878278.0,7880311.0,7878399.0,7878331.0,7879458.0,7879484.0,7898640.0,7898640
unique,,,7898083,3283123,61970,403.0,36,546,217,2,...,,,,,,,,,,85150
top,,,HZ140230,2007-01-01 00:01:00,100XX W OHARE ST,820.0,THEFT,SIMPLE,STREET,False,...,,,,,,,,,,12:00:00
freq,,,6,172,16295,639967.0,1666063,931528,2070806,5859186,...,,,,,,,,,,182862
first,,,,2001-01-01 00:00:00,,,,,,,...,,,,,,,,,,
last,,,,2024-01-22 00:00:00,,,,,,,...,,,,,,,,,,
mean,2010.268,7170816.0,,,,,,,,,...,27.42117,19088.9,38.68966,381.0766,25.59066,31.5318,14.91966,150.2043,6.564897,
std,6.505648,3584757.0,,,,,,,,,...,15.25245,5748.504,20.06934,230.3197,14.72498,19.134,6.449355,78.53059,3.347816,
min,2001.0,634.0,,,,,,,,,...,1.0,2733.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
25%,2005.0,3866876.0,,,,,,,,,...,14.0,21184.0,25.0,176.0,12.0,15.0,10.0,83.0,4.0,
