# Analysis and Prediction of Crimes in Chicago city

### Introduction

- The data we are analysing is from the Chicago Data Portal (https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2/data) which provides us information about all the crimes that took place in the city of Chicago from 2001 to Present

- The questions we would be investigating and predicting are:
    1. The type of crime that can happen
    2. The place where a crime is likely to happen
    3. If the crime ends up in an arrest or not


### Any changes?

We initially planned to work with the entire dataset which has data from 2001 - Present. That is 7662271 rows (as of Nov 2 2022)
Currently, we would be working on a subset of the dataset i.e., from 2019 - 2021 which has 680425 rows (as of Nov 2 2022)

This is being done to fit the time frame of our project. In the subsequent days, we would incorporate the entire dataset to finetune our model better

### Data initialisation

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [5]:
crimes_df = pd.read_csv('Crimes-2021_to_2022.csv')
crimes_df.head(5)

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,12345411,JE205618,1/1/21 0:00,036XX S ASHLAND AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,PARKING LOT / GARAGE (NON RESIDENTIAL),False,False,...,11.0,59,14,1166266.0,1880505.0,2021,4/23/21 16:49,41.827682,-87.665496,"(41.827681913, -87.665496311)"
1,12449065,JE319016,1/1/21 0:00,100XX S AVENUE L,1750,OFFENSE INVOLVING CHILDREN,CHILD ABUSE,RESIDENCE,False,True,...,10.0,52,08B,1201814.0,1838991.0,2021,8/12/21 16:59,41.712933,-87.536489,"(41.712932999, -87.53648903)"
2,12349639,JE210703,1/1/21 0:00,037XX N PITTSBURGH AVE,1150,DECEPTIVE PRACTICE,CREDIT CARD FRAUD,APARTMENT,False,False,...,38.0,17,11,1120403.0,1923742.0,2021,4/28/21 16:51,41.947186,-87.83284,"(41.94718614, -87.832840321)"
3,12354069,JE216275,1/1/21 0:00,016XX W HOWARD ST,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,APARTMENT,False,True,...,49.0,1,11,1163887.0,1950346.0,2021,5/4/21 16:47,42.01938,-87.672249,"(42.019380398, -87.672249127)"
4,12355943,JE218564,1/1/21 0:00,051XX N SHERIDAN RD,2826,OTHER OFFENSE,HARASSMENT BY ELECTRONIC MEANS,APARTMENT,False,False,...,48.0,3,26,1168704.0,1934414.0,2021,5/5/21 16:49,41.975559,-87.654988,"(41.975559264, -87.654987667)"


##### Meanings of columns

1. ID: unique crime ID
2. Date: listed date of crime
3. Block: block where crime occured
4. IUCR: four digit Illinois Uniform Crime Reporting (IUCR) codes
5. Description: Short description of the type of crime
6. Location description: Description of where crime occured
7. Arrest: boolean value (T/F) of whether or not an arrest was made
8. Domestic: boolean value (T/V) of whether or not crime was domestic
9. Community Area: numeric value indicating area in community where crime occured
10. FBI Code: numeric code indicating FBI crime categorization
11. X & Y Coordinate: exact location where crime occured
12. Year: Year crime occured
13. Updated On: Date and time the crime was added
14. Latitude & Longitude: latitude and longitude information of crime