# Draft analysis 

---

Group name: B

---


## Introduction

*This section includes an introduction to the project motivation, data, and research question. Include a data dictionary* 

### Motivation

In the city of Chicago, many incidents/crimes happen every day, from minor thefts to murders. To reduce the violence in the city, the city wants to open a new crime prevention centre. Now the city is asking our team which crimes occur particularly frequently and where they happen. With this information, the **Crime Prevention Center** can be built in a particularly well-situated location. In addition, the specialised departments of the centre can be trained for the relevant criminal offences. This should make Chicago a safer city and ensure that measures are taken at an early stage to prevent crime.

Various studies show that it is possible to prevent crime in cities with the help of specific actions. With the new **Crime Prevention Center**, we want to take a new approach in Chicago to prevent crime from the very beginning.

### Question

Which kind of crimes happen particularly frequently and where do they happen?

### Hypotheses

There are places (districts/blocks) in Chicago where the most (dangerous) crimes/incidents happen.

### Data dictionary


| Name  |   Description	   	| Type   	|  Format 	|
|---	|---	          	|---	    |---	|
|id   	|Unique identifier for the record.   	            |numeric   	    |category   	|
|date   	|Date when the incident occurred.   	       	    |oridnal   	    |date   	|
|block   	|The partially redacted address where the incident occurred.   	            |numeric   	    |category   	|
|primary_type   	|The primary description of the IUCR code.   	       	    |nominal   	    |category   	|
|arrest   	|Indicates whether an arrest was made.   	            |nominal   	    |category   	|
|domestic   	|Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.   	       	    |nominal   	    |category   	|
|beat   	|Indicates the beat where the incident occurred.   	       	    |numeric, dicsrete   	    |category   	|
|district   	|Indicates the police district where the incident occurred.   	       	    |numeric, dicsrete   	    |category   	|
|ward   	|The ward (City Council district) where the incident occurred.   	       	    |numeric, discrete   	    |category   	|
|community_area   	|Indicates the community area where the incident occurred.   	       	    |numeric, discrete   	    |category   	|
|year   	|Year the incident occurred.   	       	    |nominal   	    |category   	|
|month   	|Month the incident occurred.   	       	    |nominal   	    |category   	|
|day   	|Day the incident occurred.   	       	    |nominal   	    |category   	|
|hour   	|Hour the incident occurred.   	       	    |nominal   	    |category   	|
|latitude   	|The longitude of the location where the incident occurred.   	       	    |numeric   	    |float   	|
|longitude   	|The longitude of the location where the incident occurred.   	       	    |numeric   	    |float   	|
|arrest__False   	|Indicates whether an arrest was made. 0 means False   	            |nominal   	    |category   	|
|arrest__True   	|Indicates whether an arrest was made. 1 means True   	            |nominal   	    |category   	|
|domestic__False   	|Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act. 0 means False  	       	    |nominal   	    |category   	|
|domestic__True   	|Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act. 1 means True   	       	    |nominal   	    |category   	|

<br>

## Setup

In [10]:
from pathlib import Path

#Pandas library
import pandas as pd

#Altair library for visualisations
import altair as alt

#Stop showing FutureWarning
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

#Geopandas library to work with Chicago map
import geopandas as gpd

#Numpy library for data corrections
import numpy as np

## Data

## Import data

In [11]:
#import Dataset
PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "interim/"
FILE = "chicago_crimes-20221130-1405"
FORMAT = ".csv"


df = pd.read_csv(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)


#import Geopandas Dataset

PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "external/"
FILE = "wards"
FORMAT = ".shp"

gdf = gpd.read_file(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)

### Data structure

In [12]:
df.head()

Unnamed: 0,id,date,block,primary_type,arrest,domestic,beat,district,ward,community_area,year,latitude,longitude,month,day,hour,arrest__False,arrest__True,domestic__False,domestic__True
0,11452981,2018-09-15 23:00:00,045XX S FAIRFIELD AVE,sexual_crime,False,False,922,9.0,12.0,58.0,2018,41.811058,-87.693066,9,15,23,1,0,1,0
1,11370943,2018-07-05 22:00:00,022XX S MICHIGAN AVE,theft,False,False,131,1.0,2.0,33.0,2018,41.85241,-87.623792,7,5,22,1,0,1,0
2,11805144,2019-08-24 01:00:00,084XX S KENNETH AVE,theft,False,False,834,8.0,18.0,70.0,2019,41.739312,-87.732509,8,24,1,1,0,1,0
3,11208069,2018-01-19 11:44:00,0000X N LATROBE AVE,narcotics,True,False,1522,15.0,28.0,25.0,2018,41.880998,-87.756304,1,19,11,0,1,1,0
4,11352516,2018-06-18 23:00:00,009XX W BARRY AVE,criminal_damage,False,False,1933,19.0,44.0,6.0,2018,41.938104,-87.653157,6,18,23,1,0,1,0


In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76363 entries, 0 to 76362
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   id               76363 non-null  int64  
 1   date             76363 non-null  object 
 2   block            76363 non-null  object 
 3   primary_type     76363 non-null  object 
 4   arrest           76363 non-null  bool   
 5   domestic         76363 non-null  bool   
 6   beat             76363 non-null  int64  
 7   district         76363 non-null  float64
 8   ward             76361 non-null  float64
 9   community_area   76363 non-null  float64
 10  year             76363 non-null  int64  
 11  latitude         76363 non-null  float64
 12  longitude        76363 non-null  float64
 13  month            76363 non-null  int64  
 14  day              76363 non-null  int64  
 15  hour             76363 non-null  int64  
 16  arrest__False    76363 non-null  int64  
 17  arrest__True

In [None]:
gdf.head()

In [None]:
gdf.info()

### Data corrections

In [22]:
list_categorial = ["id", "block", "primary_type", "arrest", "domestic", "beat", "district", "ward", "community_area", "year", "month", "day", "hour", "arrest__False", "arrest__True", "domestic__False", "domestic__True"]

In [23]:
for i in list_categorial:
    df[i] = df[i].astype("category")

In [20]:
#date in datetime umwandeln
df["date"] = pd.to_datetime(df.date)

In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76363 entries, 0 to 76362
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   id               76363 non-null  category      
 1   date             76363 non-null  datetime64[ns]
 2   block            76363 non-null  category      
 3   primary_type     76363 non-null  category      
 4   arrest           76363 non-null  category      
 5   domestic         76363 non-null  category      
 6   beat             76363 non-null  category      
 7   district         76363 non-null  category      
 8   ward             76361 non-null  category      
 9   community_area   76363 non-null  category      
 10  year             76363 non-null  category      
 11  latitude         76363 non-null  float64       
 12  longitude        76363 non-null  float64       
 13  month            76363 non-null  category      
 14  day              76363 non-null  categ

## Analysis

### Descriptive statistics

### Exploratory data analysis

## Visualizations

### Visualization ideas

### Save Visualizations



Save your draft visualizations in the folder `reports/visualizations/`. Use a meaningful name (always include the word `draft` and a `timestamp`in your filename).

## Conclusion and recommended action