# Draft analysis 

---

Group name: B

---


## Introduction

*This section includes an introduction to the project motivation, data, and research question. Include a data dictionary* 

### Motivation

In the city of Chicago, many incidents/crimes happen every day, from minor thefts to murders. To reduce the violence in the city, the city wants to open a new crime prevention centre. Now the city is asking our team which crimes occur particularly frequently and where they happen. With this information, the **Crime Prevention Center** can be built in a particularly well-situated location. In addition, the specialised departments of the centre can be trained for the relevant criminal offences. This should make Chicago a safer city and ensure that measures are taken at an early stage to prevent crime.

Various studies show that it is possible to prevent crime in cities with the help of specific actions. With the new **Crime Prevention Center**, we want to take a new approach in Chicago to prevent crime from the very beginning.

### Question

Which kind of crimes happen particularly frequently and where do they happen?

### Hypotheses

There are places (districts/blocks) in Chicago where the most (dangerous) crimes/incidents happen.

### Data dictionary


| Name  |   Description	   	| Type   	|  Format 	|
|---	|---	          	|---	    |---	|
|id   	|Unique identifier for the record.   	            |numeric   	    |category   	|
|date   	|Date when the incident occurred.   	       	    |oridnal   	    |date   	|
|block   	|The partially redacted address where the incident occurred.   	            |numeric   	    |category   	|
|primary_type   	|The primary description of the IUCR code.   	       	    |nominal   	    |category   	|
|arrest   	|Indicates whether an arrest was made.   	            |nominal   	    |category   	|
|domestic   	|Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.   	       	    |nominal   	    |category   	|
|beat   	|Indicates the beat where the incident occurred.   	       	    |numeric, dicsrete   	    |category   	|
|district   	|Indicates the police district where the incident occurred.   	       	    |numeric, dicsrete   	    |category   	|
|ward   	|The ward (City Council district) where the incident occurred.   	       	    |numeric, discrete   	    |category   	|
|community_area   	|Indicates the community area where the incident occurred.   	       	    |numeric, discrete   	    |category   	|
|year   	|Year the incident occurred.   	       	    |nominal   	    |category   	|
|month   	|Month the incident occurred.   	       	    |nominal   	    |category   	|
|day   	|Day the incident occurred.   	       	    |nominal   	    |category   	|
|hour   	|Hour the incident occurred.   	       	    |nominal   	    |category   	|
|latitude   	|The longitude of the location where the incident occurred.   	       	    |numeric   	    |float   	|
|longitude   	|The longitude of the location where the incident occurred.   	       	    |numeric   	    |float   	|
|arrest__False   	|Indicates whether an arrest was made. 0 means False   	            |nominal   	    |category   	|
|arrest__True   	|Indicates whether an arrest was made. 1 means True   	            |nominal   	    |category   	|
|domestic__False   	|Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act. 0 means False  	       	    |nominal   	    |category   	|
|domestic__True   	|Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act. 1 means True   	       	    |nominal   	    |category   	|

<br>

## Setup

In [None]:
from pathlib import Path

#Pandas library
import pandas as pd

#Altair library for visualisations
import altair as alt

#disable max rows
alt.data_transformers.disable_max_rows()

#Stop showing FutureWarning
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

#Geopandas library to work with Chicago map // OPTIMAL
import geopandas as gpd



## Data

## Import data

In [None]:
#import Dataset
PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "interim/"
FILE = "chicago_crimes-20221130-1405"
FORMAT = ".csv"


df = pd.read_csv(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)


#import Geopandas Dataset

PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "external/"
FILE = "wards"
FORMAT = ".shp"

gdf = gpd.read_file(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)

### Data structure

In [None]:
df.head()

In [None]:
df.info()

In [None]:
#drop missing values
df = df.dropna()

In [None]:
#rename column ward to wards
gdf = gdf.rename(columns={'ward': 'wards'})

In [None]:
gdf.head()

In [None]:
gdf.info()

### Data corrections

In [None]:
list_categorial = ["id", "block", "primary_type", "arrest", "domestic", "beat", "year", "month", "day", "hour", "arrest__False", "arrest__True", "domestic__False", "domestic__True"]

In [None]:
for i in list_categorial:
    df[i] = df[i].astype("category")

In [None]:
list_cat = ["district", "ward", "community_area"]

In [None]:
for x in list_cat:
    df[x] = df[x].astype(str).apply(lambda x: x.replace('.0',''))

In [None]:
#for x in list_cat:
#    df[x] = df[x].astype(str)

In [None]:
#for x in list_categorial:
#    df[x] = df[x].astype("category")

In [None]:
df["district"] = df["district"].astype("category")
df["ward"] = df["ward"].astype("category")
df["community_area"] = df["community_area"].astype("category")

In [None]:
#date in datetime umwandeln
df["date"] = pd.to_datetime(df.date)

In [None]:
df.info()

In [None]:
df.head(3)

## Analysis

### Descriptive statistics

In [None]:
#number of crime types
df["primary_type"].value_counts()

In [None]:
#crime distribution in district
df["district"].value_counts()

In [None]:
#crime distribution in ward
df["ward"].value_counts()

In [None]:
#crime distribution in community_area
df["community_area"].value_counts()

### Exploratory data analysis

In [None]:
# contingency table for arrest and primary_type.
pd.crosstab(df['arrest'],    # rows: arrest
            df['primary_type'],    # columns: primary_type
            margins=True)          # with total count

In [None]:
# contingency table for domestic and primary_type.
pd.crosstab(df['domestic'],    # rows: domestic
            df['primary_type'],    # columns: primary_type
            margins=True)          # with total count

## Visualizations

### Visualization ideas

In [None]:
#Barchart with with count of different types of crime

chart_1 = alt.Chart(df).mark_bar().encode(
    x=alt.X("primary_type", sort="-y",
            axis=alt.Axis(title="district",

                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)', 
            axis=alt.Axis(title = "Count", 
                        titleAnchor="end")),
).properties(
    title='Count of commited crimes per type',
    width=1000,
    height=400
)

txt_1 = chart_1.mark_text(
    baseline = 'middle',
    dy= - 15
).encode(
    text='count(primary_type)'
)

chart_1+txt_1

In [None]:
#Barchart that counts types of crime and shows if an arrest was made

chart = alt.Chart(df).mark_bar().encode(
    x=alt.X("primary_type", sort="-y",
            axis=alt.Axis(title="district",

                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)', 
            axis=alt.Axis(title = "Count", 
                        titleAnchor="end")),
    color=alt.Color("arrest",legend=alt.Legend(title="Arrest"))
).properties(
    title='Count of commited crimes per type and arrest',
    width=1000,
    height=400
)

text = chart.mark_text(
    baseline = 'middle',
    dy= - 15
).encode(
    text='count(primary_type)'
)

chart+text

In [None]:
#Barchart that counts types of crime and shows if an criminal was domestic

chart_10 = alt.Chart(df).mark_bar().encode(
    x=alt.X("primary_type", sort="-y",
            axis=alt.Axis(title="district",

                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)', 
            axis=alt.Axis(title = "Count", 
                        titleAnchor="end")),
    color=alt.Color("domestic",legend=alt.Legend(title="Domestic"))
).properties(
    title='Count of commited crimes per type and domestic',
    width=1000,
    height=400
)

text_10 = chart.mark_text(
    baseline = 'middle',
    dy= - 15
).encode(
    text='count(primary_type)'
)

chart_10+text_10

In [None]:
ch = alt.Chart(df).mark_bar().encode(
    x=alt.X('district',
            bin=alt.BinParams(maxbins=50),
            axis=alt.Axis(title="district",  
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime in the districts',
    width=1000,
    height=400
)

txt = ch.mark_text(
    baseline='middle',
    dy=30
).encode(
    text='count(primary_type)'
)

ch + txt

In [None]:
#Barchart wit commited crimes in wards

chart_2 = alt.Chart(df).mark_bar(size=20).encode(
    x=alt.X('ward',
            sort="-y",
            axis=alt.Axis(title="Ward",
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime in the wards',
    width=1500,
    height=400
)


chart_2.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
#count of commited crime per year

chart_3 = alt.Chart(df).mark_bar().encode(
    x=alt.X('year',
            axis=alt.Axis(title="Year",
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime per year',
    width=200,
    height=400
)


chart_3.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
#count of commited crime per month

chart_4 = alt.Chart(df).mark_bar(size=30).encode(
    x=alt.X('month',
            axis=alt.Axis(title="Month",
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime per month',
    width=500,
    height=400
)


chart_4.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
#count of commited crime per hour

chart_5 = alt.Chart(df).mark_bar(size=15).encode(
    x=alt.X('hour',
            axis=alt.Axis(title="Hour",
                          titleAnchor="start",
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime per hour',
    width=600,
    height=400
)


chart_5.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
#Map of Chicago with Crimes as Dots on the Map

choro = alt.Chart(gdf).mark_geoshape(
    fill="lightgrey", stroke='grey'
).encode()


p = alt.Chart(df).mark_circle().encode(
    longitude='longitude',
    latitude='latitude',
    size=alt.value(10),
).properties(
    title="Location of crimes in Chicago City",
    width=1000,
    height=1000
)

choro + p

### Save Visualizations



Save your draft visualizations in the folder `reports/visualizations/`. Use a meaningful name (always include the word `draft` and a `timestamp`in your filename).

## Conclusion and recommended action

At this point we do not have an final conclusion or recommended action.
We we need to look more closely at the individual wards and if there are dependencies regarding the type of crime, time and location.
After these steps we can give a recommendation.