# Project proposal

---

Group name: Group B

---


## Introduction

The introduction section includes

-   an introduction to the subject matter you're investigating
-   the motivation for your question (citing any relevant literature/study results ...)
-   the general (research) question you wish to explore
-   your hypotheses regarding the (research) question of interest.

### Subject Matter:

In the city of Chicago, many incidents/crimes happen every day, from minor thefts to murders. To reduce the violence in the city, the city wants to open a new crime prevention centre. Now the city is asking our team which crimes occur particularly frequently and where they happen. With this information, the **Crime Prevention Center** can be built in a particularly well-situated location. In addition, the specialised departments of the centre can be trained for the relevant criminal offences. This should make Chicago a safer city and ensure that measures are taken at an early stage to prevent crime.


### Motivation:

Various studies show that it is possible to prevent crime in cities with the help of specific actions. With the new **Crime Prevention Center**, we want to take a new approach in Chicago to prevent crime from the very beginning.

* Crime Prevention and the Safer Cities Story
https://onlinelibrary.wiley.com/doi/10.1111/j.1468-2311.1993.tb00758.x

* Social Crime Prevention in South Africa's Major Cities 
http://csvr.org.za/docs/urbansafety/socialcrimeprevention.pdf




### General Question:

Which kind of crimes happen particularly frequently and where do they happen?

### Hypotheses:

There are places (districts/blocks) in Chicago where the most (dangerous) crimes/incidents happen.


## Data description

In this section, you will describe the data set you wish to explore. This includes

-   description of the observations in the data set,
-   description of how the data was originally collected (not how you found the data but how the original curator of the data collected it).

### Observations:

* **ID** Unique identifier for the record.	

* **Case Number** The Chicago Police Department RD Number (Records Division Number), which is unique to the incident. 	

* **Date** Date when the incident occurred. This is sometimes a best estimate. 

* **Block**	The partially redacted address where the incident occurred, placing it on the same block as the actual address.

* **IUCR** The Illinois Unifrom Crime Reporting code. This is directly linked to the Primary Type and Description. See the list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.

* **Primary Type** The primary description of the IUCR code.

* **Type** The primary description of the IUCR code.	

* **Description** The secondary description of the IUCR code, a subcategory of the primary description.	

* **Location Description** Description of the location where the incident occurred. 

* **Arrest** Indicates whether an arrest was made.	

* **Domestic** Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.	

* **Beat** 	Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car. Three to five beats make up a police sector, and three sectors make up a police district. The Chicago Police Department has 22 police districts. See the beats at https://data.cityofchicago.org/d/aerh-rz74. 	

* **District** Indicates the police district where the incident occurred. See the districts at https://data.cityofchicago.org/d/fthy-xz3r.	

* **Ward** The ward (City Council district) where the incident occurred. See the wards at https://data.cityofchicago.org/d/sp34-6z76.	

* **Community Area** Indicates the community area where the incident occurred. Chicago has 77 community areas. See the community areas at https://data.cityofchicago.org/d/cauq-8yn6. 

* **Year** Year the incident occurred.	

* **Latitude** The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.	

* **Longitude** The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.

### How the data was collected:

**Note:** As the original dataset was too large, we have reduced it a little, so it only contains criminal cases from 2018 and 2019.

Crimes - 2001 to Present

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Data Fulfillment and Analysis Division of the Chicago Police Department at DFA@ChicagoPolice.org.

Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data are updated daily. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e



## Analysis approach

In this section, you will provide a brief overview of your analysis approach. This includes:

-   Description of the relevant variable.
-   Exploratory data analysis and summary statistics for the relevant variables.
-   The visualization types (what kind of visualizations will you use)

*Variables:*

Information to figure out *when* the crime was commited:
- Date
- Year

Information to figure out *where* the crime was commited:
- Location Description
- Block
- Domestic
- District
- Ward
- Community Area
- Latitude
- Longitude

Information whether an arrest was made:
- Arrest

*Data:*



*Visualization types:*
- Point Maps: A map that will show where the most crimes are commited in Chicago. (https://uwdata.github.io/visualization-curriculum/altair_cartographic.html#geoshape-marks)
- Bar Chart with rounded edges: A bar charts that will sum up the different types of crime and violence in every district. (https://altair-viz.github.io/gallery/bar_rounded.html)

Exploratory Data Analysis:

In [None]:
from pathlib import Path

PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "interim/"
FILE = "chicago_crimes-20221130-1405"
FORMAT = ".csv"

In [None]:
import pandas as pd

df = pd.read_csv(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)

In [None]:
df.head()

In [None]:
df.info()

In [None]:
#Welches Verbrechen wie oft vorkam
df.primary_type.value_counts()

In [None]:
df["primary_type"].describe()

In [None]:
#Verbrechen pro District
df.district.value_counts()

In [None]:
df["district"].max()

In [None]:
import altair as alt
from vega_datasets import data
alt.data_transformers.disable_max_rows()

In [None]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [None]:
var_list = ['primary_type']

source = df[var_list]

In [None]:
#Barchart with different types of crime

ch = alt.Chart(source).mark_bar().encode(
    x=alt.X("primary_type", sort="-y",
            axis=alt.Axis(title="district",

                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)', 
            axis=alt.Axis(title = "Count", 
                        titleAnchor="end")),
).properties(
    title='Count of commited crimes per type',
    width=1000,
    height=400
)

txt = ch.mark_text(
    baseline = 'middle',
    dy= - 15
).encode(
    text='count(primary_type)'
)

ch+txt

In [None]:
chart = alt.Chart(df).mark_bar().encode(
    x=alt.X('district',
            bin=alt.BinParams(maxbins=50),
            axis=alt.Axis(title="district",  
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime in the districts',
    width=1000,
    height=400
)

text = chart.mark_text(
    baseline='middle',
    dy=30
).encode(
    text='count(primary_type)'
)

chart + text

In [None]:
#wards are like districts
#there are 50 wards in Chicago
df["ward"].unique()

In [None]:
#change float to int
import numpy as np

df['ward'] = df['ward'].dropna().astype("Int64") #pandas int

In [None]:
df["ward"].head()

In [None]:
df["ward"].unique()

In [None]:
df['ward'] = df['ward'].astype("category")


In [None]:
chart_2 = alt.Chart(df).mark_bar(size=20).encode(
    x=alt.X('ward',
            sort="-y",
            axis=alt.Axis(title="Ward",
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime in the wards',
    width=1500,
    height=400
)


chart_2.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
df['year'] = df['year'].astype("category")


In [None]:
chart_3 = alt.Chart(df).mark_bar().encode(
    x=alt.X('year',
            axis=alt.Axis(title="Year",
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime per year',
    width=200,
    height=400
)


chart_3.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
df['month'] = df['month'].astype("category")


In [None]:
chart_4 = alt.Chart(df).mark_bar(size=30).encode(
    x=alt.X('month',
            axis=alt.Axis(title="Month",
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime per month',
    width=500,
    height=400
)


chart_4.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
df['hour'] = df['hour'].astype("category")

In [None]:
chart_5 = alt.Chart(df).mark_bar(size=15).encode(
    x=alt.X('hour',
            axis=alt.Axis(title="Hour",
                          titleAnchor="start",
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime per hour',
    width=600,
    height=400
)


chart_5.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
#Geopandas library to work with Chicago map
import geopandas as gpd

In [None]:
PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "external/"
FILE = "wards"
FORMAT = ".shp"

gdf = gpd.read_file(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)

In [None]:
gdf.head()

In [None]:
gdf.info()

In [None]:
#Map of Chicago with Crimes as Dots on the Map

choro = alt.Chart(gdf).mark_geoshape(
    fill="lightgrey", stroke='grey'
).encode()


p = alt.Chart(df).mark_circle().encode(
    longitude='longitude',
    latitude='latitude',
    size=alt.value(10)
).properties(
    title="Location of crimes in Chicago City",
    width=1000,
    height=1000
)

choro + p

## Data dictionary

*Create a data dictionary for all the variables in your data set. You may fill out the data description table or create your own table with Pandas:*

<br>


| Name  |   Description	   	| Type   	|  Format 	|
|---	|---	          	|---	    |---	|
|id   	|Unique identifier for the record.   	            |numeric   	    |category   	|
|date   	|Date when the incident occurred.   	       	    |oridnal   	    |date   	|
|block   	|The partially redacted address where the incident occurred.   	            |numeric   	    |category   	|
|primary_type   	|The primary description of the IUCR code.   	       	    |nominal   	    |category   	|
|arrest   	|Indicates whether an arrest was made.   	            |nominal   	    |category   	|
|domestic   	|Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.   	       	    |nominal   	    |category   	|
|beat   	|Indicates the beat where the incident occurred.   	       	    |numeric, dicsrete   	    |category   	|
|district   	|Indicates the police district where the incident occurred.   	       	    |numeric, dicsrete   	    |category   	|
|ward   	|The ward (City Council district) where the incident occurred.   	       	    |numeric, discrete   	    |category   	|
|community_area   	|Indicates the community area where the incident occurred.   	       	    |numeric, discrete   	    |category   	|
|year   	|Year the incident occurred.   	       	    |nominal   	    |category   	|
|month   	|Month the incident occurred.   	       	    |nominal   	    |category   	|
|day   	|Day the incident occurred.   	       	    |nominal   	    |category   	|
|hour   	|Hour the incident occurred.   	       	    |nominal   	    |category   	|
|latitude   	|The longitude of the location where the incident occurred.   	       	    |numeric   	    |float   	|
|longitude   	|The longitude of the location where the incident occurred.   	       	    |numeric   	    |float   	|
|arrest__False   	|Indicates whether an arrest was made. 0 means False   	            |nominal   	    |category   	|
|arrest__True   	|Indicates whether an arrest was made. 1 means True   	            |nominal   	    |category   	|
|domestic__False   	|Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act. 0 means False  	       	    |nominal   	    |category   	|
|domestic__True   	|Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act. 1 means True   	       	    |nominal   	    |category   	|

<br>


- `Type`: nominal, ordinal or numeric

- `Format`: int, float, string, category, date or object