# Project proposal

---

Group name: Group B

---


## Introduction

The introduction section includes

-   an introduction to the subject matter you're investigating
-   the motivation for your question (citing any relevant literature/study results ...)
-   the general (research) question you wish to explore
-   your hypotheses regarding the (research) question of interest.

### Subject Matter:

In the city of Chicago, many incidents/crimes happen every day, from minor thefts to murders. To reduce the violence in the city, the city wants to open a new crime prevention centre. Now the city is asking our team which crimes occur particularly frequently and where they happen. With this information, the **Crime Prevention Center** can be built in a particularly well-situated location. In addition, the specialised departments of the centre can be trained for the relevant criminal offences. This should make Chicago a safer city and ensure that measures are taken at an early stage to prevent crime.


### Motivation:

Various studies show that it is possible to prevent crime in cities with the help of specific actions. With the new **Crime Prevention Center**, we want to take a new approach in Chicago to prevent crime from the very beginning.

* Crime Prevention and the Safer Cities Story
https://onlinelibrary.wiley.com/doi/10.1111/j.1468-2311.1993.tb00758.x

* Social Crime Prevention in South Africa's Major Cities 
http://csvr.org.za/docs/urbansafety/socialcrimeprevention.pdf




### General Question:

Which kind of crimes happen particularly frequently and where do they happen?

### Hypotheses:

There are places (districts/blocks) in Chicago where the most (dangerous) crimes/incidents happen.


## Data description

In this section, you will describe the data set you wish to explore. This includes

-   description of the observations in the data set,
-   description of how the data was originally collected (not how you found the data but how the original curator of the data collected it).

### Observations:

* **ID** Unique identifier for the record.	

* **Case Number** The Chicago Police Department RD Number (Records Division Number), which is unique to the incident. 	

* **Date** Date when the incident occurred. This is sometimes a best estimate. 

* **Block**	The partially redacted address where the incident occurred, placing it on the same block as the actual address.

* **IUCR** The Illinois Unifrom Crime Reporting code. This is directly linked to the Primary Type and Description. See the list of IUCR codes at https://data.cityofchicago.org/d/c7ck-438e.

* **Primary Type** The primary description of the IUCR code.

* **Type** The primary description of the IUCR code.	

* **Description** The secondary description of the IUCR code, a subcategory of the primary description.	

* **Location Description** Description of the location where the incident occurred. 

* **Arrest** Indicates whether an arrest was made.	

* **Domestic** Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.	

* **Beat** 	Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car. Three to five beats make up a police sector, and three sectors make up a police district. The Chicago Police Department has 22 police districts. See the beats at https://data.cityofchicago.org/d/aerh-rz74. 	

* **District** Indicates the police district where the incident occurred. See the districts at https://data.cityofchicago.org/d/fthy-xz3r.	

* **Ward** The ward (City Council district) where the incident occurred. See the wards at https://data.cityofchicago.org/d/sp34-6z76.	

* **Community Area** Indicates the community area where the incident occurred. Chicago has 77 community areas. See the community areas at https://data.cityofchicago.org/d/cauq-8yn6. 

* **Year** Year the incident occurred.	

* **Latitude** The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.	

* **Longitude** The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.

### How the data was collected:

**Note:** As the original dataset was too large, we have reduced it a little, so it only contains criminal cases from 2018 and 2019.

Crimes - 2001 to Present

This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Data Fulfillment and Analysis Division of the Chicago Police Department at DFA@ChicagoPolice.org.

Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data are updated daily. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e



## Analysis approach

In this section, you will provide a brief overview of your analysis approach. This includes:

-   Description of the relevant variable.
-   Exploratory data analysis and summary statistics for the relevant variables.
-   The visualization types (what kind of visualizations will you use)

*Variables:*

Information to figure out *when* the crime was commited:
- Date
- Year

Information to figure out *where* the crime was commited:
- Location Description
- Block
- Domestic
- District
- Ward
- Community Area
- Latitude
- Longitude

Information whether an arrest was made:
- Arrest

*Data:*



*Visualization types:*
- Choropleth Map: A map that will show where the most crimes are commited in Chicago. (https://altair-viz.github.io/gallery/choropleth.html)
- Bar Chart with rounded edges: A bar charts that will sum up the different types of crime and violence in every district. (https://altair-viz.github.io/gallery/bar_rounded.html)

Exploratory Data Analysis:

In [3]:
from pathlib import Path

PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "interim/"
FILE = "chicago_crimes-20221130-0952"
FORMAT = ".csv"

In [4]:
import pandas as pd

df = pd.read_csv(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)

In [5]:
df.head()

Unnamed: 0,id,date,block,primary_type,arrest,domestic,beat,district,ward,community_area,year,latitude,longitude,month,day,hour,arrest__False,arrest__True,domestic__False,domestic__True
0,11452981,2018-09-15 23:00:00,045XX S FAIRFIELD AVE,sexual_crime,False,False,922,9.0,12.0,58.0,2018,41.811058,-87.693066,9,15,23,1,0,1,0
1,11370943,2018-07-05 22:00:00,022XX S MICHIGAN AVE,theft,False,False,131,1.0,2.0,33.0,2018,41.85241,-87.623792,7,5,22,1,0,1,0
2,11805144,2019-08-24 01:00:00,084XX S KENNETH AVE,theft,False,False,834,8.0,18.0,70.0,2019,41.739312,-87.732509,8,24,1,1,0,1,0
3,11208069,2018-01-19 11:44:00,0000X N LATROBE AVE,narcotics,True,False,1522,15.0,28.0,25.0,2018,41.880998,-87.756304,1,19,11,0,1,1,0
4,11352516,2018-06-18 23:00:00,009XX W BARRY AVE,criminal_damage,False,False,1933,19.0,44.0,6.0,2018,41.938104,-87.653157,6,18,23,1,0,1,0


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 76204 entries, 0 to 76203
Data columns (total 20 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   id               76204 non-null  int64  
 1   date             76204 non-null  object 
 2   block            76204 non-null  object 
 3   primary_type     76204 non-null  object 
 4   arrest           76204 non-null  bool   
 5   domestic         76204 non-null  bool   
 6   beat             76204 non-null  int64  
 7   district         76204 non-null  float64
 8   ward             76202 non-null  float64
 9   community_area   76204 non-null  float64
 10  year             76204 non-null  int64  
 11  latitude         76204 non-null  float64
 12  longitude        76204 non-null  float64
 13  month            76204 non-null  int64  
 14  day              76204 non-null  int64  
 15  hour             76204 non-null  int64  
 16  arrest__False    76204 non-null  int64  
 17  arrest__True

In [7]:
#Welches Verbrechen wie oft vorkam
df.primary_type.value_counts()

theft                  21637
assault_and_battery    21004
criminal_damage         8151
deceptive_practice      5404
burglary                5260
other_offense           5136
robbery_and_weapons     4415
narcotics               4170
sexual_crime            1027
Name: primary_type, dtype: int64

In [8]:
df["primary_type"].describe()

count     76204
unique        9
top       theft
freq      21637
Name: primary_type, dtype: object

In [11]:
#Verbrechen pro Uhrzeit
df.district.value_counts()

11.0    5400
6.0     4698
8.0     4578
1.0     4559
18.0    4483
7.0     4093
4.0     3984
25.0    3821
12.0    3780
10.0    3617
19.0    3593
3.0     3501
2.0     3403
5.0     3396
9.0     3203
15.0    2876
14.0    2724
16.0    2503
22.0    2298
24.0    2274
17.0    2058
20.0    1361
31.0       1
Name: district, dtype: int64

In [13]:
df["district"].max()

31.0

In [10]:
import altair as alt
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

## Data dictionary

*Create a data dictionary for all the variables in your data set. You may fill out the data description table or create your own table with Pandas:*

<br>


| Name  |   Description	   	| Type   	|  Format 	|
|---	|---	          	|---	    |---	|
|ID   	|Unique identifier for the record.   	            |numeric   	    |category   	|
|Case Number   	|The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.   	       	    |numeric   	    |category   	|
|Date   	|Date when the incident occurred.   	       	    |oridnal   	    |date   	|
|Block   	|The partially redacted address where the incident occurred.   	            |numeric   	    |category   	|
|IUCR   	|The Illinois Unifrom Crime Reporting code.   	       	    |numeric   	    |category   	|
|Primary Type   	|The primary description of the IUCR code.   	       	    |nominal   	    |category   	|
|Type   	|The primary description of the IUCR code.   	            |nominal   	    |category   	|
|Description   	|The secondary description of the IUCR code.   	       	    |nominal   	    |category   	|
|Location Description   	|Description of the location where the incident occurred.   	       	    |nominal   	    |category   	|
|Arrest   	|Indicates whether an arrest was made.   	            |nominal   	    |category   	|
|Domestic   	|Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.   	       	    |nominal   	    |category   	|
|Beat   	|Indicates the beat where the incident occurred.   	       	    |numeric, dicsrete   	    |category   	|
|District   	|Indicates the police district where the incident occurred.   	       	    |numeric, dicsrete   	    |category   	|
|Ward   	|The ward (City Council district) where the incident occurred.   	       	    |numeric, discrete   	    |category   	|
|Community Area   	|Indicates the community area where the incident occurred.   	       	    |numeric, discrete   	    |category   	|
|Year   	|Year the incident occurred.   	       	    |nominal   	    |category   	|
|Latitude   	|The longitude of the location where the incident occurred.   	       	    |numeric   	    |float   	|
|Longitude   	|The longitude of the location where the incident occurred.   	       	    |numeric   	    |float   	|



<br>


- `Type`: nominal, ordinal or numeric

- `Format`: int, float, string, category, date or object