# Report

## Introduction and data

> REMOVE THE FOLLOWING TEXT

This section includes an introduction to the project motivation, data, and (research) question.

> Use content from the [BIG IDEA worksheet](https://docs.google.com/document/d/1-GZvhdbhLYLB_Bo1arj1rgTqbJ5SUoU21vtgbYEhVqk/edit?usp=sharing)   

Describe the data and definitions of key variables.

It should also include some exploratory data analysis.

*All of the EDA won't fit in the paper, so focus on the EDA for the response variable and a few other interesting variables and relationships.*

In [None]:
from pathlib import Path

PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "processed/"
FILE = "chicago_crimes-20230125-1544"
FORMAT = ".csv"

In [None]:
import altair as alt
from vega_datasets import data
alt.data_transformers.disable_max_rows()

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [None]:
import pandas as pd

df = pd.read_csv(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)

In [None]:
df.head()

In [None]:
df.info()

In [None]:
#Welches Verbrechen wie oft vorkam
df.primary_type.value_counts()

In [None]:
df["primary_type"].describe()

In [None]:
var_list = ['primary_type']

source = df[var_list]

In [None]:
#Barchart with different types of crime

ch = alt.Chart(source).mark_bar().encode(
    x=alt.X("primary_type", sort="-y",
            axis=alt.Axis(title="district",

                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)', 
            axis=alt.Axis(title = "Count", 
                        titleAnchor="end")),
).properties(
    title='Count of commited crimes per type',
    width=1000,
    height=400
)

txt = ch.mark_text(
    baseline = 'middle',
    dy= - 15
).encode(
    text='count(primary_type)'
)

ch+txt

In [None]:
#Barchart with different types of crime

ch = alt.Chart(df).mark_bar().encode(
    x=alt.X('count(primary_group)', 
            axis=alt.Axis(title = "COUNT", 
                        titleAnchor="end")),
    y=alt.Y("primary_group", sort="y",
            axis=alt.Axis(title="PRIMARY GROUP",
                          titleAnchor="start")),

).properties(
    title='Count of commited crimes per type',
    width=1000,
    height=400
)

ch

In [None]:
#districts crosstab // nicht Ã¼bersichtlich

cross_table = pd.crosstab(df["district"], df["district"],
    margins=True,
    normalize=True,
    rownames=["District"],
    colnames=["Ergebnis"]
    )* 100


cross_table

In [None]:
df["district"].value_counts(normalize=True) * 100

In [None]:
#arrest crime crosstab in percent

cross_table = pd.crosstab(df["primary_type"], df["arrest"],
    margins=True,
    normalize=True,
    rownames=["Crime"],
    colnames=["Arrest"]
    )* 100


cross_table

In [None]:
#arrest crime crosstab in total

cross_table = pd.crosstab(df["primary_type"], df["arrest"],
    margins=True,
    normalize=False,
    rownames=["Crime"],
    colnames=["Arrest"]
    )


cross_table

In [None]:
#arrest homocide crosstab

cross_table = pd.crosstab(df["primary_type"] =="homicide" , df["arrest"],
    margins=True,
    normalize=True,
    rownames=["Crime"],
    colnames=["Arrest"]
    )* 100


cross_table

In [None]:
#arrest group crosstab

cross_table = pd.crosstab(df["primary_group"], df["arrest"],
    margins=True,
    normalize=True,
    rownames=["Group"],
    colnames=["Arrest"]
    )* 100


cross_table

In [None]:
homicide = alt.Chart(df).mark_bar().encode(
    x=alt.X("count(primary_type)"),
    y=alt.Y("arrest")
).transform_filter(
alt.FieldEqualPredicate(field='primary_type', equal="homicide")
)



homicide

In [None]:
#primary_type and district crosstab

cross_table = pd.crosstab(df["primary_type"], df["district"],
    margins=True,
    normalize=True,
    rownames=["Type"],
    colnames=["District"]
    )* 100


cross_table

In [None]:
district = alt.Chart(df).mark_bar().encode(
    x=alt.X("district:N",
    sort="-y",
    axis=alt.Axis(title="DISTRICT",  
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y("count(primary_type)")
).properties(
    title='Count of commited crime in the districts',
    width=1000,
    height=400
)



district

In [None]:
#districts with most crime

df["district"].value_counts().nlargest(5)

In [None]:
district_5 = alt.Chart(df).mark_bar().encode(
    x=alt.X("district:N",
    sort="-y",
    axis=alt.Axis(title="DISTRICT",  
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y("count(primary_type):Q"),
    color=alt.condition(
        alt.FieldOneOfPredicate('district', [11, 6, 8, 1, 18]),  # If the district is 11 this test returns True,
        alt.value('orange'),     # which sets the bar orange.
        alt.value('steelblue')   # And if it's not true it sets the bar steelblue.
    )
).properties(
    title='Count of commited crime in the districts',
    width=1000,
    height=400
)



district_5

In [None]:
#wards with most crime

df["ward"].value_counts().nlargest(5)

In [None]:
selection = alt.selection_multi(fields=['ward'])

chart_2 = alt.Chart(df[~df['ward'].isna()]).mark_bar(size=20).encode(
    x=alt.X('ward:N',
            sort="-y",
            axis=alt.Axis(title="Ward",
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
    tooltip="ward"
).properties(
    title='Count of commited crime in the wards',
    width=1500,
    height=400
).add_selection(
    selection
)


chart_2.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
ward_5 = alt.Chart(df[~df['ward'].isna()]).mark_bar(size=20).encode(
    x=alt.X('ward:N',
            sort="-y",
            axis=alt.Axis(title="Ward",
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
            color=alt.condition(
        alt.FieldOneOfPredicate('ward', [42.0, 28.0, 24.0, 27.0, 6.0]),  # If the district is 11 this test returns True,
        alt.value('orange'),     # which sets the bar orange.
        alt.value('steelblue')   # And if it's not true it sets the bar steelblue.
    )              
).properties(
    title='Count of commited crime in the wards',
    width=1500,
    height=400
)


ward_5.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
#block/streets with most crimes
df["block"].value_counts().nlargest(10)

In [None]:
chart_3 = alt.Chart(df).mark_bar().encode(
    y=alt.Y('year:N',
            axis=alt.Axis(title="Year",
                          titleAnchor="start")),
    x=alt.X('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
).properties(
    title='Count of commited crime per year',
    width=400,
    height=200
)


chart_3.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
chart_4 = alt.Chart(df).mark_line().encode(
    x=alt.X('month:N',
            axis=alt.Axis(title="Month",
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
    color=alt.Color("year:N", legend=alt.Legend(title="YEAR"))                  
).properties(
    title='Count of commited crime per month',
    width=500,
    height=400
)

chart_4.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
chart_5 = alt.Chart(df).mark_line().encode(
    x=alt.X('hour:N',
            axis=alt.Axis(title="Hour",
                          titleAnchor="start",
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
    color=alt.Color("year:N", legend=alt.Legend(title="YEAR"))
).properties(
    title='Count of commited crime per hour',
    width=600,
    height=400
)


chart_5.configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
)

In [None]:
# select a point for which to provide details-on-demand
label = alt.selection_single(
    encodings=['x'], # limit selection to x-axis value
    on='mouseover',  # select on mouseover events
    nearest=True,    # select data point nearest the cursor
    empty='none'     # empty selection includes no data points
)

chart_5 = alt.Chart().mark_line().encode(
    x=alt.X('hour:N',
            axis=alt.Axis(title="Hour",
                          titleAnchor="start",
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "Count", 
                          titleAnchor="end")),
    color=alt.Color("year:N", legend=alt.Legend(title="YEAR"))
)


alt.layer(
    chart_5,
    alt.Chart().mark_rule(color='lightgrey').encode(
        x='hour:N'
    ).transform_filter(label),

chart_5.mark_circle().encode(
        opacity=alt.condition(label, alt.value(1), alt.value(0))
    ).add_selection(label),

chart_5.mark_text(align='left', dx=5, dy=-5, stroke='white', strokeWidth=2).encode(
        text='count(primary_type)'
    ).transform_filter(label),

chart_5.mark_text(align='left', dx=5, dy=-5).encode(
        text='count(primary_type)'
    ).transform_filter(label),
    data=df
).properties(
    width=800,
    height=600
)

In [None]:
# nicht brauchbar wegen group_3
alt.Chart(df).mark_area().encode(
    x="hour:N",
    y=alt.Y("count(primary_group)", stack="normalize"),
    color="primary_group:N"
)

## Visualizations

In [None]:
#Geopandas library to work with Chicago map
import geopandas as gpd

In [None]:
PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "external/"
FILE = "wards"
FORMAT = ".shp"

gdf = gpd.read_file(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)

In [None]:
#Map of Chicago with Crimes as Dots on the Map

choro = alt.Chart(gdf).mark_geoshape(
    fill="lightgrey", stroke='grey'
).encode()


p = alt.Chart(df).mark_circle(opacity=0.2).encode(
    longitude='longitude',
    latitude='latitude',
    size=alt.value(10)
).properties(
    title="Location of crimes in Chicago City",
    width=1000,
    height=1000
)

choro + p

> REMOVE THE FOLLOWING TEXT

This section includes a brief description of your visualization creation process.

Explain the reasoning for the type of visualization you're using and what other types you considered. 

Additionally, show how you arrived at the final visualization by describing the plot selection process, variable transformations (if needed), and any other relevant considerations that were part of the visualization creation process.

## Conclusion + recommended action


> REMOVE THE FOLLOWING TEXT

In this section you'll include a summary of what you have learned about your (research) question along with (statistical) arguments supporting your conclusions.

In addition, discuss the limitations of your analysis and provide suggestions on ways the analysis could be improved.

Any potential issues pertaining to the reliability and validity of your data and appropriateness of the statistical analysis should also be discussed here.

Lastly, this section will include your recommended action.