---
format:
  html:
    theme: zephyr
    code-fold: true
    embed-resources: true
jupyter: python3
---

In [None]:
from pathlib import Path

PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "processed/"
FILE = "chicago_crimes-20230130-1108"
FORMAT = ".csv"

import altair as alt
from vega_datasets import data
alt.data_transformers.disable_max_rows()

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pandas as pd

#load crime dataset
df = pd.read_csv(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)

#Geopandas library to work with Chicago map
import geopandas as gpd

PARENT_PATH = str(Path().resolve().parent) + "/"
PATH = "data/"
SUBPATH = "external/"
FILE = "wards"
FORMAT = ".shp"

#load map dataset
gdf = gpd.read_file(PARENT_PATH + PATH + SUBPATH + FILE + FORMAT)

#drop observations which did not made it into our report
df.drop(['id', 'ward', 'day'], axis=1, inplace=True)


cat_list = ["block","primary_type", "primary_group", "district", "year", "month", "hour"]

#convert to category
for i in cat_list:
    df[i] = df[i].astype("category")

date_list = ["date"]

#convert date to datetime
for i in date_list:
    df[i] = df[i].astype("datetime64")

## Visualizations

The first visualisation is a map of Chicago City in which the individual crimes of the years 2018 and 2019 have been displayed as small squares. The colour sorting was done according to the primary_group. Since the density of the squares was reduced, it is easy to see in which districts the most crimes took place. It is also possible to filter according to one of the three individual primary_group. The map allows the viewer to quickly get an overview of the local frequency and degree of the crimes.

The second visualisation is a line chart which shows the day in 24 hours. This graph also shows the distribution of the cases within a day, represented for the years 2018 and 2019. The whole chart is also interactive. It is important for us to know at what time the crimes were committed in order to adapt the opening hours of the Crime Prevention Center to the local conditions.

The third visualization contains two plots. The first one shows the number of crimes committed per district, in descending order. This is to show the viewer quickly and easily in which districts the most crimes were committed.
To get an even more detailed insight into the crimes committed in the individual districts, the second stacked bar plot was added. This plot shows the individual crimes by type in percentage frequency. This should help to evaluate the crimes in the districts not only by frequency, but also by severity.

In all visualisations, care was taken to ensure that the viewer can understand the message of the graphic as easily as possible.
Unnecessary distractions such as grids or frames were avoided. The axes were labelled in such a way that they support the viewer's eye. The interactivity of the charts encourages the viewer to engage with the visualisations and to discover while using the chart.



### Map (interactive)
In this Map you can see the location of the crimes that were occured in Chicago. The more intense the point on the map is, the more crimes were made there. You can find out the exact location, street and type with your mouse. You can also filter the view by one of the three primary_groups.

In [None]:
#map of chicago (interactive)
#map
choro = alt.Chart(gdf).mark_geoshape(
    fill="white", stroke='grey'
).encode()

#selection
group_radio = alt.binding_radio(options=['group_1','group_2','group_3'], name='Select_Group: ')
group_select = alt.selection_single(
    fields=["primary_group"], bind=group_radio
)

group_color_condition = alt.condition(
    group_select,
    alt.Color("primary_group:N", legend=alt.Legend(title="GROUP", orient='none', legendX=750, legendY=10)),
    alt.value("lightgrey"),
)

#squares
p = alt.Chart(df).mark_square(opacity=0.3).encode(
        longitude='longitude', 
        latitude='latitude', 
        size=alt.value(10), 
        tooltip=["district", "block", "primary_type"]
).add_selection(group_select
).encode(color=group_color_condition
).properties(
    title="Locations of crimes in Chicago City",
    width=800,
    height=1000)
    
layer = alt.layer(choro + p
).configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
).configure_axis(grid=False
).configure_view(strokeOpacity=0)

layer


### Line chart (interactive)
Here we can see the difference between the year 2018 and the year 2019. This Graph shows at what time the crime was commited. We can see that the most crimes were commited at 12pm and at 7pm. 


In [None]:
# Line chart (interactive)
# select a point for which to provide details-on-demand
label = alt.selection_single(
    encodings=['x'], # limit selection to x-axis value
    on='mouseover',  # select on mouseover events
    nearest=True,    # select data point nearest the cursor
    empty='none'     # empty selection includes no data points
)

chart_6 = alt.Chart().mark_line().encode(
    x=alt.X('hour:N',
            axis=alt.Axis(title="HOUR",
                          titleAnchor="start",
                          labelAngle=0)),
    y=alt.Y('count(primary_type)',
            axis=alt.Axis(title = "COUNT", 
                          titleAnchor="end")),
    color=alt.Color("year:N", legend=alt.Legend(title=" ", orient='none', legendX=820, legendY=180))
)


alt.layer(
    chart_6,
    alt.Chart().mark_rule(color='lightgrey').encode(
        x='hour:N'
    ).transform_filter(label),

chart_6.mark_circle().encode(
        opacity=alt.condition(label, alt.value(1), alt.value(0))
    ).add_selection(label),

chart_6.mark_text(align='left', dx=5, dy=-5, stroke='white', strokeWidth=2).encode(
        text='count(primary_type)'
    ).transform_filter(label),

chart_6.mark_text(align='left', dx=5, dy=-5).encode(
        text='count(primary_type)'
    ).transform_filter(label),
    data=df
).properties(
    title="Distribution of committed crimes per hour",
    width=800,
    height=600
).configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
).configure_axis(grid=False
).configure_view(strokeOpacity=0)

### Bar chart & stacked bar chart (interactive: drag to select single or multiple districts)
- In the first bar chart we can see again in which districts the most crimes were commited. The most violent districts are: 11, 6, 8, 1, 18 and the least violent are: 20, 17 and 24.
- In the stacked bar chart we can see how the different types of crime are distributed in the districts. For example there were 2271 thefts in District 1.

In [None]:
# Bar chart & stacked bar chart (interactive: drag to select single or multiple districts)
order_crime = ["theft", "assault_and_battery","criminal_damage", "deceptive_practice", "burglary", "other_offense", "robbery_and_weapons", "narcotics", "homicide", "sexual_crime"]

brush = alt.selection(type='interval')

bar = alt.Chart(df).mark_bar().encode(
    x=alt.X("district:N",
    sort="-y",
    axis=alt.Axis(title="DISTRICT",  
                          titleAnchor="start", 
                          labelAngle=0)),
    y=alt.Y("count(primary_type):Q",
    axis=alt.Axis(title="COUNT",  
                          titleAnchor="end")),
    tooltip=[alt.Tooltip('count(primary_type)', title='count')]
).add_selection(
    brush
).properties(
    title='Count of committed crime per districts',
    width=1000,
    height=400
)



bars = alt.Chart(df).mark_bar().encode(
    x=alt.X('count(primary_type)', stack="normalize",
    axis=alt.Axis(format="%",title = "DISTRIBUTION", 
                          titleAnchor="start")),
    y=alt.Y('district:N',
    axis=alt.Axis(title="DISTRICT",  
                          titleY=25)),
    color=alt.Color('primary_type', sort=order_crime, 
    legend=alt.Legend(title="TYPE", orient='none', legendX=1050, legendY=475)),
    tooltip=["primary_type", alt.Tooltip('count(primary_type)', title='count')]
).transform_filter(
    brush
).properties(
    title='Distribution of crime types per district',
    width=1000,
    height=600)

alt.vconcat(bar & bars).configure_title(
    fontSize=16,
    font='Arial',
    color='black',
    anchor='start'
).configure_axis(grid=False
).configure_view(strokeOpacity=0)


## Conclusion + recommended action


### Conclusion

With the help of our analysis, the important questions "Which kind of crimes happen particularly frequently and where/when do they happen?" could be clarified in order to answer the question of "Where the new Crime Prevention Centers should be built in Chicago?".

We have learned that the crime types vary in frequency. For example, the offences "theft" and "assault_and_battery" account for over 50% of all registered cases. Other crimes such as "sexual_crime" or "homicide" are (fortunately) much less common. But we also found out that most of the crimes occur during the day, between 9pm and 7pm. The course of the line shows how crimes also adapt to people's daily rhythms. Of course, we were also able to get an overview of which districts of Chicago have the most crimes and how serious they are.

The data show that there are particular hot spots in the city. These are located around the city center in district 1, where a particularly high number of (lighter) crimes of "group_1" occur. In the outer areas of the city, e.g. in and around district 11 and districts 6 and 8, there is an increase in (more serious) crimes of "group_2".


Due to the data situation (2018 and 2019), we were unable to determine a trend as to whether there will be fewer or more crimes over time. To determine this, we would need to analyse data of an even longer period of time.
One limitation of our analysis is that we were not able to work very statistically. This is mainly due to the fact that the data are mostly categorical. The observations allow us to find out where, when and which crime took place, but not who and why the crime was committed. Also, were not able to statistically verify the two hypotheses. This would have to be done better in the future.


In addition, it was not easy at the beginning to summarize the more than 40 different types of offenses in a meaningful way. As a result, the individual offenses may not always be accurately reflected in our analysis.
Also, a grouping according to three different severity rates is rather subjective and may vary from viewer to viewer.



### Recommended action

We can clearly see that Chicago is a violent city and that the city council needs to react to prevent or lower these in the future. 

Therefore we suggest Crime Prevention Centers in the districts. We understand that it is to expensive to open them in every district. Considering our analysis and graphs we would open at least 5 Centers in the most violent districts with focus on special crime types/groups:

1.  District 11 (5432 incidents) - focus on "narcotics" and general "group_2". 
2.  District 6 (4712) - focus on "criminal damage" and general "group_2".
3.  District 8 (4591) - focus on "criminal damage" and general "group_2".
4.  District 1 (4560) - focus on "theft" and "deceptive practice" and general "group_1".
5.  District 18 (4485) - focus on "theft" and "deceptive practice" and generals "group_1".

It also makes sense to open the Crime Prevention Centers in certain streets, as there are some very violent streets. Most of these streets are going through 2 or more districts, like the State Street or Michigan Avenue, which are the most violent streets in Chicago.

We also understand that the Prevention Centers can't be open 24 hours a day, so we would adapt their opening hours to the time when the most crime happen statisticly. We would suggest opening times at least from 11am to 8 pm because this are the most violent hours, especially around 12pm and 7 pm. The 5 most violent hours are:

1.  12pm   (4663 incidents)
2.  7pm    (4413)
3.  6pm    (4383)
4.  3pm    (4226)
5.  5pm    (4225)

Consider the months over the year, we would still recommend to keep the Prevention Center open the whole year, because there are no months with a significant decrease of crime.

