# **Dataset and Preprocessing**

## Visual 1
Theft, Battery and Criminal Damage were the top 3 most common crimes committed in Chicago. This graph tells us the percentage of one of top 3 crimes, compared to the total crimes committed in a specific year. Every year, except for 2020, Theft was the most popular crime committed. We believe that in 2020, less Theft was committed because of the COVID-19 pandemic

In [1]:
import pandas as pd
import plotly.express as px
import plotly.offline as pyo
from IPython.display import HTML

# Load and prepare data
df = pd.read_csv("dataset/graph123.csv")
df['Primary Type'] = df['Primary Type'].astype(str)
df = df[df['Year'].between(2018, 2023)]

top3_list = []

for year in sorted(df['Year'].unique()):
    year_data = df[df['Year'] == year]
    top3 = year_data['Primary Type'].value_counts(normalize=True).nlargest(3) * 100
    for crime, pct in top3.items():
        top3_list.append({
            "Year": year,
            "Primary Type": crime,
            "Percentage": pct
        })

top3_df = pd.DataFrame(top3_list)

# Create Plotly figure
fig = px.bar(
    top3_df,
    x="Year",
    y="Percentage",
    color="Primary Type",
    barmode="group",
    text=top3_df["Percentage"].round(1).astype(str) + "%",
    color_discrete_sequence=px.colors.qualitative.Set2
)

fig.update_traces(textposition='outside')
fig.update_layout(
    title="Top 3 Crime Types per Year (2018–2023)",
    xaxis_title="Year",
    yaxis_title="Percentage of Crimes",
    yaxis=dict(range=[0, top3_df["Percentage"].max() + 5]),
    legend_title="Primary Type",
    template="plotly_white",
    bargap=0.3
)

# Render HTML using plotly.offline
plot_html = pyo.plot(fig, include_plotlyjs='cdn', output_type='div')

# Display HTML in notebook (safe for MyST/Jupyter Book)
HTML(plot_html)


## Visual 2
The total amount of registered crimes per year can tell someone if something significant happened that year. This graph shows us the total amount of registered crimes per year from 2018 until 2023. We did expect a large dip for 2020, but we also were surprised by the decline in 2019 compared to 2018, as we'd expect the total to actually increase.

In [2]:
import pandas as pd
import plotly.express as px
import plotly.offline as pyo
from IPython.display import HTML

# Load and filter the dataset
df = pd.read_csv('dataset/graph123.csv')

# Filter years and drop unnecessary columns
df = df[(df['Year'] >= 2018) & (df['Year'] <= 2023)]
df = df.drop(columns=['ID', 'Case Number'], errors='ignore')

# Group by year and count crimes
yearly_totals = df.groupby('Year').size().reset_index(name='Total Crimes')

# Create the line chart
fig = px.line(
    yearly_totals,
    x='Year',
    y='Total Crimes',
    title='Total Number of Crimes per Year (2018–2023)',
    markers=True
)

# Improve layout
fig.update_traces(line=dict(color='#636EFA', width=3))
fig.update_layout(
    xaxis=dict(type='category'),
    yaxis_title='Total Crimes',
    title_font=dict(size=22),
    margin=dict(t=60, b=40, l=20, r=20)
)

# Render HTML using plotly.offline
plot_html = pyo.plot(fig, include_plotlyjs='cdn', output_type='div')

# Display HTML in notebook (safe for MyST/Jupyter Book)
HTML(plot_html)




## Visual 3
Crimes can be committed in different places in the city. This graph shows where crimes are committed most often between 2018 and 2023. According to the graph, the most popular places are the street, in someone's apartment and then someone's residence. Something we noticed is the lack of crimes committed in alleyways, where normally you'd expect a lot more crimes according to Hollywood.

In [3]:
import pandas as pd
import plotly.express as px
import plotly.offline as pyo
from IPython.display import HTML

# Load and filter the dataset
df = pd.read_csv('dataset/graph123.csv')

# Filter by year and drop unused columns
df = df[(df['Year'] >= 2018) & (df['Year'] <= 2023)]
df = df.drop(columns=['ID', 'Case Number'], errors='ignore')

# Drop missing values and convert 'Location Description' to string
df = df.dropna(subset=['Location Description'])
df['Location Description'] = df['Location Description'].astype(str)

# Count occurrences of each location
location_counts = df['Location Description'].value_counts().reset_index()
location_counts.columns = ['Location Description', 'Count']

# Optional: show only top N locations to reduce clutter
top_n = 10
location_counts = location_counts.head(top_n)

# Create donut chart
fig = px.pie(
    location_counts,
    values='Count',
    names='Location Description',
    title='Top Locations of Crime (2018–2023)',
    hole=0.6,
    color_discrete_sequence=[
        "#636EFA", "#EF553B", "#00CC96", "#AB63FA", "#FFA15A",
        "#19D3F3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52"
    ]
)

# Improve layout
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.update_layout(
    showlegend=True,
    legend_title_text='Location',
    title_font=dict(size=22),
    margin=dict(t=60, b=40, l=20, r=20)
)

# Render HTML using plotly.offline
plot_html = pyo.plot(fig, include_plotlyjs='cdn', output_type='div')

# Display HTML in notebook (safe for MyST/Jupyter Book)
HTML(plot_html)



## Visual 4

In [4]:
import pandas as pd
import plotly.express as px
import plotly.offline as pyo
from IPython.display import HTML

# 1. Data inlezen
df = pd.read_csv("dataset/graph4.csv")

# 2. Groeperen per politiedistrict en aantal misdaden tellen
district_counts = df['District'].value_counts().reset_index()
district_counts.columns = ['Police District', 'Reported Crimes']

# 3. Sorteer op aantal misdaden (optioneel voor visuele duidelijkheid)
district_counts = district_counts.sort_values(by='Reported Crimes', ascending=False)

# 4. Plot maken
fig = px.bar(
    district_counts,
    x='Police District',
    y='Reported Crimes',
    title='Reported Crimes per Police District',
    text='Reported Crimes'
)

fig.update_layout(xaxis_title='Police District', yaxis_title='Reported Crimes')

# Render HTML using plotly.offline
plot_html = pyo.plot(fig, include_plotlyjs='cdn', output_type='div')

# Display HTML in notebook (safe for MyST/Jupyter Book)
HTML(plot_html)