<hr/>
<div class="alert alert-success alertsuccess" style="margin-top: 20px">
[Tip]: To execute the Python code in the code cell below, click on the cell to select it and press <kbd>Shift</kbd> + <kbd>Enter</kbd>.
</div>
<hr/>

# Notwendige Imports fÃ¼r dieses Notebook

In [None]:
try:
    import plotly.express as px
except ImportError as e:
    !pip install "plotly>=5.0"

import numpy as np
import pandas as pd
import seaborn as sns
import plotly.express as px

import plotly.io as pio 
pio.renderers.default = 'notebook'


from os.path import exists
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_formats = {'png', 'retina'}

# Laden des Datensatzes

In [None]:
def load_crimes():    
    local = "crime.csv.zip"
    if exists(local):
        print ("Read from local file")
        return pd.read_csv(local, compression='zip')
    else:
        print ("Read from hu-box")        
        return pd.read_csv('https://box.hu-berlin.de/f/d0c59bb99af24dbf9c81/?dl=1', compression='zip')    

df = load_crimes()

# fill missing values
df.SHOOTING = df.SHOOTING.fillna('N')

# Replace -1 values in Lat/Long with Nan
df.Lat = df.Lat.replace(-1, np.nan)
df.Long = df.Long.replace(-1, np.nan)
df['OCCURRED_ON_DATE'] = pd.to_datetime(df['OCCURRED_ON_DATE'])

df = df.convert_dtypes()
df.head()

In [None]:
# List the data types of each column
df.dtypes

# Show the Crimes on OpenStreetMap

In [None]:
# Drop NAN and sample to 10k samples to avoid excessive memory usage
data = df.dropna(subset=['Lat', 'Long', 'DISTRICT']).sample(n=10_000)

fig = px.scatter_map(
    data,
    lat='Lat',
    lon='Long',
    color='DISTRICT',
    zoom=10,
)

fig.update_layout(
    font_family='serif',
    title_font_size=24,
    title_font_weight='bold',
    width=1000,
    height=500,
    mapbox_style=None,
    title=dict(
        text="Crimes by District",
        font=dict(size=24)),
        title_subtitle=dict(
        text="Boston is divided into 12 Discricts",
        font=dict(size=18))
)

fig.show()

# Informationen

In [None]:
df.describe()

In [None]:
df.nunique()

# Selektion / Filter

In [None]:
df[(df.UCR_PART == "Part One")].INCIDENT_NUMBER.count()

# Gruppierungen und Aggregation

In [None]:
df.groupby("YEAR").INCIDENT_NUMBER.count()

## Alternative

In [None]:
df.groupby("YEAR")[["INCIDENT_NUMBER"]].count()

## Sorting

In [None]:
df.sort_values(by="YEAR", ascending=False).head()

### Documentation: https://pandas.pydata.org

# Plotting with Seaborn

In [None]:
# First filter some data
data = df[(df.UCR_PART == 'Part One')  & (df.YEAR == 2016)]

# Plot some data
g = sns.catplot(
    x='DAY_OF_WEEK',
    kind='count',
    height=2,
    aspect=3.0,
    data=data,
    order=["Monday", "Tuesday", "Wednesday", 
           "Thursday", "Friday", "Saturday", "Sunday"])

# add title
g.fig.text(0.12, 1.2, 'Total number of offences by weekday.', 
         fontsize=18, fontweight='bold', fontfamily='serif')
g.fig.text(0.12, 1.05, 'Crime Rates reach its high on Fridays', 
         fontsize=14, fontweight='light', fontfamily='serif');

# annotate plot
plt.annotate('Highest\n crime\n rates', xy=(4, 2900), xytext=(5, 3000),
             arrowprops=dict(facecolor='steelblue',arrowstyle="->",
                             connectionstyle="arc3,rad=.3"), 
             fontsize=10,fontfamily='monospace', ha='left');

plt.show()

In [None]:
# First filter some data
data = df[(df.UCR_PART == 'Part One')].dropna(subset=['Lat', 'Long'])

# Plot some data
g = sns.relplot(
    x='Lat',
    y='Long',   
    col='YEAR',
    alpha=0.01,
    data=data
)

# add title
g.fig.text(0.05, 1.15, 'Urban areas, broken down by year', 
         fontsize=34, fontweight='bold', fontfamily='serif')
g.fig.text(0.05, 1.05, 'Most crimes in the eastern city part?', 
         fontsize=26, fontweight='light', fontfamily='serif');

# annotate plot
for axes in g.axes:
    for ax in axes:
        ax.annotate('Is this the\ncrime hotspot?', xy=(42.36, -71.075), xytext=(42.36, -71.155),
             arrowprops=dict(facecolor='steelblue',arrowstyle="->",
                             connectionstyle="arc3,rad=-.3"), 
             fontsize=10,fontfamily='monospace', ha='left');


sns.despine()
plt.show()

### Documentation: https://seaborn.pydata.org

<hr>

# Your Solutions - Submit via Moodle

<hr/>


<div class="alert alert-block alert-success" style="margin-top: 20px">

<h1>1. Classify Columns</h1>

<h2>a) Classify the data types of each column</h2>
<ul>
<li><strong>Numerical</strong>
<ul>
<li>Continuous or Discrete</li></ul>
</li>
<li><strong>Categorical</strong>
<ul>
<li>Nominal or Ordinal</li></ul>
</li>
</ul>

<b>If a column is discrete or ordinal:
State the reason for your decision</b>

</div>

In [None]:
# Answer / Code / etc

<hr/> 

<div class="alert alert-block alert-success" style="margin-top: 20px">

<h1>2. Key Questions:</h1>

<p>Your aim is to support the police by developing preventive measures based on historical data from 2015-2018.</p>

<ol>
<li><p><strong>How has the total number of offences developed over the years?</strong></p>

<ul>
<li>Which offences are the most frequent?</li>

<li>How has the number of serious crimes ('Part One') developed over the years?</li>

<li>Why is the total number of offences (so) low in 2015 and 2018?</li></ul></li>

<br/>

<li><p><strong>In which urban areas (district), broken down by year, were most crimes committed?</strong></p>

<ul>
<li>In which urban areas (district) are most serious crimes ('Part One') committed? </li>

<li>Which types of serious crimes ('Part One') occur most frequently in the urban area 'B2'? </li></ul></li>

<br/>

<li><p><strong>Are there (a) times, (b) days or (c) months when more serious crimes ('Part One') occur?</strong></p>

<ul>
<li>Do crimes tend to occur at night or during the day?</li>

<li>When are the most police officers needed?</li></ul></li>

<br/>

<li><p><strong>How has the number of shootings developed in recent years?</strong></p>

<ul>
<li>In which district do most shootings take place?</li>

<li>In which street do most shootings take place?</li>

<li>At what times do most shootings take place?</li></ul></li>
</ol>
    
</div> 

<hr/>

<div class="alert alert-block alert-success" style="margin-top: 20px">
    
<h1>Key Question 1:</h1>

<h2>How has the total number of offences developed over the years?</h2>
    
</div>

In [None]:
# Answer / Code / etc

<div class="alert alert-block alert-success" style="margin-top: 20px">

<h3>a) Which offences are the most frequent?</h3>

</div>

In [None]:
# Answer / Code / etc

<div class="alert alert-block alert-success" style="margin-top: 20px">

<h3>b) How has the number of serious crimes ('Part One') developed over the years?</h3>

</div>

In [None]:
# Answer / Code / etc

<div class="alert alert-block alert-success" style="margin-top: 20px">
    
<h3>c) Why is the total number of offences (so) low in 2015 and 2018?</h3>
    
</div>

In [None]:
# Answer / Code / etc

<hr/> 

<div class="alert alert-block alert-success" style="margin-top: 20px">
    
<h1>Key Question 2:</h1>

<h2>In which urban areas (district), broken down by year, were most crimes committed?</h2>
    
</div>

In [None]:
# Answer / Code / etc

<div class="alert alert-block alert-success" style="margin-top: 20px">
    
<h3> a) In which urban areas (district) are most serious crimes ('Part One') committed? </h3>
    
</div>

In [None]:
# Answer / Code / etc

<div class="alert alert-block alert-success" style="margin-top: 20px">
    
<h3> b) Which types of serious crimes ('Part One') occur most frequently in the urban area 'B2'? </h3>
    

</div>

In [None]:
# Answer / Code / etc

<hr/>

<div class="alert alert-block alert-success" style="margin-top: 20px">
    
<h1>Key Question 3:</h1>

<h2>Are there (a) times, (b) days or (c) months when more serious crimes ('Part One') occur?</h2>
    
</div>

In [None]:
# Answer / Code / etc

<div class="alert alert-block alert-success" style="margin-top: 20px">

<h3> a) Do crimes tend to occur at night or during the day?</h3>
    
</div>

In [None]:
# Answer / Code / etc

<div class="alert alert-block alert-success" style="margin-top: 20px">

<h3> b) When are the most police officers needed?</h3>
    
</div>

In [None]:
# Answer / Code / etc

<hr/>

<div class="alert alert-block alert-success" style="margin-top: 20px">
    
<h1>Key Question 4:</h1>

<h2>How has the number of shootings developed in recent years?</h2>
    
</div>

In [None]:
# Answer / Code / etc

<div class="alert alert-block alert-success" style="margin-top: 20px">
    
<h3> a) In which district do most shootings take place?</h3>
    
</div>

In [None]:
# Answer / Code / etc

<div class="alert alert-block alert-success" style="margin-top: 20px">
    
<h3>b) In which street do most shootings take place?</h3>
    
</div>

In [None]:
# Answer / Code / etc

<div class="alert alert-block alert-success" style="margin-top: 20px">
    
<h3>c) At what times do most shootings take place?</h3>
    
</div>

In [None]:
# Answer / Code / etc

<hr/>

# Finally: 
From these solutions: 
- **create a report in PPT, Word, html, etc.** 
- list each Key Question and its answers
- show plots, tables, etc to underline your statement

**Code is not enough to answer the key questions. It should be a visualization and explanation.**

**IF and only IF this JupsterNotebook is very clean, you may polish it, add explanations as markdown, and hand it in as a HTML report, instead**