# Explainer Notebook

This Explainer Notebook will contain details about the choosen datasets, argumentation for the selected visualizations, methodology, etc.

Besides that, the notebook will also contain our analysis and code. 

## 1.	Motivation

### 1.1 What is your dataset?
We selected four datasets obtained from the New York Police Department (NYPD), which include information about crime complaints and arrests in New York City (NYC) spanning from 2006 to 2022.

##### Link to the four datesets:
- Crime complaint data (2006-2021): https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Historic/qgea-i56i
- Crime complaint data (2022): https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243
- Arrest data (2006-2021): https://data.cityofnewyork.us/Public-Safety/NYPD-Arrests-Data-Historic-/8h9b-rp9u
- Arrest data (2022): https://data.cityofnewyork.us/Public-Safety/NYPD-Arrest-Data-Year-to-Date-/uip8-fykc


### 1.2 Why did you choose this/these particular datasets
We specifically chose these four datasets for our crime study to examine the impact of COVID-19 on the crime rates in NYC between 2019 and 2022. By analyzing the data, we aimed to understand how the pandemic influenced the level of criminal activity in the city.

### 1.3 What was your goal for the end user's experience?
Given the significant changes brought about by COVID-19 in our daily lives, our objective in conducting this analysis 
was to provide users with valuable insights. Through our findings, we aimed to shed light on the relationship between 
the pandemic and crime trends in NYC, enabling users to gain a deeper understanding of these dynamics.

## 2. Basic stats

### 2.1 Write about your choices in data cleaning and preprocessing

To perform the analysis on the four datasets, we conducted data cleaning and preprocessing using the following steps:

##### Merging datasets
- We merged the four datasets into two new datasets: one for complaints (2006-2022) and another for arrests (2006-
  2022).

##### Filtering datasets (2019-2022)
- Both datasets were filtered to include data only from the years 2019-2022, focusing on recent information.

##### Merged Larceny
- We merged the two categories of larceny, namely 'GRAND LARCENY' and 'PETIT LARCENY,' to have a unified 
  representation.

##### Corrected the spelling
- In the arrest dataset, we added an "R" to the spelling of "MURDER & NON-NEGL. MANSLAUGHTE" to align it with the 
  complaint dataset for consistency.

##### Replacing incorrect dates
- Any incorrect dates, such as those falling outside the valid range of dates and times that 
  pandas can handle(e.g., 1028-06-21), were replaced with NaT (Not-a-Time) in the complaint dataset.

##### Only ten most affected focus crimes
- Both datasets were filtered to include only the ten most affected focus crimes: 'SEX CRIMES,' 'DANGEROUS DRUGS,'  
  'BURGLARY,' 'LARCENY,' 'ASSAULT 3 & RELATED OFFENSES,' 'MURDER & NON-NEGL. MANSLAUGHTER,' 'RAPE,' 'ROBBERY,' 
  'PROSTITUTION & RELATED OFFENSES,' and 'DISORDERLY CONDUCT.'

##### Please note that the cleaning and preprocessing code can be found below.

In [None]:
# Import libaries
import pandas as pd
from datetime import datetime
import numpy as np
import matplotlib.pyplot as plot
import geopandas as gpd
import plotly.express as px
import json
import plotly.io as pio
import pandas_bokeh
from bokeh.io import output_file, show
from bokeh.layouts import row
from bokeh.plotting import figure
import plotly.graph_objects as go

In [None]:
# Complaint data

# Import NYC crime complaint data (2006-2021) into a df
df = pd.read_csv('NYPD_Complaint_Data_Historic.csv')

# Import New York crime complaint data (2022) into a df
df1 = pd.read_csv('NYPD_Complaint_Data_Current__Year_To_Date_.csv')

In [None]:
# Merge the df and df1
df_complaint = pd.concat([df,df1])

# Delete old dataframes
del df,df1

In [None]:
# Create a new dictionary for merging "GRAND LARCENY" and "PETIT LARCENY"
larceny_dict = {'GRAND LARCENY': 'LARCENY', 'PETIT LARCENY': 'LARCENY'}
              
# Using the replace method to merge the "LARCENY" categories
df_complaint['OFNS_DESC'] = df_complaint['OFNS_DESC'].replace(larceny_dict)

# Make a list with the 10 focus crimes
focuscrimes = set(['SEX CRIMES', 'DANGEROUS DRUGS', 'BURGLARY', 'LARCENY', 'ASSAULT 3 & RELATED OFFENSES', 'MURDER & NON-NEGL. MANSLAUGHTER', 'RAPE', 'ROBBERY', 'PROSTITUTION & RELATED OFFENSES', 'DISORDERLY CONDUCT'])

# New df_complaint with the 10 focus crimes only
df_complaint = df_complaint[df_complaint['OFNS_DESC'].isin(focuscrimes)].copy()

# Changing formatting in 'Date' and 'Time' columns + replacing wrong dates with NaT (wrong dates are dates that are outside the valid range of dates and times that pandas can handle fx 1028-06-21)
df_complaint['CMPLNT_FR_DT'] = pd.to_datetime(df_complaint['CMPLNT_FR_DT'], format='%m/%d/%Y', errors='coerce')
df_complaint['CMPLNT_TO_DT'] = pd.to_datetime(df_complaint['CMPLNT_TO_DT'], format='%m/%d/%Y', errors='coerce')

# Count the number of dates with wrong format
nan_count = df_complaint['CMPLNT_FR_DT'].isna.sum()

# New df with year 2019-2022 (merge df og df1)
df_complaint = df_complaint.loc[(df_complaint['CMPLNT_FR_DT'] >= '2019-01-01') & (df_complaint['CMPLNT_FR_DT'] < '2023-01-01')].copy()

# Add "Year" coloumn to the "df_arrest"
df_complaint ['Year'] = df_complaint['CMPLNT_FR_DT'].dt.year

# Add "Month" coloumn to the "df_arrest"
df_complaint ['Month'] = df_complaint['CMPLNT_FR_DT'].dt.month

# Add "Weekday" coloumn to the "df_arrest"
df_complaint ['Weekday'] = df_complaint ['CMPLNT_FR_DT'].dt.weekday

In [None]:
# Arrest data

# Import New York Arrest data (2006-2021) into a dataframe 
df = pd.read_csv('NYPD_Arrests_Data__Historic_.csv')

# Import New York Arrest data (2022) into a dataframe 
df1 = pd.read_csv('NYPD_Arrest_Data__Year_to_Date_.csv')

In [None]:
# Merge the df and df1
df_arrest = pd.concat([df,df1])

# Delete old dataframes
del df,df1

In [None]:
# Create a new dictionary to add a "R" in "MURDER & NON-NEGL. MANSLAUGHTE", so the spelling is the same in the two datasets
murder_dict = {'MURDER & NON-NEGL. MANSLAUGHTE': 'MURDER & NON-NEGL. MANSLAUGHTER'}

# Create a new dictionary with district names
district_dict = {'B': 'Bronx', 'S': 'Staten Island', 'K':'Brooklyn', 'M':'Manhattan', 'Q':'Queens'}

# Using the replace method to correct the spelling of "MURDER & NON-NEGL. MANSLAUGHTER", merge the "LARCENY" categories and adding the full district names
df_arrest['OFNS_DESC'] = df_arrest['OFNS_DESC'].replace(murder_dict)
df_arrest['OFNS_DESC'] = df_arrest['OFNS_DESC'].replace(district_dict)
df_arrest['OFNS_DESC'] = df_arrest['OFNS_DESC'].replace(larceny_dict)

# New df_arrest with the 10 focuscrimes only
df_arrest = df_arrest[df_arrest['OFNS_DESC'].isin(focuscrimes)].copy()

# Changing formatting in 'Date' and 'Time' columns
df_arrest['ARREST_DATE'] = pd.to_datetime(df_arrest['ARREST_DATE'], format='%m/%d/%Y')

# New dataframe with year 2019-2022 (merge df og df1)
df_arrest = df_arrest.loc[(df_arrest['ARREST_DATE'] >= '2019-01-01') & (df_arrest['ARREST_DATE'] < '2023-01-01')].copy()

# Add "Year" coloumn to the "df_arrest"
df_arrest ['Year'] = df_arrest['ARREST_DATE'].dt.year

# Add "Month" coloumn to the "df_arrest"
df_arrest ['Month'] = df_arrest['ARREST_DATE'].dt.month

# Add "Weekday" coloumn to the "df_arrest"
df_arrest ['Weekday'] = df_arrest['ARREST_DATE'].dt.weekday

### 2.2 Write a short section that discusses the dataset stats, containing key points/plots from your exploratory data analysis.

#### Wrong dates
As mentioned earlier (2.1), we identified a significant number of incorrect dates in the complaint dataset. These incorrect dates were replaced with NaT (Not-a-Time) values during the cleaning process. It's important to note that this may have a significant impact on the yearly overview, as these complaints were excluded from the analysis. Consequently, the visualizations may show lower numbers than the actual crime levels.


#### Incomplete data for the Murder category in the boroughs from 2019 to 2021
Another issue we encountered is incomplete data for the complaints dataset, specifically regarding the borough information for the Murder category from 2019 to 2021. 

This missing information creates the impression that there were no murders during that period when examining the data from the perspective of the five boroughs. However, this conclusion is inaccurate due to the incomplete nature of the data.

### 2.2.1 Wrong dates

In [None]:
# Count dates with wrong format
display(nan_count)

### 2.2.2 Incomplete data for the Murder category in the boroughs from 2019 to 2021

In [None]:
#Fill nan's with the value (null) 
df_complaint = df_complaint.assign(BORO_NM=df_complaint['BORO_NM'].fillna("(null)"))

#create a crosstab between Borougs and focuscrimes 
complaint_cross = pd.crosstab(df_complaint_test.BORO_NM, df_complaint_test.OFNS_DESC)

#print crosstab
display(complaint_cross)

In [None]:
#New dataframe for murder complaints in Brooklyn 
Comp_murder = df_complaint[df_complaint['OFNS_DESC'] == "MURDER & NON-NEGL. MANSLAUGHTER"].copy()

#Create a crosstab for borougs and year
murder_cross = pd.crosstab(Comp_murder.BORO_NM, Comp_murder.Year)

display(murder_cross)

## 3. Data Analysis

### 3.1 Describe your data analysis and explain what you've learned about the datasets

In our crime study, we focused on understanding the impact of COVID-19 on crime complaints and arrests in NYC from 2019 to 2022, with a specific focus on the ten most affected crime categories: sex crimes, dangerous drugs, burglary, larceny, assault, murder, rape, robbery, prostitution, and disorderly conduct.

Our data analysis involved various methods, including cross tables, groupings, and visualizations such as bar and line plots, to gain insights and uncover meaningful patterns.

### 3.1.1 The Data Analysis

The data analysis can be divided into five parts:

#### Yearly Overview:

We examined the yearly trends in crime complaints and arrests, analyzing both the actual numbers and the yearly
percentage changes. This allowed us to observe the overall development of crime during the pandemic.
  
#### Crime Levels in the 5 Boroughs:

We examined the yearly trends in crime complaints and arrests, analyzing both the actual numbers and the yearly
percentage changes. This allowed us to observe the overall development of crime during the pandemic.

#### Breakdown of the 10 Most Affected Crimes:

We delved into the ten most affected crime categories and examined specific trends within prostitution, murder,
and burglary. These categories did not follow the overall crime trends, warranting further investigation.

#### Geographic Analysis:

We conducted a geographic analysis of prostitution, murder, and burglary in the five boroughs of NYC. This provided 
insights into the spatial distribution and potential localized patterns of these crimes.

#### Impact of COVID-19:

Finally, we assessed the current state of crime in NYC and determined whether COVID-19 had any significant impact on 
the crime level and trends in 2023.

  
### 3.1.2 What have we learned about the datasets? 
#### Incorrect dates:

We discovered a significant number of incorrect dates in the complaint dataset, which led us to exclude those 
entries by replacing them with NaN values. This issue could result in the reported crime levels appearing smaller 
than the actual numbers, especially if the incorrect dates consistently pertain to specific focus crimes or boroughs.

#### Missing information in the borough column:

The Murder category in the complaints dataset has a high number of missing values for the borough where the murders 
took place. This missing information limits our ability to analyze the Murder category accurately when examining the 
five boroughs separately.

#### Visibility:

In general, the New York City Police (NYCP) registers and makes a lot of information about crime levels, offenders, 
recipients, etc., publicly available, providing higher visibility. This level of visibility and transparency is not 
as prevalent in Denmark, where the police mostly communicate crime numbers to the public.

## 4. Genre - which genre of data story did you use?

We selected the Magazine Style genre for our crime study because it effectively tells our story and allows us to combine text with interactive data visualizations.

### 4.1 Visual Narrative - Which tools did you use from each of the 3 categories? Why?
When it comes to Visual Narrative, which refers to the visual devices that assist and facilitate the narrative (Segal & Heer, 2010), we employed different tools from the three categories: 1) Visual Structuring, 2) Highlighting, and 3) Transition Guidance.

#### Visual Structuring:
For visual structuring, we utilized two types of techniques in our visualizations:
- Consistent Visual Platform 
- Time Bar

To ensure consistency and ease of navigation for readers, all our visualizations feature an interactive Consistent Visual Platform. This means that while the content within each plot changes, the general layout and visual elements remain intact. Additionally, for our maps, we incorporated a timeline slider, allowing readers to track the progress and explore the visualization.

#### Highlighting: 
Regarding highlighting, we employed one method in our visualizations:
- Close-Ups

Most of our visualizations consist of a drop-down menu and a right panel that enables readers to select or deselect specific data. This approach encourages viewers to engage with the native story as a starting point and explore the data independently, generating their own close-ups and uncovering interesting insights within the crime data.

#### Transition Guidance:
In terms of transition guidance, we utilized one technique in our visualizations:
- Animated Transitions

Each visualization includes animated transitions for the titles, ensuring they are linked to the corresponding plot and shift appropriately when the plot changes. This helps maintain viewer orientation throughout the narrative.


### 4.2 Narrative Structure - Which tools did you use from each of the 3 categories? Why?
Regarding narrative structure, which encompasses the tactics used by each visualization or non-visual mechanisms that assist and facilitate the narrative (Segal & Heer, 2010), we employed tools from the three categories: 1) Ordering, 2) Highlighting, and 3) Messaging.

#### Ordering:
For ordering, we utilized two tools in our data story:
- Linear
- User Directed Path

We adopted an overall linear storytelling structure, where visualizations support the narrative. However, we also provided a User Directed Path, allowing readers to create their own data story by interacting with the interactive plots.

#### Interactivity: 
In terms of interactivity, we employed two tools in our data story:
- Filtering and Selection
- Navigation Buttons

We incorporated various forms of interactivity in our data story, allowing users to manipulate the plots. Most plots offer drop-down menus for selecting and filtering data, and the interactive right panel functions as navigation buttons, enabling users to select or unselect specific data categories.

#### Messaging: 
When it comes to messaging, we employed several tools in our data story:
- Introductory Text
- Captions / Headlines
- Accompanying Article
- Summary

To effectively communicate with readers, we employed various messaging methods, including introductory text at the beginning, headlines for each chapter, labels in all our plots, valuable insights from accompanying articles, and a summary at the end to summarize our findings. We also used colors to seperate complaints form arrests in the visualizations. 

## 5. Visualizations

#### 5.1 Explain the visualizations you've chosen
We have chosen a combination of interactive maps, bar plots, and line plots to effectively present our data story. Each visualization serves a specific purpose in conveying the main findings and providing a comprehensive exploration of the crime data.

The interactive maps with a colorbar allow us to showcase the number of crime complaints and arrests per 100,000 residents in NYC's boroughs. By incorporating a timeline slider, readers can observe how the crime level has evolved in each borough over time. The use of colors facilitates easy identification of boroughs with the highest and lowest crime rates.

In addition to the maps, we have utilized interactive bar and line plots to highlight the number of crime complaints and arrests. These visualizations present the data in both total numbers and yearly percentage change. The bar plots effectively demonstrate the relationship between the two datasets, while the line plots allow for a simultaneous display of multiple focus crimes and their changes over time. The breaks in the lines make it easier to identify significant shifts or trends.

#### 5.2 Why are they right for the story you want to tell?
The chosen visualizations are appropriate for the story we want to tell for several reasons. The interactive maps provide a geographical context, allowing readers to understand the distribution of crime across NYC's boroughs. The timeline slider enables them to explore temporal patterns and draw comparisons between different periods.

Bar plots are effective in illustrating the relationship between crime complaints and arrests. They provide a clear visual representation of the quantitative differences and allow readers to discern any discrepancies or correlations.

Line plots are well-suited for visualizing the changes in multiple focus crimes over time. By presenting the data as continuous lines, it becomes easier to identify variations and spot any significant shifts or patterns. The inclusion of multiple focus crimes in a single plot enables readers to make comparisons and draw insights about crime trends.

Overall, the chosen visualizations offer a comprehensive and interactive experience that enables readers to delve deeper into the crime data, understand the story being presented, and make informed interpretations.

##### Please note that the code for the interactive plots can be found below.

## Interactive plots

This section provides the opportunity to delve into the code underlying the interactive plots showed in our data story.

### 5.3 Yearly Overview

#### 5.3.1 Yearly overview for complaint and arrest as bar plots ('total number' + '% change')

In [None]:
# Initializing figure object
fig = go.Figure()

# Adding complaint total
fig.add_trace(
    go.Bar(
        y=list(complaint_year.values),
        x=list(complaint_year.index),
        name='Complaints',
        visible=True,
        orientation='v',
        marker=dict(color='#007f4b'), 
        hovertemplate='%{y}'
    )
)

# Adding complaint %
fig.add_trace(
    go.Bar(
        y=list(complaint_year_pct.values),
        x=list(complaint_year_pct.index),
        name='Complaints',
        visible=False,
        orientation='v',
        marker=dict(color='#007f4b'), 
        hovertemplate='%{y:.2f}%'
    )
)

# Adding arrest total
fig.add_trace(
    go.Bar(
        y=list(arrest_year.values),
        x=list(arrest_year.index),
        name='Arrests',
        visible=True,
        orientation='v',
        marker=dict(color='#0047ab'),  
        hovertemplate='%{y}'
    )
)

# Adding arrest %
fig.add_trace(
    go.Bar(
        y=list(arrest_year_pct.values),
        x=list(arrest_year_pct.index),
        name='Arrests',
        visible=False,
        orientation='v',
        marker=dict(color='#0047ab'), 
        hovertemplate='%{y:.2f}%'
    )
)

fig.update_layout(
    updatemenus=[
        dict(
            xanchor='left',
            x=0,
            yanchor='top',
            y=1.2,
            active=0,
            buttons=list([
                dict(label="Total number",
                     method="update",
                     args=[{"visible": [True, False, True, False]},
                           {"title": "<b>Yearly number of complaints/arrests, 2019-2022</b>",
                            "yaxis": {"title": "No. of complaints/arrests", "ticksuffix": " ", "title_font": {"size": 1}, "tickfont": {"size": 12}}}]
                    ),
                dict(label="% change",
                     method="update",
                     args=[{"visible": [False, True, False, True]},
                           {"title": "<b>Yearly change in %, 2019-2022</b>",
                            "yaxis": {"title": "Yearly change in %", "ticksuffix": "%", "title_font": {"size": 13}, "tickfont": {"size": 12}}, "yaxis2": {"overlaying": "y", "side": "right"}}]
                    ),
            ]),
        )
    ],
    title='<b>Yearly number of complaints/arrests, 2019-2022<b>',
    title_x=0.48,
    legend=dict(title="<b>Click to select/deselect:<b>", font=dict(size=11)),
    margin=dict(t=50),
    xaxis=dict(tickmode='linear', tick0=2019, dtick=1),
    yaxis=dict(title='No. of complaints/arrests', ticksuffix=' ', title_font=dict(size=14, color='#444')),
    yaxis2=dict(title='Yearly change in %', ticksuffix='%', overlaying='y', side='right', title_font=dict(size=14, color='#444'), hoverformat=".2f"),
    hovermode='x unified'  # To show all hover data at once
)

fig.show()

pio.write_html(fig, file='Complaint_arrest_overview.html', auto_open=True)

#### 5.3.2 Yearly overview for complaint and arrest as line plots ('share in %')

In [None]:
# Initializing figure object
fig = go.Figure()

# Define a color map for the 10 focus crimes
color_map = {'ASSAULT 3 & RELATED OFFENSES': '#1f77b4', 'BURGLARY': '#17becf', 'DANGEROUS DRUGS': '#ff7f0e',
             'DISORDERLY CONDUCT': '#2ca02c', 'LARCENY': '#7f7f7f', 'MURDER & NON-NEGL. MANSLAUGHTER': '#9467bd',
             'PROSTITUTION & RELATED OFFENSES': '#8c564b', 'RAPE': '#e377c2', 'ROBBERY': '#d62728',
             'SEX CRIMES': '#bcbd22'}

# Adding traces for complaints and arrests for each focus crime
for fc in focuscrimes:
    fig.add_trace(
        go.Scatter(
            x=complaint_focuscrime_share.index,
            y=complaint_focuscrime_share[fc],
            name=f'{fc}',
            mode='lines+markers',
            hovertemplate='%{y:.2f}%',
            line=dict(color=color_map[fc]),  # Updated line color definition
            visible=True
        )
    )

    fig.add_trace(
        go.Scatter(
            x=arrest_focuscrime_share.index,
            y=arrest_focuscrime_share[fc],
            name=f'{fc}',
            mode='lines+markers',
            hovertemplate='%{y:.2f}%',
            line=dict(color=color_map[fc]),  # Updated line color definition
            visible=False
        )
    )

# Adding dropdown for selecting complaints or arrests
fig.update_layout(
    updatemenus=[
        dict(
            direction="down",
            showactive=True,
            xanchor='left',
            yanchor='top',
            x=0,
            y=1.2,
            active=0,
            buttons=list([
                dict(
                    label="Complaints",
                    method="update",
                    args=[{"visible": [True, False] * len(focuscrimes)}] +
                         [{"title": "<b>Yearly share in % for complaints for each focus crime, 2019-2022</b>",
                           "ticksuffix": "%", "title_font": {"size": 12}, "tickfont": {"size": 12}}]
                ),

                dict(
                    label="Arrests",
                    method="update",
                    args=[{"visible": [False, True] * len(focuscrimes)}] +
                         [{"title": "<b>Yearly share in % for arrests for each focus crime, 2019-2022</b>",
                           "ticksuffix": "%", "title_font": {"size": 12}, "tickfont": {"size": 12}}]
                ),
            ]),
        ),
    ],
)

# Updating layout
fig.update_layout(
    title={
        'text': "<b>Yearly share in % of complaints for each focus crime</b>",
        'x': 0.52,  # Centered title
        'xanchor': 'center',
    },
    xaxis_title="Year",
    yaxis_title="Share (%)",
    legend=dict(title="<b>Click to select/deselect:</b>", font=dict(size=11)),
    margin=dict(l=50, r=50, t=80, b=50),
    xaxis=dict(tickmode='array', tickvals=[0,1,2,3], ticktext=['2019', '2020', '2021', '2022']),
    yaxis=dict(title='Yearly share in %', ticksuffix='%', title_font=dict(size=14, color='#444')),
    yaxis2=dict(title='Yearly change in %', ticksuffix='%', overlaying='y', side='right', title_font=dict(size=14, color='#444'),hoverformat=".0f"),
    hovermode="x",
)

fig.show()

pio.write_html(fig, file='Complaint_arrest_focus_share.html', auto_open=True)

### 5.4 Crime Levels in the 5 Boroughs - Choropleth Maps

In [None]:
#Preparing the geometries for Borough Boundaries
Boro = gpd.read_file('C:/Users/Emili/Desktop/Borough Boundaries.geojson')

#The index of the json has to be the borough name
Boro.index = Boro['boro_name']

#Choropleth mapbox accepts a json for the geometries of borough
Boro_json = json.loads(Boro.to_json())

#### 5.4.1 Complaints per 100,000 residents

In [None]:
# Group the complaints data by Borough Boundaries and year and count the number of complaints
df_compmap = df_complaint.groupby(['BORO_NM', 'Year']).size().reset_index(name='Number of complaints')

# replace the string '(null)' with NaN values
df_compmap = df_compmap.replace('(null)', pd.NA)

#reset index and dropping NA values 
df_compmap = df_compmap.reset_index().rename(columns={'BORO_NM': 'Borough'}).dropna()

# replace boroughs with capitalized letters so that it matches with the Borough Boundaries
df_compmap['Borough']  = df_compmap['Borough'].str.capitalize().str.replace('island', 'Island')

In [None]:
# Define a dictionary with the population for each year and borough
population = {'Bronx': {2019: 1418207, 2020: 1461125, 2021: 1421089, 2022: 1379946},
              'Brooklyn': {2019: 2559903, 2020: 2719044, 2021: 2637486, 2022: 2590516},
              'Manhattan': {2019: 1628706, 2020: 1677306, 2021: 1578801, 2022: 1596273},
              'Queens': {2019: 2253858, 2020: 2388586, 2021: 2328141, 2022: 2278029},
              'Staten Island': {2019: 476143, 2020: 494586, 2021: 493484, 2022: 449133}}

#Soruce:
#https://www.nyc.gov/assets/planning/download/pdf/planning-level/nyc-population/population-estimates/current-population-estimates-2022.pdf?r=a
#https://www.nyc.gov/assets/planning/download/pdf/planning-level/nyc-population/population-estimates/current-population-estimates-2019.pdf

# Define a function to calculate complaints for 100,000 residents
def per_capita_complaints(row):
    borough = row['Borough']
    year = row['Year']
    complaints = row['Number of complaints']
    
    # Calculate number of complaints per 100,000 residents
    return complaints / population[borough][year] * 100000

# Use the function to calculate complaints per 100,000 residents for each row
df_compmap['Complaints per 100k'] = df_compmap.apply(per_capita_complaints, axis=1).apply(round).apply(int)

In [None]:
#using plotly for an animated choropleth map
fig = px.choropleth_mapbox(data_frame=df_compmap,
                           geojson=Boro_json,
                           locations=df_compmap.Borough,
                           color='Complaints per 100k',
                           center={"lat": 40.7250, "lon": -73.9851},
                           mapbox_style='carto-positron',
                           zoom=9,
                           color_continuous_scale='blues',
                           range_color=(1000, 5500),
                           animation_frame='Year',
                           width=800,
                           height=600)

# Set the animation frame to 2019
fig.layout.sliders[0].active = 8


fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

# Display the plot
fig.show()

# Write to HTML file with animation initially paused
pio.write_html(fig, file='Complaints_map_per_100.html', auto_open=True, auto_play=False)

#### 5.2.2 Arrest per 100,000 residents

In [None]:
# Group the arrests data by borough and year and count the number of arrests
df_arrmap = df_arrest.groupby(['ARREST_BORO', 'Year']).size().reset_index(name='Number of arrests')

df_arrmap = df_arrmap.reset_index().rename(columns={'ARREST_BORO': 'ARREST_BORO'})

# Replacing ARREST_BORO names with borough names 
df_arrmap["ARREST_BORO"].replace({"B": "Bronx", "S": "Staten Island","M": "Manhattan"
                               ,"K": "Brooklyn","Q": "Queens"}, inplace=True)

In [None]:
# Define a dictionary with the population for each year and borough
population = {'Bronx': {2019: 1418207, 2020: 1461125, 2021: 1421089, 2022: 1379946},
              'Brooklyn': {2019: 2559903, 2020: 2719044, 2021: 2637486, 2022: 2590516},
              'Manhattan': {2019: 1628706, 2020: 1677306, 2021: 1578801, 2022: 1596273},
              'Queens': {2019: 2253858, 2020: 2388586, 2021: 2328141, 2022: 2278029},
              'Staten Island': {2019: 476143, 2020: 494586, 2021: 493484, 2022: 449133}}

#  Define a function to calculate arrest for 100,000 residents
def per_capita_arrests(row):
    borough = row['ARREST_BORO']
    year = row['Year']
    arrests = row['Number of arrests']
    
    # calculate arrests per 100.000 residents
    return arrests / population[borough][year] * 100000

# Use the function to calculate arrests per 100,000 residents for each row
df_arrmap['Arrests per 100k'] = df_arrmap.apply(per_capita_arrests, axis=1).apply(round).apply(int)

In [None]:
#using plotly for an animated choropleth map
fig2 = px.choropleth_mapbox(data_frame=df_arrmap,
                           geojson=Boro_json,
                           locations=df_arrmap.ARREST_BORO,
                           color='Arrests per 100k',
                           center={"lat": 40.7250, "lon": -73.9851},
                           mapbox_style='carto-positron',
                           zoom=9,
                           color_continuous_scale='greens',
                           range_color=(500, 2000),
                           animation_frame='Year',
                           width=800,
                           height=600)

# Set the animation frame to 2019
fig2.layout.sliders[0].active = 8


fig2.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

# Display the plot
fig2.show()

#Write to html file
pio.write_html(fig2, file='Arrest_map_per_100.html', auto_open=True, auto_play=False)

### 5.3 Breakdown of the 10 Most Affected Crimes - Bar plots

#### 5.3.1 Yearly number of complaints and arrest for each focus crime ('Total number')

In [None]:
# Initializing figure object
fig = go.Figure()

# Define a color map for the 10 focus crimes
color_map = {'ASSAULT 3 & RELATED OFFENSES': '#1f77b4','BURGLARY': '#17becf','DANGEROUS DRUGS': '#ff7f0e', 'DISORDERLY CONDUCT': '#2ca02c',
             'LARCENY': '#7f7f7f', 'MURDER & NON-NEGL. MANSLAUGHTER': '#9467bd',
             'PROSTITUTION & RELATED OFFENSES': '#8c564b', 'RAPE': '#e377c2',
             'ROBBERY': '#d62728', 'SEX CRIMES': '#bcbd22'}

# Pivot data to create separate sets of data for each focus crime
df_complaint_pivot = pd.pivot_table(df_complaint, values='CMPLNT_NUM', index='Year', columns='OFNS_DESC', aggfunc='count')
df_arrest_pivot = pd.pivot_table(df_arrest, values='ARREST_KEY', index='Year', columns='OFNS_DESC', aggfunc='count')

# Adding data traces for each focus crime
for i, focuscrime in enumerate(df_complaint_pivot.columns):
    visible = [False] * (2 * len(df_complaint_pivot.columns))
    if i == 0:
        visible[0] = False
        visible[1] = True
    fig.add_trace(
        go.Bar(
            x=df_complaint_pivot.index,
            y=df_complaint_pivot[focuscrime],
            name=f"Complaints: {focuscrime} ",
            hovertemplate='%{y}',
            marker=dict(color=color_map[focuscrime]),
            visible=visible[0]
        )
    )
    
    fig.add_trace(
        go.Bar(
            x=df_arrest_pivot.index,
            y=df_arrest_pivot[focuscrime],
            name=f"Arrests: {focuscrime}",
            marker=dict(color='#0047ab'),
            hovertemplate='%{y}',
            visible=visible[1]
        )
    )

# Make all data visible by default
fig.update_traces(visible=True)

# Create dropdown menu
dropdown_menu = []
for focuscrime in ['Select a focuscrime'] + list(df_complaint_pivot.columns):
    visible = [False] * (2 * len(df_complaint_pivot.columns))
    if focuscrime == 'Select a focuscrime':
        title_text = "<b>Yearly number of complaints/arrests, 2019-2022<b>"
    else:
        idx = df_complaint_pivot.columns.get_loc(focuscrime)
        visible[2 * idx] = True
        visible[2 * idx + 1] = True
        title_text = f"<b>Yearly number of complaints/arrests, 2019-2022<b>"
        
    dropdown_menu.append(
        dict(
            label=focuscrime,
            method='update',
            args=[
                {'visible': visible},
                {'title': title_text}
            ]
        )
    )

# Add dropdown menu to layout
fig.update_layout(
    title={
        'text': "<b>Yearly number of complaints/arrests, 2019-2022<b>",
        'x': 0.65,  # Centered title
        'xanchor': 'center',
    },
    updatemenus=[dict(
        buttons=dropdown_menu,
        direction='down',
        pad={'r': 10, 't': 10},
        showactive=True,
        x=0.0,
        xanchor='left',
        y=1.2,
        yanchor='top'
    )],
    #title='<b>Yearly number of complaints/arrests, 2019-2022<b>',
    #title_x=0.56,
    legend=dict(title="<b>Click to select/deselect:<b>", font=dict(size=11)),
    margin=dict(t=50),
    xaxis=dict(tickmode='linear', tick0=2019, dtick=1),
    yaxis=dict(title='No. of complaints/arrests', ticksuffix=' ', title_font=dict(size=14, color='#444')),
    hovermode='x unified'  # To show all hover data at once
)


fig.show()

pio.write_html(fig, file='Complaint_arrest_focus_total.html', auto_open=True)

#### 5.3.1 Yearly number of complaints and arrest for each focus crime ('% change')

In [None]:
# Initializing figure object
fig = go.Figure()

# Define a color map for the 10 focus crimes
color_map = {'ASSAULT 3 & RELATED OFFENSES': '#1f77b4','BURGLARY': '#17becf','DANGEROUS DRUGS': '#ff7f0e', 'DISORDERLY CONDUCT': '#2ca02c',
             'LARCENY': '#7f7f7f', 'MURDER & NON-NEGL. MANSLAUGHTER': '#9467bd',
             'PROSTITUTION & RELATED OFFENSES': '#8c564b', 'RAPE': '#e377c2',
             'ROBBERY': '#d62728', 'SEX CRIMES': '#bcbd22'}

# Adding data traces for each focus crime
for focuscrime in df_complaint_pivot.columns:
    df_complaint_pivot_pct = df_complaint_pivot[focuscrime].pct_change().mul(100)
    df_arrest_pivot_pct = df_arrest_pivot[focuscrime].pct_change().mul(100)
    
    # Drop the row for the year 2019
    df_complaint_pivot_pct = df_complaint_pivot_pct.drop(2019)
    df_arrest_pivot_pct = df_arrest_pivot_pct.drop(2019)
    
    fig.add_trace(
        go.Bar(
            x=df_complaint_pivot.index,
            y=df_complaint_pivot_pct,
            name=f"Complaints: {focuscrime}",
            marker=dict(color=color_map[focuscrime]),
            hovertemplate='%{y:.1f}%',
            visible=False
        )
    )

    fig.add_trace(
        go.Bar(
            x=df_arrest_pivot.index,
            y=df_arrest_pivot_pct,
            name=f"Arrests: {focuscrime}",
            marker=dict(color='#0047ab'),
            hovertemplate='%{y:.1f}%',
            visible=False
        )
    )

# Make all data visible by default
fig.update_traces(visible=True)

# Create dropdown menu
dropdown_menu = []
for focuscrime in ['Select a focuscrime'] + list(df_complaint_pivot.columns):
    visible = [False] * (2 * len(df_complaint_pivot.columns))
    if focuscrime == 'Select a focuscrime':
        title_text = "<b>Yearly change in % for complaints/arrests, 2019-2022</b>"
    else:
        idx = df_complaint_pivot.columns.get_loc(focuscrime)
        visible[2 * idx] = True
        visible[2 * idx + 1] = True
        title_text = f"<b>Yearly change in % for complaints/arrests, 2019-2022</b>"
        
    dropdown_menu.append(
        dict(
            label=focuscrime,
            method='update',
            args=[
                {'visible': visible},
                {'title': title_text}
            ]
        )
    )

# Add dropdown menu to layout
fig.update_layout(
    title={
        'text': "<b>Yearly change in % for complaints/arrests, 2019-2022</b>",
        'x': 0.66,  # Centered title
        'xanchor': 'center',
    },
    updatemenus=[dict(
        buttons=dropdown_menu,
        direction='down',
        pad={'r': 10, 't': 10},
        showactive=True,
        x=0.0,
        xanchor='left',
        y=1.15,
        yanchor='top'
    )],
    legend=dict(title="<b>Click to select/deselect:<b>", font=dict(size=11)),
    margin=dict(t=50),
    xaxis=dict(tickmode='array', tickvals=[2019,2020,2021], ticktext=['2020', '2021', '2022']),
    yaxis=dict(title='Yearly change in %', ticksuffix='%'),
    hovermode='x unified'  # To show all hover data at once
)

fig.show()

pio.write_html(fig, file='Complaint_arrest_focus_%.html', auto_open=True)

### 5.4 Geographic Analysis - Bar plots

#### 5.4.1 Boroughs and focuscrimes - complaint

In [None]:
#New complaint dataframe for each borough
Comp_Bronx2= df_complaint[df_complaint['BORO_NM'] == "BRONX"].copy()
Comp_Brooklyn2= df_complaint[df_complaint['BORO_NM'] == "BROOKLYN"].copy()
Comp_Manhattan2= df_complaint[df_complaint['BORO_NM'] == "MANHATTAN"].copy()
Comp_Queens2= df_complaint[df_complaint['BORO_NM'] == "QUEENS"].copy()
Comp_Staten2 = df_complaint[df_complaint['BORO_NM'] == "STATEN ISLAND"].copy()

#New pivot for each borough
df_bronx_pivot = pd.pivot_table(Comp_Bronx2, values='CMPLNT_NUM', index='Year', columns='OFNS_DESC', aggfunc='count')
df_brooklyn_pivot = pd.pivot_table(Comp_Brooklyn2, values='CMPLNT_NUM', index='Year', columns='OFNS_DESC', aggfunc='count')
df_manhattan_pivot = pd.pivot_table(Comp_Manhattan2, values='CMPLNT_NUM', index='Year', columns='OFNS_DESC', aggfunc='count')
df_queens_pivot = pd.pivot_table(Comp_Queens2, values='CMPLNT_NUM', index='Year', columns='OFNS_DESC', aggfunc='count')
df_staten_pivot = pd.pivot_table(Comp_Staten2, values='CMPLNT_NUM', index='Year', columns='OFNS_DESC', aggfunc='count')

In [None]:
fig = go.Figure()   

# Adding data traces for each focus crime
for i, focuscrime in enumerate(df_bronx_pivot.columns):
    visible = [False] * (5 * len(df_bronx_pivot.columns))
    if i == 0:
        visible[0] = False
        visible[1] = True
        visible[2] = True
        visible[3] = True
        visible[4] = True
        
    fig.add_trace(
        go.Bar(
            x=df_bronx_pivot.index,
            y=df_bronx_pivot[focuscrime],
            name=f"Bronx: {focuscrime} ",
            hovertemplate='%{y}',
            marker=dict(color='#0047ab'),
            visible=visible[0]
        )
    )
   
    fig.add_trace(
        go.Bar(
            x=df_brooklyn_pivot.index,
            y=df_brooklyn_pivot[focuscrime],
            name=f"Brooklyn: {focuscrime}",
            marker=dict(color='#91c4e6'),
            hovertemplate='%{y}',
            visible=visible[1]
        )
    )
    fig.add_trace(
        go.Bar(
            x=df_manhattan_pivot.index,
            y=df_manhattan_pivot[focuscrime],
            name=f"Manhattan: {focuscrime}",
            marker=dict(color='#f8b17d'),
            hovertemplate='%{y}',
            visible=visible[2]
        )
    ) 
    fig.add_trace(
        go.Bar(
            x=df_queens_pivot.index,
            y=df_queens_pivot[focuscrime],
            name=f"Queens: {focuscrime}",
            marker=dict(color='#ffd571'),
            hovertemplate='%{y}',
            visible=visible[3]
        )
     )
    fig.add_trace(
        go.Bar(
            x=df_staten_pivot.index,
            y=df_staten_pivot[focuscrime],
            name=f"Staten Island: {focuscrime}",
            marker=dict(color='#a3d9b1'),
            hovertemplate='%{y}',
            visible=visible[4]
        )
     )
    
# Make all data visible by default
fig.update_traces(visible=True)

# Create dropdown menu
dropdown_menu = []
for focuscrime in ['Select a focuscrime'] + list(df_bronx_pivot.columns):
    visible = [False] * (5 * len(df_bronx_pivot.columns))
    if focuscrime == 'Select a focuscrime':
        title_text = "<b>Yearly number of complaints in the five boroughs<b>"
    else:
        idx = df_bronx_pivot.columns.get_loc(focuscrime)
        visible[5 * idx] = True
        visible[5 * idx + 1] = True
        visible[5 * idx + 2] = True
        visible[5 * idx + 3] = True
        visible[5 * idx + 4] = True
        title_text = f"<b>Yearly number of complaints in the five boroughs"
    dropdown_menu.append(
        dict(
            label=focuscrime,
            method='update',
            args=[
                {'visible': visible},
                {'title': title_text}
            ]
        )
    )

# Add dropdown menu to layout
fig.update_layout(
    title={
        'text': "<b>Yearly number of complaints in the five boroughs<b>",
        'x': 0.65,  # Centered title
        'xanchor': 'center',
    },
    updatemenus=[dict(
        buttons=dropdown_menu,
        direction='down',
        pad={'r': 10, 't': 10},
        showactive=True,
        x=0.0,
        xanchor='left',
        y=1.2,
        yanchor='top'
    )],
    #title='<b>Yearly number of complaints in the five boroughs, 2019-2022<b>',
    #title_x=0.56,
    legend=dict(title="<b>Click to select/deselect:<b>", font=dict(size=11)),
    margin=dict(t=50),
    xaxis=dict(tickmode='linear', tick0=2019, dtick=1),
    yaxis=dict(title='No. of complaints', ticksuffix=' ', title_font=dict(size=14, color='#444')),
    hovermode='x unified'  # To show all hover data at once
)


fig.show()

#Write to html file
pio.write_html(fig, file='Complaint_Borough.html', auto_open=True)

#### 5.4.2 Boroughs and focuscrimes - arrest

In [None]:
#New arrest dataframe for each borough
Arrest_Bronx= df_arrest[df_arrest['ARREST_BORO'] == "B"].copy()
Arrest_Brooklyn= df_arrest[df_arrest['ARREST_BORO'] == "K"].copy()
Arrest_Manhattan= df_arrest[df_arrest['ARREST_BORO'] == "M"].copy()
Arrest_Queens= df_arrest[df_arrest['ARREST_BORO'] == "Q"].copy()
Arrest_Staten = df_arrest[df_arrest['ARREST_BORO'] == "S"].copy()

#New pivot for each borough
arrest_bronx_pivot = pd.pivot_table(Arrest_Bronx, values='ARREST_KEY', index='Year', columns='OFNS_DESC', aggfunc='count')
arrest_brooklyn_pivot = pd.pivot_table(Arrest_Brooklyn, values='ARREST_KEY', index='Year', columns='OFNS_DESC', aggfunc='count')
arrest_manhattan_pivot = pd.pivot_table(Arrest_Manhattan, values='ARREST_KEY', index='Year', columns='OFNS_DESC', aggfunc='count')
arrest_queens_pivot = pd.pivot_table(Arrest_Queens, values='ARREST_KEY', index='Year', columns='OFNS_DESC', aggfunc='count')
arrest_staten_pivot = pd.pivot_table(Arrest_Staten, values='ARREST_KEY', index='Year', columns='OFNS_DESC', aggfunc='count')

In [None]:
fig = go.Figure()   

# Adding data traces for each focus crime
for i, focuscrime in enumerate(arrest_bronx_pivot.columns):
    visible = [False] * (5 * len(arrest_bronx_pivot.columns))
    if i == 0:
        visible[0] = False
        visible[1] = True
        visible[2] = True
        visible[3] = True
        visible[4] = True
        
    fig.add_trace(
        go.Bar(
            x=arrest_bronx_pivot.index,
            y=arrest_bronx_pivot[focuscrime],
            name=f"Bronx: {focuscrime} ",
            hovertemplate='%{y}',
            marker=dict(color='#0047ab'),
            visible=visible[0]
        )
    )
   
    fig.add_trace(
        go.Bar(
            x=arrest_brooklyn_pivot.index,
            y=arrest_brooklyn_pivot[focuscrime],
            name=f"Brooklyn: {focuscrime}",
            marker=dict(color='#91c4e6'),
            hovertemplate='%{y}',
            visible=visible[1]
        )
    )
    fig.add_trace(
        go.Bar(
            x=arrest_manhattan_pivot.index,
            y=arrest_manhattan_pivot[focuscrime],
            name=f"Manhattan: {focuscrime}",
            marker=dict(color='#f8b17d'),
            hovertemplate='%{y}',
            visible=visible[2]
        )
    ) 
    fig.add_trace(
        go.Bar(
            x=arrest_queens_pivot.index,
            y=arrest_queens_pivot[focuscrime],
            name=f"Queens: {focuscrime}",
            marker=dict(color='#ffd571'),
            hovertemplate='%{y}',
            visible=visible[3]
        )
     )
    fig.add_trace(
        go.Bar(
            x=arrest_staten_pivot.index,
            y=arrest_staten_pivot[focuscrime],
            name=f"Staten Island: {focuscrime}",
            marker=dict(color='#a3d9b1'),
            hovertemplate='%{y}',
            visible=visible[4]
        )
     )
    
# Make all data visible by default
fig.update_traces(visible=True)

# Create dropdown menu
dropdown_menu = []
for focuscrime in ['Select a focuscrime'] + list(arrest_bronx_pivot.columns):
    visible = [False] * (5 * len(arrest_bronx_pivot.columns))
    if focuscrime == 'Select a focuscrime':
        title_text = "<b>Yearly number of arrests in the five boroughs<b>"
    else:
        idx = arrest_bronx_pivot.columns.get_loc(focuscrime)
        visible[5 * idx] = True
        visible[5 * idx + 1] = True
        visible[5 * idx + 2] = True
        visible[5 * idx + 3] = True
        visible[5 * idx + 4] = True
        title_text = f"<b>Yearly number of arrests in the five boroughs"
    dropdown_menu.append(
        dict(
            label=focuscrime,
            method='update',
            args=[
                {'visible': visible},
                {'title': title_text}
            ]
        )
    )

# Add dropdown menu to layout
fig.update_layout(
    title={
        'text': "<b>Yearly number of arrests in the five boroughs<b>",
        'x': 0.65,  # Centered title
        'xanchor': 'center',
    },
    updatemenus=[dict(
        buttons=dropdown_menu,
        direction='down',
        pad={'r': 10, 't': 10},
        showactive=True,
        x=0.0,
        xanchor='left',
        y=1.2,
        yanchor='top'
    )],
    #title='<b>Yearly number of arrests in the five boroughs, 2019-2022<b>',
    #title_x=0.56,
    legend=dict(title="<b>Click to select/deselect:<b>", font=dict(size=11)),
    margin=dict(t=50),
    xaxis=dict(tickmode='linear', tick0=2019, dtick=1),
    yaxis=dict(title='No. of arrests', ticksuffix=' ', title_font=dict(size=14, color='#444')),
    hovermode='x unified'  # To show all hover data at once
)


fig.show()

#Write to html file
pio.write_html(fig, file='Arrest_borough.html', auto_open=True)

## 6. Discussion - think critically about your creation

### 6.1 What went well?

#### Informative interactive visualizations:

We created informative and interactive visualizations that allowed for exploration and valuable insights from the 
datasets. Each plot incorporated interactive features such as drop-down menus, interactive right panels, and 
timeline sliders for a comprehensive analysis of the crime data. The interactive nature of the visualizations 
enabled viewers to engage with the data and conduct their own exploration.

#### Deep dive focus on three specific crimes:

By focusing on three specific crimes (Prostitution, Murder, and Burglary), we provided a deeper understanding of 
their development and the external factors that may have influenced them. This focused approach helped maintain a 
focused data story.


### 6.2 What is still missing? What could be improved? Why?

#### Analysis is at an overview level:

The analysis could have delved deeper into the factors influencing the observed trends instead of presenting a high-
level overview. More in-depth investigations and explanations of the underlying reasons for the trends would have 
added depth to the analysis.

#### Limited pre-COVID-19 data:

Since the analysis focused only on the years 2019-2022, there is limited insight into the trends before the COVID-19 
pandemic. Understanding the baseline crime levels and comparing them to the pandemic years would provide a more 
comprehensive analysis.

#### Focus only on increasing crimes in 2020:

The analysis only considered the crimes that experienced an increase in complaints from 2019 to 2020. Exploring the 
reasons behind the decreases in the other seven focus crimes during the same period would provide a more balanced 
understanding of the overall crime landscape.
  
#### No data for 2023

The analysis does not include data for 2023, relying instead on insights from articles and media sources. Including 
updated data for the current year would enhance the accuracy and relevance of the analysis.

#### Limited availability of articles and information about COVID-19's impact on crime:

Finding valuable articles and up-to-date information specifically focused on the influence of COVID-19 on crime 
levels in New York City has been challenging. It seems that many articles in this area may be removed.

## 7. Contributions - who did what?

### 7.1 Data Collection and Preparation:
We collectively collected the datasets from New York Police Department (NYPD) containing crime complaints and arrests in NYC from 2006 to 2022 and performed data cleaning and preprocessing tasks. The data cleaning included merging the datasets, filtering data for the desired time period (2019-2022), merging similar crime categories, addressing spelling inconsistencies, and handling incorrect dates.

### 7.2 Data Analysis:
The data analysis phase involved a collaborative effort. Cross tables, groupings, and statistical analysis techniques were applied to explore and extract insights from the datasets. Visualization methods such as bar plots and charts were utilized to present the data distribution, identify patterns and figure out which story to tell to the reader.

### 7.3 Overview and Trends:
Each team member has taken the lead on different parts of the data analysis, examining how COVID-19 has impacted the crime levels, including crime complaints and arrests, in New York City (NYC).


##### Mie: 

- Yearly Overview and Trends:

  Analyzing the yearly overview and development of crime complaints and arrests. Examined the actual numbers and 
  calculated the percentage changes to understand the annual trends and to observe the overall development of crime 
  during the pandemic. Also, the author behind the visualizations regarding this part.
  
  
- Analysis of Specific Crime Categories:

  In-depth analysis of the ten most affected crime categories. Specific focus was given to understanding trends within 
  prostitution, murder, and burglary, which were observed to deviate from the overall crime trends. Also, the author 
  behind the visualizations regarding this part.


##### Emilie:

- Borough-level Analysis:

  Analyzing the crime levels in the five boroughs of NYC collectively. Then compared the borough-specific data with 
  the overall crime trends to determine if there were any divergences. Also, the author behind the visualizations 
  regarding this part.

- Geographic Analysis:

  Performed the geographic analysis of prostitution, murder, and burglary across the five boroughs of NYC. 
  Investigated the spatial distribution and localized patterns of these crimes. Also, the author behind the 
  visualizations regarding this part.


##### Team:

- Impact of COVID-19:

  Finally, we assessed the current state of crime in NYC and determined whether COVID-19 had any significant impact on 
  the crime level and trends in 2023.

### 7.4 Overall Findings and Conclusions:
The team collaborated to draw key insights and conclusions from the data analysis. We identified the impact of COVID-19 on crime levels and trends in NYC based on the study findings.

### 7.5 Github site
Emilie is the primary author of the Github Site.

### 7.6 Explainer Notebook
Mie is the primary author of the Explainer Notebook.