## Motivation

- What is your dataset?
- Why did you choose this/these particular dataset(s)?
- What was your goal for the end user's experience?

This dataset provides a detailed look at shooting incidents in New York from 2006 to 2024, revealing trends and details about these events. It is an important tool for understanding city crime and safety over almost twenty years.

#### Dataset Attributes
 - __Incident Date:__ Captures the exact date each shooting occurred, providing a timeline for analysis.
 - __Borough:__ Indicates the specific borough within New York where the incident took place, crucial for geographic analysis.
 - __Incident Time:__ The time at which the incident occurred, valuable for identifying patterns related to time of day or night.
 - __Victim Count:__ Records the number of individuals impacted in each incident, which helps in understanding the severity.
 - __Incident Type:__ Describes the nature of the incident, such as robbery or altercation, offering insights into the causes of shootings.
 - __Perpetrator and Victim Demographics:__ Includes age, gender, and race of those involved, allowing for demographic analysis.
 - __Location Coordinates:__ Latitude and longitude for mapping and spatial analysis.


#### __Why This Dataset?__

- __Relevance:__ Gun violence remains a critical issue in urban centers, particularly in major cities like New York. By analyzing this dataset, it is possible to identify patterns, trends, and hotspots that can help policymakers and communities understand the scale and nature of the issue.
- __Rich Data:__ The dataset is comprehensive and contains demographic and geographic information that can uncover crucial insights. These attributes can help reveal correlations between the nature of incidents and socioeconomic or demographic characteristics.
- __Potential for Actionable Insights:__ Understanding patterns and trends can guide resource allocation for policing, identify vulnerable areas or groups, and contribute to policy formulation.

#### __Goal for the End User Experience:__

__The aim is to deliver a clear and data-driven story that empowers stakeholders:__

- __Visualize Patterns:__ Interactive visualizations will help users intuitively explore the dataset. They can understand when and where incidents are more frequent and identify recurring patterns.
- __Geographical Understanding:__ Maps showing shooting incident locations will help understand spatial distribution, clustering, and high-risk zones.
- __Demographic Insights:__ Highlighting data by victim demographics provides insights into the most affected groups.

# __Basic Stats and Data Cleaning__
 - Missing Values Analysis:
    - Columns with High Missing Values: LOC_CLASSFCTN_DESC, LOCATION_DESC, and LOC_OF_OCCUR_DESC 
    - PERP_RACE, PERP_SEX, and PERP_AGE_GROUP
    - Decision: These columns had significant amounts of missing data. Despite this, they were retained to preserve the overall integrity of the dataset. Removing them would have resulted in losing a large portion of the dataset.

 - Columns with Low Missing Values:
   - Other columns had relatively few missing values.
   - Decision: Rows with missing data in these columns were removed to maintain a cleaner dataset.

 - Rationale:
   - Most records had only one or two missing values from the columns with high missing data.
   - Conclusion: Retaining these records allowed us to preserve valuable information across other attributes, although the missing columns could not be used for direct analysis.

In [None]:
import pandas as pd
import seaborn as sns
import plotly.express as px
import json

import folium
from folium.plugins import HeatMap
from folium.plugins import HeatMapWithTime
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
import plotly.graph_objects as go



# Load the dataset to understand its structure
data_path = 'NYPD_Shooting_Incident_Data__Historic__20240410.csv'
data = pd.read_csv(data_path)

In [None]:
# Calculate the number of missing (null) values in each column
missing_values = data.isnull().sum()
missing_values

In [None]:
# Dropping rows where JURISDICTION_CODE, Latitude, Longitude, or Lon_Lat have missing values
cleaned_data = data.dropna(subset=['JURISDICTION_CODE', 'Latitude', 'Longitude', 'Lon_Lat'])

# Check the number of remaining missing values in these columns to confirm the operation
remaining_missing = cleaned_data.isnull().sum()

# Convert the 'OCCUR_DATE' to datetime format and extract the year
cleaned_data['OCCUR_DATE'] = pd.to_datetime(cleaned_data['OCCUR_DATE'])
cleaned_data['YEAR'] = cleaned_data['OCCUR_DATE'].dt.year
cleaned_data['OCCUR_TIME'] = pd.to_datetime(cleaned_data['OCCUR_TIME'], format='%H:%M:%S').dt.hour

# Filter out rows containing any of these unrealistic age groups
unrealistic_ages = ['1020', '224', '940', '(null)']
cleaned_data = cleaned_data[~cleaned_data['PERP_AGE_GROUP'].isin(unrealistic_ages)]
cleaned_data = cleaned_data[cleaned_data['VIC_AGE_GROUP'] != '1022']

# Assuming the column name is confirmed to be 'PERP_SEX'
cleaned_data = cleaned_data[cleaned_data['PERP_SEX'] != '(null)']
incidents_per_year = cleaned_data.groupby('YEAR').size().reset_index(name='Incidents')



In [None]:
# Continuous color scale: light gray to black
continuous_colorscale = [
    [0.0, 'rgb(220,220,220)'],  # light gray
    [1.0, 'rgb(0,0,0)']        # black
]

# Discrete color palette: Adding more colors if needed, depending on the number of years
discrete_colorscale = [
    'rgb(143,188,143)',  # Sea green
    'rgb(119,136,153)',  # Light slate gray
    'rgb(192,192,192)',  # Silver
    'rgb(47,79,79)',     # Dark slate gray
    'rgb(105,105,105)',  # Dim gray
]

year_to_color_2020_2021 = {
    2006: 'rgb(105,105,105)',  # Dim gray
    2007: 'rgb(105,105,105)',  # Dim gray
    2008: 'rgb(105,105,105)',  # Dim gray
    2009: 'rgb(105,105,105)',  # Dim gray
    2010: 'rgb(105,105,105)',  # Dim gray
    2011: 'rgb(105,105,105)',  # Dim gray
    2012: 'rgb(105,105,105)',  # Dim gray
    2013: 'rgb(105,105,105)',  # Dim gray
    2014: 'rgb(105,105,105)',  # Dim gray
    2015: 'rgb(105,105,105)',  # Dim gray
    2016: 'rgb(105,105,105)',  # Dim gray
    2017: 'rgb(105,105,105)',  # Dim gray
    2018: 'rgb(105,105,105)',  # Dim gray
    2019: 'rgb(105,105,105)',  # Dim gray
    2020: 'rgb(93,214,145)', # Bright green
    2021: 'rgb(143,188,143)', # Sea green
    2022: 'rgb(105,105,105)',  # Dim gray
}

year_to_color = {
    2018: 'rgb(119,136,153)',  # Light slate gray
    2019: 'rgb(119,136,192)', # Silver
    2020: 'rgb(93,214,145)', # Bright green
    2021: 'rgb(143,188,143)', # Sea green
    2022: 'rgb(47,79,79)'} # Dark slate gray


emphasis_color = 'rgb(255,165,0)'
extra_color = '#D4F3CC'

In [None]:
cleaned_data.columns

In [None]:
cleaned_data['OCCUR_DATE'] = pd.to_datetime(cleaned_data['OCCUR_DATE'])
cleaned_data['YEAR'] = cleaned_data['OCCUR_DATE'].dt.year
# cleaned_data['OCCUR_TIME'] = pd.to_datetime(cleaned_data['OCCUR_TIME'], format='%H:%M:%S').dt.hour

# Group data by year and count incidents
incidents_per_year = cleaned_data.groupby('YEAR').size().reset_index(name='Incidents')
incidents_per_year['color'] =incidents_per_year['YEAR'].map(year_to_color_2020_2021)

# Calculate the trend line
x = incidents_per_year['YEAR']  # Years
y = incidents_per_year['Incidents']  # Incident counts
coefficients = np.polyfit(x, y, 1)
trend_line = np.poly1d(coefficients)

# Create a DataFrame for the trend line
trend_df = pd.DataFrame({
    'YEAR': x,
    'Trend': trend_line(x)
})


# Plotting with Plotly Express
fig = px.bar(incidents_per_year, x='YEAR', y='Incidents', text='Incidents',
             labels={'YEAR': 'Year', 'Incidents': 'Number of Shootings'},
             title='Number of Shooting Incidents in NYC per Year (2006-2024)',
             color_discrete_sequence=['skyblue'])
fig.update_traces(texttemplate='%{text}', textposition='outside', textfont=dict(size=10))
fig.add_scatter(x=trend_df['YEAR'], y=trend_df['Trend'], mode='lines',
                name='Trend Line', line=dict(color=emphasis_color))


# Calculate the trend line
x = incidents_per_year['YEAR']  # Years
y = incidents_per_year['Incidents']  # Incident counts
coefficients = np.polyfit(x, y, 1)
trend_line = np.poly1d(coefficients)

# Create a DataFrame for the trend line
trend_df = pd.DataFrame({
    'YEAR': x,
    'Trend': trend_line(x)
})

# Create the bar chart
fig = go.Figure()

# Add bars
for idx, row in incidents_per_year.iterrows():
    fig.add_trace(go.Bar(x=[row['YEAR']], y=[row['Incidents']], 
                         marker_color=row['color'], name=str(row['YEAR'])))

# Add trend line
fig.add_trace(go.Scatter(x=trend_df['YEAR'], y=trend_df['Trend'], mode='lines',
                         name='Trend Line', line=dict(color=emphasis_color)))

# Enhance the plot
fig.update_layout(
    title='Number of Shooting Incidents in NYC per Year (2006-2024)',
    title_font_color='white',  # Set title color to white for visibility
    xaxis=dict(
        title='Year',
        tickangle=-45,
        # title_font_color='white',  # Set x-axis title color to white
        # tickfont_color='white',  # Set x-axis tick labels to white
        gridcolor='gray',  # Set grid color for better visibility against black
    ),
    yaxis=dict(
        title='Number of Shootings',
        title_font_color='white',  # Set y-axis title color to white
        tickfont_color='white',  # Set y-axis tick labels to white
        gridcolor='gray',  # Set grid color for better visibility against black
    ),
    plot_bgcolor='black',  # Set the plotting area background to black
    paper_bgcolor='black',  # Set the entire chart background to black
    showlegend=False,
    margin=dict(l=30, r=20, t=60, b=20),
    font=dict(
        color='white'  # Ensure all default text (like legend, annotations) is white
    )
)


fig.show()
pio.write_html(fig, file='shootings_by_year.html', auto_open=False)


#### Number of Shooting Incidents per Year

- **Purpose:**  
  - Bar plot depicting annual shooting incidents from 2006 to 2022 in New York City.

- **Trends Identified:**  
  - Relatively stable trend with minor fluctuations until 2013-2017.
  - Noticeable decrease in shootings to about 970 annually during that period.
  - Sharp increase post-2017, peaking in 2020 with around 1,200 incidents.

- **Contextual Observations:**  
  - The 2020 peak coincides with the COVID-19 pandemic and the social unrest following major socio-political events.
  - Increased economic and social stressors likely contributed to this rise.


In [None]:
# Counting incidents per borough (BORO)
incidents_per_boro = cleaned_data['BORO'].value_counts().sort_index()

# Plotting the data
plt.figure(figsize=(12, 6))
incidents_per_boro.plot(kind='bar', color='skyblue')
plt.title('Number of Shooting Incidents by Borough')
plt.xlabel('Borough')
plt.ylabel('Number of Incidents')
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.xticks(rotation=45)  # Rotating for better label visibility
plt.tight_layout()
plt.show()

#### Number of Shooting Incidents by Borough

- **Purpose:**  
  - Bar plot comparing shooting incidents across the five boroughs (2006-2024).

- **Trends Identified:**  
  - Brooklyn and the Bronx report the highest number of incidents, indicating possible hotspots.
  - Manhattan, Queens, and Staten Island have fewer incidents comparatively.

In [None]:
cleaned_data = cleaned_data[cleaned_data['LOCATION_DESC'] != '(null)']
filtered_location_desc = cleaned_data['LOCATION_DESC'].dropna()
incidents_per_location_desc = filtered_location_desc.value_counts().sort_index()

plt.figure(figsize=(14, 8))
incidents_per_location_desc.plot(kind='bar', color='skyblue')
plt.title('Number of Shooting Incidents by Location Description')
plt.xlabel('Location Description')
plt.ylabel('Number of Incidents')
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.xticks(rotation=90)  # Rotating the labels for better readability
plt.tight_layout()
plt.show()

plt.figure(figsize=(14, 8))
incidents_per_location_desc.plot(kind='bar', color='skyblue')
plt.title('Number of Shooting Incidents by Location Description')
plt.xlabel('Location Description')
plt.ylabel('Number of Incidents')
plt.yscale('log')  # Applying a logarithmic scale
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

#### Number of Shooting Incidents by Location Description

- **Purpose:**  
  - Bar plot categorizing incidents by location type (2006-2024).

- **Trends Identified:**  
  - High occurrence at bars, gas stations, liquor stores, and nightclubs.
  - Lower incidents at banks, schools, and multi-dwelling houses.

- **Contextual Observations:**  
  - Social interaction and alcohol presence may increase shooting risks in venues like bars.
  - The prevalence of shootings on "Street" indicates pervasive urban gun violence beyond controlled environments.

In [None]:
# Group data by perpetrator age group and count incidents
incidents_per_perp_age_group = cleaned_data['PERP_AGE_GROUP'].value_counts().sort_index()

# Group data by victim age group and count incidents
incidents_per_vic_age_group = cleaned_data['VIC_AGE_GROUP'].value_counts().sort_index()

# Count incidents by perpetrator sex
incidents_per_perp_sex = cleaned_data['PERP_SEX'].value_counts().sort_index()

# Count incidents by victim sex
incidents_per_vic_sex = cleaned_data['VIC_SEX'].value_counts().sort_index()

# Assuming data loading and cleaning has already been done and 'data' is the DataFrame
# Count incidents by perpetrator race
incidents_per_perp_race = cleaned_data['PERP_RACE'].value_counts().sort_index()

# Count incidents by victim race
incidents_per_vic_race = cleaned_data['VIC_RACE'].value_counts().sort_index()


In [None]:
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Assuming the necessary data preparation steps have been completed for age_group, sex, and race

# Create subplots
fig = make_subplots(
    rows=3, cols=1,
    subplot_titles=("Number of Shooting Incidents by Perpetrator and Victim Age Group",
                    "Number of Shooting Incidents by Sex",
                    "Number of Shooting Incidents by Race")
)

# Add first subplot for age group
fig.add_trace(
    go.Bar(name='Perpetrator Age Group', x=incidents_per_perp_age_group.index, y=incidents_per_perp_age_group.values, marker_color='rgb(255, 99, 71)'),
    row=1, col=1
)
fig.add_trace(
    go.Bar(name='Victim Age Group', x=incidents_per_vic_age_group.index, y=incidents_per_vic_age_group.values, marker_color='rgb(100, 149, 237)'),
    row=1, col=1
)

# Add second subplot for sex
fig.add_trace(
    go.Bar(name='Perpetrator Sex', x=incidents_per_perp_sex.index, y=incidents_per_perp_sex.values, marker_color='rgb(255, 99, 71)'),
    row=2, col=1
)
fig.add_trace(
    go.Bar(name='Victim Sex', x=incidents_per_vic_sex.index, y=incidents_per_vic_sex.values, marker_color='rgb(100, 149, 237)'),
    row=2, col=1
)

# Add third subplot for race
fig.add_trace(
    go.Bar(name='Perpetrator Race', x=incidents_per_perp_race.index, y=incidents_per_perp_race.values, marker_color='rgb(255, 99, 71)'),
    row=3, col=1
)
fig.add_trace(
    go.Bar(name='Victim Race', x=incidents_per_vic_race.index, y=incidents_per_vic_race.values, marker_color='rgb(100, 149, 237)'),
    row=3, col=1
)

# Update layout for a cohesive look
fig.update_layout(
    title='Comprehensive Analysis of Shooting Incidents',
    barmode='group',
    height=1200,  # Adjust height to fit all plots comfortably
    showlegend=True,
    legend_title="Category",
    plot_bgcolor='rgba(0,0,0,0)'
)

# Update x-axis and y-axis titles individually if needed
fig.update_xaxes(title_text="Age Group", row=1, col=1)
fig.update_xaxes(title_text="Sex", row=2, col=1)
fig.update_xaxes(title_text="Race", row=3, col=1)
fig.update_yaxes(title_text="Number of Incidents", row=1, col=1)
fig.update_yaxes(title_text="Number of Incidents", row=2, col=1)
fig.update_yaxes(title_text="Number of Incidents", row=3, col=1)

# Show the plot
fig.show()

# Optionally, save the figure to HTML to be opened in any web browser
fig.write_html('Comprehensive_Shooting_Incidents_Analysis.html')  # Update the save path


#### Analysis of Demographic Data in Shooting Incidents

- **Purpose:**  
  - Series of bar plots breaking down incidents by age, sex, and race of perpetrators and victims.

- **Key Findings:**  
  - **Age Group:**  
    - Significant concentration in the 18-44 age range for both perpetrators and victims.
    - Marked decline in involvement above age 45.

  - **Sex:**  
    - Males are significantly more affected than females, highlighting a strong gender disparity.

  - **Race:**  
    - Black individuals are the most involved, followed by Hispanic individuals, indicating possible socio-economic factors.


The noticeable increase in shooting incidents during 2020 and 2021 called for careful analysis to identify patterns and contributing factors. Because of potential inaccuracies due to high missing data in columns like age, perpetrator race, sex, location and incident description these factors were set aside. Instead, we focused on the yearly incidents, different map plot and race to spot clear trends. Comparing data from 2020 and 2021 to earlier and later years showed the spike clearly, with 2020 reaching the highest number of incidents in recent years. 

An analysis of victim race each year showed that while overall shootings have gone down, the distribution across racial groups remained largely unchanged. 

The heatmap of race-on-race crime percentages showed that most incidents happened between people of the same race, with "Black-on-Black" crimes being the most frequent. The increase in shootings in 2020 and 2021 was unusual but consistent across different groups, suggesting widespread underlying issues. Although the overall decline in shootings is encouraging, the uneven impact on certain racial groups remains a concern. Understanding these trends will help shape focused policies and allocate resources for violence prevention and support programs.

In [None]:
# Convert 'OCCUR_DATE' to datetime if it's not already
cleaned_data['OCCUR_DATE'] = pd.to_datetime(cleaned_data['OCCUR_DATE'])


# Extract year and month
cleaned_data['YEAR'] = cleaned_data['OCCUR_DATE'].dt.year
cleaned_data['MONTH'] = cleaned_data['OCCUR_DATE'].dt.month

# Group by year and month and count the occurrences
monthly_crime_data = cleaned_data.groupby(['YEAR', 'MONTH']).size().reset_index(name='INCIDENTS')


# keep only last 5 years
monthly_crime_data = monthly_crime_data[monthly_crime_data['YEAR'] >=2018]

# Calculate maximum incidents per year for color scaling
max_incidents_per_year = monthly_crime_data.groupby('YEAR')['INCIDENTS'].max()

# Normalize these values to a 0-1 range for color mapping, set a minimum opacity
min_opacity = 0.2  # Adjust minimum opacity here
norm = (max_incidents_per_year - max_incidents_per_year.min()) / (max_incidents_per_year.max() - max_incidents_per_year.min()) * (1 - min_opacity) + min_opacity

# Map these normalized values back to the main DataFrame
monthly_crime_data['COLOR'] = monthly_crime_data['YEAR'].map(norm)

# Make sure we have enough colors for the number of years, repeat if needed
unique_years = monthly_crime_data['YEAR'].unique()
if len(discrete_colorscale) < len(unique_years):
    num_repeats = -(-len(unique_years) // len(discrete_colorscale))  # Ceiling division
    discrete_colorscale = (discrete_colorscale * num_repeats)[:len(unique_years)]

# Map years to colors
# year_to_color = {year: color for year, color in zip(unique_years, discrete_colorscale)}
# year_to_color[2020] = emphasis_color  # Assign the emphasis color to 2020

# Create the plot using Graph Objects for more control
fig = go.Figure()

# Add each year as a separate trace
for year in unique_years:
    year_data = monthly_crime_data[monthly_crime_data['YEAR'] == year]
    fig.add_trace(go.Scatter(
        x=year_data['MONTH'], 
        y=year_data['INCIDENTS'], 
        mode='lines',
        name=str(year),
        line=dict(color=year_to_color[year])  # Apply color based on the year
    ))

# Update the layout to add customizations
fig.update_layout(
    title='Monthly Shooting Incidents by Year',
    xaxis=dict(
        title='Month',
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
        gridcolor='gray',
    ),
    yaxis=dict(
        title='Number of Incidents',
        gridcolor='gray',
    ),
    legend_title_text='Year',
    plot_bgcolor='black',  # Set the plotting area background to black
    paper_bgcolor='black',  # Set the entire chart background to black
    font=dict(
        color='white'  # Ensure all default text (like legend, annotations) is white
    )
    # margin=dict(l=30, r=20, t=60, b=20)
)

fig.show()
pio.write_html(fig, file='monthly_shootings_per_year.html', auto_open=False)




In [None]:
discrete_colorscale = [
    'rgb(143,188,143)',  # Sea green
    'rgb(119,136,153)',  # Light slate gray
    'rgb(192,192,192)',  # Silver
    'rgb(47,79,79)',     # Dark slate gray
    'rgb(105,105,105)',  # Dim gray
    'rgb(255,165,0)'     # Orange for emphasis
]

# Replace 'nan' and any other identifier for unknown data with 'UNKNOWN'
cleaned_data['VIC_RACE'] = cleaned_data['VIC_RACE'].fillna('UNKNOWN').replace(['', ' ', 'N/A'], 'UNKNOWN')

# Group by year and victim race, count the occurrences
vic_race_yearly = cleaned_data.groupby(['YEAR', 'VIC_RACE']).size().unstack(fill_value=0)

# Combine 'UNKNOWN' and any other nan categories into one
vic_race_yearly['UNKNOWN'] = vic_race_yearly.get('UNKNOWN', 0) + vic_race_yearly.get('', 0)


# Calculate total incidents per year
total_incidents_per_year = vic_race_yearly.sum(axis=1)

# Calculate percentages
percentages = vic_race_yearly.divide(total_incidents_per_year, axis=0) * 100

# Assuming 'vic_race_yearly' and 'percentages' are already calculated as per your previous code

# Reset the index to turn 'YEAR' into a column and prepare data for plotting
vic_race_yearly_reset = vic_race_yearly.reset_index()
percentages_reset = percentages.reset_index()
percentages_reset = percentages_reset.round(0)

# Melt the data to long format, which Plotly can use to plot stacked area charts
vic_race_yearly_melted = vic_race_yearly_reset.melt(id_vars='YEAR', var_name='Victim Race', value_name='Number of Incidents')
percentages_melted = percentages_reset.melt(id_vars='YEAR', var_name='Victim Race', value_name='Percentage')

# Merge the counts and percentages into a single DataFrame
area_chart_data = vic_race_yearly_melted.merge(percentages_melted, on=['YEAR', 'Victim Race'])
# Filter out 'UNKNOWN' from visualization, not from calculation
area_chart_data = area_chart_data[area_chart_data['Victim Race'] != 'UNKNOWN']
area_chart_data.sort_values(by=['YEAR', 'Percentage'], inplace=True)


# Create a stacked area chart with hover data showing percentages
# Assuming you know the races involved or you extract them from the DataFrame
races = area_chart_data['Victim Race'].unique()
color_map = {race: color for race, color in zip(races, discrete_colorscale)}

# Apply the discrete color palette to the area chart
fig_area = px.area(area_chart_data, x='YEAR', y='Number of Incidents', color='Victim Race',
                   title="Victim Race Distribution by Year",
                   labels={'YEAR': 'Year', 'Number of Incidents': 'Number of Incidents', 'Victim Race': 'Victim Race'},
                   hover_data={'Number of Incidents': True, 'Percentage': True},
                   color_discrete_map=color_map)  # Using the color mapping

# Update layout
fig_area.update_layout(
    xaxis=dict(
        title='Year',
        gridcolor='gray',
    ),
    yaxis=dict(
        title='Number of Incidents',
        gridcolor='gray',
    ),
    legend_title_text='Victim Race',
    plot_bgcolor='black',  # Set the plotting area background to black
    paper_bgcolor='black',  # Set the entire chart background to black
    font=dict(
        color='white'  # Ensure all default text (like legend, annotations) is white
    )
)

fig_area.show()
pio.write_html(fig, file='victim_race_distribution.html', auto_open=False)


In [None]:
# Filter out unknown and nan races for perpetrators and victims
filtered_data = cleaned_data[
    (cleaned_data['PERP_RACE'].notna()) & (cleaned_data['PERP_RACE'] != 'UNKNOWN') &
    (cleaned_data['VIC_RACE'].notna()) & (cleaned_data['VIC_RACE'] != 'UNKNOWN') &
    (cleaned_data['VIC_RACE'] != 'AMERICAN INDIAN/ALASKAN NATIVE') &
    (cleaned_data['PERP_RACE'] != 'AMERICAN INDIAN/ALASKAN NATIVE')
]

# Create a crosstab of perpetrator and victim races
# Ensure PERP_RACE are rows and VIC_RACE are columns
race_on_race = pd.crosstab(filtered_data['PERP_RACE'], filtered_data['VIC_RACE'])

for i in range(0,5):
    s = race_on_race.iloc[i].sum(axis=0)
    race_on_race.iloc[i] = race_on_race.iloc[i]/s
    race_on_race.iloc[i] = (race_on_race.iloc[i] * 100).round(2)


# Plotting a density heatmap using the grayscale color scale
fig = px.imshow(
    race_on_race,
    labels=dict(x="Victim Race", y="Perpetrator Race", color="Incident Count"),
    title='Heatmap of Race-on-Race Crime Percentages',
    aspect='auto',  # Adjust the aspect ratio to auto if needed
    color_continuous_scale=continuous_colorscale  # Apply the continuous color scale
)

# Ensure axes labels are correct and on the appropriate side
fig.update_xaxes(side="bottom", tickmode='array', tickvals=list(range(len(race_on_race.columns))), ticktext=race_on_race.columns)
fig.update_yaxes(tickmode='array', tickvals=list(range(len(race_on_race.index))), ticktext=race_on_race.index)

# Optionally, you can add a color axis title
fig.update_layout(coloraxis_colorbar=dict(
    title="",  # Adding two line breaks for spacing
    titleside="top"
    ),
    plot_bgcolor='black',  # Set the plotting area background to black
    paper_bgcolor='black',  # Set the entire chart background to black
    font=dict(
        color='white'  # Ensure all default text (like legend, annotations) is white
    )
    
    )
fig.show()
pio.write_html(fig, file='confusion_matrix_interracial_shootings.html', auto_open=False)


To understand the significant spike in incidents during this period, we investigated further to uncover potential causes. We discovered that in May 2020, George Floyd was tragically killed by the police, triggering widespread protests across the United States. We believe these events, combined with the COVID-19 pandemic, significantly contributed to the increase in shootings, as crime rates rose across the board during this period. Additionally, it was a challenging time as it marked the beginning of the lockdowns.. To delve deeper, we examined the NYC_layoff_interest.csv dataset to see if there was any correlation between layoffs and the spike in incidents. While it's not the only factor, we noticed a clear upward trend from February to a peak in May, suggesting a combination of factors, including social unrest and economic stress from mass layoffs, may have contributed to this surge.

In [None]:
layoffs = pd.read_csv("NYC_layoff_interest.csv")
# Ensure the 'Week' column is datetime type
layoffs['Week'] = pd.to_datetime(layoffs['Week'])

# Extract the month from the 'Week' column
layoffs['Month'] = layoffs['Week'].dt.month

monthly_layoffs = layoffs.groupby('Month')['Layoff Google Searches in New York'].sum().reset_index()

max_val = monthly_layoffs['Layoff Google Searches in New York'].max()
monthly_layoffs['Layoff Google Interest in New York for 2020'] = monthly_layoffs['Layoff Google Searches in New York'] * 100/max_val


# Create the line graph with Plotly Express
fig = px.line(
    monthly_layoffs, 
    x='Month', 
    y='Layoff Google Interest in New York for 2020', 
    title='Monthly Layoff Interest in New York for 2020',
    labels={'Month': 'Month', 'Layoff Google Interest in New York for 2020': 'Layoffs Interest'},
    color_discrete_sequence=['green']
)

# Add layout customization
fig.update_layout(
    xaxis=dict(
        title='Month',
        tickmode='array',
        tickvals=list(range(1, 13)),
        ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'],
         gridcolor='gray',
    ),
    yaxis=dict(
        title='Layoffs',
        gridcolor='gray',
    ),
    plot_bgcolor='black',  # Set the plotting area background to black
    paper_bgcolor='black',  # Set the entire chart background to black
    font=dict(
        color='white'  # Ensure all default text (like legend, annotations) is white
    )
)

fig.show()
pio.write_html(fig, file='monthly_layoffs.html', auto_open=False)



Our first map graph was a dynamic heatmap, which allowed us to visualize the ebb and flow of shooting incidents over the years. By watching how crime hotspots shifted annually, we could identify any emerging trends or persistent patterns, providing valuable insights into the city's evolving crime landscape.

We then turned our attention to the socio-economic context of shooting incidents. Then, we examined neighborhood shootings to see how socio-economic factors affected crime rates. This approach not only helped us pinpoint areas with higher levels of gun violence but also provided a deeper understanding of the underlying socio-economic dynamics driving these trends.

We further enriched our analysis with a detailed map graph incorporating a range of socio-economic indicators. This multifaceted visualization painted a nuanced picture of New York's socio-economic landscape, highlighting areas of disparity and revealing potential correlations between socio-economic conditions and the prevalence of shooting incidents. By integrating data on poverty levels, unemployment rates, community trust, and the distribution of non-white populations, we gained a deeper understanding of the complex interplay between socio-economic factors and gun violence in the city.

In [None]:
# Create a list of lists for each year with the format needed for HeatMapWithTime
data_by_year = []
years = sorted(cleaned_data['YEAR'].unique())
for year in years:
    year_data = cleaned_data[cleaned_data['YEAR'] == year][['Latitude', 'Longitude']].dropna()
    data_by_year.append(year_data.values.tolist())

# Initialize the Folium map centered around New York City
map_with_time = folium.Map(location=[40.7128, -74.0060], zoom_start=11)

# Create a HeatMapWithTime
heatmap = HeatMapWithTime(data_by_year, auto_play=True, max_opacity=0.8)
heatmap.add_to(map_with_time)

# Save to HTML
map_html_path = '/mnt/data/NYC_Shooting_HeatMapWithTime.html'
# map_with_time.save(map_html_path)
map_with_time

## Shoting incidents over the years map

These are 3 map which shows where and the number of incidents across the New York in 2006,2020,2021.

## 2006

In [None]:
# Extract the data for the year 2006
data_2006 = cleaned_data[cleaned_data['YEAR'] == 2006]
data_2006_counts_by_precinct = data_2006['PRECINCT'].value_counts().to_dict()

# Load the GeoJSON file and associate shooting incident data with the precincts
with open('Police Precincts.geojson') as f:
    precinct_geojson = json.load(f)

data_2006_counts_by_precinct = {str(key): value for key, value in data_2006_counts_by_precinct.items()}

# Add incident count data to GeoJSON features
for feature in precinct_geojson['features']:
    # Extract the precinct number as a string to match the dictionary keys
    precinct_num = str(feature['properties']['precinct']).strip()
    # Retrieve the incident count from the dictionary using the precinct number
    feature['properties']['incident_count'] = data_2006_counts_by_precinct.get(precinct_num, 0)
    
map_with_precincts_2006 = folium.Map(location=[40.7128, -74.0060], zoom_start=10, tiles='CartoDB dark_matter')


# Create the choropleth layer with a specific color for missing/zero values
choropleth = folium.Choropleth(
    geo_data=precinct_geojson,
    name='choropleth',
    data=pd.DataFrame(list(data_2006_counts_by_precinct.items()), columns=['Precinct', 'Incidents']),
    columns=['Precinct', 'Incidents'],
    key_on='feature.properties.precinct',
    fill_color='GnBu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Shooting Incidents per Precinct',
    # use the lightest color on GnBu scale for missing/zero values
    nan_fill_color= '#f7f7f7',
    nan_fill_opacity=0.7
).add_to(map_with_precincts_2006)

# Define a style function for the GeoJSON layer
def style_function(feature):
    return {
        'fillOpacity': 0,
        'color': 'black',  # Set the line color
        'weight': 0.5  # Set the line width (0.5 will make it narrower)
    }

# Add the GeoJSON layer with the custom style function
folium.GeoJson(
    precinct_geojson,
    name='geojson',
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(
        fields=['precinct', 'incident_count'],
        aliases=['Precinct Number:', 'Incident Count:'],
        localize=True
    )
).add_to(map_with_precincts_2006)


# save the map as an HTML file
maps_path = 'maps/'
map_with_precincts_2006.save(maps_path + 'shooting_incidents_2006.html')

# Display the map
map_with_precincts_2006

## 2020

In [None]:
# Extract the data for the year 2020
data_2020 = cleaned_data[cleaned_data['YEAR'] == 2020]
data_2020_counts_by_precinct = data_2020['PRECINCT'].value_counts().to_dict()

# Load the GeoJSON file and associate shooting incident data with the precincts
with open('Police Precincts.geojson') as f:
    precinct_geojson = json.load(f)

data_2020_counts_by_precinct = {str(key): value for key, value in data_2020_counts_by_precinct.items()}

# Add incident count data to GeoJSON features
for feature in precinct_geojson['features']:
    # Extract the precinct number as a string to match the dictionary keys
    precinct_num = str(feature['properties']['precinct']).strip()
    # Retrieve the incident count from the dictionary using the precinct number
    feature['properties']['incident_count'] = data_2020_counts_by_precinct.get(precinct_num, 0)
    
map_with_precincts = folium.Map(location=[40.7128, -74.0060], zoom_start=10, tiles='CartoDB dark_matter')


# Create the choropleth layer with a specific color for missing/zero values
choropleth = folium.Choropleth(
    geo_data=precinct_geojson,
    name='choropleth',
    data=pd.DataFrame(list(data_2020_counts_by_precinct.items()), columns=['Precinct', 'Incidents']),
    columns=['Precinct', 'Incidents'],
    key_on='feature.properties.precinct',
    fill_color='GnBu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Shooting Incidents per Precinct',
    # use the lightest color on GnBu scale for missing/zero values
    nan_fill_color= '#f7f7f7',
    nan_fill_opacity=0.7
).add_to(map_with_precincts)

# Define a style function for the GeoJSON layer
def style_function(feature):
    return {
        'fillOpacity': 0,
        'color': 'black',  # Set the line color
        'weight': 0.5  # Set the line width (0.5 will make it narrower)
    }

# Add the GeoJSON layer with the custom style function
folium.GeoJson(
    precinct_geojson,
    name='geojson',
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(
        fields=['precinct', 'incident_count'],
        aliases=['Precinct Number:', 'Incident Count:'],
        localize=True
    )
).add_to(map_with_precincts)


# save the map as an HTML file
maps_path = 'maps/'
map_with_precincts.save(maps_path + 'shooting_incidents_2020.html')

# Display the map
map_with_precincts

## 2021

In [None]:
# Extract the data for the year 2021
data_2021 = cleaned_data[cleaned_data['YEAR'] == 2021]
data_2021_counts_by_precinct = data_2021['PRECINCT'].value_counts().to_dict()

# Load the GeoJSON file and associate shooting incident data with the precincts
with open('Police Precincts.geojson') as f:
    precinct_geojson = json.load(f)

data_2021_counts_by_precinct = {str(key): value for key, value in data_2021_counts_by_precinct.items()}

# Add incident count data to GeoJSON features
for feature in precinct_geojson['features']:
    # Extract the precinct number as a string to match the dictionary keys
    precinct_num = str(feature['properties']['precinct']).strip()
    # Retrieve the incident count from the dictionary using the precinct number
    feature['properties']['incident_count'] = data_2021_counts_by_precinct.get(precinct_num, 0)
    
map_with_precincts = folium.Map(location=[40.7128, -74.0060], zoom_start=10, tiles='CartoDB dark_matter')


# Create the choropleth layer with a specific color for missing/zero values
choropleth = folium.Choropleth(
    geo_data=precinct_geojson,
    name='choropleth',
    data=pd.DataFrame(list(data_2021_counts_by_precinct.items()), columns=['Precinct', 'Incidents']),
    columns=['Precinct', 'Incidents'],
    key_on='feature.properties.precinct',
    fill_color='GnBu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Shooting Incidents per Precinct',
    # use the lightest color on GnBu scale for missing/zero values
    nan_fill_color= '#f7f7f7',
    nan_fill_opacity=0.7
).add_to(map_with_precincts)

# Define a style function for the GeoJSON layer
def style_function(feature):
    return {
        'fillOpacity': 0,
        'color': 'black',  # Set the line color
        'weight': 0.5  # Set the line width (0.5 will make it narrower)
    }

# Add the GeoJSON layer with the custom style function
folium.GeoJson(
    precinct_geojson,
    name='geojson',
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(
        fields=['precinct', 'incident_count'],
        aliases=['Precinct Number:', 'Incident Count:'],
        localize=True
    )
).add_to(map_with_precincts)


# save the map as an HTML file
maps_path = 'maps/'
map_with_precincts.save(maps_path + 'shooting_incidents_2021.html')

# Display the map
map_with_precincts

To create the final map graph depicting poverty levels, unemployment rates, community trust, and the distribution of non-white populations in New York, we needed to gather and analyze four additional datasets. 
We integrated them into a single comprehensive map graph. This final visualization combined layers representing poverty levels, unemployment rates, community trust scores, and the distribution of non-white populations across New York City.

We did the same to the shooting incidents over 2006,2020 and 2021 map graphs, too. 



### Unemployment

In [None]:
file_path = 'other_datasets/NYC Data - Unemployment Rate 2021.csv'
unemployment_data = pd.read_csv(file_path)
# Function to map community districts to geojson numeric notation
def map_districts(district_str):
    borough, num_str = district_str[-4], district_str[-3:-1]
    num = int(num_str)
    if borough == 'M':
        return 100 + num
    elif borough == 'B':
        return 200 + num
    elif borough == 'K':
        return 300 + num
    elif borough == 'Q':
        return 400 + num
    elif borough == 'S':
        return 500 + num
    return None

# Apply the mapping function to the 'Community District' column
unemployment_data['Numeric Notation'] = unemployment_data['Community District'].apply(map_districts)

# Show the first few rows of the dataset with the new 'Numeric Notation' column
unemployment_data.head()


In [None]:

# Load the community district geojson file
geojson_path = 'other_datasets/Community Districts.geojson'
with open(geojson_path, 'r') as f:
    district_geojson = json.load(f)

# Extract relevant unemployment data
unemployment_mapping = unemployment_data.set_index('Numeric Notation')['Unemployment Rate'].str.replace('%', '').astype(float).to_dict()

# Add unemployment data to GeoJSON features
for feature in district_geojson['features']:
    district_num = feature['properties'].get('boro_cd', None)
    if district_num:
        feature['properties']['Unemployment Rate'] = unemployment_mapping.get(district_num, 0)

# Create a Folium map centered on New York City
map_with_districts = folium.Map(location=[40.7128, -74.0060], zoom_start=10, tiles='CartoDB dark_matter')


# Create the choropleth layer with unemployment data
choropleth = folium.Choropleth(
    geo_data=district_geojson,
    name='choropleth',
    data=pd.DataFrame(list(unemployment_mapping.items()), columns=['Numeric Notation', 'Unemployment Rate']),
    columns=['Numeric Notation', 'Unemployment Rate'],
    key_on='feature.properties.boro_cd',
    fill_color='GnBu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Unemployment Rate (%) by Community District',
    nan_fill_color='#f0f0f0',  # Light grey for districts with no data
    nan_fill_opacity=0.7
).add_to(map_with_districts)

# Define a style function for the GeoJSON layer
def style_function(feature):
    return {
        'fillOpacity': 0,
        'color': 'black',  # Set the line color
        'weight': 0.5  # Set the line width
    }

# Add the GeoJson layer with the custom style function
folium.GeoJson(
    district_geojson,
    name='geojson',
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(
        fields=['boro_cd'],
        aliases=['Community District:'],
        localize=True
    )
).add_to(map_with_districts)

# Save the map as an HTML file
maps_path = 'maps/'
map_with_districts.save(maps_path + 'unemployment_rate.html')

# Display the map
map_with_districts

### Poverty Rate

In [None]:
file_path = 'other_datasets/NYC Data - Poverty Rate 2019.csv'
poverty_data = pd.read_csv(file_path)

# Apply the mapping function to the 'Community District' column
poverty_data['Numeric Notation'] = poverty_data['Community District'].apply(map_districts)

# Show the first few rows of the dataset with the new 'Numeric Notation' column
poverty_data.head()

In [None]:
# Load the community district geojson file
geojson_path = 'other_datasets/Community Districts.geojson'
with open(geojson_path, 'r') as f:
    district_geojson = json.load(f)

# Extract relevant Poverty data
poverty_mapping = poverty_data.set_index('Numeric Notation')['Poverty Rate'].str.replace('%', '').astype(float).to_dict()

# Add Poverty data to GeoJSON features
for feature in district_geojson['features']:
    district_num = feature['properties'].get('boro_cd', None)
    if district_num:
        feature['properties']['Poverty Rate'] = poverty_mapping.get(district_num, 0)

# Create a Folium map centered on New York City
map_with_districts_poverty = folium.Map(location=[40.7128, -74.0060], zoom_start=10, tiles='CartoDB dark_matter')


# Create the choropleth layer with unemployment data
choropleth = folium.Choropleth(
    geo_data=district_geojson,
    name='choropleth',
    data=pd.DataFrame(list(poverty_mapping.items()), columns=['Numeric Notation', 'Poverty Rate']),
    columns=['Numeric Notation', 'Poverty Rate'],
    key_on='feature.properties.boro_cd',
    fill_color='GnBu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Poverty Rate (%) by Community District',
    nan_fill_color='#f0f0f0',  # Light grey for districts with no data
    nan_fill_opacity=0.7
).add_to(map_with_districts_poverty)

# Define a style function for the GeoJSON layer
def style_function(feature):
    return {
        'fillOpacity': 0,
        'color': 'black',  # Set the line color
        'weight': 0.5  # Set the line width
    }

# Add the GeoJson layer with the custom style function
folium.GeoJson(
    district_geojson,
    name='geojson',
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(
        fields=['boro_cd'],
        aliases=['Community District:'],
        localize=True
    )
).add_to(map_with_districts_poverty)

# Save the map as an HTML file
maps_path = 'maps/'
map_with_districts_poverty.save(maps_path + 'poverty_rate.html')

# Display the map
map_with_districts_poverty

### Community Distrust

In [None]:
trust_data_path = 'other_datasets/NYC Data - Community Trust 2017_2018.csv'
trust_data = pd.read_csv(trust_data_path)

# Apply the mapping function to the 'Community District' column
trust_data['Numeric Notation'] = trust_data['Community District'].apply(map_districts)

# the values in Community Trust values are as 77.40%. Now we want to create distrust values by subtracting the trust values from 100
trust_data['Community Distrust'] = 100 - trust_data['Community Trust'].str.replace('%', '').astype(float)

# Show the first few rows of the dataset with the new 'Numeric Notation' column
trust_data.head()



In [None]:
# Load the community district geojson file
geojson_path = 'other_datasets/Community Districts.geojson'
with open(geojson_path, 'r') as f:
    district_geojson = json.load(f)

# Extract relevant trust data
distrust_mapping = trust_data.set_index('Numeric Notation')['Community Distrust'].to_dict()

# Add Poverty data to GeoJSON features
for feature in district_geojson['features']:
    district_num = feature['properties'].get('boro_cd', None)
    if district_num:
        feature['properties']['Community Distrust'] = distrust_mapping.get(district_num, 0)

# Create a Folium map centered on New York City
map_with_districts_poverty = folium.Map(location=[40.7128, -74.0060], zoom_start=10, tiles='CartoDB dark_matter')


# Create the choropleth layer with distrust data
choropleth = folium.Choropleth(
    geo_data=district_geojson,
    name='choropleth',
    data=pd.DataFrame(list(distrust_mapping.items()), columns=['Numeric Notation', 'Community Distrust']),
    columns=['Numeric Notation', 'Community Distrust'],
    key_on='feature.properties.boro_cd',
    fill_color='GnBu', 
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='Community Distrust (%) by Community District',
    nan_fill_color='#f0f0f0',  # Light grey for districts with no data
    nan_fill_opacity=0.7
).add_to(map_with_districts_poverty)

# Define a style function for the GeoJSON layer
def style_function(feature):
    return {
        'fillOpacity': 0,
        'color': 'black',  # Set the line color
        'weight': 0.5  # Set the line width
    }

# Add the GeoJson layer with the custom style function
folium.GeoJson(
    district_geojson,
    name='geojson',
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(
        fields=['boro_cd'],
        aliases=['Community District:'],
        localize=True
    )
).add_to(map_with_districts_poverty)

# Save the map as an HTML file
maps_path = 'maps/'
map_with_districts_poverty.save(maps_path + 'community_distrust.html')

# Display the map
map_with_districts_poverty

### People of Color map

In [None]:
demographics_data_path = 'other_datasets/NYC Data - Total Population 2021.csv'
demographics_data = pd.read_csv(demographics_data_path)

# Apply the mapping function to the 'Community District' column
demographics_data['Numeric Notation'] = demographics_data['Community District'].apply(map_districts)

# Show the first few rows of the dataset with the new 'Numeric Notation' column
demographics_data.head()

In [None]:
for col in demographics_data.columns[1:]:
    demographics_data[col] = demographics_data[col].astype(str).str.replace(',', '').astype(int)


In [None]:
# The columns are: Community District	Black	White	Hispanic or Latino	Asian and Pacific Islander	Combination or Another Race	Native American	Numeric Notation
# create a new column for People of Color counts
demographics_data['People of Color'] = demographics_data['Black'] + demographics_data['Hispanic or Latino'] + demographics_data['Asian and Pacific Islander'] + demographics_data['Combination or Another Race']

demographics_data.head()

In [None]:
# Load the community district geojson file
geojson_path = 'other_datasets/Community Districts.geojson'
with open(geojson_path, 'r') as f:
    district_geojson = json.load(f)

# Extract relevant poeple of color data
poc_mapping = demographics_data.set_index('Numeric Notation')['People of Color'].to_dict()

# Add Poverty data to GeoJSON features
for feature in district_geojson['features']:
    district_num = feature['properties'].get('boro_cd', None)
    if district_num:
        feature['properties']['People of Color'] = poc_mapping.get(district_num, 0)

# Create a Folium map centered on New York City
map_with_districts_poc = folium.Map(location=[40.7128, -74.0060], zoom_start=10, tiles='CartoDB dark_matter')


# Create the choropleth layer with poc data
choropleth = folium.Choropleth(
    geo_data=district_geojson,
    name='choropleth',
    data=pd.DataFrame(list(poc_mapping.items()), columns=['Numeric Notation', 'People of Color']),
    columns=['Numeric Notation', 'People of Color'],
    key_on='feature.properties.boro_cd',
    fill_color='GnBu', 
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='People of Color Count by Community District',
    nan_fill_color='#f0f0f0',  # Light grey for districts with no data
    nan_fill_opacity=0.7
).add_to(map_with_districts_poc)

# Define a style function for the GeoJSON layer
def style_function(feature):
    return {
        'fillOpacity': 0,
        'color': 'black',  # Set the line color
        'weight': 0.5  # Set the line width
    }

# Add the GeoJson layer with the custom style function
folium.GeoJson(
    district_geojson,
    name='geojson',
    style_function=style_function,
    tooltip=folium.GeoJsonTooltip(
        fields=['boro_cd'],
        aliases=['Community District:'],
        localize=True
    )
).add_to(map_with_districts_poc)

# Save the map as an HTML file
maps_path = 'maps/'
map_with_districts_poc.save(maps_path + 'people_of_color.html')

# Display the map
map_with_districts_poc

## __Genre. Which genre of data story did you use?__
- __Which tools did you use from each of the 3 categories of Visual Narrative (Figure 7 in Segal and Heer). Why?__
- __Which tools did you use from each of the 3 categories of Narrative Structure (Figure 7 in Segal and Heer). Why?__

We implemented several tools from the *visual narrative* section. 
- For the overall visual structuring we tried to maintain a consistent theme, with a combination of darker shades of grey and green for a dimmer look to accentuate the severity of shooting incidents. Our establishing shot would be the introduction provided on our website. Although we haven't specifically implemented a progress bar, the scrolling nature of the website gives an intuition to the reader of how much of the story is left.
- Although we didn't use special movement and dynamic effects for highlighting, we made sure to differentiate the most important features of our data using highlight colours (brighter green for the 2020 monthly plot, or more intense colours in the heatmaps)
- We didn't implement any special transitions but tried to keep elements within the slideshows consistent (like consistent scaling for the maps that can be toggled)
As for the *narrative structure* we implemented the following components:
- We had a linear ordering so that the viewer could uncover the story going through the graphics one by one
- We tried to integrate interactivity in our plots. Hovering over graph elements would show additional information like incident counts, specific time (months or years), areas (precincts and districts), percentages and demographics. For some time plots we allow the viewer to toggle on and off specific elements to filter the data. Finally we also added toggles to switch between plots in slideshows and for the maps showcasing different datasets
- For sharing our message we make use of an accompanying article which includes an introduction and synthesis. We also make use of captions and annotations throughout all our graphs

## __Visualizations.__
- Explain the visualizations you've chosen.
- Why are they right for the story you want to tell?

We have chosen different categories of visualizations to convey different aspects of our dataset. Bar plots and line plots were ideal for showing trends over time periods like years and months. Furthermore, when we wanted to show the number of the victims without excluding information about distributions we made use of stacked area charts that contained details like percentage distribution upon hovering. The map plots were also ideal for our datasets closely linked to areas like precincts and community districts. They show areas with higher intensity in gunshot incidents effectively. Finally the heatmap confusion matrix showcases well the relations between perpetrator and victim races. All of these different visualizations helped us tie together our narrative throughout time, locations and communities.

__Discussion.__ Think critically about your creation
- What went well?,
- What is still missing? What could be improved?, Why?

What went well was our initial observation of a spike in shooting incidents, prompting us to focus our analysis on understanding this trend. We utilized map graphs effectively to visualize various socio-economic factors such as poverty, unemployment, community trust, and demographic distributions across New York City. However, a significant limitation was the absence of specific crime locations due to missing data, hindering our ability to conduct a detailed spatial analysis. Additionally, incomplete data on perpetrator race limited our exploration of potential correlations between race and crime. To improve, we could seek additional datasets or data sources to enhance our analysis and provide a more comprehensive understanding of the factors influencing gun violence in New York City.

__Contributions.__ Who did what?
- You should write (just briefly) which group member was the main responsible for which elements of the assignment. (I want you guys to understand every part of the assignment, but usually there is someone who took lead role on certain portions of the work. That's what you should explain).

For this assignment all three members worked together and while we distributed the tasks for efficiency we were all in touch with the other parts of the project.

|                     | Artemis Doumeni (s234061)| Ioannis Tselios (s233516) | Mario Medoni (s204684) |
| ------------------- | --------------- | --------------- | ------------ |
| Dataset Exploration | 15%             | 70%             | 15%          |
| Website             | 15%             | 70%             | 15%          |
| Story               | 70%             | 0%              | 30%          |
| Jupyter Notebook    | 10%             | 50%             | 40%          |
| Time series plots   | 80%             | 0%              | 20%          |
| Map plots           | 10%             | 10%             | 80%          |