# NYPD Shooting Incident Data Visulization

## Summary
This is a visualization of New York City shooting incident based on historical data on shooting incidents across boroughs from 2006 to 2021. The visualization shows borough, time of day, and victim data from the shooting incidents. Via interaction with the visualization, users should be able to answer what should be taken into account when we try to predict shooting incidents in the NYC area.

The target audiences for this visualization are executive decision-making teams for police force planning as well as others who work in the public safty domain. 

FDS method was used to design the visualization. Bar chart, radial chart and line chart were chosen to create  this visualization. From selection and filtering interactions, users can drill into the data and connect to the related data if needed. These interactions help users to build deeper insights. "Tabluea 10" color scheme was used consistently for "BORO" (borough) count of incident data. Opacity changes when selecting bars in bar charts helps users to highlight the data in the view. Tooltips, labels, and legends were added to improve readability and comprehension.


## About the Dataset

<b>Dataset of shooting incidents in NYC:</b>
<br> This dataset was created to analyze shooting incidents that occurred in NYC from 2006 to 2021. 
This is a list of every shooting incident that occurred in NYC from 2006 to 2021. It was created from [NYPD Shooting Incident Data](https://data.cityofnewyork.us/api/views/833y-fsy8/rows.csv?accessType=DOWNLOAD) from DATA.GOV and [Wikipedia](https://en.wikipedia.org/wiki/Boroughs_of_New_York_City).

## Content
This dataset contains two files, one for the incident information (NYPD_Shooting_Incident.csv) and another for the BORO population (NY_Population.csv).

The <b>NYPD_Shooting_Incident.csv</b> contains all five boroughs, 4 columns containing:
- Hour_of_Day: hour of the day when the incident occurred
- BORO: Borough name
- Age: Victim's age group
- Count_of_Incident: Number of incidents occurred

The <b>NY_Population.csv</b> contains five boroughs' poplation from 2020 census:
- Borough: Borough names
- Population: Borough population


Evaluation and feedback: Summarize any user testing, feedback, or evaluation you conducted to refine your design and ensure its effectiveness.






Conclusion
Area incident rate should look at incident per thousand residents for the area instead of the raw number of incidents for the area. For example, even though the Bronx has a lower population than Brooklyn, it also has the highest incident rate per thousand residents. There are likely other factors that can explain the higher incident rate as well, such as level of income, education, etc., although these are not covered in my report.

There is a significant correlation between frequency of incidents, time of day, and month of year. Incidents occur least frequently around 9 AM, then rise steadily until they peak at midnight. Seasonally, incidents occur least frequently during the holidays.

Finally, we found a strong correlation between unemployment rate and shootings. Incidents are strongly correlated with the unemployment rate, so we can can use this economic trend as a predictor of increased violence in New York City.


In [2]:
import altair as alt
import pandas as pd
import numpy as np

In [3]:
data = pd.read_csv('NYPD_Shooting_Incident.csv')
population = pd.read_csv('NY_Population.csv')

## Load the files and inspect the data

<b>NYPD_Shooting_Incident.csv</b>

In [35]:
data.head() 

Unnamed: 0,Hour_of_Day,BORO,Age,Count_of_Incident
0,0,BRONX,<18,91
1,0,BRONX,18-24,261
2,0,BRONX,25-44,269
3,0,BRONX,45-64,21
4,0,BRONX,65+,2


<b>NY_Population.csv</b>

In [33]:
population['Borough'] = population['Borough'].str.upper()

In [34]:
population.head()

Unnamed: 0,Borough,Population
0,BRONX,1472654
1,BROOKLYN,2736074
2,MANHATTAN,1694263
3,QUEENS,2405464
4,STATEN ISLAND,495747


## Incident by Time of Day
Analyze if time of the day is a factor in incidents. Select dropdown list for specific borough.

In [92]:
dropdown = alt.binding_select (options=data['BORO'].unique(),name='Select a BORO:')
selection = alt.selection(type = 'single', fields=['BORO'], bind=dropdown)

Hour_of_Day=alt.Chart(data).mark_bar().encode(
    x = alt.X('Hour_of_Day:O', title='Hour of Day'),
    y = alt.Y('sum(Count_of_Incident):Q', title='Count of Incident'),
    color = alt.Color('BORO', scale=alt.Scale(scheme='tableau10')),
    #tooltip=['BORO', 'Hour_of_Day:O', alt.Tooltip('sum(Count_of_Incident):Q', title= 'Count of Incident')],
    opacity=alt.condition(selection,alt.value(1),alt.value(.2)),
    
).add_selection(selection)
Hour_of_Day

In [181]:
selection = alt.selection(type='multi', fields=['BORO'])

base_boro =  alt.Chart(data).properties(width=300, height=250)
BORO = base_boro.mark_bar().encode(
    x = 'BORO',
    y = alt.Y('sum(Count_of_Incident):Q', title='Count of Incident'),
    color = alt.Color('BORO', scale=alt.Scale(scheme='tableau10')),
    opacity=alt.condition(selection,alt.value(1),alt.value(.2)),
    tooltip=['BORO', alt.Tooltip('sum(Count_of_Incident):Q', title= 'Total Incident')]
).add_selection(selection)


base_age = alt.Chart(data).encode(
    theta=alt.Theta("sum(Count_of_Incident):Q", stack=True, type="quantitative"),
    radius=alt.Radius("sum(Count_of_Incident)", scale=alt.Scale(type="sqrt", zero=False, rangeMax=110)),
    color = alt.Color('Age', scale=alt.Scale(scheme='orangered'),sort=["sum(Count_of_Incident):Q"]),
    tooltip=[alt.Tooltip('Age', title= 'Age Group'), alt.Tooltip('sum(Count_of_Incident):Q', title= 'Total Incident')],
)

age_pie = base_age.mark_arc(innerRadius=10, stroke="#fff")


Age=age_pie.transform_filter(selection)



In [182]:
boro_sum=data.groupby('BORO').sum()
boro_sum=boro_sum.drop('Hour_of_Day', axis=1)
boro_sum=boro_sum.reset_index()


In [183]:
data_percentage = boro_sum.join(population.set_index('Borough'), on='BORO')
data_percentage['Incident_per_Million'] = round(data_percentage.Count_of_Incident / data_percentage.Population *1000000,0)
#print(data_percentage)

In [184]:
base_percent = alt.Chart(data_percentage).properties(width=300, height=250)
incident_population_percentage = base_percent.mark_line(point=True, color="#FFAA00").encode(
    x = 'BORO',
    y = alt.Y(
        'Incident_per_Million:Q', 
        title='Incident_per_Million',
    ),
    color=alt.value("#000000"),
    #opacity=alt.condition(selection,alt.value(1),alt.value(.2)),
    tooltip=['BORO', alt.Tooltip('Incident_per_Million:Q', title= 'Incident_per_Million')]
)

## Incidents by Borough
Layer incidents per million on top of incident total by borough, and show with the plot below that the number does not simply increase with higher population.
Select a borough to filter for shooting incident counts by victim age group.

In [185]:
BORO_combine = BORO+incident_population_percentage
alt.hconcat(BORO_combine,Age).resolve_scale(color='independent')

## Evaluation
A think-aloud study was performed for this visulization.

<b>Objective</b>:
The goal of this think-aloud study was to obtain direct feedback from users, understand their thought processes, and assess their ability to discover patterns and trends in the data visualization related to shooting incidents in New York City.

<b>Participants</b>
Three family members (B1, B2, B3) participated in the study.

<b>Introduction</b>
Participants were given a 10-15 minute introduction to the purpose of the study, the data used in the visualization, and the think-aloud method. They were encouraged to share their feelings and questions.

<b>Quesions</b>
Participants were asked to answer the following questions while thinking aloud:
- Do you notice any trends?
- Which borough do you think has the worst shooting incident situation?
- Which age group has the most victims?

<b>Findings</b>
- All three people quickly noticed that the time of day really matters for shootings in NYC. They saw that fewer incidents happen around 9 in the morning, and things get worse around midnight. 
- B1 and B2 figured out that Brooklyn has the most shootings, but they also realized it doesn't mean it's the worst borough because population density was not taken in count. When they checked out the incidents per million, they saw that the Bronx was actually the highest. B3 had a hard time seeing this because it wasn't clear that the line chart was the incident per million rate and they didn't check the tooltips. 
- All three people quickly figured out that that victims most frequently were in the 25-44 age group, and they selected different boroughs in the chart confirm the conclusion.

<b>After think-aloud session feedback</b>: 
- Change chart titles to be more descriptive
- Add labels for the radial chart for better readability

<b>Conclusion</b>:
The overall design of the visulization worked well. Participants were able to identify patterns and trends in the NYC shooting incident visualization and build understanding through interactions and tasks. In general, people can quickly and intuitively interact with the data. However, some parts of this visualizations could be improved. Besides more descrptive title and additional labels, we could also create separate incidents by borough count and incidents per million charts. 

<b>Implement Changes</b>: 
- Add titles for each chart
- Plot Incident by Borough and Incident per Million separately
- Add Age Group labels for Count of Incidents by Victim Age Group chart

## Revised Charts
Revised charts according to feedback.

In [216]:
dropdown = alt.binding_select (options=data['BORO'].unique(),name='Select a BORO:')
selection = alt.selection(type = 'single', fields=['BORO'], bind=dropdown)

Hour_of_Day=alt.Chart(data).mark_bar().encode(
    x = alt.X('Hour_of_Day:O', title='Hour of Day'),
    y = alt.Y('sum(Count_of_Incident):Q', title='Count of Incident'),
    color = alt.Color('BORO', scale=alt.Scale(scheme='tableau10')),
    #tooltip=['BORO', 'Hour_of_Day:O', alt.Tooltip('sum(Count_of_Incident):Q', title= 'Count of Incident')],
    opacity=alt.condition(selection,alt.value(1),alt.value(.2)),
    
).add_selection(selection).properties(title='Count of Incident by Hour of Day')
Hour_of_Day

In [210]:
selection = alt.selection(type='multi', fields=['BORO'])

base_boro =  alt.Chart(data).properties(width=300, height=250)
BORO = base_boro.mark_bar().encode(
    x = 'BORO',
    y = alt.Y('sum(Count_of_Incident):Q', title='Count of Incident'),
    color = alt.Color('BORO', scale=alt.Scale(scheme='tableau10')),
    opacity=alt.condition(selection,alt.value(1),alt.value(.2)),
    tooltip=['BORO', alt.Tooltip('sum(Count_of_Incident):Q', title= 'Total Incident')]
).add_selection(selection).properties(title='Count of Incident by Borough')


base_age = alt.Chart(data).encode(
    theta=alt.Theta("sum(Count_of_Incident):Q", stack=True, type="quantitative"),
    radius=alt.Radius("sum(Count_of_Incident)", scale=alt.Scale(type="sqrt", zero=False, rangeMax=110)),
    color = alt.Color('Age', scale=alt.Scale(scheme='orangered'),sort=["sum(Count_of_Incident):Q"]),
    tooltip=[alt.Tooltip('Age', title= 'Age Group'), alt.Tooltip('sum(Count_of_Incident):Q', title= 'Total Incident')],
)

age_pie = base_age.mark_arc(innerRadius=10, stroke="#fff").properties(title='Count of Incident by Victim Age Group')
Age_label= age_pie.mark_text(radiusOffset=10).encode(text="Age:N", color=alt.value("#000000"))



In [211]:
Age = age_pie + Age_label

In [212]:
base_percent = alt.Chart(data_percentage).properties(width=300, height=250)
incident_population_percentage = base_percent.mark_line(point=True, color="#FFAA00").encode(
    x = 'BORO',
    y = alt.Y(
        'Incident_per_Million:Q', 
        title='Incident per Million',
    ),
    color=alt.value("#000000"),
    #opacity=alt.condition(selection,alt.value(1),alt.value(.2)),
    tooltip=['BORO', alt.Tooltip('Incident_per_Million:Q', title= 'Incident per Million')]
).properties(title='Incident per Million Population')

In [213]:

Age=Age.transform_filter(selection)

In [214]:
alt.hconcat(BORO,incident_population_percentage,Age).resolve_scale(color='independent')