Q1: How has the number of mass shootings evolved in the big US regions between two concrete years? For this, we need you to aggregate the data in the 5 regions (Southeast, Northeast, Midwest, Northwest, and Southwest), and let the user select the first and last year of the comparison. Same for states, both views coordinated.

In [1]:
# %pip install geopandas --quiet 

Ideas: Slider for the beginning and end year (so double slider). Use a map to select the state and the subsequent region (use that map to show other things for it not to be a poor data to ink ratio). Use a line chart to show evolution as it is time series data. Two plots next to each other, as it does not make sense to compare region and state level data. Both lineplots should be coordinated (selecting a year should highlight the corresponding point in the other plot, with its corresponding tooltip).

LINEPLOT xxx STATEMAP

LINEPLOT xxx &LEGEND

Need to determine what is the default view, i.e. what to show when the user opens the page and has not selected a state/region.


In [2]:
import altair as alt
import pandas as pd

# Load the data
data_shootings = pd.read_csv('data/gun_violence_processed.csv')

state_data = pd.read_csv('data/state_data.csv').rename(columns={'state': 'State'})

# Add extra state information to the shootings data
data_shootings = data_shootings.merge(state_data, on='State')

data_shootings['Incident Date'] = pd.to_datetime(data_shootings['Incident Date'])
data_shootings['Year'] = data_shootings['Incident Date'].dt.year
data_shootings['Month'] = data_shootings['Incident Date'].dt.month

data_shootings.loc[:,'count'] = 1  # Add a count column for aggregation


# group by state and month
state_df = data_shootings.groupby(['State', 'Year']).agg({
    'region': 'first',
    'Victims Killed': 'sum',
    'Victims Injured': 'sum',
    'Population_per_state_2023': 'first',
    'count': 'sum',
    'FIPS': 'first'
}).rename(columns={
    'region': 'Region',
    'Victims Killed': 'Total Victims Killed',
    'Victims Injured': 'Total Victims Injured',
    'Population_per_state_2023': 'Population',
    'count': 'Total Incidents'
}).reset_index()

# fill empty years with 0
all_years = state_df['Year'].unique()
all_states = state_df['State'].unique()

for state in all_states:
    pop = state_df[state_df['State'] == state]['Population'].values[0]
    region = state_df[state_df['State'] == state]['Region'].values[0]
    fips = state_df[state_df['State'] == state]['FIPS'].values[0]
    current_years = state_df[state_df['State'] == state]['Year'].values
    for year in all_years:
        if year not in current_years:
            state_df = pd.concat([state_df, pd.DataFrame([{
                'State': state,
                'Year': year,
                'Region': region,
                'Total Victims Killed': 0,
                'Total Victims Injured': 0,
                'Population': pop,
                'Total Incidents': 0,
                'FIPS': fips
            }])], ignore_index=True)
        
# dataset used for the choropleth map
state_statistics_df = state_df.groupby(['State']).agg({
    'Region': 'first',
    'Total Victims Killed': 'sum',
    'Total Victims Injured': 'sum',
    'Population': 'first',
    'Total Incidents': 'sum',
    'FIPS': 'first'
}).reset_index()


region_df = state_df.groupby(['Region', 'Year']).agg({
    'Total Victims Killed': 'sum',
    'Total Victims Injured': 'sum',
    'Population': 'sum',
    'Total Incidents': 'sum'
}).reset_index()

state_df["Type"] = "State"
region_df["Type"] = "Region"

state_df.rename(columns={"State": "StateOrRegion"}, inplace=True)
region_df.rename(columns={"Region": "StateOrRegion"}, inplace=True)


state_region_df = pd.concat([state_df, region_df])

# add average evolution in the US
us_df = state_df.groupby(['Year']).agg({
    'Total Victims Killed': 'sum',
    'Total Victims Injured': 'sum',
    'Population': 'sum',
    'Total Incidents': 'sum'
}).reset_index()

us_df['StateOrRegion'] = 'United States'
us_df['Type'] = 'Country'

state_region_df = pd.concat([state_region_df, us_df])

state_region_df['Count per 100k'] = state_region_df['Total Incidents'] / state_region_df['Population'] * 100_000


state_statistics_df['StateFIPS'] = state_statistics_df['FIPS'].astype(str).str[:-3].astype(int)

state_statistics_df['Count per 100k'] = state_statistics_df['Total Incidents'] / state_statistics_df['Population'] * 100_000


# TODO: drop useless columns
state_region_df

Unnamed: 0,StateOrRegion,Year,Region,Total Victims Killed,Total Victims Injured,Population,Total Incidents,FIPS,Type,Count per 100k
0,Alabama,2014,Southeast,1,9,5108468,2,1039.0,State,0.039151
1,Alabama,2015,Southeast,4,13,5108468,4,1097.0,State,0.078301
2,Alabama,2016,Southeast,15,53,5108468,15,1045.0,State,0.293630
3,Alabama,2017,Southeast,6,38,5108468,7,1109.0,State,0.137027
4,Alabama,2018,Southeast,11,53,5108468,14,1091.0,State,0.274055
...,...,...,...,...,...,...,...,...,...,...
5,United States,2019,,428,1690,334914895,414,,Country,0.123613
6,United States,2020,,495,2526,334914895,611,,Country,0.182434
7,United States,2021,,668,2784,334914895,689,,Country,0.205724
8,United States,2022,,642,2647,334914895,644,,Country,0.192288


In [3]:
from vega_datasets import data as vega_data

# ======== Chart dimensions ========
MAP_DIM = (500, 300)
LINE_CHART_DIM = (600, 400)

# Define a selection for the state
state_selection = alt.selection_point(fields=['State'], name='SelectState', empty='none')
region_selection = alt.selection_point(fields=['Region'], name='SelectRegion', empty='none')

year_selection = alt.selection_interval(fields=['Year'], encodings=['x'], translate=False)

# Load US states topojson
states_topo = alt.topo_feature(vega_data.us_10m.url, feature='states')

# Create the choropleth map
state_map = alt.Chart(states_topo).mark_geoshape(
    stroke='white',
    strokeWidth=1
).encode(
    color=alt.Color(
        'Region:N',
        scale=alt.Scale(
            domain=['Southeast', 'Northeast', 'Midwest', 'Northwest', 'Southwest']
        ),
        legend=alt.Legend(orient='left')
    ),
    opacity=alt.condition(
        state_selection,
        alt.value(1),
        alt.value(0.8)
    ),
    tooltip=['State:N', 'Region:N']
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(state_statistics_df, 'StateFIPS', ['State', 'Region'])
).add_params(
    state_selection, region_selection
).properties(
    width=MAP_DIM[0],
    height=MAP_DIM[1],
).project(
    type='albersUsa'
)

# generate a dataframe with all the possible titles to make the title dynamic
title_df = state_data[["State", "region"]].copy()
title_df["title"] = title_df.apply(
    lambda x: f"Gun Violence evolution in {x['State']} and {x['region']}",
    axis=1
)

# add a base title where state and region are not selected
title_df = pd.concat([title_df, pd.DataFrame([{
    'State': pd.NA,
    'region': pd.NA,
    'title': 'Gun Violence evolution in the United States'
}])])

line_chart_title = alt.Chart(title_df).mark_text(
    align='left',
    fontSize=13,
    fontWeight='bold',
).encode(
    text=alt.condition(
        "datum.State == SelectState.State && datum.region == SelectRegion.Region",
        'title:N',
        alt.value('')
    )
)

line_chart = alt.Chart(state_region_df).mark_line(point=True).encode(
    x=alt.X('Year:O', title='', axis=alt.Axis(labelAngle=0)),
    y=alt.Y('Count per 100k:Q'),
    color=alt.Color(
        'StateOrRegion:N',
        title='State',
        scale=alt.Scale(scheme='category20'),
        sort=alt.Sort(field='Type', order='ascending')
    ),
    tooltip=['Year:O', 'Count per 100k:Q']
).transform_filter(
    "datum.StateOrRegion == SelectState.State || datum.StateOrRegion == SelectRegion.Region || datum.StateOrRegion == 'United States'"
).transform_filter(
    year_selection
).properties(
    width=LINE_CHART_DIM[0],
    height=LINE_CHART_DIM[1] - 100
)


party_data = pd.DataFrame({
    'Year': [2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023],
    'Party': ['Democrat', 'Democrat', 'Democrat', 'Republican', 'Republican', 'Republican', 'Republican', 'Democrat', 'Democrat', 'Democrat']
})

year_selector = alt.Chart(party_data).mark_rect().encode(
    x=alt.X('Year:O', title='', axis=None),
    color=alt.Color(
        'Party:N',
        title='Party in power',
        scale=alt.Scale(
            domain=['Democrat', 'Republican'],
            range=['blue', 'red']
        ),
        legend=alt.Legend(orient='left')
    ),
    opacity=alt.condition(
        year_selection,
        alt.value(0.8),
        alt.value(0.4)
    ),
    tooltip=['Party']
).properties(
    width=MAP_DIM[0],
    height=15
).add_params(
    year_selection
)

year_text = alt.Chart(party_data).mark_text(
    align='center',
    baseline='middle',
    dy=1,
    fontWeight='bold',
).encode(
    x=alt.X('Year:O'),
    text=alt.Text('Year:O')
)

final_chart = alt.hconcat(
    alt.vconcat(
        state_map,
        year_selector + year_text
    ).resolve_scale(
        color='independent',
    ),
    alt.vconcat(
        line_chart_title,
        line_chart
    )
).resolve_scale(
    color='independent'
).configure_title(
    fontSize=16
)

final_chart

For some of the states, the labels overlap, therefore we will keep the legend as before

In [8]:
# Add background color to the line chart in terms of the political party in power
RDplot = alt.Chart(party_data).mark_rect(opacity=0.2).encode(
    x=alt.X('Year:O', title='', scale=alt.Scale(padding=0)),
    color=alt.Color(
        'Party:N',
        scale=alt.Scale(domain=['Democrat', 'Republican'], range=['blue', 'red']),
        legend=None
    )
).transform_filter(
    year_selection
)

# Combine the line chart with the background color
RDplot = (line_chart + RDplot).resolve_scale(color='independent')


final_chart = alt.hconcat(
    alt.vconcat(
        state_map,
        year_selector + year_text
    ).resolve_scale(
        color='independent',
    ),
    alt.vconcat(
        line_chart_title,
        RDplot
    )
).resolve_scale(
    color='independent'
).configure_title(
    fontSize=16
)

final_chart

Too much clutter. Poor data to ink ratio.

In [9]:
# Add colored points to the line chart in terms of the political party in power
RDplot = alt.Chart(state_region_df).mark_point(filled=True).encode(
    x=alt.X('Year:O', title='', axis=alt.Axis(labelAngle=0), scale=alt.Scale(padding=0)),
    y=alt.Y('Count per 100k:Q'),
    color=alt.Color(
        'Party:N',
        scale=alt.Scale(domain=['Democrat', 'Republican'], range=['blue', 'red']),
        legend=None
    ),
    size=alt.value(40),
    tooltip=['Year:O', 'Count per 100k:Q', 'Party:N']
).transform_lookup(
    lookup='Year',
    from_=alt.LookupData(party_data, 'Year', ['Party'])
).transform_filter(
    "datum.StateOrRegion == SelectState.State || datum.StateOrRegion == SelectRegion.Region || datum.StateOrRegion == 'United States'"
).transform_filter(
    year_selection
).properties(
    width=LINE_CHART_DIM[0],
    height=LINE_CHART_DIM[1] - 100
)

# Combine the line chart with the background color
RDplot = (line_chart + RDplot).resolve_scale(color='independent')


final_chart = alt.hconcat(
    alt.vconcat(
        state_map,
        year_selector + year_text
    ).resolve_scale(
        color='independent',
    ),
    alt.vconcat(
        line_chart_title,
        RDplot
    )
).resolve_scale(
    color='independent'
).configure_title(
    fontSize=16
)

final_chart

Things to improve:
- Where to put the title. Middle for general title, and titles for each lineplot.
- Add extra information in the lineplot, like we did in the previous project.