# CSE 163 Final Project: U.S. Hate Crime Analysis

In order to set up this project, unzip the "HATECRIMES163.zip" file and open up the "Hate Crime Project.ipynb" in your preferred workspace (this project was created using datalore jetbrains). Furthermore, if you do not have access to the .zip file, you will need access to download the hate crimes dataset (https://www.kaggle.com/louissebye/united-states-hate-crimes-19912017?select=hate_crime.csv) and the states dataset (https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html). For the states dataset, we downloaded "cb_2018_us_state_500k.zip." Be sure to insert the two datasets into your workspace. In order to run this project, you will need to import pandas as pd, import plotly.express as px, import geopandas as gpd, and import plotly.graph_objects. For the testing cells, you will need to add the cse163_utils.py from the .zip into the workspace. You will also need to run from cse163_utils import assert_equals. You can run the code by clicking run on each code cell from top to bottom, or by selecting a running all at once option.

### Import necessary libraries

In [1]:
import pandas as pd
import plotly.express as px
import geopandas as gpd
import plotly.graph_objects as go
from cse163_utils import assert_equals

### Read in hate crimes dataframe

In [2]:
df = pd.read_csv('/data/workspace_files/hate_crime.csv')
df.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,INCIDENT_ID,DATA_YEAR,ORI,PUB_AGENCY_NAME,PUB_AGENCY_UNIT,AGENCY_TYPE_NAME,STATE_ABBR,STATE_NAME,DIVISION_NAME,REGION_NAME,...,OFFENDER_RACE,OFFENDER_ETHNICITY,VICTIM_COUNT,OFFENSE_NAME,TOTAL_INDIVIDUAL_VICTIMS,LOCATION_NAME,BIAS_DESC,VICTIM_TYPES,MULTIPLE_OFFENSE,MULTIPLE_BIAS
0,3015,1991,AR0040200,Rogers,,City,AR,Arkansas,West South Central,South,...,White,,1,Intimidation,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-Black or African American,Individual,S,S
1,3016,1991,AR0290100,Hope,,City,AR,Arkansas,West South Central,South,...,Black or African American,,1,Simple Assault,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-White,Individual,S,S
2,43,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,...,Black or African American,,1,Aggravated Assault,1.0,Residence/Home,Anti-Black or African American,Individual,S,S
3,44,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,...,Black or African American,,2,Aggravated Assault;Destruction/Damage/Vandalis...,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-White,Individual,M,S
4,3017,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,...,Black or African American,,1,Aggravated Assault,1.0,Service/Gas Station,Anti-White,Individual,S,S


# How has the number of incidences hate crimes in a year changed from 1991-2018?

### Create series counting incidents per year using pandas

In [3]:
incidences_per_year = df.groupby('DATA_YEAR')['INCIDENT_ID'].count()
incidences_per_year

DATA_YEAR
1991    4589
1992    6667
1993    7608
1994    5954
1995    7950
1996    8790
1997    8107
1998    7902
1999    7943
2000    8219
2001    9730
2002    7485
2003    7545
2004    7685
2005    7411
2006    7716
2007    7625
2008    8039
2009    6613
2010    6630
2011    6300
2012    6594
2013    6044
2014    5599
2015    5879
2016    6268
2017    7317
2018    7194
Name: INCIDENT_ID, dtype: int64

### Use plotly to create line graph visualization

In [4]:
fig = px.line(incidences_per_year, y="INCIDENT_ID", title='Number of Reported Hate Crime Incidents in the United States Over Time',
               labels={
                     "INCIDENT_ID": "Number of Incidents",
                     "DATA_YEAR": "Year",
                 }
            )
fig.show()

Unsupported

### Testing incident accuracy for three years ("None" indicates no difference)

In [12]:
print(assert_equals(9730, incidences_per_year.loc[2001]))
print(assert_equals(5599, incidences_per_year.loc[2014]))
print(assert_equals(5954, incidences_per_year.loc[1994]))

None
None
None


# What groups are the most targeted?

### Create series counting incidents per group using pandas

In [15]:
crimes_per_group = df.groupby('BIAS_DESC')['INCIDENT_ID'].count()
crimes_per_group

BIAS_DESC
Anti-American Indian or Alaska Native                                    2160
Anti-American Indian or Alaska Native;Anti-Asian                            1
Anti-American Indian or Alaska Native;Anti-Black or African American        4
Anti-American Indian or Alaska Native;Anti-Hispanic or Latino               2
Anti-American Indian or Alaska Native;Anti-Islamic (Muslim)                 1
                                                                        ...  
Anti-Sikh                                                                  81
Anti-Transgender                                                          516
Anti-Transgender;Anti-White                                                 2
Anti-White                                                              23345
Unknown (offender's motivation not known)                                   1
Name: INCIDENT_ID, Length: 144, dtype: int64

### Create filtered series composed of race/ethnicity groups & create plotly bar chart visualization

In [16]:
races = crimes_per_group.loc[['Anti-American Indian or Alaska Native', 'Anti-Asian', 'Anti-Arab', 'Anti-Black or African American',
                        'Anti-Hispanic or Latino', 'Anti-White', 'Anti-Other Race/Ethnicity/Ancestry']]
fig = px.bar(races, y='INCIDENT_ID', title='Number of Hate Crime Incidents by Race/Ethnicity from 1991-2018',
             labels={
                     "INCIDENT_ID": "Number of Incidents",
                     "BIAS_DESC": "Race/Ethnicity Bias",
                 }
             )
fig.show()

Unsupported

### Testing incident accuracy for race groups

In [22]:
print(assert_equals(69056, races.loc['Anti-Black or African American']))
print(assert_equals(5913, races.loc['Anti-Asian']))
print(assert_equals(2160, races.loc['Anti-American Indian or Alaska Native']))

None
None
None


### Create filtered series composed of gender groups & create plotly bar chart visualization

In [24]:
genders = crimes_per_group.loc[['Anti-Female', 'Anti-Gender Non-Conforming', 'Anti-Male']]
fig = px.bar(genders, y='INCIDENT_ID', title='Number of Hate Crime Incidents by Gender from 1991-2018',
             labels={
                     "INCIDENT_ID": "Number of Incidents",
                     "BIAS_DESC": "Gender Bias",
                 }
             )
fig.show()

Unsupported

### Testing incident accuracy for gender groups

In [26]:
print(assert_equals(159, genders.loc['Anti-Female']))
print(assert_equals(154, genders.loc['Anti-Gender Non-Conforming']))
print(assert_equals(76, genders.loc['Anti-Male']))

None
None
None


### Create filtered series composed of sexuality groups & create plotly bar chart visualization

In [27]:
sexualities = crimes_per_group.loc[['Anti-Bisexual', 'Anti-Gay (Male)', 'Anti-Heterosexual', 'Anti-Lesbian (Female)',
                             'Anti-Lesbian, Gay, Bisexual, or Transgender (Mixed Group)',]]
fig = px.bar(sexualities, y='INCIDENT_ID', title='Number of Hate Crime Incidents by Sexuality from 1991-2018',
             labels={
                     "INCIDENT_ID": "Number of Incidents",
                     "BIAS_DESC": "Sexuality Bias",
                 }
             )
fig.show()

Unsupported

### Testing incident accuracy for sexuality groups

In [30]:
print(assert_equals(20316, sexualities.loc['Anti-Gay (Male)']))
print(assert_equals(4266, sexualities.loc['Anti-Lesbian (Female)']))
print(assert_equals(6077, sexualities.loc['Anti-Lesbian, Gay, Bisexual, or Transgender (Mixed Group)']))

None
None
None


### Create filtered series composed of religion groups & create plotly bar chart visualization

In [32]:
religions = crimes_per_group.loc[['Anti-Atheism/Agnosticism', 'Anti-Buddhist', 'Anti-Catholic', 
                                  'Anti-Eastern Orthodox (Russian, Greek, Other)', 'Anti-Hindu', 'Anti-Islamic (Muslim)', 
                                  'Anti-Jewish', "Anti-Jehovah's Witness", 'Anti-Other Religion',
                                  'Anti-Protestant', 'Anti-Sikh']]
fig = px.bar(religions, y='INCIDENT_ID', title='Number of Hate Crime Incidents by Religion from 1991-2018',
             labels={
                     "INCIDENT_ID": "Number of Incidents",
                     "BIAS_DESC": "Religious Bias",
                 }
             )
fig.show()

Unsupported

### Testing incident accuracy for religion groups

In [34]:
print(assert_equals(1473, religions.loc['Anti-Catholic']))
print(assert_equals(26109, religions.loc['Anti-Jewish']))
print(assert_equals(3337, religions.loc['Anti-Other Religion']))

None
None
None


# Where do most hate crimes occur? Does this change depending on the bias of the crime?

### Read in states dataframe

In [40]:
states = gpd.read_file('cb_2018_us_state_500k.shp')
states.head()

Unnamed: 0,STATEFP,STATENS,AFFGEOID,GEOID,STUSPS,NAME,LSAD,ALAND,AWATER,geometry
0,28,1779790,0400000US28,28,MS,Mississippi,0,121533519481,3926919758,"MULTIPOLYGON (((-88.50297 30.21523, -88.49176 ..."
1,37,1027616,0400000US37,37,NC,North Carolina,0,125923656064,13466071395,"MULTIPOLYGON (((-75.72681 35.93584, -75.71827 ..."
2,40,1102857,0400000US40,40,OK,Oklahoma,0,177662925723,3374587997,"POLYGON ((-103.00257 36.52659, -103.00219 36.6..."
3,51,1779803,0400000US51,51,VA,Virginia,0,102257717110,8528531774,"MULTIPOLYGON (((-75.74241 37.80835, -75.74151 ..."
4,54,1779805,0400000US54,54,WV,West Virginia,0,62266474513,489028543,"POLYGON ((-82.64320 38.16909, -82.64300 38.169..."


### 

### Create new total occurences per state dataframe using original hate crimes df

In [51]:
state_occurences = df.groupby('STATE_NAME')['INCIDENT_ID'].count()
state_occurences = state_occurences.to_frame()
state_occurences['STATE'] = state_occurences.index
state_occurences = state_occurences.rename(columns={"INCIDENT_ID": "Number of Crimes"})
state_occurences

Unnamed: 0_level_0,Number of Crimes,STATE
STATE_NAME,Unnamed: 1_level_1,Unnamed: 2_level_1
Alabama,205,Alabama
Alaska,218,Alaska
Arizona,6273,Arizona
Arkansas,1056,Arkansas
California,33891,California
Colorado,3824,Colorado
Connecticut,3466,Connecticut
Delaware,892,Delaware
District of Columbia,1354,District of Columbia
Federal,22,Federal


### Merge states df with state_occurences df in order to add geo data for chloropleth graphs

In [64]:
geo_crimes_df = state_occurences.merge(states, left_on='STATE', right_on='NAME', how='left')
geo_crimes_df

Unnamed: 0,Number of Crimes,STATE,STATEFP,STATENS,AFFGEOID,GEOID,STUSPS,NAME,LSAD,ALAND,AWATER,geometry
0,205,Alabama,1.0,1779775.0,0400000US01,1.0,AL,Alabama,0.0,131174000000.0,4593327000.0,"MULTIPOLYGON (((-88.05338 30.50699, -88.05109 ..."
1,218,Alaska,2.0,1785533.0,0400000US02,2.0,AK,Alaska,0.0,1478840000000.0,245481600000.0,"MULTIPOLYGON (((179.48246 51.98283, 179.48656 ..."
2,6273,Arizona,4.0,1779777.0,0400000US04,4.0,AZ,Arizona,0.0,294198600000.0,1027338000.0,"POLYGON ((-114.81629 32.50804, -114.81432 32.5..."
3,1056,Arkansas,5.0,68085.0,0400000US05,5.0,AR,Arkansas,0.0,134768900000.0,2962860000.0,"POLYGON ((-94.61783 36.49941, -94.61765 36.499..."
4,33891,California,6.0,1779778.0,0400000US06,6.0,CA,California,0.0,403503900000.0,20463870000.0,"MULTIPOLYGON (((-118.60442 33.47855, -118.5987..."
5,3824,Colorado,8.0,1779779.0,0400000US08,8.0,CO,Colorado,0.0,268422900000.0,1181622000.0,"POLYGON ((-109.06025 38.59933, -109.05954 38.7..."
6,3466,Connecticut,9.0,1779780.0,0400000US09,9.0,CT,Connecticut,0.0,12542500000.0,1815618000.0,"MULTIPOLYGON (((-72.76143 41.24233, -72.75973 ..."
7,892,Delaware,10.0,1779781.0,0400000US10,10.0,DE,Delaware,0.0,5045926000.0,1399986000.0,"MULTIPOLYGON (((-75.56555 39.51485, -75.56174 ..."
8,1354,District of Columbia,11.0,1702382.0,0400000US11,11.0,DC,District of Columbia,0.0,158340400.0,18687200.0,"POLYGON ((-77.11976 38.93434, -77.11253 38.940..."
9,22,Federal,,,,,,,,,,


### Testing efficacy of dataframe merge by comparing observation amounts

In [63]:
print(assert_equals(len(state_occurences), len(geo_crimes_df)))

None


### Create visualization of state occurences using plotly.graph_objects

In [65]:
fig = go.Figure(data=go.Choropleth(
    locations=geo_crimes_df['STUSPS'],
    z = geo_crimes_df['Number of Crimes'],
    locationmode = 'USA-states',
    colorscale = 'Reds',
    text=geo_crimes_df['STUSPS'],
    marker_line_color='white',
    colorbar_title="Number of Incidents"
))

fig.update_layout(
    title_text='Number of Hate Crime Incidents Across States from 1991-2018',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
    )
)

fig.show()

Unsupported

### Testing occurence accuracy for states

In [72]:
print(assert_equals(33891, geo_crimes_df.loc[4, 'Number of Crimes']))
print(assert_equals(7517, geo_crimes_df.loc[49, 'Number of Crimes']))
print(assert_equals(392, geo_crimes_df.loc[36, 'Number of Crimes']))

None
None
None


### Create dataframe with geography information for exclusively "Anti-Gay (Male)"

In [66]:
anti_gay_df = df[df["BIAS_DESC"] == "Anti-Gay (Male)"]
ag_frametotal = len(anti_gay_df)
ag_state_occurences = 100 * anti_gay_df.groupby('STATE_NAME')['INCIDENT_ID'].count() / ag_frametotal
ag_state_occurences = ag_state_occurences.to_frame()
ag_state_occurences['STATE'] = ag_state_occurences.index
ag_state_occurences = ag_state_occurences.rename(columns={"INCIDENT_ID": "Number of Crimes"})
ag_total = ag_state_occurences['Number of Crimes'].sum()
ag_geo_crimes_df = ag_state_occurences.merge(states, left_on='STATE', right_on='NAME', how='left')
ag_geo_crimes_df = ag_geo_crimes_df.dropna()

20316


### Create visualization of state occurences for "Anti-Gay (Male)" using plotly.graph_objects

In [67]:
fig = go.Figure(data=go.Choropleth(
    locations=ag_geo_crimes_df['STUSPS'],
    z = ag_geo_crimes_df['Number of Crimes'],
    locationmode = 'USA-states',
    colorscale = 'Reds',
    text=ag_geo_crimes_df['STUSPS'],
    marker_line_color='white',
    colorbar_title="% of Crimes"
))

fig.update_layout(
    title_text='Percentage Distribution of Anti-Gay (Male) Hate Crime Incidences Across States from 1991-2018',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
    )
)

Unsupported

### Testing percentage accuracy for states for exclusively "Anti-Gay (Male)"

In [76]:
ag_geo_crimes_df

Unnamed: 0,Number of Crimes,STATE,STATEFP,STATENS,AFFGEOID,GEOID,STUSPS,NAME,LSAD,ALAND,AWATER,geometry
0,0.063989,Alabama,1,1779775,0400000US01,1,AL,Alabama,0,131174000000.0,4593327000.0,"MULTIPOLYGON (((-88.05338 30.50699, -88.05109 ..."
1,0.0886,Alaska,2,1785533,0400000US02,2,AK,Alaska,0,1478840000000.0,245481600000.0,"MULTIPOLYGON (((179.48246 51.98283, 179.48656 ..."
2,4.139594,Arizona,4,1779777,0400000US04,4,AZ,Arizona,0,294198600000.0,1027338000.0,"POLYGON ((-114.81629 32.50804, -114.81432 32.5..."
3,0.236267,Arkansas,5,68085,0400000US05,5,AR,Arkansas,0,134768900000.0,2962860000.0,"POLYGON ((-94.61783 36.49941, -94.61765 36.499..."
4,23.96141,California,6,1779778,0400000US06,6,CA,California,0,403503900000.0,20463870000.0,"MULTIPOLYGON (((-118.60442 33.47855, -118.5987..."
5,1.63418,Colorado,8,1779779,0400000US08,8,CO,Colorado,0,268422900000.0,1181622000.0,"POLYGON ((-109.06025 38.59933, -109.05954 38.7..."
6,1.447135,Connecticut,9,1779780,0400000US09,9,CT,Connecticut,0,12542500000.0,1815618000.0,"MULTIPOLYGON (((-72.76143 41.24233, -72.75973 ..."
7,0.59559,Delaware,10,1779781,0400000US10,10,DE,Delaware,0,5045926000.0,1399986000.0,"MULTIPOLYGON (((-75.56555 39.51485, -75.56174 ..."
8,2.559559,District of Columbia,11,1702382,0400000US11,11,DC,District of Columbia,0,158340400.0,18687200.0,"POLYGON ((-77.11976 38.93434, -77.11253 38.940..."
10,0.881079,Florida,12,294478,0400000US12,12,FL,Florida,0,138949100000.0,31361100000.0,"MULTIPOLYGON (((-80.17628 25.52505, -80.17395 ..."


In [79]:
print(assert_equals(3.927939, round(ag_geo_crimes_df.loc[48, 'Number of Crimes'], 6)))
print(assert_equals(23.961410, round(ag_geo_crimes_df.loc[4, 'Number of Crimes'], 6)))
print(assert_equals(0.546367, round(ag_geo_crimes_df.loc[45, 'Number of Crimes'], 5)))

None
None
None


### Create dataframe with geography information for exclusively "Anti-Black or African American"

In [68]:
anti_black_df = df[df["BIAS_DESC"] == "Anti-Black or African American"]
ab_frametotal = len(anti_black_df)
ab_state_occurences = 100 * anti_black_df.groupby('STATE_NAME')['INCIDENT_ID'].count() / ab_frametotal
ab_state_occurences = ab_state_occurences.to_frame()
ab_state_occurences['STATE'] = ab_state_occurences.index
ab_state_occurences = ab_state_occurences.rename(columns={"INCIDENT_ID": "Number of Crimes"})
ab_total = ab_state_occurences['Number of Crimes'].sum()
ab_geo_crimes_df = ab_state_occurences.merge(states, left_on='STATE', right_on='NAME', how='left')
ab_geo_crimes_df = ab_geo_crimes_df.dropna()

69056


### Create visualization of state occurences for "Anti-Black or African American" using plotly.graph_objects

In [69]:
fig = go.Figure(data=go.Choropleth(
    locations=ab_geo_crimes_df['STUSPS'],
    z = ab_geo_crimes_df['Number of Crimes'],
    locationmode = 'USA-states',
    colorscale = 'Reds',
    text=geo_crimes_df['STUSPS'],
    marker_line_color='white',
    colorbar_title="% of Crimes"
))

fig.update_layout(
    title_text='Percentage Distribution of Anti-Black Hate Crime Incidences Across States from 1991-2018',
    geo = dict(
        scope='usa',
        projection=go.layout.geo.Projection(type = 'albers usa'),
    )
)

Unsupported

### Testing percentage accuracy for states for exclusively "Anti-Black or African American"

In [77]:
ab_geo_crimes_df

Unnamed: 0,Number of Crimes,STATE,STATEFP,STATENS,AFFGEOID,GEOID,STUSPS,NAME,LSAD,ALAND,AWATER,geometry
0,0.149154,Alabama,1,1779775,0400000US01,1,AL,Alabama,0,131174000000.0,4593327000.0,"MULTIPOLYGON (((-88.05338 30.50699, -88.05109 ..."
1,0.115848,Alaska,2,1785533,0400000US02,2,AK,Alaska,0,1478840000000.0,245481600000.0,"MULTIPOLYGON (((179.48246 51.98283, 179.48656 ..."
2,3.04101,Arizona,4,1779777,0400000US04,4,AZ,Arizona,0,294198600000.0,1027338000.0,"POLYGON ((-114.81629 32.50804, -114.81432 32.5..."
3,0.505387,Arkansas,5,68085,0400000US05,5,AR,Arkansas,0,134768900000.0,2962860000.0,"POLYGON ((-94.61783 36.49941, -94.61765 36.499..."
4,15.361446,California,6,1779778,0400000US06,6,CA,California,0,403503900000.0,20463870000.0,"MULTIPOLYGON (((-118.60442 33.47855, -118.5987..."
5,1.711654,Colorado,8,1779779,0400000US08,8,CO,Colorado,0,268422900000.0,1181622000.0,"POLYGON ((-109.06025 38.59933, -109.05954 38.7..."
6,1.813021,Connecticut,9,1779780,0400000US09,9,CT,Connecticut,0,12542500000.0,1815618000.0,"MULTIPOLYGON (((-72.76143 41.24233, -72.75973 ..."
7,0.580688,Delaware,10,1779781,0400000US10,10,DE,Delaware,0,5045926000.0,1399986000.0,"MULTIPOLYGON (((-75.56555 39.51485, -75.56174 ..."
8,0.243281,District of Columbia,11,1702382,0400000US11,11,DC,District of Columbia,0,158340400.0,18687200.0,"POLYGON ((-77.11976 38.93434, -77.11253 38.940..."
10,2.487836,Florida,12,294478,0400000US12,12,FL,Florida,0,138949100000.0,31361100000.0,"MULTIPOLYGON (((-80.17628 25.52505, -80.17395 ..."


In [0]:
print(assert_equals(3.534812, round(ab_geo_crimes_df.loc[49, 'Number of Crimes'], 6)))
print(assert_equals(15.36145, round(ab_geo_crimes_df.loc[4, 'Number of Crimes'], 6)))
print(assert_equals(4.119845, round(ab_geo_crimes_df.loc[45, 'Number of Crimes'], 6)))