# US Hate Crime Analysis

1. [Background](#background)
2. [Inspiration](#inspiration)
3. [Methodologies](#methodologies)
    - [Data Gathering](#data-gathering)
    - [Data Exploration](#data-exploration)
    - [Data Pre-processing](#data-pre-processing)

<a name='background'></a>
## Background

**Hate crime**, or also known as bias motivated crime, is a crime motivated by prejudice or intolerance toward an individual’s membership (or perceived membership) to a certain group. For example, some reported hate crime victim are assaulted just because his skin color is black. Unfortunately this kind of crime slowly became common things. Discrimantion keep increasing even when the majority of society constantly protest againts these. This analysis will create simple vizualitation about hate crime characteristics in US from 2010-2018. The result can be used for another country if the data is available. 

<a name='inspiration'></a>
## Analysis Question

1. How US hate crime trends in 2010 to 2018?
2. What biases that used for most of the cases?
3. What are the usual criminal acts commited on US hate crime?

<a name='methodologies'></a>
## Methodologies

This analysis will be using three tools:
1. **Python 3.7 in Jupyter Notebook**, for data gathering and preprocessing.
2. **Excel 365**, for data visualization.
3. **Powerpoint 365**, for data analysis report poster.

Analysis will be aiming to answer the analysis question.

In [1]:
import numpy as np
import pandas as pd

<a name='data-gathering'></a>
### Data Gathering

Data obtained from [United States Hate Crimes (1991-2018)](https://www.kaggle.com/louissebye/united-states-hate-crimes-19912017) that preprocessed and posted by [Louisse Bye](https://www.kaggle.com/louissebye). Data gathered from [FBI: Crime Data Explorer](https://crime-data-explorer.fr.cloud.gov/downloads-and-docs). This data consist of several files:

|File Name|Description|
|:---|:---|
|HC Readme.docx|Provide further information about technical definitions used in dataset|
|NIBRS_DataDictionary.pdf|Data dictionary to provide definitions/information|
|hate_crime.csv|the data|

In [3]:
hc = pd.read_csv('data/raw/hate_crime.csv')

print(hc.shape)
print(hc.columns)
hc.head()

(201403, 28)
Index(['INCIDENT_ID', 'DATA_YEAR', 'ORI', 'PUB_AGENCY_NAME', 'PUB_AGENCY_UNIT',
       'AGENCY_TYPE_NAME', 'STATE_ABBR', 'STATE_NAME', 'DIVISION_NAME',
       'REGION_NAME', 'POPULATION_GROUP_CODE', 'POPULATION_GROUP_DESC',
       'INCIDENT_DATE', 'ADULT_VICTIM_COUNT', 'JUVENILE_VICTIM_COUNT',
       'TOTAL_OFFENDER_COUNT', 'ADULT_OFFENDER_COUNT',
       'JUVENILE_OFFENDER_COUNT', 'OFFENDER_RACE', 'OFFENDER_ETHNICITY',
       'VICTIM_COUNT', 'OFFENSE_NAME', 'TOTAL_INDIVIDUAL_VICTIMS',
       'LOCATION_NAME', 'BIAS_DESC', 'VICTIM_TYPES', 'MULTIPLE_OFFENSE',
       'MULTIPLE_BIAS'],
      dtype='object')


  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,INCIDENT_ID,DATA_YEAR,ORI,PUB_AGENCY_NAME,PUB_AGENCY_UNIT,AGENCY_TYPE_NAME,STATE_ABBR,STATE_NAME,DIVISION_NAME,REGION_NAME,...,OFFENDER_RACE,OFFENDER_ETHNICITY,VICTIM_COUNT,OFFENSE_NAME,TOTAL_INDIVIDUAL_VICTIMS,LOCATION_NAME,BIAS_DESC,VICTIM_TYPES,MULTIPLE_OFFENSE,MULTIPLE_BIAS
0,3015,1991,AR0040200,Rogers,,City,AR,Arkansas,West South Central,South,...,White,,1,Intimidation,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-Black or African American,Individual,S,S
1,3016,1991,AR0290100,Hope,,City,AR,Arkansas,West South Central,South,...,Black or African American,,1,Simple Assault,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-White,Individual,S,S
2,43,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,...,Black or African American,,1,Aggravated Assault,1.0,Residence/Home,Anti-Black or African American,Individual,S,S
3,44,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,...,Black or African American,,2,Aggravated Assault;Destruction/Damage/Vandalis...,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-White,Individual,M,S
4,3017,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,...,Black or African American,,1,Aggravated Assault,1.0,Service/Gas Station,Anti-White,Individual,S,S


<a name='data-exploration'></a>
### Data Exploration

**1. Multi-Value Columns Exploration**

Some columns in HateCrimeTable (hate_crime.csv) have multiple value. These columns are 'OFFENSE_NAME, 'LOCATION_NAME', and 'BIAS_DESC'.

In [4]:
oset = set()
for offenses in hc['OFFENSE_NAME'].tolist():
    for offense in offenses.split(';'):
        oset.add(offense)

In [5]:
lset = set()
for locations in hc['LOCATION_NAME'].tolist():
    for location in locations.split(';'):
        lset.add(location)

In [6]:
bset = set()
for biases in hc['BIAS_DESC'].tolist():
    for bias in biases.split(';'):
        bset.add(bias)

In [7]:
print("Unique Value")
print("OFFENSE_NAME : ", len(oset))
print("LOCATION_NAME: ", len(lset))
print("BIAS_DESC    : ", len(bset))

Unique Value
OFFENSE_NAME :  48
LOCATION_NAME:  46
BIAS_DESC    :  35


These multi-value columns need to be parsed and convert to another table. Another way to improve readability is to cluster each category in multi-value columns. The clustering will use external information as references and will be done manually. Both method will make visualization easier and data more representable.

<a name='data-pre-processing'></a>
### Data Pre-processing

**1. Cleaning Data**

This analysis will only using reported incident from 2010 to 2018. Some listed state won't be included too (Guam, Hawaii, and Federal) for the sake of better map visualization.

In [9]:
df = hc[(hc['DATA_YEAR'] >= 2010) & (hc['DATA_YEAR'] <= 2018)]
df = df.drop(df[(df['STATE_NAME']=="Guam") | (df['STATE_NAME']=="Hawaii") | (df['STATE_NAME']=="Federal")
               ].index)
print(df.shape)

(57758, 28)


**2. Clustering Each Category in Multi-Value Columns**

Category-Cluster mapping can be seen in this file. Cluster will be used to reduce variety of category. This will make vizualiation more readable.

In [10]:
# Offense Name
omap = {
    'Aggravated Assault': 'Assault',
    'Animal Cruelty': 'Assault',
    'Arson': 'Assault',
    'Destruction/Damage/Vandalism of Property': 'Assault',
    'Simple Assault': 'Assault',
    'Burglary/Breaking & Entering': 'Burgarly or Robbery',
    'Motor Vehicle Theft': 'Burgarly or Robbery',
    'Pocket-picking': 'Burgarly or Robbery',
    'Purse-snatching': 'Burgarly or Robbery',
    'Robbery': 'Burgarly or Robbery',
    'Shoplifting': 'Burgarly or Robbery',
    'Stolen Property Offenses': 'Burgarly or Robbery',
    'Theft From Building': 'Burgarly or Robbery',
    'Theft From Coin-Operated Machine or Device': 'Burgarly or Robbery',
    'Theft From Motor Vehicle': 'Burgarly or Robbery',
    'Theft of Motor Vehicle Parts or Accessories': 'Burgarly or Robbery',
    'Drug Equipment Violations': 'Drug/Weapon Law Violations',
    'Drug/Narcotic Violations': 'Drug/Weapon Law Violations',
    'Weapon Law Violations': 'Drug/Weapon Law Violations',
    'Human Trafficking, Commercial Sex Acts': 'Human Trafficking or Abduction',
    'Kidnapping/Abduction': 'Human Trafficking or Abduction',
    'Betting/Wagering': 'Illegal Activities, Fraud and Identity Theft',
    'Embezzlement': 'Illegal Activities, Fraud and Identity Theft',
    'Counterfeiting/Forgery': 'Illegal Activities, Fraud and Identity Theft',
    'Bribery': 'Illegal Activities, Fraud and Identity Theft',
    'Impersonation': 'Illegal Activities, Fraud and Identity Theft',
    'Hacking/Computer Invasion': 'Illegal Activities, Fraud and Identity Theft',
    'Wire Fraud': 'Illegal Activities, Fraud and Identity Theft',
    'Welfare Fraud': 'Illegal Activities, Fraud and Identity Theft',
    'Identity Theft': 'Illegal Activities, Fraud and Identity Theft',
    'Credit Card/Automated Teller Machine Fraud': 'Illegal Activities, Fraud and Identity Theft',
    'False Pretenses/Swindle/Confidence Game': 'Illegal Activities, Fraud and Identity Theft',
    'Murder and Nonnegligent Manslaughter': 'Murder or Manslaughter',
    'Negligent Manslaughter': 'Murder or Manslaughter',
    'Extortion/Blackmail': 'Psychological Harassment',
    'Intimidation': 'Psychological Harassment',
    'Assisting or Promoting Prostitution': 'Sexual Harassment or Assault',
    'Fondling': 'Sexual Harassment or Assault',
    'Incest': 'Sexual Harassment or Assault',
    'Pornography/Obscene Material': 'Sexual Harassment or Assault',
    'Prostitution': 'Sexual Harassment or Assault',
    'Purchasing Prostitution': 'Sexual Harassment or Assault',
    'Rape': 'Sexual Harassment or Assault',
    'Sexual Assault With An Object': 'Sexual Harassment or Assault',
    'Sodomy': 'Sexual Harassment or Assault',
    'Statutory Rape': 'Sexual Harassment or Assault',
    'Not Specified': 'Other',
    'All Other Larceny': 'Other',
}

In [11]:
# Location Name
lmap = {
    "School-College/University": "Educational Buildings",
    "School/College": "Educational Buildings",
    "School-Elementary/Secondary": "Educational Buildings",
    "Government/Public Building": "Goverment Buildings",
    "Jail/Prison/Penitentiary/Corrections Facility": "Goverment Buildings",
    "Military Installation": "Goverment Buildings",
    "ATM Separate from Bank": "Highway, Road or Other Open Space",
    "Bank/Savings and Loan": "Highway, Road or Other Open Space",
    "Highway/Road/Alley/Street/Sidewalk": "Highway, Road or Other Open Space",
    "Parking/Drop Lot/Garage": "Highway, Road or Other Open Space",
    "Rest Area": "Highway, Road or Other Open Space",
    "Service/Gas Station": "Highway, Road or Other Open Space",
    "Abandoned/Condemned Structure": "Industrial",
    "Construction Site": "Industrial",
    "Industrial Site": "Industrial",
    "Amusement Park": "Public Places and Recreation Sites",
    "Arena/Stadium/Fairgrounds/Coliseum": "Public Places and Recreation Sites",
    "Camp/Campground": "Public Places and Recreation Sites",
    "Church/Synagogue/Temple/Mosque": "Public Places and Recreation Sites",
    "Field/Woods": "Public Places and Recreation Sites",
    "Lake/Waterway/Beach": "Public Places and Recreation Sites",
    "Park/Playground": "Public Places and Recreation Sites",
    "Community Center": "Residential or Community Center",
    "Daycare Facility": "Residential or Community Center",
    "Residence/Home": "Residential or Community Center",
    "Shelter-Mission/Homeless": "Residential or Community Center",
    "Tribal Lands": "Residential or Community Center",
    "Auto Dealership New/Used": "Store and Commercial Buildings",
    "Bar/Nightclub": "Store and Commercial Buildings",
    "Commercial/Office Building": "Store and Commercial Buildings",
    "Convenience Store": "Store and Commercial Buildings",
    "Cyberspace": "Store and Commercial Buildings",
    "Department/Discount Store": "Store and Commercial Buildings",
    "Drug Store/Doctor's Office/Hospital": "Store and Commercial Buildings",
    "Farm Facility": "Store and Commercial Buildings",
    "Gambling Facility/Casino/Race Track": "Store and Commercial Buildings",
    "Grocery/Supermarket": "Store and Commercial Buildings",
    "Hotel/Motel/Etc.": "Store and Commercial Buildings",
    "Liquor Store": "Store and Commercial Buildings",
    "Rental Storage Facility": "Store and Commercial Buildings",
    "Restaurant": "Store and Commercial Buildings",
    "Shopping Mall": "Store and Commercial Buildings",
    "Specialty Store": "Store and Commercial Buildings",
    "Air/Bus/Train Terminal": "Transportation Terminals",
    "Dock/Wharf/Freight/Modal Terminal": "Transportation Terminals",
    "Other/Unknown": "Other/Unknown",
}

In [12]:
# Bias Desc
bmap = {
    "Anti-Mental Disability": "Disability",
    "Anti-Physical Disability": "Disability",
    "Anti-Female": "Gender",
    "Anti-Gender Non-Conforming": "Gender Identity",
    "Anti-Heterosexual": "Gender Identity",
    "Anti-Male": "Gender Identity",
    "Anti-Transgender": "Gender Identity",
    "Anti-American Indian or Alaska Native": "Racial or Ethnicity",
    "Anti-Arab": "Racial or Ethnicity",
    "Anti-Asian": "Racial or Ethnicity",
    "Anti-Black or African American": "Racial or Ethnicity",
    "Anti-Hispanic or Latino": "Racial or Ethnicity",
    "Anti-Multiple Races, Group": "Racial or Ethnicity",
    "Anti-Native Hawaiian or Other Pacific Islander": "Racial or Ethnicity",
    "Anti-Other Race/Ethnicity/Ancestry": "Racial or Ethnicity",
    "Anti-White": "Racial or Ethnicity",
    "Anti-Atheism/Agnosticism": "Religion or Belief",
    "Anti-Buddhist": "Religion or Belief",
    "Anti-Catholic": "Religion or Belief",
    "Anti-Eastern Orthodox (Russian, Greek, Other)": "Religion or Belief",
    "Anti-Hindu": "Religion or Belief",
    "Anti-Islamic (Muslim)": "Religion or Belief",
    "Anti-Jehovah's Witness": "Religion or Belief",
    "Anti-Jewish": "Religion or Belief",
    "Anti-Mormon": "Religion or Belief",
    "Anti-Multiple Religions, Group": "Religion or Belief",
    "Anti-Other Christian": "Religion or Belief",
    "Anti-Other Religion": "Religion or Belief",
    "Anti-Protestant": "Religion or Belief",
    "Anti-Sikh": "Religion or Belief",
    "Anti-Bisexual": "Sexual Orientation",
    "Anti-Gay (Male)": "Sexual Orientation",
    "Anti-Lesbian (Female)": "Sexual Orientation",
    "Anti-Lesbian, Gay, Bisexual, or Transgender (Mixed Group)": "Sexual Orientation",
    "Unknown (offender's motivation not known)": "Unknown",
}

**3. Parse Multi-Value Columns**

Multi-value column will be normalized by creating another table. Data diagram for this implementation are shown as follow:

In [13]:
def parseMVC(df, column_name, cluster_map, label):
    value = []
    iid = df.columns.get_loc("INCIDENT_ID")
    cid = df.columns.get_loc(column_name)
    for col in df.values.tolist():
        for v in col[cid].split(';'):
            value.append([col[iid], v, cluster_map[v]])
    tdf = pd.DataFrame(value, columns=['Incident_Id', label, 'Cluster'])
    return tdf

In [14]:
offense_df = parseMVC(df, 'OFFENSE_NAME', omap, 'Offense')
location_df = parseMVC(df, 'LOCATION_NAME', lmap, 'Location')
bias_df = parseMVC(df, 'BIAS_DESC', bmap, 'Bias')

print(offense_df.shape)
offense_df.head(3)

(59856, 3)


Unnamed: 0,Incident_Id,Offense,Cluster
0,147593,Aggravated Assault,Assault
1,147594,Simple Assault,Assault
2,147595,Simple Assault,Assault


**4. Rename, Rearangge and Convert DataFrame to .csv**

Further analysis will be done on Excel. csv format is choos because it easier to processed.

In [15]:
used_columns = ['INCIDENT_ID', 'DATA_YEAR', 'STATE_NAME', 'REGION_NAME']
incident_df = df[used_columns]
incident_df.columns = ['Incident_Id', 'Year', 'State', 'Region']

print(incident_df.shape)
incident_df.head(3)

(57758, 4)


Unnamed: 0,Incident_Id,Year,State,Region
143578,147593,2010,Alaska,West
143579,147594,2010,Alaska,West
143580,147595,2010,Alaska,West


In [17]:
incident_df.to_csv('data/incidet.csv', index=False)
offense_df.to_csv('data/offense.csv', index=False)
location_df.to_csv('data/location.csv', index=False)
bias_df.to_csv('data/bias.csv', index=False)