# Machine Learning Model

## Approach
- Use as many fields as possible from shooter information
- Try and create cluster of shooters based on available information
- Create modelled individuals based on census data
- Train a ML model based on generated individuals (assumed safe users) and shooters (unsafe users)
- ML model is also called "BlackBox" model because the inner working may be less obvious
- Test accuracy and precision of model

## Hypothesis
- OpenBox will provide more subjective results, based on only a small subset of data
- BlackBox will provide more objective results (probability of a high risk individuals) based on more data
- BlackBox will be better at taking dependent variables into account while OpenBox assumes all variables to be fully independent

## Data Sources
- **Shooter information**: Peterson, J., & Densley, J. (2023). The Violence Project database of mass shootings in the United States (Version 7). https://www.theviolenceproject.org
- **Mental Illness Information**: States with the highest levels of mental health illness - NiceRx. https://worldpopulationreview.com/state-rankings/mental-health-statistics-by-state
- **Arrests by State**: Federal Bureau of Investigation (2018). https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-the-u.s.-2018/topic-pages/tables/table-69 (Data for Iowa based on 2019 figures due to lack of information in 2018)
- **Autism prevalence**: National Library of Medicine, J Autism Dev Disord. 2020 Dec; 50(12): 4258–4266. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128411/table/T2/
- **Other census datan**: U.S. Census Bureau (2018-2021). Accessed through `census` Python module API

### Import Dependencies

In [2]:
import pandas as pd
from pathlib import Path
from census import Census
from us import states

# Ignnore warning
import warnings
warnings.simplefilter(action='ignore')

# Import SQL Alchemy
from sqlalchemy import create_engine

# Import and establish Base for which classes will be constructed 
import sqlalchemy
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine, func
from sqlalchemy import desc

# Import modules to declare columns and column data types
from sqlalchemy import Column, Integer, String, Float, Boolean

# Local modules
import codebook
from codebook import replace_code_by_value
from config import api_census_key

## Step 1: Extract and Transform Shooter Data
### Import and keep only relevant columns

In [3]:
# Import shooter data
shooters_df = pd.read_csv(Path('../Datasets/clean_data/clean_shooters.csv'))

# Remove fields for which we have no information from a third-party source
shooter_profile = shooters_df[
        [
        'Age',
        'Gender',
        'Race',
        'Immigrant',
        'Education',
        'Relationship Status',
        'Employment Status',
        'Employment Type',
        'Military Service',
        'Highest Level of Justice System Involvement',
        'Parental Divorce / Separation',
        'Childhood SES',
        'Mental Illness',
        'Known Family Mental Health History',
        'Autism Spectrum',
        'Health Issues'
        ]]

shooter_profile.head(3)

Unnamed: 0,Age,Gender,Race,Immigrant,Education,Relationship Status,Employment Status,Employment Type,Military Service,Highest Level of Justice System Involvement,Parental Divorce / Separation,Childhood SES,Mental Illness,Known Family Mental Health History,Autism Spectrum,Health Issues
0,25,0,0.0,0.0,2.0,2.0,0,-1.0,1,4,0,1.0,1,0,0,1
1,18,0,0.0,0.0,0.0,0.0,0,-1.0,0,0,0,1.0,2,0,0,0
2,39,0,0.0,0.0,2.0,2.0,1,2.0,1,1,0,1.0,4,0,0,0


### Replace codes by explicit values

In [4]:
replace_code_by_value(shooter_profile, 'Gender', codebook.codes_shooter_background_gender)
replace_code_by_value(shooter_profile, 'Race', codebook.codes_shooter_background_race)
replace_code_by_value(shooter_profile, 'Immigrant', codebook.codes_shooter_background_immigrant)
replace_code_by_value(shooter_profile, 'Education', codebook.codes_shooter_background_education)
replace_code_by_value(shooter_profile, 'Relationship Status', codebook.codes_shooter_background_relationship)
replace_code_by_value(shooter_profile, 'Employment Status', codebook.codes_shooter_background_employstatus)
replace_code_by_value(shooter_profile, 'Employment Type', codebook.codes_shooter_background_employtype)
replace_code_by_value(shooter_profile, 'Military Service', codebook.codes_shooter_background_milservice)
replace_code_by_value(shooter_profile, 'Highest Level of Justice System Involvement', codebook.codes_shooter_crime_justice)
replace_code_by_value(shooter_profile, 'Parental Divorce / Separation', codebook.codes_shooter_trauma_divorce)
replace_code_by_value(shooter_profile, 'Childhood SES', codebook.codes_shooter_trauma_ses)
replace_code_by_value(shooter_profile, 'Mental Illness', codebook.codes_shooter_health_illness)
replace_code_by_value(shooter_profile, 'Known Family Mental Health History', codebook.codes_shooter_health_family)
replace_code_by_value(shooter_profile, 'Autism Spectrum', codebook.codes_shooter_health_autism)
replace_code_by_value(shooter_profile, 'Health Issues', codebook.codes_shooter_health_issues)

shooter_profile.head(3)

Unnamed: 0,Age,Gender,Race,Immigrant,Education,Relationship Status,Employment Status,Employment Type,Military Service,Highest Level of Justice System Involvement,Parental Divorce / Separation,Childhood SES,Mental Illness,Known Family Mental Health History,Autism Spectrum,Health Issues
0,25,Male,White,No,Some college/trade school,Married,Not working,Unknown,Yes,Convicted,No evidence,Middle class,Mood disorder,No evidence,No evidence,Yes
1,18,Male,White,No,Less than high school,Single,Not working,Unknown,No,,No evidence,Middle class,Thought disorder,No evidence,No evidence,No evidence
2,39,Male,White,No,Some college/trade school,Married,Working,In between,Yes,Suspected,No evidence,Middle class,Indication of psychiatric disorder but no diag...,No evidence,No evidence,No evidence


### Simplify complex fields

In [5]:
# Gender: replace non-male and non-female genders by 'Other
shooter_profile.loc[(shooter_profile['Gender'] != 'Male') & (shooter_profile['Gender'] != 'Female'),'Gender'] = 'Other'

# Race, replace less common races
other_races = ['Middle Eastern', 'Unknown', 'Native American']
for race in other_races:
    shooter_profile.loc[shooter_profile['Race'] == race, 'Race'] = 'Other'

# Employment status, replace empty field by 'Unknown'
shooter_profile.loc[shooter_profile['Employment Status'] == ' ', 'Employment Status'] = 'Unknown'

# Military service: if training not completed -> No
shooter_profile.loc[shooter_profile['Military Service'] == 'Joined but did not make it through training', 'Military Service'] = 'No'

# Change Highest Level of Justice System Involvement to Arrested or Not Arrested
shooter_profile = shooter_profile.rename(columns={'Highest Level of Justice System Involvement': 'Arrested'})
shooter_profile.loc[shooter_profile['Arrested'] == 'Arrested', 'Arrested'] = 'Yes'
shooter_profile.loc[shooter_profile['Arrested'] == 'Charged', 'Arrested'] = 'Yes'
shooter_profile.loc[shooter_profile['Arrested'] == 'Convicted', 'Arrested'] = 'Yes'
shooter_profile.loc[shooter_profile['Arrested'] == 'NA', 'Arrested'] = 'No'
shooter_profile.loc[shooter_profile['Arrested'] == 'Suspected', 'Arrested'] = 'No'

# Mental illness
shooter_profile.loc[shooter_profile['Mental Illness'] == 'Indication of psychiatric disorder but no diagnosis', 'Mental Illness'] = 'No evidence'
shooter_profile.loc[shooter_profile['Mental Illness'] != 'No evidence', 'Mental Illness'] = 'Yes'

# Parent mental illness
shooter_profile.loc[shooter_profile['Known Family Mental Health History'] != 'No evidence', 'Known Family Mental Health History'] = 'Yes'

### Rename columns with shorter names

In [6]:
shooter_profile = shooter_profile.rename(columns={
    'Relationship Status': 'RelStatus',
    'Employment Status': 'Employed',
    'Employment Type': 'Work',
    'Military Service': 'MilService',
    'Parental Divorce / Separation': 'ParentDivorce',
    'Childhood SES': 'SES',
    'Mental Illness': 'MentalIllness',
    'Known Family Mental Health History': 'MentalIllnessHistory',
    'Autism Spectrum': 'Autism',
    'Health Issues': 'HealthIssues'
})

shooter_profile.head(3)

Unnamed: 0,Age,Gender,Race,Immigrant,Education,RelStatus,Employed,Work,MilService,Arrested,ParentDivorce,SES,MentalIllness,MentalIllnessHistory,Autism,HealthIssues
0,25,Male,White,No,Some college/trade school,Married,Not working,Unknown,Yes,Yes,No evidence,Middle class,Yes,No evidence,No evidence,Yes
1,18,Male,White,No,Less than high school,Single,Not working,Unknown,No,No,No evidence,Middle class,Yes,No evidence,No evidence,No evidence
2,39,Male,White,No,Some college/trade school,Married,Working,In between,Yes,No,No evidence,Middle class,No evidence,No evidence,No evidence,No evidence


### Add classification column
- 1 = participated is mass shooting
- 0 = did not particpate

In [7]:
shooter_profile['Classification'] = 1
shooter_profile.head(3)

Unnamed: 0,Age,Gender,Race,Immigrant,Education,RelStatus,Employed,Work,MilService,Arrested,ParentDivorce,SES,MentalIllness,MentalIllnessHistory,Autism,HealthIssues,Classification
0,25,Male,White,No,Some college/trade school,Married,Not working,Unknown,Yes,Yes,No evidence,Middle class,Yes,No evidence,No evidence,Yes,1
1,18,Male,White,No,Less than high school,Single,Not working,Unknown,No,No,No evidence,Middle class,Yes,No evidence,No evidence,No evidence,1
2,39,Male,White,No,Some college/trade school,Married,Working,In between,Yes,No,No evidence,Middle class,No evidence,No evidence,No evidence,No evidence,1


### Save table as CSV
This data will be used to train the BlackBox model

In [9]:
csv_out = Path('../Model/model_blackbox_shooters.csv')
shooter_profile.to_csv(csv_out, index=False)

## Save table in SQL
- Table `shooters`: contains all the shooter data (same as the CSV above)
- Table `analysis`: will contain data that can be loaded and analysed from the Web App

In [10]:
# Get Base
Base = declarative_base()

# Create Shooter class
class Shooter(Base):
    __tablename__ = 'shooters'
    shooter_id = Column(Integer, primary_key=True)
    Age = Column(Integer)
    Gender = Column(String)
    Race = Column(String)
    Immigrant = Column(String)
    Education = Column(String)
    RelStatus = Column(String)
    Employed = Column(String)
    Work = Column(String)
    MilService = Column(String)
    Arrested = Column(String)
    ParentDivorce = Column(String)
    SES = Column(String)
    MentalIllness = Column(String)
    MentalIllnessHistory = Column(String)
    Autism = Column(String)
    HealthIssues = Column(String)
    Classification = Column(Integer)
    Probability = Column(Float)

# Create Analysis class (used to load data to classify)
class Analysis(Base):
    __tablename__ = 'analysis'
    shooter_id = Column(Integer, primary_key=True)
    Age = Column(Integer)
    Gender = Column(String)
    Race = Column(String)
    Immigrant = Column(String)
    Education = Column(String)
    RelStatus = Column(String)
    Employed = Column(String)
    Work = Column(String)
    MilService = Column(String)
    Arrested = Column(String)
    ParentDivorce = Column(String)
    SES = Column(String)
    MentalIllness = Column(String)
    MentalIllnessHistory = Column(String)
    Autism = Column(String)
    HealthIssues = Column(String)
    Classification = Column(Integer)
    Probability = Column(Float)
    

# Create a connection to a SQLite database
engine = create_engine('sqlite:///../Server/blackbox_db.sqlite')

# Create the tables within the database
Base.metadata.create_all(engine)

# Start session
session = Session(bind=engine)

In [11]:
# Loop through risk DataFrame and retrieve data
for index, row in shooter_profile.iterrows():
    # Add data to database
    session.add(Shooter(
        Age = int(shooter_profile.loc[index,'Age']),
        Gender = shooter_profile.loc[index,'Gender'],
        Race = shooter_profile.loc[index,'Race'],
        Immigrant = shooter_profile.loc[index,'Immigrant'],
        Education = shooter_profile.loc[index,'Education'],
        RelStatus = shooter_profile.loc[index,'RelStatus'],
        Employed = shooter_profile.loc[index,'Employed'],
        Work = shooter_profile.loc[index,'Work'],
        MilService = shooter_profile.loc[index,'MilService'],
        Arrested = shooter_profile.loc[index,'Arrested'],
        ParentDivorce = shooter_profile.loc[index,'ParentDivorce'],
        SES = shooter_profile.loc[index,'SES'],
        MentalIllness = shooter_profile.loc[index,'MentalIllness'],
        MentalIllnessHistory = shooter_profile.loc[index,'MentalIllnessHistory'],
        Autism = shooter_profile.loc[index,'Autism'],
        HealthIssues = shooter_profile.loc[index,'HealthIssues'],
        Classification = 1,
        Probability = 100.0
    ))

print(f"{len(shooter_profile)} rows ready for commit.")

193 rows ready for commit.


In [12]:
# Commit changes to session
session.commit()

In [13]:
# Close session
session.close()