# Machine Learning Model

## Approach
- Use as many fields as possible from shooter information
- Try and create cluster of shooters based on available information
- Create modelled individuals based on census data
- Train a ML model based on generated individuals (assumed safe users) and shooters (unsafe users)
- ML model is also called "BlackBox" model because the inner working may be less obvious
- Test accuracy and precision of model

## Hypothesis
- OpenBox will provide more subjective results, based on only a small subset of data
- BlackBox will provide more objective results (probability of a high risk individuals) based on more data
- BlackBox will be better at taking dependent variables into account while OpenBox assumes all variables to be fully independent

## Data Sources
- **Shooter information**: Peterson, J., & Densley, J. (2023). The Violence Project database of mass shootings in the United States (Version 7). https://www.theviolenceproject.org
- **Mental Illness Information**: States with the highest levels of mental health illness - NiceRx. https://worldpopulationreview.com/state-rankings/mental-health-statistics-by-state
- **Arrests by State**: Federal Bureau of Investigation (2018). https://ucr.fbi.gov/crime-in-the-u.s/2018/crime-in-the-u.s.-2018/topic-pages/tables/table-69 (Data for Iowa based on 2019 figures due to lack of information in 2018)
- **Autism prevalence**: National Library of Medicine, J Autism Dev Disord. 2020 Dec; 50(12): 4258–4266. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9128411/table/T2/
- **Other census datan**: U.S. Census Bureau (2018-2021). Accessed through `census` Python module API

### Import Dependencies

In [14]:
import pandas as pd
from pathlib import Path
import sys

# Ignnore warning
import warnings
warnings.simplefilter(action='ignore')

# Local modules
sys.path.append("../Local_Modules/")
import codebook
from codebook import replace_code_by_value

## Step 1: Transform Shooter Data for Clustering
### Import and keep only relevant columns

In [15]:
# Import shooter data
shooters_df = pd.read_csv(Path('../clean_data/clean_shooters.csv'))

# Remove fields for which we have no information from a third-party source
shooter_profile = shooters_df[
        [
        'Age',
        'Gender',
        'Race',
        'Immigrant',
        'Education',
        'Relationship Status',
        'Employment Status',
        'Employment Type',
        'Military Service',
        'Highest Level of Justice System Involvement',
        'Parental Divorce / Separation',
        'Childhood SES',
        'Mental Illness',
        'Known Family Mental Health History',
        'Autism Spectrum',
        'Health Issues'
        ]]

shooter_profile.head(3)

Unnamed: 0,Age,Gender,Race,Immigrant,Education,Relationship Status,Employment Status,Employment Type,Military Service,Highest Level of Justice System Involvement,Parental Divorce / Separation,Childhood SES,Mental Illness,Known Family Mental Health History,Autism Spectrum,Health Issues
0,25,0,0.0,0.0,2.0,2.0,0,-1.0,1,4,0,1.0,1,0,0,1
1,18,0,0.0,0.0,0.0,0.0,0,-1.0,0,0,0,1.0,2,0,0,0
2,39,0,0.0,0.0,2.0,2.0,1,2.0,1,1,0,1.0,4,0,0,0


### Replace codes by explicit values

In [16]:
replace_code_by_value(shooter_profile, 'Gender', codebook.codes_shooter_background_gender)
replace_code_by_value(shooter_profile, 'Race', codebook.codes_shooter_background_race)
replace_code_by_value(shooter_profile, 'Immigrant', codebook.codes_shooter_background_immigrant)
replace_code_by_value(shooter_profile, 'Education', codebook.codes_shooter_background_education)
replace_code_by_value(shooter_profile, 'Relationship Status', codebook.codes_shooter_background_relationship)
replace_code_by_value(shooter_profile, 'Employment Status', codebook.codes_shooter_background_employstatus)
replace_code_by_value(shooter_profile, 'Employment Type', codebook.codes_shooter_background_employtype)
replace_code_by_value(shooter_profile, 'Military Service', codebook.codes_shooter_background_milservice)
replace_code_by_value(shooter_profile, 'Highest Level of Justice System Involvement', codebook.codes_shooter_crime_justice)
replace_code_by_value(shooter_profile, 'Parental Divorce / Separation', codebook.codes_shooter_trauma_divorce)
replace_code_by_value(shooter_profile, 'Childhood SES', codebook.codes_shooter_trauma_ses)
replace_code_by_value(shooter_profile, 'Mental Illness', codebook.codes_shooter_health_illness)
replace_code_by_value(shooter_profile, 'Known Family Mental Health History', codebook.codes_shooter_health_family)
replace_code_by_value(shooter_profile, 'Autism Spectrum', codebook.codes_shooter_health_autism)
replace_code_by_value(shooter_profile, 'Health Issues', codebook.codes_shooter_health_issues)

shooter_profile.head(3)

Unnamed: 0,Age,Gender,Race,Immigrant,Education,Relationship Status,Employment Status,Employment Type,Military Service,Highest Level of Justice System Involvement,Parental Divorce / Separation,Childhood SES,Mental Illness,Known Family Mental Health History,Autism Spectrum,Health Issues
0,25,Male,White,No,Some college/trade school,Married,Not working,Unknown,Yes,Convicted,No evidence,Middle class,Mood disorder,No evidence,No evidence,Yes
1,18,Male,White,No,Less than high school,Single,Not working,Unknown,No,,No evidence,Middle class,Thought disorder,No evidence,No evidence,No evidence
2,39,Male,White,No,Some college/trade school,Married,Working,In between,Yes,Suspected,No evidence,Middle class,Indication of psychiatric disorder but no diag...,No evidence,No evidence,No evidence


### Simplify complex fields

In [37]:
shooter_profile['Health Issues'].value_counts()

No evidence    148
Yes             45
Name: Health Issues, dtype: int64

In [42]:
# Gender: replace non-male and non-female genders by 'Other
shooter_profile.loc[(shooter_profile['Gender'] != 'Male') & (shooter_profile['Gender'] != 'Female'),'Gender'] = 'Other'

# Race, replace less common races
other_races = ['Middle Eastern', 'Unknown', 'Native American']
for race in other_races:
    shooter_profile.loc[shooter_profile['Race'] == race, 'Race'] = 'Other'

# Employment status, replace empty field by 'Unknown'
shooter_profile.loc[shooter_profile['Employment Status'] == ' ', 'Employment Status'] = 'Unknown'

# Military service: if training not completed -> No
shooter_profile.loc[shooter_profile['Military Service'] == 'Joined but did not make it through training', 'Military Service'] = 'No'

# Change Highest Level of Justice System Involvement to Arrested or Not Arrested
shooter_profile = shooter_profile.rename(columns={'Highest Level of Justice System Involvement': 'Arrested'})
shooter_profile.loc[shooter_profile['Arrested'] == 'Arrested', 'Arrested'] = 'Yes'
shooter_profile.loc[shooter_profile['Arrested'] == 'Charged', 'Arrested'] = 'Yes'
shooter_profile.loc[shooter_profile['Arrested'] == 'Convicted', 'Arrested'] = 'Yes'
shooter_profile.loc[shooter_profile['Arrested'] == 'NA', 'Arrested'] = 'No'
shooter_profile.loc[shooter_profile['Arrested'] == 'Suspected', 'Arrested'] = 'No'

# Mental illness
shooter_profile.loc[shooter_profile['Mental Illness'] == 'Indication of psychiatric disorder but no diagnosis', 'Mental Illness'] = 'No evidence'
shooter_profile.loc[shooter_profile['Mental Illness'] != 'No evidence', 'Mental Illness'] = 'Yes'

# Parent mental illness
shooter_profile.loc[shooter_profile['Known Family Mental Health History'] != 'No evidence', 'Known Family Mental Health History'] = 'Yes'

### Rename columns with shorter names

In [60]:
shooter_profile = shooter_profile.rename(columns={
    'Relationship Status': 'RelStatus',
    'Employment Status': 'Employed',
    'Employment Type': 'Work',
    'Military Service': 'MilService',
    'Parental Divorce / Separation': 'ParentDivorce',
    'Childhood SES': 'SES',
    'Mental Illness': 'MentalIllness',
    'Known Family Mental Health History': 'MentalIllnessHistory',
    'Autism Spectrum': 'Autism',
    'Health Issues': 'HealthIssues'
})

shooter_profile.head(3)

Unnamed: 0,Age,Gender,Race,Immigrant,Education,RelStatus,Employed,Work,MilService,Arrested,ParentDivorce,SES,MentalIllness,MentalIllnessHistory,Autism,HealthIssues
0,25,Male,White,No,Some college/trade school,Married,Not working,Unknown,Yes,Yes,No evidence,Middle class,Yes,No evidence,No evidence,Yes
1,18,Male,White,No,Less than high school,Single,Not working,Unknown,No,No,No evidence,Middle class,Yes,No evidence,No evidence,No evidence
2,39,Male,White,No,Some college/trade school,Married,Working,In between,Yes,No,No evidence,Middle class,No evidence,No evidence,No evidence,No evidence


### Add classification column
- 1 = participated is mass shooting
- 0 = did not particpate

In [61]:
shooter_profile['Classification'] = 1
shooter_profile.head(3)

Unnamed: 0,Age,Gender,Race,Immigrant,Education,RelStatus,Employed,Work,MilService,Arrested,ParentDivorce,SES,MentalIllness,MentalIllnessHistory,Autism,HealthIssues,Classification
0,25,Male,White,No,Some college/trade school,Married,Not working,Unknown,Yes,Yes,No evidence,Middle class,Yes,No evidence,No evidence,Yes,1
1,18,Male,White,No,Less than high school,Single,Not working,Unknown,No,No,No evidence,Middle class,Yes,No evidence,No evidence,No evidence,1
2,39,Male,White,No,Some college/trade school,Married,Working,In between,Yes,No,No evidence,Middle class,No evidence,No evidence,No evidence,No evidence,1


### Save table as CSV

In [62]:
csv_out = Path('model_blackbox_shooters.csv')
shooter_profile.to_csv(csv_out, index=False)

In [64]:
blackbox_fields = shooter_profile.columns
blackbox_fields

Index(['Age', 'Gender', 'Race', 'Immigrant', 'Education', 'RelStatus',
       'Employed', 'Work', 'MilService', 'Arrested', 'ParentDivorce', 'SES',
       'MentalIllness', 'MentalIllnessHistory', 'Autism', 'HealthIssues',
       'Classification'],
      dtype='object')

## Step 2: load census data to compare shooters to the general population
### Employment, income, family, age, gender, race

In [67]:
len(renamed_variables)

44

In [73]:
from census import Census
from us import states

c1 = Census("01518d8c4cd0a768de3d4e54c9b2e07b24400070", year=2018)

# Data for male only (acs5dp = 5-year estimates, dataprofiles)
# Variables: https://api.census.gov/data/2018/acs/acs5/profile/variables.html
census_variables = [
    'DP03_0004PE',
    'DP03_0005PE',
    'DP03_0006PE',
    'DP03_0007PE',
    'DP03_0047PE',
    'DP03_0048PE',
    'DP03_0049PE',
    'DP03_0050PE',
    'DP03_0052PE',
    'DP03_0053PE',
    'DP03_0054PE',
    'DP03_0055PE',
    'DP03_0056PE',
    'DP03_0057PE',
    'DP03_0058PE',
    'DP03_0059PE',
    'DP03_0060PE',
    'DP03_0061PE',
    'DP03_0075PE',
    'DP03_0089PE',
    'DP03_0133PE',
    'DP05_0005PE',
    'DP05_0006PE',
    'DP05_0007PE',
    'DP05_0008PE',
    'DP05_0009PE',
    'DP05_0010PE',
    'DP05_0011PE',
    'DP05_0012PE',
    'DP05_0013PE',
    'DP05_0014PE',
    'DP05_0015PE',
    'DP05_0016PE',
    'DP05_0017PE',
    'DP05_0028E',
    'DP05_0037PE',
    'DP05_0038PE',
    'DP05_0039PE',
    'DP05_0044PE',
    'DP05_0052PE',
    'DP05_0057PE',
    'DP05_0058PE',
    'DP05_0071PE',
    'DP05_0077PE',
    'DP05_0078PE',
    'DP05_0079PE',
    'DP05_0080PE',
    'DP05_0081PE',
    'DP05_0082PE',
    'DP05_0083PE'
]

renamed_variables = [
    'Employed_Employed',
    'Employed_Unemployed',
    'Employed_Military',
    'Employed_NotLabour',
    'Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Private wage and salary workers',
    'Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Government workers',
    'Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Self-employed in own not incorporated business workers',
    'Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Unpaid family workers',
    'SES less than $10,000',
    'SES between $10,000 to $14,999',
    'SES between $15,000 to $24,999',
    'SES between $25,000 to $34,999',
    'SES between $35,000 to $49,999',
    'SES between $50,000 to $74,999',
    'SES between $75,000 to $99,999',
    'SES between $100,000 to $149,999',
    'SES between $150,000 to $199,999',
    'SES $200,000 or more',
    'Families households',
    'Nonfamily households',
    'SES Below poverty line',
    'Age Under 5 years',
    'Age 5 to 9 years',
    'Age 10 to 14 years',
    'Age 15 to 19 years',
    'Age 20 to 24 years',
    'Age 25 to 34 years',
    'Age 35 to 44 years',
    'Age 45 to 54 years',
    'Age 55 to 59 years',
    'Age 60 to 64 years',
    'Age 65 to 74 years',
    'Age 75 to 84 years',
    'Age 85 years and over',
    'Gender ratio (males per 100 females)',
    'Race White',
    'Race Black or African American',
    'Race American Indian and Alaska Native',
    'Race Asian',
    'Race Native Hawaiian and Other Pacific Islander',
    'Race Some other race',
    'Race Two or more races',
    'Race Hispanic or Latino (of any race)',
    'Race Not Hispanic or Latino!!White alone',
    'Race Not Hispanic or Latino!!Black or African American alone',
    'Race Not Hispanic or Latino!!American Indian and Alaska Native alone',
    'Race Not Hispanic or Latino!!Asian alone',
    'Race Not Hispanic or Latino!!Native Hawaiian and Other Pacific Islander alone',
    'Race Not Hispanic or Latino!!Some other race alone',
    'Race Not Hispanic or Latino!!Two or more races'
]

census_data = c1.acs5dp.get(tuple(census_variables),{'for': 'state:*'})

# Convert to DataFrame
census1_pd = pd.DataFrame(census_data)

# Rename columns
for i in range(len(census_variables)):
    census1_pd = census1_pd.rename(columns={census_variables[i]: renamed_variables[i]})

# Rename states with two-letter acronym
for index, row in census1_pd.iterrows():
    try:
        census1_pd.loc[index,'state'] = states.lookup(census1_pd.loc[index,'state']).abbr
    except:
        census1_pd.loc[index,'state'] = 'DC'



In [77]:
census1_pd.head(3)

Unnamed: 0,Employed_Employed,Employed_Unemployed,Employed_Military,Employed_NotLabour,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Private wage and salary workers,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Government workers,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Self-employed in own not incorporated business workers,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Unpaid family workers,"SES less than $10,000","SES between $10,000 to $14,999",...,Race Hispanic or Latino (of any race),Race Not Hispanic or Latino!!White alone,Race Not Hispanic or Latino!!Black or African American alone,Race Not Hispanic or Latino!!American Indian and Alaska Native alone,Race Not Hispanic or Latino!!Asian alone,Race Not Hispanic or Latino!!Native Hawaiian and Other Pacific Islander alone,Race Not Hispanic or Latino!!Some other race alone,GEO_ID,state,Race Not Hispanic or Latino!!Two or more races
0,53.3,3.8,0.3,42.6,79.3,15.3,5.2,0.2,8.8,6.3,...,4.2,65.7,26.4,0.5,1.3,0.0,0.2,0400000US01,AL,1.7
1,61.4,4.9,3.0,30.7,67.8,25.5,6.4,0.3,4.0,3.0,...,6.9,61.0,3.1,14.0,6.2,1.2,0.2,0400000US02,AK,7.4
2,55.4,3.8,0.3,40.5,80.1,13.7,6.1,0.2,6.8,4.4,...,31.1,55.1,4.1,3.9,3.2,0.2,0.1,0400000US04,AZ,2.2


### Education, disability, immigration, marital status

In [76]:
c2 = Census("01518d8c4cd0a768de3d4e54c9b2e07b24400070", year=2018)

# Variables: https://api.census.gov/data/2018/acs/acs5/profile/variables.html
census_variables = [
    'DP02_0059PE',
    'DP02_0060PE',
    'DP02_0061PE',
    'DP02_0062PE',
    'DP02_0063PE',
    'DP02_0064PE',
    'DP02_0065PE',
    'DP02_0066PE',
    'DP02_0067PE',
    'DP02_0071PE',
    'DP02_0094PE',
    'DP02_0095PE',
    'DP02_0092PE',
    'DP02_0025PE',
    'DP02_0026PE',
    'DP02_0027PE',
    'DP02_0028PE',
    'DP02_0029PE',
    'DP02_0031PE',
    'DP02_0032PE',
    'DP02_0033PE',
    'DP02_0034PE',
    'DP02_0035PE'
]

renamed_variables = [
    'Education Less than 9th grade',
    'Education 9th to 12th grade, no diploma', 
    'Education High school graduate (includes equivalency)', 
    'Education Some college, no degree', 
    "Education Associate's degree", 
    "Education Bachelor's degree", 
    "Education Graduate or professional degree", 
    "Education High school graduate or higher", 
    "Education Bachelor's degree or higher",
    "HealthIssues Total Civilian Noninstitutionalized Population!!With a disability",
    "Immigrant Naturalized U.S. citizen",
    "Immigrant Not a U.S. citizen",
    "Immigrant Foreign born",
    "RelStatus Male Never married",
    "RelStatus Male Now married, except separated",
    "RelStatus Male Separated",
    "RelStatus Male Widowed",
    "RelStatus Male Divorced",
    "RelStatus Female Never married",
    "RelStatus Female Now married, except separated",
    "RelStatus Female Separated",
    "RelStatus Female Widowed",
    "RelStatus Femaleale Divorced"
]

census_data = c2.acs1dp.get(tuple(census_variables),{'for': 'state:*'})

# Convert to DataFrame
census2_pd = pd.DataFrame(census_data)

# Rename columns
for i in range(len(census_variables)):
    census2_pd = census2_pd.rename(columns={census_variables[i]: renamed_variables[i]})

# Rename states with two-letter acronym
for index, row in census2_pd.iterrows():
    try:
        census2_pd.loc[index,'state'] = states.lookup(census2_pd.loc[index,'state']).abbr
    except:
        census2_pd.loc[index,'state'] = 'DC'

In [78]:
census2_pd.head(3)

Unnamed: 0,Education Less than 9th grade,"Education 9th to 12th grade, no diploma",Education High school graduate (includes equivalency),"Education Some college, no degree",Education Associate's degree,Education Bachelor's degree,Education Graduate or professional degree,Education High school graduate or higher,Education Bachelor's degree or higher,HealthIssues Total Civilian Noninstitutionalized Population!!With a disability,...,"RelStatus Male Now married, except separated",RelStatus Male Separated,RelStatus Male Widowed,RelStatus Male Divorced,RelStatus Female Never married,"RelStatus Female Now married, except separated",RelStatus Female Separated,RelStatus Female Widowed,RelStatus Femaleale Divorced,state
0,2.7,5.2,30.6,20.5,11.0,19.4,10.6,92.1,30.0,11.6,...,50.8,0.9,2.6,10.1,29.1,49.9,1.2,8.4,11.4,WI
1,1.8,4.9,28.8,26.4,11.2,16.9,10.0,93.3,26.9,12.8,...,52.7,1.4,3.3,12.2,24.7,53.1,1.3,7.6,13.3,WY
2,,,,,,,,,,,...,,,,,,,,,,PR


### Merge census data and drop Puerto Rico data

In [91]:
census_pd = pd.merge(census1_pd, census2_pd, how='outer', on='state')

# Drop row for Puerto Rico (PR)
census_pd = census_pd[census_pd['state'] != 'PR']

# Rename state to State
census_pd = census_pd.rename(columns={'state': 'State'})

census_pd.head(3)

Unnamed: 0,Employed_Employed,Employed_Unemployed,Employed_Military,Employed_NotLabour,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Private wage and salary workers,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Government workers,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Self-employed in own not incorporated business workers,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Unpaid family workers,"SES less than $10,000","SES between $10,000 to $14,999",...,RelStatus Male Never married,"RelStatus Male Now married, except separated",RelStatus Male Separated,RelStatus Male Widowed,RelStatus Male Divorced,RelStatus Female Never married,"RelStatus Female Now married, except separated",RelStatus Female Separated,RelStatus Female Widowed,RelStatus Femaleale Divorced
0,53.3,3.8,0.3,42.6,79.3,15.3,5.2,0.2,8.8,6.3,...,34.1,49.3,1.9,3.1,11.6,28.1,45.7,2.7,10.3,13.2
1,61.4,4.9,3.0,30.7,67.8,25.5,6.4,0.3,4.0,3.0,...,37.3,49.2,1.6,2.0,9.9,29.5,50.3,1.4,5.9,12.9
2,55.4,3.8,0.3,40.5,80.1,13.7,6.1,0.2,6.8,4.4,...,36.7,48.0,1.5,2.7,11.1,30.0,46.2,2.0,8.5,13.3


## Step 3: get mental illness, arrest and autism data from other data sources

In [88]:
# Import mental illness data per state
csv = Path('../raw_data/mental-health-statistics-by-state-[updated-may-2023].csv')
mentalillness_df = pd.read_csv(csv)

# Change state name with 2-letter state code
for key in codebook.codes_states.keys():
    mentalillness_df.loc[mentalillness_df['state']==key,'state'] = codebook.codes_states[key]

# Rename columns and keep only state and mental illness rate columns
mentalillness_df = mentalillness_df.rename(columns={'state': 'State', 'RatesOfMentalIllness':'Mental_Illness_Rate'})
mentalillness_df = mentalillness_df[['State', 'Mental_Illness_Rate']]
mentalillness_df.head(3)

Unnamed: 0,State,Mental_Illness_Rate
0,UT,29.68
1,OR,27.33
2,WV,26.05


In [89]:
# Import arrest data
csv = Path('../raw_data/FBI_2018_crimeByStates.csv')
arrests_df = pd.read_csv(csv)

# Change state name with 2-letter state code
for key in codebook.codes_states.keys():
    arrests_df.loc[arrests_df['State']==key,'State'] = codebook.codes_states[key]

# Delete empty rows
arrests_df = arrests_df.dropna(how='any')

# Recalculate Rate
arrests_df['Rate'] = 100*arrests_df['Arrests']/arrests_df['Population']

# Rename columns and keep only state and mental illness rate columns
arrests_df = arrests_df.rename(columns={'Rate':'Arrest_Rate'})
arrests_df = arrests_df[['State', 'Arrest_Rate']]

arrests_df.head(3)

Unnamed: 0,State,Arrest_Rate
0,AL,2.586075
1,AK,4.152213
2,AZ,3.580559


In [90]:
# Import Autism Data
csv = Path('../clean_data/clean_autism.csv')
autism_df = pd.read_csv(csv)

# Keep only State and Prevalence column (rename)
autism_df = autism_df.rename(columns={'Prevalence':'Autism_Rate'})
autism_df = autism_df[['State', 'Autism_Rate']]

autism_df.head(3)

Unnamed: 0,State,Autism_Rate
0,AL,2.12
1,AK,2.19
2,AZ,2.29


## Step 4: Merge all statistics

In [92]:
statistics_df = pd.merge(census_pd, mentalillness_df, how='inner', on='State')
statistics_df = pd.merge(statistics_df, arrests_df, how='inner', on='State')
statistics_df = pd.merge(statistics_df, autism_df, how='inner', on='State')

statistics_df

Unnamed: 0,Employed_Employed,Employed_Unemployed,Employed_Military,Employed_NotLabour,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Private wage and salary workers,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Government workers,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Self-employed in own not incorporated business workers,Percent Estimate!!CLASS OF WORKER!!Civilian employed population 16 years and over!!Unpaid family workers,"SES less than $10,000","SES between $10,000 to $14,999",...,RelStatus Male Widowed,RelStatus Male Divorced,RelStatus Female Never married,"RelStatus Female Now married, except separated",RelStatus Female Separated,RelStatus Female Widowed,RelStatus Femaleale Divorced,Mental_Illness_Rate,Arrest_Rate,Autism_Rate
0,53.3,3.8,0.3,42.6,79.3,15.3,5.2,0.2,8.8,6.3,...,3.1,11.6,28.1,45.7,2.7,10.3,13.2,21.24,2.586075,2.12
1,61.4,4.9,3.0,30.7,67.8,25.5,6.4,0.3,4.0,3.0,...,2.0,9.9,29.5,50.3,1.4,5.9,12.9,22.2,4.152213,2.19
2,55.4,3.8,0.3,40.5,80.1,13.7,6.1,0.2,6.8,4.4,...,2.7,11.1,30.0,46.2,2.0,8.5,13.3,23.89,3.580559,2.29
3,54.7,3.2,0.2,42.0,78.4,15.3,6.2,0.2,8.1,6.5,...,3.0,11.9,26.5,46.7,2.4,10.2,14.3,22.61,3.989615,2.03
4,58.9,4.2,0.4,36.5,78.3,13.4,8.1,0.2,5.1,4.4,...,2.2,7.5,33.9,45.3,2.4,7.7,10.8,20.49,2.7633,2.36
5,64.4,3.2,0.8,31.7,80.3,13.3,6.2,0.2,5.1,3.5,...,2.2,10.3,28.9,49.5,1.7,6.7,13.2,23.16,3.392395,2.28
6,61.9,4.3,0.3,33.5,80.8,12.8,6.3,0.2,5.2,3.6,...,2.5,9.3,32.0,45.3,1.5,8.8,12.4,18.77,2.678925,2.37
7,58.6,3.7,0.4,37.3,81.3,14.5,4.1,0.1,5.6,3.6,...,3.0,8.8,31.1,45.4,1.9,8.5,13.0,20.52,2.97176,2.26
8,54.7,3.7,0.3,41.3,82.2,11.7,5.9,0.2,6.8,4.7,...,3.2,11.4,28.7,43.9,2.4,10.2,14.8,17.49,3.358905,2.03
9,58.4,4.0,0.6,37.0,79.9,14.5,5.4,0.2,7.3,4.8,...,2.4,9.7,32.4,44.2,2.5,8.2,12.7,17.55,1.907348,2.25


### Save statistics as CSV

In [None]:
csv_out = Path('model_blackbox_stats.csv')
statistics_df.to_csv(csv_out, index=False)