<div style='background-color: orange'>
<a id="TableOfContents"></a>
    <h1 style='text-align: center'>
        <b><i>
            TABLE OF CONTENTS:
        </i></b></h1>
<li><a href='#imports'>Imports</a></li>
<li><a href="#acquire">Acquire</a></li>
<li><a href='#prepare'>Prepare</a></li>
<li><a href="#wrangle">Wrangle</a></li>
<li><a href='#misc'>Miscellaneous</a></li>

<div style='background-color: orange'>
<a id="imports"></a>
    <h1 style='text-align: center'>
        <b><i>
            Imports
        </i></b></h1>
<li><a href='#TableOfContents'>Table of Contents</a></li>

In [184]:
# Vectorization and tables
import numpy as np
import pandas as pd

# Ignore Warnings
import warnings
warnings.filterwarnings('ignore')

# .py files
import wrangle as w

<div style='background-color: orange'>
<a id="acquire"></a>
    <h1 style='text-align: center'>
        <b><i>
            Acquire
        </i></b></h1>
<li><a href='#TableOfContents'>Table of Contents</a></li>

Acquire everything from the vanilla mass_shooters database via excel sheet

- mass_shooters Vanilla Shape:
    - Rows: 189
    - Columns: 153

In [144]:
# Acquire the vanilla data
mass_shooters = pd.read_excel('mass_shooters.xlsx', sheet_name='Full Database', header=1)
mass_shooters.shape

(189, 153)

In [145]:
# Verify .py file functionality
mass_shooters_py = w.acquire_mass_shooters()
mass_shooters_py.shape

(189, 153)

<div style='background-color: orange'>
<a id="prepare"></a>
    <h1 style='text-align: center'>
        <b><i>
            Prepare
        </i></b></h1>
<li><a href='#TableOfContents'>Table of Contents</a></li>
<li><a href='#preparedrop'>Drop Data</a></li>
<li><a href='#preparenull'>Null Handling</a></li>
<li><a href='#preparedtypes'>Dtype Cleaning</a></li>
<li><a href='#preparedisseminate'>Disseminate Column Information</a></li>
<li><a href='#prepareaggregate'>Aggregate Column Creation</a></li>
<li><a href='#preparetext'>Text Modifications</a></li>
<li><a href='#preparesummary'>Summary of Preparation</a></li>

<a id='preparedrop'></a>
<h3><b><i>
    Drop Data
</i></b></h3>
<li><a href='#prepare'>Prepare Top</a></li>

- 1 Row
    - Missing majority of info
- 47 columns
    - Useless in scope of predictive value
    - Percent nulls above 20%

In [146]:
# Remove row '145, 146' (Too many nulls)
mass_shooters = mass_shooters.drop(mass_shooters[mass_shooters['Case #'] == '145, 146'].index)

In [147]:
# Drop cols that aren't necessary
# Generally anything that would've been known during
# and/or after the shooting
# 29 columns
drop_perpetratorname_cols = [
    'Shooter Last Name',
    'Shooter First Name'
]

drop_date_cols = [
    'Full Date'
]

drop_location_cols = [
    'Street Number',
    'Street Name',
    'Zip Code',
    'Latitude',
    'Longitude',
    'State Code',
    'Region',
    'Metro/Micro Statistical Area Type',
    'Location',
    'Insider or Outsider',
    'Workplace Shooting',
    'Multiple Locations',
    'Other Location',
    'Armed Person on Scene',
    'Specify Armed Person'
]

drop_victim_cols = [
    'Family Member Victim',
    'Romantic Partner Victim',
    'Kidnapping or Hostage Situation'
]

drop_weapons_cols = [
    'Total Firearms Brought to the Scene',
    'Other Weapons or Gear',
    'Specify Other Weapons or Gear',
]

drop_resolutionofcase_cols = [
    'On-Scene Outcome',
    'Who Killed Shooter On Scene',
    'Attempt to Flee',
    'Insanity Defense',
    'Criminal Sentence'
]

In [148]:
# Drop the above cols
mass_shooters = mass_shooters.drop(columns=drop_perpetratorname_cols)
mass_shooters = mass_shooters.drop(columns=drop_date_cols)
mass_shooters = mass_shooters.drop(columns=drop_location_cols)
mass_shooters = mass_shooters.drop(columns=drop_victim_cols)
mass_shooters = mass_shooters.drop(columns=drop_weapons_cols)
mass_shooters = mass_shooters.drop(columns=drop_resolutionofcase_cols)

In [149]:
mass_shooters.shape

(188, 124)

In [150]:
# Drop cols based off of a null percent cutoff
# 21 columns
mass_shooters, drop_null_pct_dict = drop_nullpct(mass_shooters, 0.20)

In [151]:
# Get list and percentages of everything dropped
pd.DataFrame(drop_null_pct_dict)

Unnamed: 0,column_name,percent_null
0,Height,0.702128
1,Weight,0.760638
2,Religion,0.505319
3,Education,0.265957
4,School Performance,0.531915
5,School Performance Specified,0.526596
6,Birth Order,0.420213
7,Number of Siblings,0.255319
8,Older Siblings,0.43617
9,Younger Siblings,0.446809


In [152]:
mass_shooters.shape

(188, 103)

---

<a id='preparenull'></a>
<h3><b><i>
    Null Handling
</i></b></h3>
<li><a href='#prepare'>Prepare Top</a></li>

45 columns out of 106 columns with nulls:

- Fill with mode
    - 44 columns
- Fill uniquely
    - 1 columns ('None')

In [153]:
# Identify cols with nulls
has_nulls = check_nulls(mass_shooters)
len(has_nulls)

45

In [154]:
# Filling nulls
mass_shooters['Signs of Crisis Expanded'] = mass_shooters['Signs of Crisis Expanded'].fillna('None')
for col in has_nulls:
    mass_shooters[col] = mass_shooters[col].fillna(mass_shooters[col].mode()[0])

In [155]:
# Recheck for any nulls
has_nulls_verify = check_nulls(mass_shooters)
len(has_nulls_verify)

0

---

<a id='preparedtypes'></a>
<h3><b><i>
    Dtype Cleaning
</i></b></h3>
<li><a href='#prepare'>Prepare Top</a></li>

87 columns out of 106 columns adjusted:
- Object
    - 4 columns to int
    - 3 columns fixed
         - 'Day',
             - '19-20' == 19
         - 'Race',
             - 'Moroccan' == 6
             - 'Bosnian' == 7
         - 'Criminal Record',
             - '1`' == 1
- Float
    - 79 columns to int
    - 3 columns fixed
         - 'Gender',
             - 3.0 == 0
         - 'Children',
             - 5.0 == 0
         - 'History of Domestic Abuse',
             - 3.0 == 2
- Int
    - Nothing changed

In [156]:
# Ensure there are no ' ' values...
for col in mass_shooters.columns:
    if mass_shooters[col].dtype == 'object':
        if mass_shooters[col].apply(lambda x: isinstance(x, str) and x.isspace()).any():
            mass_shooters[col].replace(r'^\s*$', np.nan, regex=True, inplace=True)
            mass_shooters[col].fillna(mass_shooters[col].mode()[0], inplace=True)

In [158]:
# Change Object Columns
mass_shooters['Case #'] = mass_shooters['Case #'].astype(int)
mass_shooters['Day'] = np.where(mass_shooters['Day'] == '19-20', 19, mass_shooters['Day'])
mass_shooters['Day'] = mass_shooters['Day'].astype(int)
mass_shooters['Race'] = np.where(mass_shooters['Race'] == 'Moroccan', 6, mass_shooters['Race'])
mass_shooters['Race'] = np.where(mass_shooters['Race'] == 'Bosnian', 7, mass_shooters['Race'])
mass_shooters['Race'] = mass_shooters['Race'].astype(int)
mass_shooters['Criminal Record'] = np.where(mass_shooters['Criminal Record'] == '1`', 1, mass_shooters['Criminal Record'])
mass_shooters['Criminal Record'] = mass_shooters['Criminal Record'].astype(int)

In [229]:
# Make preliminary changes before astype changes
mass_shooters['Gender'] = np.where(mass_shooters['Gender'] == 3.0, 0, mass_shooters['Gender'])
mass_shooters['Children'] = np.where(mass_shooters['Children'] == 5.0, 0, mass_shooters['Children'])
mass_shooters['History of Domestic Abuse'] = np.where(mass_shooters['History of Domestic Abuse'] == 3.0, 2, mass_shooters['History of Domestic Abuse'])

In [160]:
# All floats confirmed ready to change to int
for col in mass_shooters.select_dtypes(include=float).columns.to_list():
    mass_shooters[col] = mass_shooters[col].astype(int)

In [161]:
# No changes to int dtypes necessary...

---

<a id='preparedisseminate'></a>
<h3><b><i>
    Disseminate Column Information
</i></b></h3>
<li><a href='#prepare'>Prepare Top</a></li>

Due to column information formatting, encoders will not disseminate/create binary columns properly;therefore, must manually create them...

- 31 columns manually disseminated
    - 137 new columns generated

In [177]:
# Disseminate 'Adult Trauma'
mass_shooters['adult_trauma_no_evidence'] = np.where(mass_shooters['Adult Trauma'].astype(str).str.contains('0'), 1, 0)
mass_shooters['adult_trauma_death_of_parent'] = np.where(mass_shooters['Adult Trauma'].astype(str).str.contains('1'), 1, 0)
mass_shooters['adult_trauma_death_or_loss_of_child'] = np.where(mass_shooters['Adult Trauma'].astype(str).str.contains('2'), 1, 0)
mass_shooters['adult_trauma_death_of_family_member_causing_significant_distress'] = np.where(mass_shooters['Adult Trauma'].astype(str).str.contains('3'), 1, 0)
mass_shooters['adult_trauma_from_war'] = np.where(mass_shooters['Adult Trauma'].astype(str).str.contains('4'), 1, 0)
mass_shooters['adult_trauma_accident'] = np.where(mass_shooters['Adult Trauma'].astype(str).str.contains('5'), 1, 0)
mass_shooters['adult_trauma_other'] = np.where(mass_shooters['Adult Trauma'].astype(str).str.contains('6'), 1, 0)

In [179]:
# Disseminate 'Voluntary or Mandatory Counseling'
mass_shooters['voluntary_or_mandatory_counseling_na'] = np.where(mass_shooters['Voluntary or Mandatory Counseling'].astype(str).str.contains('0'), 1, 0)
mass_shooters['voluntary_or_mandatory_counseling_voluntary'] = np.where(mass_shooters['Voluntary or Mandatory Counseling'].astype(str).str.contains('1'), 1, 0)
mass_shooters['voluntary_or_mandatory_counseling_involuntary'] = np.where(mass_shooters['Voluntary or Mandatory Counseling'].astype(str).str.contains('2'), 1, 0)

In [185]:
# Disseminate 'Mental Illness'
mass_shooters['mental_illness_no_evidence'] = np.where(mass_shooters['Mental Illness'].astype(str).str.contains('0'), 1, 0)
mass_shooters['mental_illness_mood_disorder'] = np.where(mass_shooters['Mental Illness'].astype(str).str.contains('1'), 1, 0)
mass_shooters['mental_illness_thought_disorder'] = np.where(mass_shooters['Mental Illness'].astype(str).str.contains('2'), 1, 0)
mass_shooters['mental_illness_other_psychiatric_disorder'] = np.where(mass_shooters['Mental Illness'].astype(str).str.contains('3'), 1, 0)
mass_shooters['mental_illness_indication_but_no_diagnosis'] = np.where(mass_shooters['Mental Illness'].astype(str).str.contains('4'), 1, 0)

In [187]:
# Disseminate 'Known Family Mental Health History'
mass_shooters['known_family_mental_health_history_no_evidence'] = np.where(mass_shooters['Known Family Mental Health History'].astype(str).str.contains('0'), 1, 0)
mass_shooters['known_family_mental_health_history_parents'] = np.where(mass_shooters['Known Family Mental Health History'].astype(str).str.contains('1'), 1, 0)
mass_shooters['known_family_mental_health_history_other_relative'] = np.where(mass_shooters['Known Family Mental Health History'].astype(str).str.contains('2'), 1, 0)

In [189]:
# Disseminate 'Part I Crimes'
mass_shooters['part_i_crimes_no_evidence'] = np.where(mass_shooters['Part I Crimes'].astype(str).str.contains('0'), 1, 0)
mass_shooters['part_i_crimes_homicide'] = np.where(mass_shooters['Part I Crimes'].astype(str).str.contains('1'), 1, 0)
mass_shooters['part_i_crimes_forcible_rape'] = np.where(mass_shooters['Part I Crimes'].astype(str).str.contains('2'), 1, 0)
mass_shooters['part_i_crimes_robbery'] = np.where(mass_shooters['Part I Crimes'].astype(str).str.contains('3'), 1, 0)
mass_shooters['part_i_crimes_aggravated_assault'] = np.where(mass_shooters['Part I Crimes'].astype(str).str.contains('4'), 1, 0)
mass_shooters['part_i_crimes_burglary'] = np.where(mass_shooters['Part I Crimes'].astype(str).str.contains('5'), 1, 0)
mass_shooters['part_i_crimes_larceny_theft'] = np.where(mass_shooters['Part I Crimes'].astype(str).str.contains('6'), 1, 0)
mass_shooters['part_i_crimes_motor_vehicle_theft'] = np.where(mass_shooters['Part I Crimes'].astype(str).str.contains('7'), 1, 0)
mass_shooters['part_i_crimes_arson'] = np.where(mass_shooters['Part I Crimes'].astype(str).str.contains('8'), 1, 0)

In [191]:
# Disseminate 'Part II Crimes'
mass_shooters['part_ii_crimes_no_evidence'] = np.where(mass_shooters['Part II Crimes'].astype(str).str.contains('0'), 1, 0)
mass_shooters['part_ii_crimes_simple_assault'] = np.where(mass_shooters['Part II Crimes'].astype(str).str.contains('1'), 1, 0)
mass_shooters['part_ii_crimes_fraud_forgery_embezzlement'] = np.where(mass_shooters['Part II Crimes'].astype(str).str.contains('2'), 1, 0)
mass_shooters['part_ii_crimes_stolen_property'] = np.where(mass_shooters['Part II Crimes'].astype(str).str.contains('3'), 1, 0)
mass_shooters['part_ii_crimes_vandalism'] = np.where(mass_shooters['Part II Crimes'].astype(str).str.contains('4'), 1, 0)
mass_shooters['part_ii_crimes_weapons_offenses'] = np.where(mass_shooters['Part II Crimes'].astype(str).str.contains('5'), 1, 0)
mass_shooters['part_ii_crimes_prostitution'] = np.where(mass_shooters['Part II Crimes'].astype(str).str.contains('6'), 1, 0)
mass_shooters['part_ii_crimes_drugs'] = np.where(mass_shooters['Part II Crimes'].astype(str).str.contains('7'), 1, 0)
mass_shooters['part_ii_crimes_dui'] = np.where(mass_shooters['Part II Crimes'].astype(str).str.contains('8'), 1, 0)
mass_shooters['part_ii_crimes_other'] = np.where(mass_shooters['Part II Crimes'].astype(str).str.contains('9'), 1, 0)

In [193]:
# Disseminate 'Domestic Abuse Specified'
mass_shooters['domestic_abuse_specified_na'] = np.where(mass_shooters['Domestic Abuse Specified'].astype(str).str.contains('0'), 1, 0)
mass_shooters['domestic_abuse_specified_non_sexual'] = np.where(mass_shooters['Domestic Abuse Specified'].astype(str).str.contains('1'), 1, 0)
mass_shooters['domestic_abuse_specified_sexual_violence'] = np.where(mass_shooters['Domestic Abuse Specified'].astype(str).str.contains('2'), 1, 0)
mass_shooters['domestic_abuse_specified_threats_coercive_control'] = np.where(mass_shooters['Domestic Abuse Specified'].astype(str).str.contains('3'), 1, 0)
mass_shooters['domestic_abuse_specified_threats_with_deadly_weapon'] = np.where(mass_shooters['Domestic Abuse Specified'].astype(str).str.contains('4'), 1, 0)

In [195]:
# Disseminate 'Recent or Ongoing Stressor'
mass_shooters['recent_or_ongoing_stressor_no_evidence'] = np.where(mass_shooters['Recent or Ongoing Stressor'].astype(str).str.contains('0'), 1, 0)
mass_shooters['recent_or_ongoing_stressor_recent_breakup'] = np.where(mass_shooters['Recent or Ongoing Stressor'].astype(str).str.contains('1'), 1, 0)
mass_shooters['recent_or_ongoing_stressor_employment'] = np.where(mass_shooters['Recent or Ongoing Stressor'].astype(str).str.contains('2'), 1, 0)
mass_shooters['recent_or_ongoing_stressor_economic_stressor'] = np.where(mass_shooters['Recent or Ongoing Stressor'].astype(str).str.contains('3'), 1, 0)
mass_shooters['recent_or_ongoing_stressor_family_issue'] = np.where(mass_shooters['Recent or Ongoing Stressor'].astype(str).str.contains('4'), 1, 0)
mass_shooters['recent_or_ongoing_stressor_legal_issue'] = np.where(mass_shooters['Recent or Ongoing Stressor'].astype(str).str.contains('5'), 1, 0)
mass_shooters['recent_or_ongoing_stressor_other'] = np.where(mass_shooters['Recent or Ongoing Stressor'].astype(str).str.contains('6'), 1, 0)

In [197]:
# Disseminate 'Substance Use'
mass_shooters['substance_use_no_evidence'] = np.where(mass_shooters['Substance Use'].astype(str).str.contains('0'), 1, 0)
mass_shooters['substance_use_alcohol'] = np.where(mass_shooters['Substance Use'].astype(str).str.contains('1'), 1, 0)
mass_shooters['substance_use_marijuana'] = np.where(mass_shooters['Substance Use'].astype(str).str.contains('2'), 1, 0)
mass_shooters['substance_use_other_drugs'] = np.where(mass_shooters['Substance Use'].astype(str).str.contains('3'), 1, 0)

In [199]:
# Disseminate 'Known Prejudices\xa0'
mass_shooters['known_prejudices_no_evidence'] = np.where(mass_shooters['Known Prejudices\xa0'].astype(str).str.contains('0'), 1, 0)
mass_shooters['known_prejudices_racism'] = np.where(mass_shooters['Known Prejudices\xa0'].astype(str).str.contains('1'), 1, 0)
mass_shooters['known_prejudices_misogyny'] = np.where(mass_shooters['Known Prejudices\xa0'].astype(str).str.contains('2'), 1, 0)
mass_shooters['known_prejudices_homophobia'] = np.where(mass_shooters['Known Prejudices\xa0'].astype(str).str.contains('3'), 1, 0)
mass_shooters['known_prejudices_religious_hatred'] = np.where(mass_shooters['Known Prejudices\xa0'].astype(str).str.contains('4'), 1, 0)

In [213]:
# Disseminate 'Urban/Suburban/Rural'
mass_shooters['urban'] = np.where(mass_shooters['Urban/Suburban/Rural'] == 0, 1, 0)
mass_shooters['suburban'] = np.where(mass_shooters['Urban/Suburban/Rural'] == 1, 1, 0)
mass_shooters['rural'] = np.where(mass_shooters['Urban/Suburban/Rural'] == 2, 1, 0)

In [216]:
# Disseminate 'Race'
mass_shooters['race_white'] = np.where(mass_shooters['Race'] == 0, 1, 0)
mass_shooters['race_black'] = np.where(mass_shooters['Race'] == 1, 1, 0)
mass_shooters['race_hispanic'] = np.where(mass_shooters['Race'] == 2, 1, 0)
mass_shooters['race_asian'] = np.where(mass_shooters['Race'] == 3, 1, 0)
mass_shooters['race_middle_eastern'] = np.where(mass_shooters['Race'] == 4, 1, 0)
mass_shooters['race_native_american'] = np.where(mass_shooters['Race'] == 5, 1, 0)
mass_shooters['race_moroccan'] = np.where(mass_shooters['Race'] == 6, 1, 0)
mass_shooters['race_bosnian'] = np.where(mass_shooters['Race'] == 7, 1, 0)

In [217]:
# Disseminate 'Relationship Status'
mass_shooters['relationship_status_single'] = np.where(mass_shooters['Relationship Status'] == 0, 1, 0)
mass_shooters['relationship_status_boyfriend_girlfriend'] = np.where(mass_shooters['Relationship Status'] == 1, 1, 0)
mass_shooters['relationship_status_married'] = np.where(mass_shooters['Relationship Status'] == 2, 1, 0)
mass_shooters['relationship_status_divorce_separated'] = np.where(mass_shooters['Relationship Status'] == 3, 1, 0)

In [219]:
# Disseminate 'Employment Type\xa0'
mass_shooters['employment_type_blue_collar'] = np.where(mass_shooters['Employment Type\xa0'] == 0, 1, 0)
mass_shooters['employment_type_white_collar'] = np.where(mass_shooters['Employment Type\xa0'] == 1, 1, 0)
mass_shooters['employment_type_in_between'] = np.where(mass_shooters['Employment Type\xa0'] == 2, 1, 0)

In [223]:
# Disseminate 'Military Service'
mass_shooters['military_service_no'] = np.where(mass_shooters['Military Service'] == 0, 1, 0)
mass_shooters['military_service_yes'] = np.where(mass_shooters['Military Service'] == 1, 1, 0)
mass_shooters['military_service_joined_but_did_not_complete_training'] = np.where(mass_shooters['Military Service'] == 2, 1, 0)

In [224]:
# Disseminate 'Community Involvement'
mass_shooters['community_involvement_no_evidence'] = np.where(mass_shooters['Community Involvement'] == 0, 1, 0)
mass_shooters['community_involvement_somewhat'] = np.where(mass_shooters['Community Involvement'] == 1, 1, 0)
mass_shooters['community_involvement_heavily_involved'] = np.where(mass_shooters['Community Involvement'] == 2, 1, 0)
mass_shooters['community_involvement_formerly_involved'] = np.where(mass_shooters['Community Involvement'] == 3, 1, 0)

In [225]:
# Disseminate 'Highest Level of Justice System Involvement'
mass_shooters['highest_level_of_justice_system_involvement_na'] = np.where(mass_shooters['Highest Level of Justice System Involvement'] == 0, 1, 0)
mass_shooters['highest_level_of_justice_system_involvement_suspected'] = np.where(mass_shooters['Highest Level of Justice System Involvement'] == 1, 1, 0)
mass_shooters['highest_level_of_justice_system_involvement_arrested'] = np.where(mass_shooters['Highest Level of Justice System Involvement'] == 2, 1, 0)
mass_shooters['highest_level_of_justice_system_involvement_charged'] = np.where(mass_shooters['Highest Level of Justice System Involvement'] == 3, 1, 0)
mass_shooters['highest_level_of_justice_system_involvement_convicted'] = np.where(mass_shooters['Highest Level of Justice System Involvement'] == 4, 1, 0)

In [227]:
# Disseminate 'History of Physical Altercations'
mass_shooters['history_of_physical_altercations_na'] = np.where(mass_shooters['History of Physical Altercations'] == 0, 1, 0)
mass_shooters['history_of_physical_altercations_yes'] = np.where(mass_shooters['History of Physical Altercations'] == 1, 1, 0)
mass_shooters['history_of_physical_altercations_attacked_inanimate_objects_during_arguments'] = np.where(mass_shooters['History of Physical Altercations'] == 2, 1, 0)

In [None]:
# Disseminate 'History of Domestic Abuse'
mass_shooters['history_of_domestic_abuse_na'] = np.where(mass_shooters['History of Domestic Abuse'] == 0, 1, 0)
mass_shooters['history_of_domestic_abuse_abused_romantic_partner'] = np.where(mass_shooters['History of Domestic Abuse'] == 1, 1, 0)
mass_shooters['history_of_domestic_abuse_abused_other_family'] = np.where(mass_shooters['History of Domestic Abuse'] == 2, 1, 0)

In [None]:
# Disseminate 'Known Hate Group or Chat Room Affiliation'
mass_shooters['known_hate_group_or_chat_room_affiliation_no_evidence'] = np.where(mass_shooters['Known Hate Group or Chat Room Affiliation'] == 0, 1, 0)
mass_shooters['known_hate_group_or_chat_room_affiliation_hate_group'] = np.where(mass_shooters['Known Hate Group or Chat Room Affiliation'] == 1, 1, 0)
mass_shooters['known_hate_group_or_chat_room_affiliation_other_radical_group'] = np.where(mass_shooters['Known Hate Group or Chat Room Affiliation'] == 2, 1, 0)
mass_shooters['known_hate_group_or_chat_room_affiliation_inspired_by_hate_group_but_no_connection'] = np.where(mass_shooters['Known Hate Group or Chat Room Affiliation'] == 3, 1, 0)
mass_shooters['known_hate_group_or_chat_room_affiliation_website_or_chat'] = np.where(mass_shooters['Known Hate Group or Chat Room Affiliation'] == 4, 1, 0)

In [None]:
# Disseminate 'Violent Video Games'
mass_shooters['violent_video_games_no_evidence'] = np.where(mass_shooters['Violent Video Games'] == 0, 1, 0)
mass_shooters['violent_video_games_yes'] = np.where(mass_shooters['Violent Video Games'] == 1, 1, 0)
mass_shooters['violent_video_games_played_unspecified'] = np.where(mass_shooters['Violent Video Games'] == 2, 1, 0)
mass_shooters['violent_video_games_na'] = np.where(mass_shooters['Violent Video Games'] == 3, 1, 0)

In [None]:
# Disseminate 'Timeline of Signs of Crisis'
mass_shooters['timeline_of_signs_of_crisis_days_before'] = np.where(mass_shooters['Timeline of Signs of Crisis'] == 0, 1, 0)
mass_shooters['timeline_of_signs_of_crisis_weeks_before'] = np.where(mass_shooters['Timeline of Signs of Crisis'] == 1, 1, 0)
mass_shooters['timeline_of_signs_of_crisis_months_before'] = np.where(mass_shooters['Timeline of Signs of Crisis'] == 2, 1, 0)
mass_shooters['timeline_of_signs_of_crisis_years_before'] = np.where(mass_shooters['Timeline of Signs of Crisis'] == 3, 1, 0)

In [None]:
# Disseminate 'Suicidality'
mass_shooters['suicidality_no_evidence'] = np.where(mass_shooters['Suicidality'] == 0, 1, 0)
mass_shooters['suicidality_yes_prior_to_shooting'] = np.where(mass_shooters['Suicidality'] == 1, 1, 0)
mass_shooters['suicidality_intended_to_die_in_shooting_no_prior'] = np.where(mass_shooters['Suicidality'] == 2, 1, 0)

In [None]:
# Disseminate 'Voluntary or Involuntary Hospitalization'
mass_shooters['voluntary_or_involuntary_hospitalization_na'] = np.where(mass_shooters['Voluntary or Involuntary Hospitalization'] == 0, 1, 0)
mass_shooters['voluntary_or_involuntary_hospitalization_voluntary'] = np.where(mass_shooters['Voluntary or Involuntary Hospitalization'] == 1, 1, 0)
mass_shooters['voluntary_or_involuntary_hospitalization_involuntary'] = np.where(mass_shooters['Voluntary or Involuntary Hospitalization'] == 2, 1, 0)

In [None]:
# Disseminate 'Motive: Racism/Xenophobia'
mass_shooters['motive_racism_xenophobia_no_evidence'] = np.where(mass_shooters['Motive: Racism/Xenophobia'] == 0, 1, 0)
mass_shooters['motive_racism_xenophobia_targeting_color'] = np.where(mass_shooters['Motive: Racism/Xenophobia'] == 1, 1, 0)
mass_shooters['motive_racism_xenophobia_targeting_white'] = np.where(mass_shooters['Motive: Racism/Xenophobia'] == 2, 1, 0)

In [None]:
# Disseminate 'Motive: Religious Hate'
mass_shooters['motive_religious_hate_no_evidence'] = np.where(mass_shooters['Motive: Religious Hate'] == 0, 1, 0)
mass_shooters['motive_religious_hate_antisemitism'] = np.where(mass_shooters['Motive: Religious Hate'] == 1, 1, 0)
mass_shooters['motive_religious_hate_islamophobia'] = np.where(mass_shooters['Motive: Religious Hate'] == 2, 1, 0)
mass_shooters['motive_religious_hate_angry_with_christianity_or_god'] = np.where(mass_shooters['Motive: Religious Hate'] == 3, 1, 0)

In [None]:
# Disseminate 'Motive: Other\xa0'
mass_shooters['motive_other_no_evidence'] = np.where(mass_shooters['Motive: Other\xa0'] == 0, 1, 0)
mass_shooters['motive_other_yes'] = np.where(mass_shooters['Motive: Other\xa0'] == 1, 1, 0)
mass_shooters['motive_other_generalized_anger'] = np.where(mass_shooters['Motive: Other\xa0'] == 2, 1, 0)

In [None]:
# Disseminate 'Role of Psychosis in the Shooting'
mass_shooters['role_of_psychosis_in_the_shooting_no_evidence'] = np.where(mass_shooters['Role of Psychosis in the Shooting'] == 0, 1, 0)
mass_shooters['role_of_psychosis_in_the_shooting_minor_role'] = np.where(mass_shooters['Role of Psychosis in the Shooting'] == 1, 1, 0)
mass_shooters['role_of_psychosis_in_the_shooting_moderate_role'] = np.where(mass_shooters['Role of Psychosis in the Shooting'] == 2, 1, 0)
mass_shooters['role_of_psychosis_in_the_shooting_major_role'] = np.where(mass_shooters['Role of Psychosis in the Shooting'] == 3, 1, 0)

In [None]:
# Disseminate 'Social Media Use\xa0'
mass_shooters['social_media_use_no_evidence'] = np.where(mass_shooters['Social Media Use\xa0'] == 0, 1, 0)
mass_shooters['social_media_use_yes'] = np.where(mass_shooters['Social Media Use\xa0'] == 1, 1, 0)
mass_shooters['social_media_use_na'] = np.where(mass_shooters['Social Media Use\xa0'] == 2, 1, 0)

In [None]:
# Disseminate 'Pop Culture Connection'
mass_shooters['pop_culture_connection_no_evidence'] = np.where(mass_shooters['Pop Culture Connection'] == 0, 1, 0)
mass_shooters['pop_culture_connection_explicit_reference'] = np.where(mass_shooters['Pop Culture Connection'] == 1, 1, 0)
mass_shooters['pop_culture_connection_tangential_reference'] = np.where(mass_shooters['Pop Culture Connection'] == 2, 1, 0)

In [None]:
# Disseminate 'Firearm Proficiency'
mass_shooters['firearm_proficiency_no_experience'] = np.where(mass_shooters['Firearm Proficiency'] == 0, 1, 0)
mass_shooters['firearm_proficiency_some_experience'] = np.where(mass_shooters['Firearm Proficiency'] == 1, 1, 0)
mass_shooters['firearm_proficiency_more_experienced'] = np.where(mass_shooters['Firearm Proficiency'] == 2, 1, 0)
mass_shooters['firearm_proficiency_very_experienced'] = np.where(mass_shooters['Firearm Proficiency'] == 3, 1, 0)

---

<a id='prepareaggregate'></a>
<h3><b><i>
    Aggregate Column Creation
</i></b></h3>
<li><a href='#prepare'>Prepare Top</a></li>

- Columns Created: 12
    - agg_signs_of_crisis
        - 10 columns used
    - agg_motivation_hatred
        - 2 columns used
    - agg_motivation_personal
        - 6 columns used
    - agg_social
        - 5 columns used
    - agg_trauma
        - 20 columns used
    - agg_health
        - 13 columns used
    - agg_background
        - 4 columns used
    - agg_crime
        - 32 columns used
    - agg_stress
        - 9 columns used
    - agg_substance_abuse
        - 3 columns used
    - agg_prejudice
        - 4 columns used
    - agg_grand_total
        - 109 columns used

In [None]:
signs_of_crisis_cols = [
     'Signs of Being in Crisis',
     'Inability to Perform Daily Tasks',
     'Notably Depressed Mood',
     'Unusually Calm or Happy',
     'Rapid Mood Swings',
     'Increased Agitation',
     'Abusive Behavior',
     'Isolation',
     'Losing Touch with Reality',
     'Paranoia'
]
motivation_hatred_cols = [
    'Motive: Misogyny',
    'Motive: Homophobia'
]
motivation_personal_cols = [
    'Motive: Employment Issue',
    'Motive: Economic Issue',
    'Motive: Legal Issue',
    'Motive: Relationship Issue',
    'Motive: Interpersonal Conflict\xa0',
    'Motive: Fame-Seeking'
]
social_cols = [
    'Leakage\xa0',
    'Interest in Past Mass Violence',
    'Relationship with Other Shooting(s)',
    'Planning',
    'Performance'
]
trauma_cols = [
    'Bullied',
    'Raised by Single Parent',
    'Parental Divorce / Separation',
    'Parental Death in Childhood',
    'Parental Suicide',
    'Childhood Trauma',
    'Physically Abused',
    'Sexually Abused',
    'Emotionally Abused',
    'Neglected',
    'Mother Violent Treatment',
    'Parental Substance Abuse',
    'Parent Criminal Record',
    'Family Member Incarcerated',
    'adult_trauma_death_of_parent',
    'adult_trauma_death_or_loss_of_child',
    'adult_trauma_death_of_family_member_causing_significant_distress',
    'adult_trauma_from_war',
    'adult_trauma_accident',
    'adult_trauma_other'
]
health_cols = [
    'Prior Hospitalization',
    'Prior Counseling',
    'Psychiatric Medication',
    'FASD (Fetal Alcohol Spectrum Disorder)',
    'Autism Spectrum',
    'Health Issues',
    'Head Injury / Possible TBI',
    'voluntary_or_mandatory_counseling_voluntary',
    'voluntary_or_mandatory_counseling_involuntary',
    'mental_illness_mood_disorder',
    'mental_illness_thought_disorder',
    'mental_illness_other_psychiatric_disorder',
    'mental_illness_indication_but_no_diagnosis'
]
background_cols = [
    'Immigrant',
    'Sexual Orientation',
    'known_family_mental_health_history_parents',
    'known_family_mental_health_history_other_relative'
]
crime_cols = [
    'Known to Police or FBI',
    'Criminal Record',
    'History of Animal Abuse',
    'History of Sexual Offenses',
    'Gang Affiliation',
    'Terror Group Affiliation',
    'Bully',
    'part_i_crimes_homicide',
    'part_i_crimes_forcible_rape',
    'part_i_crimes_robbery',
    'part_i_crimes_aggravated_assault',
    'part_i_crimes_burglary',
    'part_i_crimes_larceny_theft',
    'part_i_crimes_motor_vehicle_theft',
    'part_i_crimes_arson',
    'part_ii_crimes_simple_assault',
    'part_ii_crimes_fraud_forgery_embezzlement',
    'part_ii_crimes_stolen_property',
    'part_ii_crimes_vandalism',
    'part_ii_crimes_weapons_offenses',
    'part_ii_crimes_prostitution',
    'part_ii_crimes_drugs',
    'part_ii_crimes_dui',
    'part_ii_crimes_other',
    'domestic_abuse_specified_non_sexual',
    'domestic_abuse_specified_sexual_violence',
    'domestic_abuse_specified_threats_coercive_control',
    'domestic_abuse_specified_threats_with_deadly_weapon',
    'highest_level_of_justice_system_involvement_suspected',
    'highest_level_of_justice_system_involvement_arrested',
    'highest_level_of_justice_system_involvement_charged',
    'highest_level_of_justice_system_involvement_convicted'
]
stress_cols = [
    'Employment Status',
    'recent_or_ongoing_stressor_recent_breakup',
    'recent_or_ongoing_stressor_employment',
    'recent_or_ongoing_stressor_economic_stressor',
    'recent_or_ongoing_stressor_family_issue',
    'recent_or_ongoing_stressor_legal_issue',
    'recent_or_ongoing_stressor_other',
    'history_of_physical_altercations_yes',
    'history_of_physical_altercations_attacked_inanimate_objects_during_arguments'
]
substance_abuse_cols = [
    'substance_use_alcohol',
    'substance_use_marijuana',
    'substance_use_other_drugs'
]
prejudice_cols = [
    'known_prejudices_racism',
    'known_prejudices_misogyny',
    'known_prejudices_homophobia',
    'known_prejudices_religious_hatred'
]
grand_total_cols = [
    'Interest in Firearms',
]

In [250]:
mass_shooters['Sexual Orientation'].value_counts()

0    183
1      5
Name: Sexual Orientation, dtype: int64

In [251]:
mass_shooters['History of Animal Abuse'].value_counts()

0    177
1     11
Name: History of Animal Abuse, dtype: int64

---

<a id='preparetext'></a>
<h3><b><i>
    Text Modification
</i></b></h3>
<li><a href='#prepare'>Prepare Top</a></li>

---

<a id='preparesummary'></a>
<h3><b><i>
    Summary of Preparation
</i></b></h3>
<li><a href='#prepare'>Prepare Top</a></li>

- <h5><b>Dropped Data</b></h5>

    - 1 Row
        - Missing majority of info
    - 47 columns
        - Useless in scope of predictive value
        - Percent nulls above 20%
        
- <h5><b>Null Handling</b></h5>

    - Fill with mode
        - 44 columns
    - Fill uniquely
        - 1 columns ('None')

- <h5><b>Dtype Cleaning</b></h5>
    
    - Object
        - 4 columns to int
        - 3 columns fixed
             - 'Day',
                 - '19-20' == 19
             - 'Race',
                 - 'Moroccan' == 6
                 - 'Bosnian' == 7
             - 'Criminal Record',
                 - '1`' == 1
    - Float
        - 79 columns to int
        - 3 columns fixed
             - 'Gender',
                 - 3.0 == 0
             - 'Children',
                 - 5.0 == 0
             - 'History of Domestic Abuse',
                 - 3.0 == 2
    - Int
        - Nothing changed
    
- <h5><b>Disseminate Column Information</b></h5>
    
    - 31 columns manually disseminated
        - 137 new columns generated
    
- <h5><b>Aggregate Column Creation</b></h5>    


- <h5><b>Text Modifications</b></h5>


- <h5><b>Final Dataframe</b></h5>

    - Rows: 
    - Columns: 

<div style='background-color: orange'>
<a id="wrangle"></a>
    <h1 style='text-align: center'>
        <b><i>
            Wrangle
        </i></b></h1>
<li><a href='#TableOfContents'>Table of Contents</a></li>

<div style='background-color: orange'>
<a id="misc"></a>
    <h1 style='text-align: center'>
        <b><i>
            Miscellaneous
        </i></b></h1>
<li><a href='#TableOfContents'>Table of Contents</a></li>

In [1]:
def drop_nullpct(df, percent_cutoff):
    '''
    Takes in a dataframe and a percent_cutoff of nulls to drop a column on
    and returns the new dataframe and a dictionary of dropped columns and their pct...
    
    INPUT:
    df = pandas dataframe
    percent_cutoff = Null percent cutoff amount
    
    OUTPUT:
    new_df = pandas dataframe with dropped columns
    drop_null_pct_dict = dict of column names dropped and pcts
    '''
    drop_null_pct_dict = {
        'column_name' : [],
        'percent_null' : []
    }
    for col in df:
        pct = df[col].isna().sum() / df.shape[0]
        if pct > 0.20:
            df = df.drop(columns=col)
            drop_null_pct_dict['column_name'].append(col)
            drop_null_pct_dict['percent_null'].append(pct)
    new_df = df
    return new_df, drop_null_pct_dict

In [2]:
def check_nulls(df):
    '''
    Takes a dataframe and returns a list of columns that has at least one null value
    
    INPUT:
    df = pandas dataframe
    
    OUTPUT:
    has_nulls = List of column names with at least one null
    '''
    has_nulls = []
    for col in df:
        nulls = df[col].isna().sum()
        if nulls > 0:
            has_nulls.append(col)
    return has_nulls