# Mother Jones Mass Shooting Data Cleaning
### Authors: Joe Acosta
Imports the initial data collected by the mother jones foundation and cleans the data. The data is currently being imported directly from the website and are being cleaned such that future research can expand on their findings.

Mother Jones Initial Data: https://www.motherjones.com/politics/2012/12/mass-shootings-mother-jones-full-data/

In [1]:
import pandas as pd
import numpy as np
# import re

## Initial Data Import
Creates a Pandas DataFrame given the link to the initial data.
Verifies the data types and corrects any discrepencies.

In [2]:
mj_url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQBEbQoWMn_P81DuwmlQC0_jr2sJDzkkC0mvF6WLcM53ZYXi8RMfUlunvP1B5W0jRrJvH-wc-WGjDB1/pub?gid=0&single=true&output=csv'
mjms_df = pd.read_csv(mj_url)

In [3]:
mjms_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 151 entries, 0 to 150
Data columns (total 24 columns):
 #   Column                            Non-Null Count  Dtype 
---  ------                            --------------  ----- 
 0   case                              151 non-null    object
 1   location                          151 non-null    object
 2   date                              151 non-null    object
 3   summary                           151 non-null    object
 4   fatalities                        151 non-null    int64 
 5   injured                           151 non-null    int64 
 6   total_victims                     151 non-null    int64 
 7   location.1                        151 non-null    object
 8   age_of_shooter                    151 non-null    object
 9   prior_signs_mental_health_issues  151 non-null    object
 10  mental_health_details             151 non-null    object
 11  weapons_obtained_legally          151 non-null    object
 12  where_obtained        

## Data Cleaning

### Date Data Cleaning
There are various formats for the year and there exists a seperate column year. Updates the date format such that it's in mm-dd-yyyy format then drops the year column. Also converts the date to datetime64[ns] type allowing for various date related opperations including sorting by year, month, day, decade, etc.

In [4]:
def update_year(date, year) :
    '''
    Updates the date ensuring it's in mm/dd/yyyy format

    Parameters:
     - date (String): date in mm/dd/yy or mm/dd/yyyy
     - year (String): year in yyyy format

    Return:
     - _date (String): date in the format mm/dd/yyyy
    
    // Include into the doc created
    '''
    _month_day = '/'.join(date.split('/')[:2])
    _date = _month_day + '/' + year
    return _date

In [5]:
# Converts date to mm-dd-yyyy using year column
mjms_df.date = mjms_df.apply(lambda row: update_year(row.date, str(row.year)), axis=1)

# Converts date series to pd date type
mjms_df.date = pd.to_datetime(mjms_df.date)

# Drops the year column
mjms_df = mjms_df.drop('year', axis=1)

In [6]:
# Verifies year can get extracted from the date col
mjms_df.date.sample(3).dt.year

75    2015
13    2023
50    2018
Name: date, dtype: int32

### Fatalities Data Cleaning
Calculates the total_victims field given the fatalities and injuries fields. The original data ins't a calculated field. This solution prevents human error in the total_victims field so long as the fatalities and injuries fields are accurate.

In [7]:
mjms_df.total_victims = mjms_df.fatalities + mjms_df.injured

### Age of Shooter Data Cleaning
Cleans the age of shooter. For those cases where there were multiple shooters, the average is taken

#### Verifying missing data
Checking for the missing ages that prevent the data from being stored as an int

In [8]:
for i in mjms_df.age_of_shooter.unique() :
    print(i, end=', ')

14, 44, 67, 40, 21, 59, 18, 33, 25, 28, 43, 72, 31, 22, 15, 20, 70, 23, 45, -, 57, 19, 51, 36, 24, 32, 46, 26, 54, 29, 38, 17, 47, 37, 64, 39, 27, 34, 42, 41, 52, 16, 48, 66, 11, 35, 55, 50, 

#### Converts Data to Numeric
Replaces all missing values with -1 then converts the Series to numeric (int)

In [9]:
# Replaces missing ages with -1 then converts the column to integer
mjms_df.age_of_shooter = mjms_df.age_of_shooter.replace('-', -1)
mjms_df.age_of_shooter = pd.to_numeric(mjms_df.age_of_shooter)

In [10]:
pd.set_option('display.max_colwidth', None)

#### Identify Data Cases
For the data that's missing or wrong, returns the case name to use for updating the ages

In [11]:
# Issolate the cases without ages
mjms_df.loc[mjms_df.age_of_shooter == -1, ('case', 'summary', 'age_of_shooter')]

Unnamed: 0,case,summary,age_of_shooter
25,Sacramento County church shooting,"""A man believed to be meeting his three children for a supervised visit at a church just outside Sacramento on Monday afternoon fatally shot the children and an adult accompanying them before killing himself, police officials said. Sheriff Scott Jones of Sacramento County told reporters at the scene that the gunman had a restraining order against him, and that he had to have supervised visits with his children, who were younger than 15."" (NYTimes)",-1
34,Jersey City kosher market shooting,"David N. Anderson, 47, and Francine Graham, 50, were heavily armed and traveling in a white van when they first killed a police officer in a cemetery, and then opened fire at a kosher market, “fueled both by anti-Semitism and anti-law enforcement beliefs,” according to New Jersey authorities. The pair, linked to the antisemitic ideology of the Black Hebrew Israelites extremist group, were killed after a lenghty gun battle with police at the market.",-1


In [12]:
mjms_df.loc[mjms_df.age_of_shooter == 11, ('case', 'summary', 'age_of_shooter')]

Unnamed: 0,case,summary,age_of_shooter
126,Westside Middle School killings,"Mitchell Scott Johnson, 13, and Andrew Douglas Golden, 11, two juveniles, ambushed students and teachers as they left the school; they were apprehended by police at the scene.",11


#### Supporting documentation
For the 'Sacramento County church shooting' case, source @ https://www.cnn.com/2022/02/28/us/sacramento-church-shooting identified the shooter as 39-year-old David Mora Rojas

In [13]:
# Sets age of shooter for 'Sacramento Country Church Shooting'
mjms_df.loc[mjms_df.case == 'Sacramento County church shooting', 'age_of_shooter'] = 39

#### Fixing Data Input Error
For the 'Westside Middle School killings' case, the shooters ages were 13 and 11. There were two assalents though only one was recorded.
For the 'Jersey City kosher market shooting' case, the shooters ages were 47 and 50. Ref case summary.

In [14]:
# Updates age_of_shooter to allow lists
mjms_df['age_of_shooter'] = mjms_df['age_of_shooter'].astype('object')

# Gets index regardless of future updates
# Updates the instances where there were multiple shooters
mjms_df.at[mjms_df[mjms_df['case'] == 'Jersey City kosher market shooting'].index[0], 'age_of_shooter'] = [47, 50]
mjms_df.at[mjms_df[mjms_df['case'] == 'Westside Middle School killings'].index[0], 'age_of_shooter'] = [11, 13]

#### Verify Age Corrections

In [15]:
mjms_df.loc[mjms_df.case.isin(['Sacramento County church shooting', 'Jersey City kosher market shooting', 'Westside Middle School killings']), ('case', 'summary', 'age_of_shooter')]

Unnamed: 0,case,summary,age_of_shooter
25,Sacramento County church shooting,"""A man believed to be meeting his three children for a supervised visit at a church just outside Sacramento on Monday afternoon fatally shot the children and an adult accompanying them before killing himself, police officials said. Sheriff Scott Jones of Sacramento County told reporters at the scene that the gunman had a restraining order against him, and that he had to have supervised visits with his children, who were younger than 15."" (NYTimes)",39
34,Jersey City kosher market shooting,"David N. Anderson, 47, and Francine Graham, 50, were heavily armed and traveling in a white van when they first killed a police officer in a cemetery, and then opened fire at a kosher market, “fueled both by anti-Semitism and anti-law enforcement beliefs,” according to New Jersey authorities. The pair, linked to the antisemitic ideology of the Black Hebrew Israelites extremist group, were killed after a lenghty gun battle with police at the market.","[47, 50]"
126,Westside Middle School killings,"Mitchell Scott Johnson, 13, and Andrew Douglas Golden, 11, two juveniles, ambushed students and teachers as they left the school; they were apprehended by police at the scene.","[11, 13]"


In [16]:
pd.reset_option('display.max_colwidth')

### Gender Data Shooter
Normalizes the genderes of the shooters

- M: Male gender 
- F: Female gender
- T->F: Transgender transitioning from male to female
- T->M: Transgender transitioning from female to male
- M/M: Slash used when multiple shooters where each letter represents a shooter
- O: Gender non-conforming including agender, non-binary, bigender, etc.

In [17]:
mjms_df.gender.unique()

array(['M',
       'F ("identifies as transgender" and "Audrey Hale is a biological woman who, on a social media profile, used male pronouns,” according to Nashville Metro PD officials)',
       'Male & Female', 'F', 'Male', 'Female'], dtype=object)

In [18]:
mjms_df.gender = mjms_df.gender.replace('Female', 'F')
mjms_df.gender = mjms_df.gender.replace('Male', 'M')
mjms_df.gender = mjms_df.gender.replace('Male & Female', 'M/F')

#### Cleaning Audrey's case
Identifies Index and updates the case to reflect she is a Transgender Female to Male

In [19]:
mjms_df[mjms_df.gender.str.contains('F ("identifies as transgender', regex=False)]

Unnamed: 0,case,location,date,summary,fatalities,injured,total_victims,location.1,age_of_shooter,prior_signs_mental_health_issues,...,weapon_type,weapon_details,race,gender,sources,mental_health_sources,sources_additional_age,latitude,longitude,type
10,Nashville Christian school shooting,"Nashville, Tennessee",2023-03-27,"Audrey Hale, 28, who was a former student at t...",6,6,12,School,28,-,...,"semiautomatic rifle, semiautomatic handgun",-,White,"F (""identifies as transgender"" and ""Audrey Hal...",https://www.tennessean.com/story/news/crime/20...,-,-,-,-,Mass


In [20]:
mjms_df.loc[mjms_df.case == 'Nashville Christian school shooting', 'gender'] = 'T->M'

#### Cleaning 'Westside Middle School killings' case
The gender of both shooters were not reflected. Updates to account for two shooters

In [21]:
mjms_df.loc[mjms_df.case == 'Westside Middle School killings', 'gender'] = 'M/M'

#### Verifying Gender Data Correction

In [22]:
mjms_df.gender.unique()

array(['M', 'T->M', 'M/F', 'F', 'M/M'], dtype=object)

### Race Data Cleaning
Data normalization conducted. Some entries were capitalized while others weren’t. For the missing data, unknown was entered. Race is specific to the person and without public informaiton on the individua, no attempt was made to identify them by their race

In [23]:
mjms_df.race.unique()

array(['White', 'Black', 'Latino', 'Asian', '-', 'Other', 'White ',
       'Native American', 'white', 'black', 'unclear'], dtype=object)

In [24]:
# Converts to lowercase
mjms_df.race = mjms_df.race.str.lower()
# Removes whitespace
mjms_df.race = mjms_df.race.str.strip()
# Updates '-' inputs to unknown
mjms_df.loc[mjms_df.race == '-', 'race'] = 'unknown'
# Updates 'unclear' inputs to unknown
mjms_df.loc[mjms_df.race == 'unclear', 'race'] = 'unknown'


In [25]:
mjms_df.race.unique()

array(['white', 'black', 'latino', 'asian', 'unknown', 'other',
       'native american'], dtype=object)

### Weapon Obtained Legally Data Cleaning
Identifies if the weapon used in the crime was legally obtained or not

- Yes/No
- TBD
- Unknown
- Some

In [26]:
mjms_df.weapons_obtained_legally.unique()

array(['-', 'yes', 'Yes', 'No', 'TBD',
       'Kelley passed federal criminal background checks; the US Air Force failed to provide information on his criminal history to the FBI',
       'Unknown', '\nYes',
       'Yes ("some of the weapons were purchased legally and some of them may not have been")',
       'Yes '], dtype=object)

In [27]:
# Converts to lowercase
mjms_df.weapons_obtained_legally = mjms_df.weapons_obtained_legally.str.lower()
# Removes whitespace
mjms_df.weapons_obtained_legally = mjms_df.weapons_obtained_legally.str.strip()

#### Attempt to Identify Unknown
An attempt to determine if any of the shooters weapon acquisition status has been verified since the incident

In [28]:
mjms_df.loc[mjms_df.weapons_obtained_legally == 'unknown', ('case', 'summary', 'weapons_obtained_legally')]

Unnamed: 0,case,summary,weapons_obtained_legally
65,Fresno downtown shooting,"Kori Ali Muhammad, 39, opened fire along a str...",unknown
68,Baton Rouge police shooting,"Gavin Long, 29, a former Marine who served in ...",unknown
74,Planned Parenthood clinic,"Robert Lewis Dear, 57, shot and killed a polic...",unknown
83,Alturas tribal shooting,"Cherie Lash Rhoades, 44, opened fire at the Ce...",unknown
110,Trolley Square shooting,"Sulejman Talović, 18, rampaged through the sh...",unknown
133,Chuck E. Cheese's killings,"Nathan Dunlap, 19, a recently fired Chuck E. C...",unknown


In [29]:
mjms_df.loc[mjms_df.weapons_obtained_legally == '-', ('case', 'summary', 'weapons_obtained_legally')]

Unnamed: 0,case,summary,weapons_obtained_legally
0,Apalachee High School shooting,"Colt Gray, 14, was apprehended by responding p...",-
1,Arkansas grocery store shooting,"Travis Posey, 44, opened fire in the parking l...",-
2,UNLV shooting,"Anthony Polito, 67, a former university profes...",-
3,Maine bowling alley and bar shootings,"Robert Card, 40, an Army reservist and firearm...",-
5,Orange County biker bar shooting,"John Snowling, 59, a retired sergeant from the...",-
12,Half Moon Bay spree shooting,"Chunli Zhao, 67, suspected of carrying out the...",-
13,LA dance studio mass shooting,"Huu Can Tran, 72, fled the scene in a white va...",-
14,Virginia Walmart shooting,"Andre Bing, 31, who worked as a supervisor at ...",-
15,LGBTQ club shooting,"Anderson L. Aldrich, 22, wore body armor and o...",-
17,Raleigh spree shooting,"Austin Thompson, 15, went on a rampage in the ...",-


#### Supporting documentation
##### 'unknown' cases
For the 'Fresno downtown shooting' case, source @ https://www.fresnobee.com/news/local/crime/article145336334.html determined the weapon was illegally obtained. In 2006, Kori Ali Muhammad was sentenced by Judge Oliver W. Wanger for felony possession of a firearm and cocaine for which he was incarcerated for 110 Mo. The court also revoked his right to posses a firearm upon release

For the 'Baton Rouge police shooting' case, source @ https://www.kansascity.com/news/local/article159145529.html?utm_source=chatgpt.com determined Gavin obtained his weapon legally

For the 'Planned Parenthood clinic' case, source @ https://www.coloradojudicial.gov/sites/default/files/2023-08/Search%20Warrant%2015-2022_Redacted.pdf reflects that the detectives in the matter didn’t inquire about the legality of the weapons used during the assault.

For the 'Trolley Square shooting' case, source @ https://www.deseret.com/2007/3/30/20010193/salt-lake-police-investigating-3-who-owned-gun-before-talovic/ determined that the pistol was illegally obtained and the shotgun was legally obtained

##### '-' cases
For the 'Apalachee High School shooting' case, source @ https://www.cnn.com/2024/09/06/us/colin-gray-georgia-shooting-suspect-father-charges determined that colin gray had legally purchased a gun that he had then illegally given to his son. The weapon will be marked as a legal purchase

For the 'UNLV shooting' case, source @ https://www.ktnv.com/news/unlv-gunman-had-list-of-targets-brought-handgun-11-mags-to-campus?utm_source=chatgpt.com determined the weapon was legally purchase

In [30]:
# Fresno downtown shooting
mjms_df.loc[mjms_df.case == 'Fresno downtown shooting', 'weapons_obtained_legally'] = 'no'
# Baton Rouge police shooting
mjms_df.loc[mjms_df.case == 'Baton Rouge police shooting', 'weapons_obtained_legally'] = 'yes'
# Trolley Square shooting
mjms_df.loc[mjms_df.case == 'Trolley Square shooting', 'weapons_obtained_legally'] = 'some'

In [31]:
# Apalachee High School shooting
# mjms_df.at[0, 'weapons_obtained_legally'] = 'yes'
# Arkansas grocery store shooting
# mjms_df.at[1, 'weapons_obtained_legally'] = 'unknown'
#UNLV shooting
# mjms_df.at[2, 'weapons_obtained_legally'] = 'yes'

#### Cleaning 'Chattanooga military recruitment center' case
Some of the weapons were legal but not all

In [33]:
mjms_df.loc[mjms_df.weapons_obtained_legally.str.contains('yes ("some of the weapon', regex=False)]

Unnamed: 0,case,location,date,summary,fatalities,injured,total_victims,location.1,age_of_shooter,prior_signs_mental_health_issues,...,weapon_type,weapon_details,race,gender,sources,mental_health_sources,sources_additional_age,latitude,longitude,type
77,Chattanooga military recruitment center,"Chattanooga, Tennessee",2015-07-16,"Kuwaiti-born Mohammod Youssuf Abdulazeez, 24, ...",5,2,7,Military,24,Unclear,...,2 assault rifles; semiautomatic handgun\n,"AK-47, AR-15, and 30-round magazines; 9mm handgun",other,M,http://www.reuters.com/article/2015/07/16/us-u...,-,http://www.reuters.com/article/2015/07/16/us-u...,35.047157,-85.311819,Mass


In [36]:
mjms_df.loc[mjms_df.case == 'Chattanooga military recruitment center', 'weapons_obtained_legally'] = 'some'

In [37]:
mjms_df.weapons_obtained_legally.unique()

array(['-', 'yes', 'no', 'tbd',
       'kelley passed federal criminal background checks; the us air force failed to provide information on his criminal history to the fbi',
       'unknown', 'some'], dtype=object)