# Data Visualization of US Mass Shootings between 1982 to 2022

1. About the dataset: 

    Since the Second Amendment to the Constitution of the United States guarantees the right of American citizens to bear arms, in the United States, citizens not only have the right to bear arms, but also can trade guns. But at the same time, the United States also has one of the highest rates of shootings in the world. The United States Congress passed an Act on Violent Crimes in 2003 (The Investigative Assistance for Violent Crimes Act of 2012), which stated: A mass killing is a case in which at least three people are killed in addition to the killers themselves. This dataset contains data on mass shootings in the United States over a 40-year period from 1982 to 2022. We believe that by analyzing these datasets, risk assessment, incident prevention, and incident response can be improved to reduce the incidence of tragedy.
    
2. Data source:
The data was downloaded from kaggle, the link of the data source is as follows:
https://www.kaggle.com/datasets/zusmani/us-mass-shootings-last-50-years

# Problems definition:

**First module:**
1. The overall trend of mass shooting incidents in America

2. Numbers of Total Victims of Mass Shooting case in America

3. Numbers of mass shooting incidents by states

**Second module:**
4. Where are the most common shooting locations in the most frequent shooting states?
5. Specific analysis of frequent shooting locations:

 5.1 Workplace

 5.2 School

 5.3 Religious

 5.4 Comparison and overview of three shooting-prone places


**Third module:**
6. Conclusion
 

# 0.  Download the dataset

In [None]:
from google.colab import drive
drivePath = '/content/drive' #please do not change
drive.mount(drivePath)

Mounted at /content/drive


In [None]:
# Install the library on your environment
!pip install wget

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting wget
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9676 sha256=a648dae53557876f0e97fc01b207f9e6446509af6ad14ac332d977d05575feee
  Stored in directory: /root/.cache/pip/wheels/04/5f/3e/46cc37c5d698415694d83f607f833f83f0149e49b3af9d0f38
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


In [None]:
# Import the library
import wget

# Setup URL and path variables
baseURL = 'https://raw.githubusercontent.com/HardCoreFatLady/US-Mass-Shootings/main/'       # github raw file 
doc = 'US Mass Shootings May 24 2022.csv'
fullURL = baseURL + doc

dataPath = drivePath + '/MyDrive/Colab Notebooks/data'

# Download the file
fileName = wget.download(fullURL, out = dataPath)

# Print the file name including the local path
print(fileName)

/content/drive/MyDrive/Colab Notebooks/data/US Mass Shootings May 24 2022 (1).csv


In [None]:
import pandas as pd
import numpy as np

#Load the CSV into a Pandas dataframe
data = pd.read_csv(fileName, encoding='UTF-8')
data.head()

Unnamed: 0,case,location,date,summary,fatalities,injured,total_victims,incident_location,age_of_shooter,prior_signs_mental_health_issues,mental_health_details,weapons_obtained_legally,where_obtained,weapon_type,weapon_details,race,gender,type,year
0,Uvalde elementary school massacre,"Uvalde, Texas",5/24/2022,DETAILS PENDING,15,-,-,School,18,-,-,-,-,-,-,-,M,Mass,2022
1,Buffalo supermarket massacre,"Buffalo, New York",5/14/2022,"Payton S. Gendron, 18, committed a racially mo...",10,3,13,workplace,18,yes,previous threats and a mental health evaluatio...,yes,-,semiautomatic rifle,Bushmaster XM-15 semiautomatic rifle,White,M,Mass,2022
2,Sacramento County church shooting,"Sacramento, California",2/28/2022,"""A man believed to be meeting his three childr...",4,0,4,Religious,-,-,-,-,-,-,-,-,M,Mass,2022
3,Oxford High School shooting,"Oxford, Michigan",11/30/2021,"Ethan Crumbley, a 15-year-old student at Oxfor...",4,7,11,School,15,-,-,-,-,semiautomatic handgun,Sig Sauer 9mm pistol,-,M,Mass,2021
4,San Jose VTA shooting,"San Jose, California",5/26/2021,"Samuel Cassidy, 57, a Valley Transportation Au...",9,0,9,Workplace,57,yes,"Perpetrator had a history of depression, angry...",-,-,semiautomatic handguns,-,-,M,Mass,2021


# 1. Perform data exploration and analysis on the dataset

- **Examining the attributes of the Data Frame (standard procedures)**

    df.shape ("dim" in R)

    df.columns (check the variables, like "names" in R) 

    df.index (check the index of the "rows") 

    df.info()
    
    df.describe() (descriptive statistics for numerical variables) 

In [None]:
data.shape

(128, 19)

In [None]:
a = data.columns
for i in range(0, len(a)):
  print('Name of Column',i+1,':',a[i])

Name of Column 1 : case
Name of Column 2 : location
Name of Column 3 : date
Name of Column 4 : summary
Name of Column 5 : fatalities
Name of Column 6 : injured
Name of Column 7 : total_victims
Name of Column 8 : incident_location
Name of Column 9 : age_of_shooter
Name of Column 10 : prior_signs_mental_health_issues
Name of Column 11 : mental_health_details
Name of Column 12 : weapons_obtained_legally
Name of Column 13 : where_obtained
Name of Column 14 : weapon_type
Name of Column 15 : weapon_details
Name of Column 16 : race
Name of Column 17 : gender
Name of Column 18 : type
Name of Column 19 : year


In [None]:
data.index

RangeIndex(start=0, stop=128, step=1)

In [None]:
data.isnull().sum()

case                                0
location                            0
date                                0
summary                             0
fatalities                          0
injured                             0
total_victims                       0
incident_location                   0
age_of_shooter                      0
prior_signs_mental_health_issues    0
mental_health_details               0
weapons_obtained_legally            0
where_obtained                      0
weapon_type                         0
weapon_details                      0
race                                0
gender                              0
type                                0
year                                0
dtype: int64

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128 entries, 0 to 127
Data columns (total 19 columns):
 #   Column                            Non-Null Count  Dtype 
---  ------                            --------------  ----- 
 0   case                              128 non-null    object
 1   location                          128 non-null    object
 2   date                              128 non-null    object
 3   summary                           128 non-null    object
 4   fatalities                        128 non-null    int64 
 5   injured                           128 non-null    object
 6   total_victims                     128 non-null    object
 7   incident_location                 128 non-null    object
 8   age_of_shooter                    128 non-null    object
 9   prior_signs_mental_health_issues  128 non-null    object
 10  mental_health_details             128 non-null    object
 11  weapons_obtained_legally          128 non-null    object
 12  where_obtained        

In [None]:
data.describe()

Unnamed: 0,fatalities,year
count,128.0,128.0
mean,8.039062,2009.171875
std,7.687194,10.603899
min,3.0,1982.0
25%,4.0,2000.75
50%,6.0,2013.0
75%,9.0,2018.0
max,58.0,2022.0
