<a href="https://colab.research.google.com/github/strivedi2/Gun-Violence-in-United-States/blob/master/Gun_Violence_v1_Team4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gun Violence in United States

---

Project Team 4 <br>
**Team Members**: Bharati Malik, Gaurav Hassija, Prachi Sharma, Shruti Trivedi, Vikita Nayak

## Introduction
This project is inspired by the [Vox article](https://www.vox.com/policy-and-politics/2017/10/2/16399418/us-gun-violence-statistics-maps-charts) on America's unique **Gun Violence** problem. The article provides 17 maps and charts that represent various aspects of Gun ownership in United States. <br> 


##**Background**

US has a lot of guns, and very loose or non-existent regulations relating to who may access firearms makes it unique in terms of gun violence among other developed nations.


##**Search for relevant data**

We were interested in exploring Gun ownership within each state and its relationship with suicide rates, mass shootings and officer involved shootings. After going through various sources of data, we were directed to [Gun Violence Archive](https://www.gunviolencearchive.org/). As the website states, it is an online archive of gun violence incidents collected from over 2,500  law enforcement, media, government and commercial sources daily in an effort to provide near-real time data about the results of gun violence. 

However, the data available for download on the website is limited by the number of rows and attributes that can  be exported as CSV. This led us to a larger and richer dataset on [Kaggle](https://www.kaggle.com/jameslko/gun-violence-data/downloads/gun-violence-data.zip/1). 

In [0]:
import pandas as pd
import numpy as np
import altair as alt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.max_columns',200)
pd.set_option('display.max_rows',200)
pd.set_option('display.max_colwidth', -1)

## Dataset 1: Gun Violence Dataset on Kaggle
We downloaded the gun violence dataset available on [Kaggle](https://www.kaggle.com/jameslko/gun-violence-data/downloads/gun-violence-data.zip/1). This dataset has been taken from the same website [Gun Violence Archive](https://www.gunviolencearchive.org) using web scraping with Python script (_We could not get access to the script_). 

The dataset has a record of more than **260,000 gun violence incidents** between **Jan 2013 - March 2018**. 

In addition to incident date, state, address and number of people injured and killed; the dataset also contains information about the **type of incident** (such as mass shootings, suicides, officer involved shootings etc.), **guns involved** and **participant information including age, gender** etc. which makes this a richer dataset when compared to reports available on  [Gun Violence Archive](https://www.gunviolencearchive.org).


### Preliminary Data Cleaning using MS Excel

In preliminary analysis of the dataset, following columns were deleted in the csv file before uploading on github as they were not relevant to our analysis.


*   Address, location description : We decided to retain State, City, Latitutde and longitude for each incident and  believe address is not pertinent to our analysis.

*   In addition URL columns directing to source of incident were also deleted.




Next, separate csv files were created for each year from 2013 - 2018 (to satisfy the 25MB file limit on github)and uploaded on [GitHub](https://github.com/strivedi2/Gun-Violence-in-United-States).  We read those files in the following lines of code.



In [0]:
guns_2013 = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/gun-violence-2013.csv')
guns_2014 = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/gun-violence-2014.csv')


In [0]:
guns_2015 = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/gun-violence-2015.csv')
guns_2016 = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/gun-violence-2016.csv')
guns_2017 = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/gun-violence-2017.csv')
guns_2018 = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/gun-violence-2018.csv')

We concatenated the above dataframes to create a single dataframe.

In [0]:
guns_df = guns_2013.append([guns_2014,guns_2015,guns_2016,guns_2017,guns_2018])

We then checked for total number of columns and rows and identify if there are any columns with very high number of null values.

In [0]:
guns_df.shape

(239677, 19)

In [0]:
# To check for null values in the dataset
guns_df.isna().sum()

incident_id                 0    
date                        0    
state                       0    
city_or_county              0    
n_killed                    0    
n_injured                   0    
congressional_district      11944
gun_stolen                  99498
gun_type                    99451
incident_characteristics    326  
latitude                    7923 
longitude                   7923 
n_guns_involved             99451
participant_age_group       42119
participant_gender          36362
participant_status          27626
participant_type            24863
state_house_district        38772
state_senate_district       32335
dtype: int64

### **Data Cleaning**

**Drop columns with high null values** <br>
We dropped columns where the null values were greater than 90,000 since we believe they will not be relevant to our analyses. <br>
We also dropped column participant_status since the same data is available in numeric form in n_killed and n_injured columns.

In [0]:
guns_df.drop(columns =['gun_stolen','gun_type','n_guns_involved','participant_status'], inplace = True)

At this stage, the dataframe has following columns

In [0]:
guns_df.columns

Index(['incident_id', 'date', 'state', 'city_or_county', 'n_killed',
       'n_injured', 'congressional_district', 'incident_characteristics',
       'latitude', 'longitude', 'participant_age_group', 'participant_gender',
       'participant_type', 'state_house_district', 'state_senate_district'],
      dtype='object')

In [0]:
guns_df.head(2)

Unnamed: 0,incident_id,date,state,city_or_county,n_killed,n_injured,congressional_district,incident_characteristics,latitude,longitude,participant_age_group,participant_gender,participant_type,state_house_district,state_senate_district
0,461105,1/1/13,Pennsylvania,Mckeesport,0,4,14.0,"Shot - Wounded/Injured||Mass Shooting (4+ victims injured or killed excluding the subject/suspect/perpetrator, one location)||Possession (gun(s) found during commission of other crimes)||Possession of gun by felon or prohibited person",40.3467,-79.8559,0::Adult 18+||1::Adult 18+||2::Adult 18+||3::Adult 18+||4::Adult 18+,0::Male||1::Male||3::Male||4::Female,0::Victim||1::Victim||2::Victim||3::Victim||4::Subject-Suspect,,
1,460726,1/1/13,California,Hawthorne,1,3,43.0,"Shot - Wounded/Injured||Shot - Dead (murder, accidental, suicide)||Mass Shooting (4+ victims injured or killed excluding the subject/suspect/perpetrator, one location)||Gang involvement",33.909,-118.333,0::Adult 18+||1::Adult 18+||2::Adult 18+||3::Adult 18+,0::Male,0::Victim||1::Victim||2::Victim||3::Victim||4::Subject-Suspect,62.0,35.0


We have created Date Fields from the date column to help us plot yearly, monthly or weekly gun violence trends.

In [0]:
# Creating Date fields 

guns_df['date'] = pd.to_datetime(guns_df['date'])
guns_df['year'] = guns_df['date'].dt.year
guns_df['month'] = guns_df['date'].dt.month
guns_df['monthday'] = guns_df['date'].dt.day
guns_df['weekday'] = guns_df['date'].dt.weekday

### **Understanding types of gun violence incidents**

We wanted to categorize each incident into - **Mass Shooting, Suicide and/ or Officer Involved Shooting.**

We checked for above keywords in the incident_characateristics column. Since these are not mutually exclusive incidents we created new columns to indicate 1 if it was a Mass Shooting and 0 if not. Same approach was followed for Suicide and/ or Officer Involved Shooting columns.

In [0]:
# New columns for Mass Shootings, Suicides, Officer Involved Shooting in original dataframe

guns_df['Mass_Shooting'] = guns_df['incident_characteristics'].str.contains("Mass Shooting") * 1.0

guns_df['Officer_Involved_Shooting'] = guns_df['incident_characteristics'].str.contains("Officer Involved Shooting") * 1.0

guns_df['Suicide'] = guns_df['incident_characteristics'].str.contains("Suicide") * 1.0


We also found a high number of incidents involving **Gun possession by felons and Accidental Shootings** and created new columns to record these characterstics. 

We believe these aspects are important for analysis across different states. States with stricter laws should ideally be not allowing guns to get into hands of felons. 

Also, we want to understand how many people are injured due to accidental shootings.

In [0]:
guns_df['Gun_Possession_felon'] = guns_df['incident_characteristics'].str.contains("Possession of gun by felon") * 1.0

guns_df['Accidental_Shootings'] = guns_df['incident_characteristics'].str.contains("Accidental Shooting") * 1.0

In [0]:
# Drop columns incident_characterstics
guns_df.drop(columns ='incident_characteristics', inplace = True)

In [0]:
guns_df.columns

Index(['incident_id', 'date', 'state', 'city_or_county', 'n_killed',
       'n_injured', 'congressional_district', 'latitude', 'longitude',
       'participant_age_group', 'participant_gender', 'participant_type',
       'state_house_district', 'state_senate_district', 'year', 'month',
       'monthday', 'weekday', 'Mass_Shooting', 'Officer_Involved_Shooting',
       'Suicide', 'Gun_Possession_felon', 'Accidental_Shootings'],
      dtype='object')

### Overall trend of gun violence incidents

In [0]:
# Plotting yearly trend of gun violence incidents in United States
alt.Chart(guns_df.groupby('year')['incident_id'].count().reset_index()).mark_line().encode(
    alt.X('year:O', title = 'Year'),
    alt.Y('incident_id', title = 'Total number of gun violence incidents')
).properties(
    title = 'Trend of gun violence incidents, Jan. 2013 - Mar. 2018',
    width=600).configure_axis(
    grid=False
)



**Number of mass shootings by state.**

In [0]:
alt.Chart(guns_df.groupby('state')['Mass_Shooting'].count().reset_index()).mark_bar().encode(
    alt.X('state', title = 'State', 
         sort=alt.EncodingSortField(
            field="Mass_Shooting",
            op="sum",
            order="descending"
        )),
    alt.Y('Mass_Shooting', title = 'Total number of mass shootings')
).properties(
    title='Number of mass shootings by state (2013-2018)',
    width=650,
    height=400
).configure_axis(
    grid=False
)

### **Limitations in the Dataset**

One of the challenges encountered in our analysis were caused by how the gun violence incidents were recorded in the dataset. Each entry in the dataset identifies an incident. Hence essential columns like Age, Participant type(suspect or victim), Gender were recorded in group. 



Participant Gender  | Participant Type
--- | ---
0::Male\|1::Male\|2::Male\|3::Female | 0::Victim\|1::Victim\|2::Victim\|3::Subject-Suspect


The challenge was to map these columns with each other to derive insights. For instance to know the Age of a particular participant like Suspect of the incident required valid mapping. Each record had different number of participants which made the extraction of the values even difficult.

For analysing the profile of Victims and Suspects we followed approach taken in this  [Kaggle Kernel](https://www.kaggle.com/shivamb/deep-exploration-of-gun-violence-in-us).

In [0]:
## Function for converting the values in form of key, value pair
def get_user_mapping(txt):
    if txt == "NA":
        return {}
    mapping = {}
    for d in txt.split("||"):
        try:
            key = d.split("::")[0]
            val = d.split("::")[1]
            if key not in mapping:
                mapping[key] = val
        except:
            pass

    return mapping

In [0]:
# Creating mapped columns for participants age group, type and gender
guns_df['participant_type'] = guns_df['participant_type'].fillna("NA")
guns_df['participant_type_map'] = guns_df['participant_type'].apply(lambda x : get_user_mapping(x))
guns_df['participant_age_group'] = guns_df['participant_age_group'].fillna("NA")
guns_df['participant_age_map'] = guns_df['participant_age_group'].apply(lambda x : get_user_mapping(x))
guns_df['participant_gender'] = guns_df['participant_gender'].fillna("NA")
guns_df['participant_gender_map'] = guns_df['participant_gender'].apply(lambda x : get_user_mapping(x))

### **Understanding Gun Violence Suspects Profile**

We wanted to analyze age and gender of suspects and victims of gun violence to identify any significant trends/insights.


For analysing the profile of Victims and Suspects we followed approach taken in this  [Kaggle Kernel](https://www.kaggle.com/shivamb/deep-exploration-of-gun-violence-in-us).

In [0]:
## Finding the Suspect Age Groups
suspect_age_groups = {}
for i, row in guns_df.iterrows():
    suspects = []
    for k,v in row['participant_type_map'].items():
        if "suspect" in v.lower():
            suspects.append(k)
    for suspect in suspects:
        if suspect in row['participant_age_map']:
            ag = row['participant_age_map'][suspect]
            if ag not in suspect_age_groups:
                suspect_age_groups[ag] = 0 
            else:
                suspect_age_groups[ag] += 1

In [0]:
# Plotting suspects age distribution
source = pd.DataFrame({
    'a': list(suspect_age_groups.keys()),
    'b': list(suspect_age_groups.values())
})
alt.Chart(source).mark_bar(size = 80).encode(
    alt.X('a:N', title='Age Group', 
         sort=alt.EncodingSortField(
            field="b",
            op="sum",
            order="ascending"
        )),
    alt.Y('b:Q', title = 'Number of Suspects',axis=alt.Axis(format='s')),
    tooltip = ("b")
).properties(
    title='Suspects: Age Distribution',
    width=400,
    height=400
).configure_axis(
    grid=False
)

From the above chart we see that most incidents are carried out by adults older than 18 years but we also see a significant number of incidents involving teens aged 12-17 years.

In [0]:
## Finding the Suspect's Gender
suspect_gender = {}
for i, row in guns_df.iterrows():
    suspects = []
    for k,v in row['participant_type_map'].items():
        if "suspect" in v.lower():
            suspects.append(k)
    for suspect in suspects:
        if suspect in row['participant_gender_map']:
            g = row['participant_gender_map'][suspect]
            if g not in suspect_gender:
                suspect_gender[g] = 0 
            else:
                suspect_gender[g] += 1

In [0]:
source = pd.DataFrame({
    'a': list(suspect_gender.keys()),
    'b': list(suspect_gender.values())
})

alt.Chart(source).mark_bar(size = 80).encode(
    alt.X('a:N', title='Gender'),
    alt.Y('b:Q', title = 'Number of Suspects',axis=alt.Axis(format='s')),
    tooltip = ("b")
).properties(
    title='Suspects: Gender Distribution',
    width=400,
    height=400
).configure_axis(
    grid=False
)


From the above chart we can see that most of the incidents are committed by Males.

### Analysis on Gun ownership in US by state
In the [article on Vox's website](https://www.vox.com/policy-and-politics/2017/10/2/16399418/us-gun-violence-statistics-maps-charts), chart 5 depicts the relationship between gun ownership and gun deaths across states in USA. The chart was created using 2013 data for gun ownership and gun deaths. 

Since we have more recent data (upto March 2018), we were interested in exploring this relationship with gun ownership data for a later time period. We found [2017 gun ownership by state](https://www.thoughtco.com/gun-owners-percentage-of-state-populations-3325153) as compiled by the website [HuntingMark.com](https://huntingmark.com/gun-ownership-stats/#_ftn1%20). The data on this website is taken from ATF (United States Department of Justice Bureau of Alcohol, Tobacco, Firearms and Explosives) report on [Firearm Commerce in the United States, 2017 statistics](https://www.atf.gov/resource-center/docs/undefined/firearms-commerce-united-states-annual-statistical-update-2017/download) **Exhibit 8** on **National Firearms Act Registered Weapons by State (April 2017)** which lists the number of registered guns by state.

It is important to note that the actual number of guns might be much more than the numbers depicted here since United States does not have a regulation in place for registration of all guns. But this is the most reliable, if any, data source available to us on gun ownership so we decided to use this for our analysis.



In [0]:
# importing gun ownership data by state
state_guns_owned = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/Statewise_Gun_Ownership2017.csv')

In [0]:
state_guns_owned.head(1)

Unnamed: 0,State,Any Other Weapon1,Destructive Device2,Machinegun3,Silencer4,Short Barreled Rifle5,Short Barreled Shotgun6,Total
0,Alabama,1203,78434,26307,48118,5285,2294,161641


In order to normalize gun ownership by state population we also decided to add state population data for 2017 from [here](https://www.enchantedlearning.com/usa/states/population.shtml). 

We will now load state population data and create a new dataframe with total guns and population columns for each state for 2017.  

In [0]:
# load state population dataset
state_pop = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/StatePopulation%202017.csv')
state_pop.head(1)

Unnamed: 0,State,Population (2017)
0,California,39536653


In [0]:
# merge gun ownership and state population dataframes
state_data = state_guns_owned.merge(state_pop, how ='inner', left_on='State', right_on='State')

# drop remaining gun type columns since we wont be using them for our analysis 
state_data.drop(columns=['Any Other Weapon1','Destructive Device2','Machinegun3','Silencer4','Short Barreled Rifle5','Short Barreled Shotgun6'],inplace=True)

In [0]:
state_data.head(2)

Unnamed: 0,State,Total,Population (2017)
0,Alabama,161641,4874747
1,Alaska,15824,739795


We will now merge the above dataframe with our original dataset to add these columns.

In [0]:
guns_df = guns_df.merge(state_data, how='left', left_on='state', right_on='State')
guns_df.drop(columns='State',inplace=True)

# rename the Total column to Total_guns
guns_df.rename(columns={'Total':'Total_guns'},inplace=True)

### Data dictionary of cleaned Dataset 1: Gun Violence Dataset on Kaggle
Column Name  | Description
--- | ---
incident_id | Incident ID
date	| Date of crime (Format: YYYY-MM-DD)
state	|State where crime was committed	
city_or_county	|City/ County of crime	
n_killed	|Number of people killed in the incident	
n_injured	|Number of people injured in the incident	
congressional_district |Congressional district id	
latitude	|Latitude of the location of the incident	
longitude	|Longitude of the location of the incident	
participant_age_group	|Age group of participant(s) (Child, Teen, Adult)	
participant_gender	|Gender of participant(s)	
participant_type	|Type of participant (Victim, Suspect)	
state_house_district	|Voting house district	
state_senate_district	|Territorial district from which a senator to a state legislature is elected
year | Year of incident
month | Month of incident
monthday | Day of incident
weekday | Day of the week of incident (Mon, Tue, etc.)
Mass_Shooting | Incident characteristic: 1 means Mass Shooting
Officer_Involved_Shooting | Incident characteristic: 1 means Officer is involved in the shooting
Suicide | Incident characteristic: 1 means Suicide
Gun_Possession_felon | Incident characteristic: 1 means Gun possed by a felon
Accidental_Shootings | Incident characteristic: 1 means Shooting occured by Accident
participant_type_map | Map Participant type
participant_age_map | Map Participant age
participant_gender_map | Map Participant gender
Total_guns|Total guns owned in each State in 2017
Population (2017)|Population of the State in 2017

## **Dateset 2: Gun ownership by country**

We also wanted to understand the relationship between gun ownership and gun violence incidents across countries.

[Here](https://docs.google.com/spreadsheets/d/1chqUZHuY6cXYrRYkuE0uwXisGaYvr7durZHJhpLGycs/edit#gid=0) is the dataset on Gun homicide and gun ownership listed by country. More information about the dataset can be found [here](https://www.theguardian.com/news/datablog/2012/jul/22/gun-homicides-ownership-world-list)

**About the dataset:**
The world’s crime figures are collected by the UNODC (United nations office on drugs and crimes) through its annual crime survey. It has a special section of data on firearm homicides - and provides detailed information by size of population and compared to other crimes. 

**Limitations of the dataset:** <br>
1. Some key nations are missing from the data, including Russia, China and Afghanistan. But it does include the US, UK and many other developed nations. <br>
2. Also, this dataset is from 2012. We know that we will not be able to do comparison for the same time period as gun ownership dataset for United State but we are still interested in knowing if there exists any trend.



In [0]:
# We saved the data from [google sheet](https://docs.google.com/spreadsheets/d/1chqUZHuY6cXYrRYkuE0uwXisGaYvr7durZHJhpLGycs/edit#gid=0) to a csv file and uploaded it on github
# importing gun ownership data by country
gun_ownership = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/World_firearms.csv')

In [0]:
gun_ownership.head(2)

Unnamed: 0,Country/Territory,ISO code,Source,% of homicides by firearm,Number of homicides by firearm,"Homicide by firearm rate per 100,000 pop",Rank by rate of ownership,Average firearms per 100 people,Average total all civilian firearms
0,Albania,AL,CTS,65.9,56.0,1.76,70.0,8.6,270000.0
1,Algeria,DZ,CTS,4.8,20.0,0.06,78.0,7.6,1900000.0


In [0]:
# Removed ISO code column as we already have country information we don't need the country code for analysis
# Removed Source column as it is not relevant for our analysis
# Removed Rank by rate of ownership as rank will not be relevant for analysis
gun_ownership = gun_ownership[['Country/Territory','% of homicides by firearm',\
                               'Number of homicides by firearm','Homicide by firearm rate per 100,000 pop',\
                               'Average firearms per 100 people','Average total all civilian firearms']]

In [0]:
# Renamed few columns
gun_ownership.columns = ['Country', 'Percentage of Homicides by firearm','Number of homicide by firearm',\
                        'Homicide by firearm rate per 100,000 pop','Average firearms per 100 people','Average total all civilian firearms']

In [0]:
gun_ownership.isna().sum()

Country                                     0 
Percentage of Homicides by firearm          69
Number of homicide by firearm               69
Homicide by firearm rate per 100,000 pop    69
Average firearms per 100 people             9 
Average total all civilian firearms         9 
dtype: int64

In [0]:
# Filled the na values with 0
gun_ownership = gun_ownership.fillna(0)

In [0]:
gun_ownership.isna().sum()

Country                                     0
Percentage of Homicides by firearm          0
Number of homicide by firearm               0
Homicide by firearm rate per 100,000 pop    0
Average firearms per 100 people             0
Average total all civilian firearms         0
dtype: int64

In [0]:
# Creating a dataframe of top 20 countries with the highest average firearms per 100 people by Country
top20 = gun_ownership[['Country','Average firearms per 100 people']].sort_values('Average firearms per 100 people', ascending = False).nlargest(20, 'Average firearms per 100 people')

In [0]:
alt.Chart(top20).mark_bar().encode(
    alt.X('Country', title = 'Country', 
         sort=alt.EncodingSortField(
            field="Average firearms per 100 people",
            #op="sum",
            order="descending"
        )),
    alt.Y('Average firearms per 100 people', title = 'Average firearms per 100 people')
).properties(
    title='Average firearms per 100 people for top 20 countries',
    width=650,
    height=400
).configure_axis(
    grid=False
)

From the above chart we can see that United states is the highest in terms of average firearms per 100 people.

### Data dictionary of Dataset 2: Gun ownership by Country
Column Name  | Description
--- | ---
Country | Country Name
Percentage of Homicides by firearm | Percentage of Homicides by firearm
Number of homicide by firearm | Number of homicides by firearm
Homicide by firearm rate per 100,000 pop | Homicides by firearm rate per 100,000 population
Average firearms per 100 people | Average firearms per 100 people
Average total all civilian firearms | Average total of civilians who owned firearms

## Road map

We want to explore the following trends/relationships and also replicate/improve the existing charts on the [Vox article](https://www.vox.com/policy-and-politics/2017/10/2/16399418/us-gun-violence-statistics-maps-charts)

1. **Relationship between gun ownership and gun violence:** We would like to understand if there exists any relationship between gun ownership and gun violence incidents across US states, and also see if similar relationships can be found with the data available for countries. 
2. **Gun violence incident characteristics:** We would like to understand the trend of Mass shootings, Suicides and Officers involved shootings across US. 
3. **Understanding suspect profiles with respect to gun violence incidents:** We would like to explore gun violence incidents and their characterstics with respect to suspect age - teens (age 12- 17) and adults ( aged 18 years and olders)

We are also interested in exploring gun ownership and gun violence incidents with respect to **gun laws** across states in US in later versions.

# Team Project First Version 
In this version we start with exploring the relationships mentioned under Roadmap and create visualizations for interesting, non-trivial insights. <br>

To create the visualizations, we exported cleaned datasets to csv files in our local machines. This was achieved by downloading and running this notebook on our individual systems. The csv files were then used as data sources to create visualizations on Tableau. 


## Insight 1: Relationship between Gun Ownership and Gun violence related deaths across states in US, 2017

We want to understand if there is any relationship betwen gun ownership and number of gun deaths that occur in states across the US. Chart 5 on  [Vox website](https://www.vox.com/policy-and-politics/2017/10/2/16399418/us-gun-violence-statistics-maps-charts), originally created by Mother Jones, shows that there is a direct relationship between gun ownership and gun violence using data for 2013. 
<br>
For our analysis, we are trying to recreate this chart using data from 2017.
<br>

In Tableau we loaded the cleaned csv file, and created a crosstab to aggregate number of people killed in gun violence incidents by state in 2017. We also added Total Guns and Population columns (*these columns already contain 2017 data*) to this table view and exported the data as a csv file. 

We then loaded this csv as a new data source in our Tableau file and created two new calculated fields for normalization:
1. **Gun Deaths per 100,000 people**: (N Killed / Avg. Population (2017)) * 100,000
2. **Guns per 100 people**: (Avg. Total guns / Avg. Population (2017)) * 100

<br>
The cleaned dataset looks as follows:

In [0]:
# read state data for 2017
state = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/State%20data.csv')

In [0]:
state.rename(columns={"N Killed":'Gun_Deaths',
                     'Avg. Population (2017)': 'Population_2017',
                     'Avg. Total guns':'TotalGuns',
                     'Guns per 100 people': 'GunsPer100People'}, inplace=True)
state.head(2)

Unnamed: 0,State,Gun_Deaths,Population_2017,TotalGuns,GunDeathsPer100K,GunsPer100People
0,Alabama,544,4874747,161641,11.159554,3.315885
1,Alaska,69,739795,15824,9.326908,2.138971


The scatter plot between Gun Deaths per 100,000 people and Guns owned per 100 people initially does not reveal a significant trend. 

The chart for this trend can be seen [here](https://public.tableau.com/views/GunOwnershipDeaths_chart1/GunDeathsandGunOwnershipUS2017?:embed=y&:display_count=yes&publish=yes&:origin=viz_share_link):
  



In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/Gundeaths_gunownership.PNG" style="width=100px;"/>'))

 In the chart, Wyoming appears to be an outlier with ~23 guns per 100 people and 1.5 deaths per 100,000 people. Due to this outlier, the trend represented in the Mother Jones chart cannot be seen.
<br>  
  
 Once the outlier is removed, we could see a **_significant trend between gun deaths and rate of gun ownership_**. Below is the link to the scatter plot  created after removing the outlier state of Wyoming. 
 <insert Tableau public link> <br>
In the chart one can see that with increase in gun ownership, number of deaths due to gun violence incidents increases. P-value of the trend line is 0.00 which indicates this is a significant trend.
 
 <br> 
 For the statistically inclined audience, below is a description of the trend line from the  chart: 
  
**P-value:** 0.0017479 <br> 
**Equation:**
Gun Deaths per 100,000 people = 1.11325*Guns per 100 people + 2.88569


### Mass Shooting and Suicide related incidents with gun ownership

Thus, we were able to recreate the same relationship as depicted in Chart 5. We then wanted to explore the same trend with Mass shooting incidents and suicides.

To achieve this, we added "Mass Shooting" and "Suicide" columns to the original crosstab to group the number of incidents by state. We then exported the data and added it back as a new data source. We created the following normalized columns:

1. Mass shootings per 100,000 people
2. Suicides per 100,000 people

We then created two scatter plots in the same chart using these columns and comparing them with Guns owned per 100 people.

Here is a link to the chart:

From these charts we observe that mass shootings show the same relationship as gun deaths - **_a significant increasing trend with increase in gun ownership_**. 
However, **_suicides do not follow the same trend_**. There is no significant relationship between the rate of gun ownership and the number of suicides committed through guns. 

We will now calculate number of deaths occuring due to mass shootings, suicides and accidental shootings across states and compare the data.



In [0]:
guns_df.head(1)

#rename state column to State to ease merging columns with state dataframe
guns_df.rename(columns={'state':'State'}, inplace=True)

In [0]:
# calculate deaths due to Mass Shootings, Suicide and Accidental Shootings in 2017 

MS_deaths = guns_df.loc[(guns_df.Mass_Shooting == 1) & (guns_df.year == 2017)].groupby('State')['n_killed'].sum().reset_index().rename(columns={'n_killed':'MassShootingDeaths'})
Suicide_deaths = guns_df.loc[(guns_df.Suicide == 1) & (guns_df.year == 2017)].groupby('State')['n_killed'].sum().reset_index().rename(columns={'n_killed':'SuicideDeaths'})
Accidental_deaths = guns_df.loc[(guns_df.Accidental_Shootings == 1) & (guns_df.year == 2017)].groupby('State')['n_killed'].sum().reset_index().rename(columns={'n_killed':'AccidentalDeaths'})

In [0]:
# add the above dataframes as new columns in 'state' dataframe 

state = state.merge(MS_deaths, how='left', on='State')
state = state.merge(Suicide_deaths, how='left', on='State')
state = state.merge(Accidental_deaths, how='left', on='State')

In [0]:
# fill null values with 0 for states with no deaths in any of the columns
state.fillna(0,inplace=True)

# perform check for null values
state.isna().sum()

State                 0
Gun_Deaths            0
Population_2017       0
TotalGuns             0
GunDeathsPer100K      0
GunsPer100People      0
MassShootingDeaths    0
SuicideDeaths         0
AccidentalDeaths      0
dtype: int64

In the next step we create bins for the columns 'GunsPer100People' to help group states by rate of gun ownership.

In [0]:
# create bins for rate of gun ownership
state['Gunsowned_range'] = pd.cut(state['GunsPer100People'], bins=[0,1,1.5,2,3,5,7,23])

state.groupby('Gunsowned_range')['State'].count()

Gunsowned_range
(0.0, 1.0]     10
(1.0, 1.5]     12
(1.5, 2.0]     11
(2.0, 3.0]     12
(3.0, 5.0]     4 
(5.0, 7.0]     1 
(7.0, 23.0]    1 
Name: State, dtype: int64

As seen from above distribution, 23 states have gun ownership rates in the range of 1-2. In order to prevent any bias from being introduced in our analysis due to a large number of observations in one group, we decided to split the bin into 1 - 1.5 and 1.5 - 2.

We are interested to see if there is any change due to the slight increase in gun ownership.

We then extract this data into our local system and load it as a data source on Tableau. We create normalized fields for deaths due to mass shootings, accidental shootings and suicides per 100,000 people.

We then create a chart in Tableau to identify if there is any change in the number of deaths with increase in gun ownership rate across states.

Here is a link to the [chart:](https://public.tableau.com/views/GunViolence_2017/GunownershipandresultingdeathsUS2017?:embed=y&:display_count=yes&publish=yes&:origin=viz_share_link )



In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/gundeaths_massshootings.PNG" style="width=100px;"/>'))


From the chart one can see that with increase in gun ownership, there is an increased rate in suicide and accidental deaths. 
The chart has been filtered to contain gun ownership between 0 - 3 per 100 people since those bins have an equal number of states. The remaining bins with upto 3 - 23 guns per 100 people have been filtered out due to lack of adequate data.

Focussing our analysis on gun ownership between 0-3 guns per 100 people, we see that access and no access to a gun i.e. going from between 0 to 1 and t to 1.5 guns increases the likelihood of death by suicide by 170% and also increases chances of death by 1114%. So if you are a gun enthusiast, those are some interesting odds to consider before deciding to own a gun.

**Making of the chart in Tableau**<br>
Some key steps involved while making the chart in Tableau include:
1. Renaming the bin labels from [0.0 , 1.0] to 0 to 1. Here it is important to understand that ownerhsip rates of 1% are not considered in this bin and are part of the second bin.
2. Adding annotations for gun ownership at 1% which show the most significant increase in suicide and accidental deaths.

Here we understand that the 1 to 1.5 and 1.5 to 2 bins can be confusing for the audience and that is something we will consider while working on the second version of these charts. Right now since this is a common dataset, we have decided to stick to same categorization. 

But even to the normal eye, it is clearly visible that even a 0.5 increase in gun ownership increases likelihood of death by suicide.

### Summary: Relationship between gun ownership, gun violence incidents and resulting deaths
- To summarize the finding, we see that increase in gun ownership results in increase in deaths due to gun violence incidents and also increase in number of mass shootings.
- While incidents of suicide do not increase with increasing gun ownership, but having access to a gun does increase likelihood of intentional (suicide) death by ~170% and non-intentional (accidental) death by 1000%.




## Insight 2: Relationship between State Gun Laws and Gun Incidents in US

To take our analysis on the states a step further, we then tried to understand the role of policies on Gun related Incidents. For this we took State Firearm Law Database compiled by RAND Corporation as part of the Gun Policy in America initiative. RAND developed this longitudinal data set of state firearm laws that is free to the public, including other researchers, to support improved analysis and understanding of the effects of various laws. The Database contains information on state and District of Columbia firearm laws in the United States from 1979 to 2016. The database does not capture all firearm-related laws, only those of specific types. 

Here is a link to the [site](https://www.rand.org/pubs/tools/TL283.html )

The Dataset looks like this - 

In [0]:
# read RAND state Law data

rand_state_df = pd.read_csv('https://raw.githubusercontent.com/Psharma2193/Individual-Project---First-Version/master/StateFirearmLaw.csv')

In [0]:
rand_state_df.head(1)

Unnamed: 0,Law ID,State,State Postal Abbreviation,Type of Law,Effect,Type of Change,Effective Date,Effective Date Note,Effective Date Month,Effective Date Day,Effective Date Year,Statutory Citation,Content,Controlling Law at Beginning of Period (1979),Age for Minimum Age Laws,"Length of Waiting Period (days, handguns)",Additional Context and Notes,Caveats and Ambiguities,Confirmation Code
0,AL1001,Alabama,AL,Background Checks for private sales - handguns and long guns,,,,,,,,,No law requiring background checks for private sales of handguns or long guns,1.0,,,,,


The dataset is rich in providing key insights into what type of law is implemented in each state, from when it has been effective etc. For our analysis our focus has been on innate Effect of the law i.e. whether it is Restrictive or Permissive in its nature.

Eg - Dealer licensing of handguns is Restrictive in Alabama whereas carrying a concealed weapon (CCW) is Permissive.

In [0]:
rand_state_df['Effect'].unique()

array([nan, 'Permissive', 'Restrictive'], dtype=object)

We extracted the Total number of Restrictive and Total number of Permissive for each state in a separate file to have aggregated values for these. 

In [0]:
# read state Gun Law data

stringency_df = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/state_stringency.csv')

In [0]:
stringency_df.head()

Unnamed: 0,State,Permissive,Restrictive
0,Alabama,5,5
1,Alaska,4,3
2,Arizona,4,5
3,Arkansas,2,3
4,California,2,27


This dataset was merged with the state dataset to analyze Gun incident in light of governing gun laws in each state.

In [0]:
#add the above dataframes as new columns in 'state' dataframe 

law_incident_df = state.merge(stringency_df, how='left', on='State')

In [0]:
law_incident_df.head(1)

Unnamed: 0,State,Gun_Deaths,Population_2017,TotalGuns,GunDeathsPer100K,GunsPer100People,MassShootingDeaths,SuicideDeaths,AccidentalDeaths,Gunsowned_range,Permissive,Restrictive
0,Alabama,544,4874747,161641,11.159554,3.315885,6.0,63.0,14,"(3.0, 5.0]",5,5


As a priliminary analysis we roughly plotted the figures to get an idea of presence a trend or pattern if any.


In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/Gun_Laws_Incidents_PriAnalysis.PNG" style="width=100px;"/>'))

From above graph we felt that instead of communicating two different parameters in form of Restrictive and Permissive, its more appropriate to have one derived field that holds the essence of both. Hence, we created a calculated field called Stringency Score to depict how stringent or strict the laws are in any given state. 
<br>
<br>
A positive Stringency Score infers that the state has more Restrictive laws in comparison to Permissive laws which implies that the State authorities hold a very strong opinion in controlling the Gun incidents through policy changes. On the other hand, a negative Stringency Score infers that state is lenient towards Guns incidents.


In [0]:
law_incident_df['Stringency_score'] = law_incident_df['Restrictive'] - law_incident_df['Permissive']

In [0]:
law_incident_df.head(1)

Unnamed: 0,State,Gun_Deaths,Population_2017,TotalGuns,GunDeathsPer100K,GunsPer100People,MassShootingDeaths,SuicideDeaths,AccidentalDeaths,Gunsowned_range,Permissive,Restrictive,Stringency_score
0,Alabama,544,4874747,161641,11.159554,3.315885,6.0,63.0,14,"(3.0, 5.0]",5,5,0


This dataset was then used to create the Visualizations in Tableau for this analysis.

### State's Gun Deaths & Stringency Score Analysis

**Preliminary Analysis**-  To identify if there is any underlying relationship between the Gun deaths and restrictiveness of the policies in state we started with a correlation plot. There was no apparent relationship visible from the chart. We could see the oulier in the data like California that has the highest number of restrictive laws  and District of Columbia that has the highest number of Gun deaths per 100K.

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/GunDeaths_StringencyScore_PriAnalysis.PNG" style="width=50px;"/>'))

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/GunDeathsVsGunLaws_V1.PNG" style="width=50px;"/>'))

The view is sorted by Gun Death sum and filtered on Stringency Score. 
<br>
**Key Insight** - Southern states along the Mississippi River like Louisiana, Mississippi, Alabama, Tennessee and Arkansas currently are in top 10 for firearm deaths. One legislative similarity that  these states share is that none of them require license, registration or permit to buy a gun, though there are dozens of other states with the same regulations. Even after staggering Gun deaths being reported in the states, their approach to Gun policies is very lenient. 


### State's Mass Shooting & Stringency Score Analysis

**Preliminary Analysis**-  Similar to Gun deaths scenario, there was no apparent relationship visible from the correlation chart. We could see the outlier in the data like California that has the highest number of restrictive laws  and Texas that has the highest number of Mass shooting.

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/Shooting_StringencyScore_PriAnalysis.PNG" style="width=100px;"/>'))

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/MassShootsVsGunLaws_V1.PNG" style="width=100px;"/>'))

The view is sorted on the basis of Mass Shooting Deaths sum and filtered on Stringency Score. 
<br>
**Key Insight**- Looking at the top states with highest Mass shooting we can say that even with Restrictive gun policies as stringent as California's, the mass shooting have occurred in theses states . Thus it can't be said that more Restrictive policies can be the solution for dealing with Mass shootings.

**The visualization can be viewed here**: [chart](https://public.tableau.com/profile/prachisharma#!/vizhome/StateGunLawsIncidentsAnalysis/AnalyzingGunLawsandGunIncidentsrelationship?publish=yes)


**Limitation in the analysis**-

- We found that there are a lot of records that have Null value in Effects column. For some rows, a certain law we observed the Effect type was restrictive and at other places it was found to be Null. This may have affected the analysis. 

- It's hard to classify the seriousness of each law that is defined Restrictive or Permissive by a state authority. The Stringency score field only gives us insight into the weightage given to policies but it is subject to what state authorities deem as restrictive and permissive and this could be different for each state.


**Summary: Relationship between Gun related Incidents and Gun Laws in states of US**

The issue of gun violence is extremely complicated and varied across state laws and local culture. Many states support that strict firearm regulation is a necessary measure in preventing and reducing gun deaths; however, many other states do not take this stance. 

From above analysis we could see that states with strict gun policies were major victims of Mass shootings as well as states with lenient policies reported highest number of Gun deaths. This implies that restrictive laws are not necessarily impacting gun violence as many expect them to.


Though facts may vary for gun violence, they don’t necessarily point toward a catch-all solution of controlling them through policies. Until the solution is found, we can only contribute to this cause by staying knowledgeable about the issues and by practicing responsible gun ownership.

##Insight 3: Relationship between population and gun ownership across US and other developed countries

During our exploratory phase, we identified that compared to other developed countires, US was the highest in terms of average firearms per 100 people. 

**Preliminary Analysis** - To identify  whether highly populated developed country has higher rate of gun violence. 

We then identified a population and growth dataset for the year 2019 [here](http://worldpopulationreview.com/countries/developed-countries/). 

As we had gun ownership data for the year 2012, we used the growth rate mentioned in [this](http://worldpopulationreview.com/countries/developed-countries/) dataset to calculate the population for the year 2012. The assumption made was that the growth rate was constant for the last 10 years. This calculation was performed in the local excel file. [Here](https://github.com/strivedi2/Gun-Violence-in-United-States/blob/master/population.csv) is the csv file with calculated population of the year 2012 which will be further used for our analysis.

In [0]:
# Reading the csv file
pop = pd.read_csv('https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/population.csv')

In [0]:
pop.head(3)

Unnamed: 0,cca2,name,area,pop2019,GrowthRate,pop2012
0,IS,Iceland,103000,340.566,1.008248,321.536136
1,LU,Luxembourg,2586,596.992,1.011301,551.831596
2,CY,Cyprus,9251,1198.427,1.007856,1134.542567


In [0]:
# Checking for any na values
pop.isna().sum()

cca2          0
name          0
area          0
pop2019       0
GrowthRate    0
pop2012       0
dtype: int64

In [0]:
pop.rename(columns={'name':'Country'},inplace=True)

In [0]:
# Removing columns that will not be used for any analysis
pop.drop(columns = ['cca2', 'area','pop2019'], inplace = True)

In [0]:
# Merging the existing dataset with population dataset
gun_ownership_pop = gun_ownership.merge(pop, how = "inner", on="Country")

In [0]:
gun_ownership_pop.head(2)

Unnamed: 0,Country,Percentage of Homicides by firearm,Number of homicide by firearm,"Homicide by firearm rate per 100,000 pop",Average firearms per 100 people,Average total all civilian firearms,GrowthRate,pop2012
0,Australia,11.5,30.0,0.14,15.0,3050000.0,1.012772,22955.96258
1,Austria,29.5,18.0,0.22,30.4,2500000.0,1.001643,8666.028082


In [0]:
# To create a local copy of the merged file
# gun_ownership_pop.to_csv("gun_ownership_pop.csv")

The merged file was then copied on the local machine for developing visualizations in Tableau. The same has been uploaded [here](https://github.com/strivedi2/Gun-Violence-in-United-States/blob/master/gun_ownership_pop.csv)

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/vox.JPG" style="width=100px;"/>'))

As per chart 6 in [vox](https://www.vox.com/policy-and-politics/2017/10/2/16399418/us-gun-violence-statistics-maps-charts) article, we know that Developed countries with more guns also have more gun deaths. 

The similar trend was seen for countries in our merged dataset as shown in the chart above.



In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/go1.JPG" style="width=100px;"/>'))

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/po1.JPG" style="width=100px;"/>'))

We can see from the above 2 charts that US is ranked 1 for both gun violence and population compared to all other developed countries. Quick table calculation (Percent of Total was also performed in order to make normalized comparisons).

We then created a set in Tableau consisting of all other developed countries except US and in order to compare US gun violence and population with other developed countries.

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/comparison.JPG" style="width=100px;"/>'))

**Key Insight**- High population may not mean higher rate of gun violence incidents.

**Summary: Relationship between population and gun ownership in US and other developed countries**

We see that developed countries that are highly populated do not have higher rate of gun violence. (28% aprrox. cumulative) <br>
Even though, United States' population is 33% (approx.) compared to the cumulative population of other developed countries, its gun ownership rate is significantly higher, leading to high gun violence incidents.



The above visualizations can be found on Tableau public [link](https://public.tableau.com/profile/bharati.malik#!/vizhome/gun_ownership_pop/Story1?publish=yes).

##**Insight 4: Trend analysis of gun violence incidents, suicides, mass shootings and accidental shootings (2013 - 2018)**

In this analysis, we want to explore whether there is any trend by months & weekdays in following areas:

1. Number of incidents 
2. Number of Suicides
3. Number of Mass Shootings
4. Number of Accidental Shootings

Also, we want to find any relation between any of the above metioned trend

**1. Most Dangerous Months & Days of gun violencde incidents**

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/Most_Dangerous_Months_days.PNG"/>'))

Above graph is the heat map created to show the trend of gun violence incidents by month & days in the US from 2013 to 2018.

**Tableau link for the chart:**[ here](https://public.tableau.com/profile/vikita2152#!/vizhome/MostMassShootingsMonthsDays/MostDangerousDays)

**Insight:** From the analysis, it seems start of the year (winter season) is the most dangerous in terms of gun violence incidents. In terms of day for other months,  Saturdays & Sundays are the days when most of the gun violence incidents are commited. 


**Making of the chart  in Tableau**


1.   The Date field is used as column and further used only Month from the Date field
2.   The Date field is used as row and further used only Weekday from the Date field
3.   The sum of number of gun violence incidents has been used in the mark card as a color to show the intensity of number of gun violence incidents 
4.  The dark red color indicates highest number of gun violence incidents and the light red color indicates less number of gun violence incidents.
5.  Moreover, the heat map also consists of the number representing the exact number of gun violence incident by month and days
6.  The legends shows the intensity of the  number of gun violence incidents by color and number range as well (low to high)





**2. Most Suicidal Months & Days using guns**

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/Most_suicidal_months_days.PNG"/>'))

Above graph is the heat map created to show the trend of suicides using guns by month & days in the US from 2013 to 2018.

**Tableau link for the chart:** [here](https://public.tableau.com/profile/vikita2152#!/vizhome/MostMassShootingsMonthsDays/MostSuicidalMonthsDays)


**Insight** Generally, we belive that the start of the year is fresh and people start it with great enthusiasm. However, this is not the case here. Surprisingly, from the analysis, people of the US commit more suicides using guns at the start of the year (Jan - March). For suicides, people do not often commit it over weekends but during weekdays


**Making of the chart in Tableau**



1.   The Date field is used as column and further used only Month from the Date field
2.   The Date field is used as row and further used only Weekday from the Date field
3.   The sum of number of sicides has been used in the mark card as a color to show the intensity of number of suicides using guns 
4.  The dark red color indicates highest number of suicides using guns and the light red color indicates less number ofsuicides using guns
5.  Moreover, the heat map also consists of the labels representing the exact number of suicides using guns by months and days
6.  The legends shows the intensity of the  number of suicides using guns by color and number range as well (low to high)


**3. Most  Mass Shootings Months & Days**

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/Most_massshootings_months_days.PNG"/>'))

Above graph is the heat map created to show the trend of mass shootings using by month & days in the US from 2013 to 2018.

**Tableau link for the chart:**[here](https://public.tableau.com/profile/vikita2152#!/vizhome/MostMassShootingsMonthsDays/MostMassShootingsMonthsDays)


**Insight**  Interestingly, the mass shootings does not occur during the start of the year (winter season). It occurs during the months of summer (June,July & August). This may be due to more people going for vacations at different places around the US and this can instigate more mass shootings in the month of summer. Also, maximum shootings occured during weekends similar to the gun violence inicidents in the US.


**Making of the chart in Tableau**



1.   The Date field is used as column and further used only Month from the Date field
2.   The Date field is used as row and further used only Weekday from the Date field
3.   The sum of number of mass shootings has been used in the mark card as a color to show the intensity of number of mas shootings 
4. The dark red color indicates highest number of mass shootings and the light red color indicates less number of mass shootings
5. Moreover, the heat map also consists of the labels representing the exact number of mass shootings by months and days
6. The legends shows the intensity of the number of mass shootings using guns by color and number range as well (low to high)


**4.Most Accidental Shootings Months & Days**

In [0]:
from IPython.display import Image
from IPython.display import Image,display
from IPython.core.display import display, HTML
display(HTML('<img src="https://raw.githubusercontent.com/strivedi2/Gun-Violence-in-United-States/master/Most_accidenta_shootings_months_days.PNG"/>'))

Above graph is the heat map created to show the trend of accidental shootings using by month & days in the US from 2013 to 2018.

**Insight**  Generally, accidental shooting trend  would be scattered as it is accidental. However, interestingly, the accidental shootings also occurs maximum during the weekends and in the winter season (Jan - March) with November Sundays as an outlier. 

**Tableau link for the chart:**[here](https://public.tableau.com/profile/vikita2152#!/vizhome/MostMassShootingsMonthsDays/MostAccidentalShootingsMonthsDays)

**Making of the chart in Tableau**



1.   The Date field is used as column and further used only Month from the Date field
2.   The Date field is used as row and further used only Weekday from the Date field
3.   The sum of number of accidental shootings has been used in the mark card as a color to show the intensity of number of accidental shootings
4.  The dark red color indicates highest number of accidental shootings and the light red color indicates less number of accidental shootings
5. Moreover, the heat map also consists of the labels representing the exact number of accidental shootings by months and days
6. The legends shows the intensity of the number of accidental shootings using guns by color and number range as well (low to high)


**Summary of the trend anaysis:**

With this trend analysis. we found below insights:
1. The gun violence incidents are higher in the winter season (Jan - March) and during weekends in general
2. The suicides with guns incidents are also higher in the winter seasom (Jan - March). However, they are committed more during the weekdays and not weekends unlike gun violence incidents. 
3. The mass shootings does not occur more during winter season but they occurs during summer and in general during weekends
4. The accidental shootings where one can not expect any trend as such interestingly occurs during winter season similar to the gun violence with November Sundays as an outlier


