> This project seek to find Sates with most Firearms Mortality in the United States given the avialbe dataset.The data contains deaths recorded in **2005, 2014, 2015, 2016, 2017 & 2018**. The RATE is the number of deaths per 100,000 total population.
  <a id="top_page"><a/> 

## Data Source:https://www.cdc.gov/nchs/pressroom/sosmap/firearm_mortality/firearm.htm

In [0]:
# Importing Libraries.
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [103]:
# Loading our data.
Df=pd.read_csv("Firearm Mortality by State.csv")
Df

Unnamed: 0,YEAR,STATE,RATE,DEATHS,URL
0,2018.0,AL,21.8,1064.0,/nchs/pressroom/states/alabama/alabama.htm
1,2018.0,AK,21.0,155.0,/nchs/pressroom/states/alaska/alaska.htm
2,2018.0,AZ,15.3,1147.0,/nchs/pressroom/states/arizona/arizona.htm
3,2018.0,AR,18.9,573.0,/nchs/pressroom/states/arkansas/arkansas.htm
4,2018.0,CA,7.5,3040.0,/nchs/pressroom/states/california/california.htm
...,...,...,...,...,...
296,2005.0,WA,8.8,567.0,/nchs/pressroom/states/washington/washington.htm
297,2005.0,WV,13.8,261.0,/nchs/pressroom/states/westvirginia/westvirgin...
298,2005.0,WI,8.5,474.0,/nchs/pressroom/states/wisconsin/wisconsin.htm
299,2005.0,WY,13.4,71.0,/nchs/pressroom/states/wyoming/wyoming.htm


In [104]:
# Investigating the columns in our dataset
Df.columns

Index(['YEAR', 'STATE', 'RATE', 'DEATHS', 'URL'], dtype='object')

In [0]:
# Fitering nonuseful columns
Df=Df.drop(columns=["URL"])

In [106]:
Df.dtypes

YEAR      float64
STATE      object
RATE      float64
DEATHS    float64
dtype: object

In [0]:
Df["YEAR"]=Df["YEAR"].astype("str")
Df.describe (include="all")

In [0]:
# Selecting 50 random sample from our population data
Df.sample(50)

In [108]:
# Calculate sample mean for 50 random DEATHS
sample_size=50
Df_sample=Df.sample(sample_size)
sample_mean=Df_sample["DEATHS"].mean()
sample_mean

803.86

In [109]:
# calculating sample standard deviation
sample_std= Df_sample["DEATHS"].std()
sample_std

842.904212002909

In [110]:
# calculate the sample standard deviation using for loop.

deaths_list = list(Df_sample["DEATHS"])
degree_of_freedom = sample_size - 1 # N=(n-1) for sample data
variance = 0
for deaths in deaths_list:
    variance += (deaths - sample_mean)**2
sample_std = math.sqrt(variance/degree_of_freedom) # df=1
round(sample_std, 2)

842.9

### *Confidence Intervals*
#### We will use the following chracteristics of a normal distribution:

#### 68% of values are within 1 standard deviation of the mean, 𝜇±𝜎
#### 95% of values are within 2 standard deviations of the mean, 𝜇±2𝜎
#### 99.7% of values are within 3 standard deviations of the mean, 𝜇±3𝜎 

In [111]:
# Calcualting standard deviation error.
std_err= sample_std/math.sqrt(sample_size) # sample_size(n=50)
std_err

119.20465683959205

In [112]:
# Calculate 68% Confidence Interval (CI) - one standard error from the population mean
# 68% chances the population mean is within the sample_mean (+ or -) the standard error (SE)

LCL_68 = sample_mean -  std_err
UCL_68 = sample_mean +  std_err

print("Lower confidence limit at 68% confidence level = ", round(LCL_68,2))
print("Upper confidence limit at 68% confidence level = ", round(UCL_68,2))

Lower confidence limit at 68% confidence level =  684.66
Upper confidence limit at 68% confidence level =  923.06


In [113]:
# Calculating 95% confidence Interval (CI) - one standard error from population mean
# 95% chances the population mean is within the sample_mean + or - 2* the standard error (SE)

LCL_95= sample_mean - 2*std_err
UCL_95 = sample_mean + 2*std_err

print("Lower confidence limit at 95% confidence level = ", round(LCL_95,2))
print("Upper confidence limit at 95% confidence level = ", round(UCL_95,2))

Lower confidence limit at 95% confidence level =  565.45
Upper confidence limit at 95% confidence level =  1042.27


In [114]:
# Calculate 99.7% Confidence Interval (CI) - one standard error from the population mean
# 99.7% chances the population mean is within the sample_mean + or - 3 * the standard error (SE)

LCL_997 = sample_mean -  3 * std_err
UCL_997 = sample_mean +  3 * std_err
print("Lower confidence limit at 99.7% confidence level = ", round(LCL_997,2))
print("Upper confidence limit at 99.7% confidence level = ", round(UCL_997,2))

Lower confidence limit at 99.7% confidence level =  446.25
Upper confidence limit at 99.7% confidence level =  1161.47


In [115]:
# Calcutaing the population mean
Df["DEATHS"].mean()

726.73

In [116]:
# States with highest number of deaths.
Df.sort_values("DEATHS", ascending=False)

Unnamed: 0,YEAR,STATE,RATE,DEATHS
42,2018.0,TX,12.2,3522.0
92,2017.0,TX,12.4,3513.0
254,2005.0,CA,9.5,3453.0
142,2016.0,TX,12.1,3353.0
192,2015.0,TX,11.7,3203.0
...,...,...,...,...
288,2005.0,RI,3.6,39.0
38,2018.0,RI,3.3,37.0
238,2014.0,RI,3.0,34.0
260,2005.0,HI,2.1,28.0


In [117]:
# Investigating the total number of deaths & rates in the repective years.
Df.groupby('YEAR', as_index = False).agg({'DEATHS':"sum", 'RATE':'sum'})

Unnamed: 0,YEAR,DEATHS,RATE
0,2005.0,30540.0,540.5
1,2014.0,33508.0,572.0
2,2015.0,36132.0,620.3
3,2016.0,38551.0,652.7
4,2017.0,39673.0,677.6
5,2018.0,39615.0,669.4
6,,0.0,0.0


### Remarks:
* From above we can see a constant incease in the number of deaths due to firearms from 2005 through 2018.
* There is a slight drop from in deaths from 2017 to 2018 by  58 deaths. 

In [118]:
# Investigating the years in our dataset.
Df["YEAR"].unique()

array(['2018.0', '2017.0', '2016.0', '2015.0', '2014.0', '2005.0', 'nan'],
      dtype=object)

In [119]:
# Finding top 10 states with most number of deaths. 
D = Df.groupby('STATE',as_index=None).agg({"DEATHS":"mean","RATE":"mean"})
D.sort_values("DEATHS", ascending= False,ignore_index=True).head(10)

Unnamed: 0,STATE,DEATHS,RATE
0,TX,3154.833333,11.7
1,CA,3149.666667,7.983333
2,FL,2522.833333,11.9
3,PA,1512.0,11.616667
4,GA,1462.833333,14.333333
5,OH,1398.666667,11.916667
6,NC,1311.5,12.966667
7,IL,1305.5,10.2
8,MI,1168.5,11.683333
9,TN,1114.833333,16.733333


In [120]:
 # Finding the 10 states that recorded highest death rates.
D = Df.groupby('STATE',as_index=None).agg({"RATE":"mean"})
D.sort_values("RATE", ascending= False,ignore_index=True).head(10)

Unnamed: 0,STATE,RATE
0,AK,21.483333
1,LA,20.383333
2,AL,19.783333
3,MS,19.7
4,MT,18.483333
5,MO,18.05
6,WY,17.816667
7,AR,17.7
8,NM,17.633333
9,OK,16.75


### **Remarks**:
* Although **TX, CA, PA, FL, GA** redocrded the most number of deaths in available data, these values simply do take into consideration the States population.
* Taking into consideration States population, states that recorded most deaths from firearms are: **Ak, LA, AL, MS, MT..** in that oder. This description gives a better respresentation since it take into account the population of the diferent states. **(ratio of deaths per 100,000 population)**.

In [121]:
# Finding 10 states which recorded least number of deaths.
D.sort_values("RATE", ascending= False,ignore_index=True).tail(10)

Unnamed: 0,STATE,RATE
40,NE,8.75
41,IA,8.15
42,CA,7.983333
43,MN,7.433333
44,NJ,5.25
45,CT,5.033333
46,NY,4.316667
47,RI,3.766667
48,MA,3.366667
49,HI,3.216667


### **Remark**:
* States that recoded least number of deaths in ratio to their population are **: HI, MA, RI, NY, CT, NJ, ...** in icreasing order.  