# Exploratory Data Analysis on World Suicide Rates from 1985 to 2016

## Resources and Information Used

* [Dataset](https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016)
* [Cleaning and Prepping Data with Python for Data Science](https://medium.com/@rrfd/cleaning-and-prepping-data-with-python-for-data-science-best-practices-and-helpful-packages-af1edfbe2a3)


The purpose of this project is to aid in suicide prevention by looking for useful patterns/correlations amongst different demographics worldwide as it relates to suicide rates.

## What questions do I hope to answer?

* Is there a correlation between the suicide rate and different generations?
* What is the correlation between the suicide rate and GDP if any?
* Are there any countries with significantly higher suicide rates than others?
* How does the suicide rate compare amongst the male and female population?
* What is the correlation between the human development index and suicide rates?

In [1]:
#Importing libraries
import pandas as pd
import matplotlib.pyplot as plt
import csv
import seaborn as sns
import plotly 
import numpy as np 

### Inspecting/Cleaning the data 

In [2]:
#insert the data into a pandas dataframe
df = pd.read_csv('master.csv')
df.head(10)

Unnamed: 0,country,year,sex,age,suicides_no,population,suicides/100k pop,country-year,HDI for year,gdp_for_year ($),gdp_per_capita ($),generation
0,Albania,1987,male,15-24 years,21,312900,6.71,Albania1987,,2156624900,796,Generation X
1,Albania,1987,male,35-54 years,16,308000,5.19,Albania1987,,2156624900,796,Silent
2,Albania,1987,female,15-24 years,14,289700,4.83,Albania1987,,2156624900,796,Generation X
3,Albania,1987,male,75+ years,1,21800,4.59,Albania1987,,2156624900,796,G.I. Generation
4,Albania,1987,male,25-34 years,9,274300,3.28,Albania1987,,2156624900,796,Boomers
5,Albania,1987,female,75+ years,1,35600,2.81,Albania1987,,2156624900,796,G.I. Generation
6,Albania,1987,female,35-54 years,6,278800,2.15,Albania1987,,2156624900,796,Silent
7,Albania,1987,female,25-34 years,4,257200,1.56,Albania1987,,2156624900,796,Boomers
8,Albania,1987,male,55-74 years,1,137500,0.73,Albania1987,,2156624900,796,G.I. Generation
9,Albania,1987,female,5-14 years,0,311000,0.0,Albania1987,,2156624900,796,Generation X


In [3]:
#check column names
col_names = df.columns
print(col_names)

#get datatypes for each column
df.dtypes

Index([u'country', u'year', u'sex', u'age', u'suicides_no', u'population',
       u'suicides/100k pop', u'country-year', u'HDI for year',
       u' gdp_for_year ($) ', u'gdp_per_capita ($)', u'generation'],
      dtype='object')


country                object
year                    int64
sex                    object
age                    object
suicides_no             int64
population              int64
suicides/100k pop     float64
country-year           object
HDI for year          float64
 gdp_for_year ($)      object
gdp_per_capita ($)      int64
generation             object
dtype: object

**Note**: I noticed that the HDI column has some NaN values due to the fact that this concept didn't apppear until around 2010. HDI stands for *Human Developmen I will leave that data alone for now because I can still use those rows to look at patterns during previous decades.
