# The explosion of drug abuse-related suicides in India

I built this very quick notebook to look at how the top causes of death have evolved over the past 10 years. The data shows drug abuses have sky-rocketed. Run this code to see the graphs, since I was also running a quick test of Bokeh (graphs are produced as HTML files, opened in a new tab).

In [44]:
import pandas as pd, numpy as np

In [45]:
df=pd.read_csv('Suicides in India 2001-2012.csv')
df.head(3)

Unnamed: 0,State,Year,Type_code,Type,Gender,Age_group,Total
0,A & N Islands,2001,Causes,Illness (Aids/STD),Female,0-14,0
1,A & N Islands,2001,Causes,Bankruptcy or Sudden change in Economic,Female,0-14,0
2,A & N Islands,2001,Causes,Cancellation/Non-Settlement of Marriage,Female,0-14,0


In [46]:
df.shape

(237519, 7)

In [47]:
df.Type_code.value_counts()

Causes                  109200
Means_adopted            67200
Professional_Profile     49263
Education_Status          7296
Social_Status             4560
Name: Type_code, dtype: int64

How have the top causes evolved over the 10 years?

In [49]:
#how have causes evolved over the years? Looking at overall data.
causes=df.loc[df['Type_code']=='Causes']
causes_total=causes.drop(['Gender', 'Age_group'], axis=1)
causes_by_y_state=causes_total.groupby(['State', 'Year', 'Type']).agg('sum')

In [50]:
causes_by_y_state.reset_index(inplace=True)
causes_by_y=causes_by_y_state.groupby(['Year', 'Type']).agg('sum')

In [51]:
causes_by_y.reset_index(inplace=True)

In [52]:
def index_cause_totals(total, cause, year, df):
    #not super efficient to keep recalculating the index values, but the dataset is 312 rows so my cpu will survive.
    #if indexer not available for 2001, use index base of the next year, until and index base exists.
    i=0
    while(i==0):
        try:
            indexer=df.loc[(df['Year']==year) & (df['Type']==cause), 'Total'].values
            return float(total/indexer)
        
        except TypeError:
            year+=1

In [53]:
causes_by_y['indexed_total']=causes_by_y.apply(lambda x: index_totals(x['Total'], x['Type'], 2001, causes_by_y), axis=1)

In [54]:
causes_by_y.head(3)

Unnamed: 0,Year,Type,Total,indexed_total
0,2001,Bankruptcy or Sudden change in Economic,2918,1.0
1,2001,Cancellation/Non-Settlement of Marriage,924,1.0
2,2001,Cancer,780,1.0


In [55]:
#keeping only the top 10 causes of suicide, excluding 2 categories with limited information
top_5=pd.DataFrame(causes_by_y.groupby(['Type']).agg('sum'))[['Total']].sort(columns='Total', ascending=False).iloc[:10]
top_5_list=[x for x in top_5.index if x not in ['Causes Not known', 'Other Causes (Please Specity)']]
top_5_list

  from ipykernel import kernelapp as app


['Family Problems',
 'Other Prolonged Illness',
 'Insanity/Mental Illness',
 'Love Affairs',
 'Bankruptcy or Sudden change in Economic',
 'Poverty',
 'Dowry Dispute',
 'Drug Abuse/Addiction']

In [56]:
causes_by_y_top_5=causes_by_y.loc[causes_by_y.Type.isin(top_5_list), ['Year', 'Type', 'indexed_total']]

In [57]:
from bokeh.plotting import figure, output_file, show
from bokeh.charts import Line

In [58]:
p=Line(causes_by_y_top_5, x='Year', y='indexed_total', title='Drug Abuses Sky-Rocket',ylabel='Suicides, indexed 2001 values',
       color='Type')
show(p)

Where is this increase in drug abuses coming from? Let's look at age and gender first.

In [60]:
#Get drug abuse data only. Break down by age, gender, and an interaction of the two
causes_sub=causes[['Year', 'Type', 'Gender', 'Age_group', 'Total']].loc[causes.Type == 'Drug Abuse/Addiction']
drug_by_gender_age=causes_sub.groupby(['Year', 'Gender', 'Age_group']).agg('sum').reset_index()
drug_by_gender=causes_sub.groupby(['Year', 'Gender']).agg('sum').reset_index()
drug_by_age=causes_sub.groupby(['Year', 'Age_group']).agg('sum').reset_index()

In [61]:
def index_variable_totals(total_value, var_name,var_value, index_year, df):
    i=0
    while(i==0):
        try:
            indexer=df.loc[(df['Year']==index_year) & (df[var_name]==var_value), 'Total'].values
            return float(total_value/indexer)
        
        except TypeError:
            year+=1

In [62]:
for df,category in zip([drug_by_gender, drug_by_age], ['Gender', 'Age_group']):
    df['indexed_total']=df.apply(lambda x: index_variable_totals(x['Total'],category, x[category], 2001, df), axis=1)

In [65]:
p=Line(drug_by_gender, x='Year', y='indexed_total', title='Indexed drug abuses by gender',ylabel='Suicides, indexed 2001 values',
       color='Gender')
show(p)

In [64]:
p=Line(drug_by_age, x='Year', y='indexed_total', title='Indexed drug abuses by gender',ylabel='Suicides, indexed 2001 values',
       color='Age_group')
show(p)

Drug abuses are increasing for all notable age groups, but especially among men.