# National Survey on Drug Use and Health, 2012

[![Binder](http://mybinder.org/badge.svg)](http://mybinder.org:/repo/aliaalaaeldinadly/va_project)


## 0. Data Set Overview

The dataset chosen is the [National Survey on Drug Use and Health 2012](http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/34933?q=&paging.rows=25&sortBy=10) which provides measures of the widespread and correlation of drug use in the United States in 2012. Information provided is on the use of illicit drugs, alcohol, and tabacco among members of the United State aging from 12 years and older, including residents of college dormitories, group homes, shelters, rooming houses and civilians swelling on military installations.


The data set also includes treatment information and personal/family information like income and health care coverage and many more other data. Questions in survey include age at first use as well as lifetime, annual, and past-month usage for different drugs.


## 0.1 Task Definition

For our task definition we chose to focus on the below classes only as the data set provided include so many information and branches. Also, we are only interested to know which drug is more likely for people to get addicted to and most importantly at which age range. What is more interesting is that the data set seems big enough for each class to analyzed separately before analyzing the classes correlation.

* [Tabacco](Tabacco.ipynb)
* [Alcohol](Alcohol.ipynb)
* [Cocaine and Crack](Cocaine and Crack.ipynb)
* [Marijuana](Marijuana.ipynb)
* [Heroin](Heroin.ipynb)
* [Hallucinogens](Hallucinogens.ipynb)
* [Inhalants](Inhalants.ipynb)
* [Pain Relievers](Pain Relievers.ipynb)
* [Tranquilizers](Tranquilizers.ipynb)
* [Stimulants](Stimulants.ipynb)
* [Sedatives](Sedatives.ipynb)

### 0.0.1 Approach

For every class mentioned above, the data set will be visually analyzed to define or extract the below information.

Questions for each drug class:

* How many cases use the drug?
* What are their age range?
* How frequently they use it?
* Did they stop and go back to it?
* Are they planning to quit?

### 0.0.2 Results

In the [Results](Results.ipynb) of our approach, the classes correlation will be visually analyzed to also define or extract the below information.

* Which drug/s are frequently used?
* What age range use which drug/s?
* Did they quit a specific drug and started another? Which?


In [13]:
#disable some annoying warnings
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

#plots the figures in place instead of a new window
%matplotlib inline

import matplotlib.pyplot as plt
import seaborn as sns

import numpy as np
import pandas as pd

# 1. Data Preprocessing
We started from loading and wrangling your dataset that includes dealing with missing values and formatting your dataset for later use. 


As can be seen below, the data set includes more than 55,000 cases/members that went through the survey.

In [14]:
#data is a tsv file therefore read it as a csv file with \t as the seperator
dataset = pd.read_csv('data/drugs-dataset.tsv', delimiter = '\t')

In [15]:
dataset.shape

(55268, 3120)

## 1.1 Data Formatting

The data set will be formatted/divided into the 12 classes that will be analyzed. Each class will be discussed separately. Missing data are handled in the data set by being assigned specific values like 9985. Such values will be excluded from the analysis depending on the inforamtion in focus.

In [17]:
case_quest_data = dataset.ix[:,0:2].copy()
tabacco = dataset.ix[:,'CIGEVER':'PIPE30DY'].copy() # tabacco dataset
alcohol = dataset.ix[:,'ALCEVER':'DR5DAY'].copy() # alcohol dataset
cocaine = dataset.ix[:,'COCEVER':'CC30EST'].copy() # cocaine dataset
crack = dataset.ix[:,'CRKEVER':'CR30EST'].copy() # crack dataset
marijuana = dataset.ix[:,'MJEVER':'MR30EST'].copy() # marijuana dataset
heroin = dataset.ix[:,'HEREVER':'HR30EST'].copy() # heroin dataset
hallucinogens = dataset.ix[:,'LSD':'ECSREC'].copy() # hallucinogens dataset
inhalants = dataset.ix[:,'AMYLNIT':'IN30EST'].copy() # inhalants dataset
pain_relievers = dataset.ix[:,'DARVTYLC':'OXDAYPWK'].copy() # pain_relievers dataset
tranquilizers = dataset.ix[:,'KLONOPIN':'TRDAYPWK'].copy() # tranquilizers dataset
stimulants = dataset.ix[:,'METHDES':'MTDAYPWK'].copy() # stimulants dataset
sedatives = dataset.ix[:,'METHAQ':'SVDAYPWK'].copy() # sedatives dataset

## Next

[Tabacco](Tabacco.ipynb)