# National Survey on Drug Use and Health, 2012

The dataset chosen is the [National Survey on Drug Use and Health 2012](http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/34933?q=&paging.rows=25&sortBy=10) which provides measures of the widespread and correlation of drug use in the United States in 2012. Information provided is on the use of illicit drugs, alcohol, and tabacco among members of the United State aging from 12 years and older, including residents of college dormitories, group homes, shelters, rooming houses and civilians swelling on military installations.


The data set also includes treatment information and personal/family information like income and health care coverage and many more other data. However we chose our main goal to analyze the below classes only, as the dataset provided so many information and branches. Questions include age at first use as well as lifetime, annual, and past-month usage for the following drug classes:

* [Tabacco](Tabacco.ipynb)
* [Alcohol](Alcohol.ipynb)
* [Cocaine and Crack](Cocaine and Crack.ipynb)
* [Marijuana](Marijuana.ipynb)
* [Heroin](Heroin.ipynb)
* [Hallucinogens](Hallucinogens.ipynb)
* [Inhalants](Inhalants.ipynb)
* [Pain Relievers](Pain Relievers.ipynb)
* [Tranquilizers](Tranquilizers.ipynb)
* [Stimulants](Stimulants.ipynb)
* [Sedatives](Sedatives.ipynb)

# Data Formatting

As can be seen below, the data set includes more than 55,000 case/member that went through the survey.

The data set will be formatted/divided into the 12 classes that will be analyzed. Each class will be discussed separately as some are divided into other sub-classes.

In [3]:
#disable some annoying warnings
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

#plots the figures in place instead of a new window
%matplotlib inline

import matplotlib.pyplot as plt
import seaborn as sns

import numpy as np
import pandas as pd

In [4]:
#data is a tsv file therefore read it as a csv file with \t as the seperator
dataset = pd.read_csv('data/drugs-dataset.tsv', delimiter = '\t')

In [5]:
dataset.shape

(55268, 3120)

In [8]:
dataset.isnull().sum()

CASEID      0
QUESTID2    0
CIGEVER     0
CIGOFRSM    0
CIGWILYR    0
CIGTRY      0
CIGYFU      0
CIGMFU      0
CIGREC      0
CIG30USE    0
CG30EST     0
CIG30AV     0
CIG30BR2    0
CIG30TPE    0
CIG30MEN    0
CIG30MLN    0
CIG30RO2    0
CIGDLYMO    0
CIGAGE      0
CIGDLYFU    0
CIGDLMFU    0
CIG100LF    0
SNFEVER     0
SNUFTRY     0
SNUFYFU     0
SNUFMFU     0
SNFREC      0
SNF30USE    0
SN30EST     0
SNF30BR2    0
           ..
WRKUNWKS    0
WRKLSTY2    0
WRKIDSY2    0
WRKOCUY2    0
WRKBZCY2    0
WORKDAYS    0
WORKBLAH    0
LOCSIZE     0
DRGPLCY     0
PLCYCOV     0
WKDRGED     0
DRGPRGM     0
USALCTST    0
USDRGTST    0
TSTHIRE     0
TSTRAND     0
FIRSTPOS    0
WRKHIRE     0
WORKRAND    0
EMPSTATY    0
IIEMPSTY    0
II2EMSTY    0
EMPSTAT4    0
IIEMPST4    0
II2EMST4    0
PDEN00      0
COUTYP2     0
ANALWT_C    0
VESTR       0
VEREP       0
dtype: int64

In [7]:
case_quest_data = dataset.ix[:,0:2].copy()
tabacco = dataset.ix[:,'CIGEVER':'PIPE30DY'].copy() # tabacco dataset
alcohol = dataset.ix[:,'ALCEVER':'DR5DAY'].copy() # alcohol dataset
cocaine = dataset.ix[:,'COCEVER':'CC30EST'].copy() # cocaine dataset
crack = dataset.ix[:,'CRKEVER':'CR30EST'].copy() # crack dataset
marijuana = dataset.ix[:,'MJEVER':'MR30EST'].copy() # marijuana dataset
heroin = dataset.ix[:,'HEREVER':'HR30EST'].copy() # heroin dataset
hallucinogens = dataset.ix[:,'LSD':'ECSREC'].copy() # hallucinogens dataset
inhalants = dataset.ix[:,'AMYLNIT':'IN30EST'].copy() # inhalants dataset
pain_relievers = dataset.ix[:,'DARVTYLC':'OXDAYPWK'].copy() # pain_relievers dataset
tranquilizers = dataset.ix[:,'KLONOPIN':'TRDAYPWK'].copy() # tranquilizers dataset
stimulants = dataset.ix[:,'METHDES':'MTDAYPWK'].copy() # stimulants dataset
sedatives = dataset.ix[:,'METHAQ':'SVDAYPWK'].copy() # sedatives dataset

## Next

[Tabacco](Tabacco.ipynb)