# Descriptive Statistics and Data Visualization
This notebook covers exploratory analysis for 430 features<br>
coming from 9 different categories of clinical data<br>
The analysis will include three steps:<br>
1. Extract Descriptive Statistics for each feature
2. Visualize the distribution 
3. Brief Summary of Findings

<br>

| Table Name |variable prefix|columns |DType |Description | 
| :--- | :--- | :--- | :--- | :--- |
| Research Session Attendance|rsa_|25 |Binary|Records attendence for each week of treatment | 
| Demographics|dem_|10 |Categorical|Sex, Ethnicity, Race | 
| Urine Drug Screen| test_|225|Binary  |Drug test for 8 different drug classes, taken weekly for 24 weeks | 
| DSM-IV Diagnosis|dsm_|6|Categorical |Tracks clinical diagnosis for substance use disorder, in accordance with DSM guidelines| 
| Medical and Psychiatric History|mdh_|18|Categorical |Tracks medical and psychiatric history of 18 different Conditions| 
| Physical Exam|pex_|12|Categorical |Tracks the appearance and condition of patients for 12 different physical observations| 
| Timeline Follow Back Survey|survey_|70|Numeric |Surveys for self reported drug use, collected every 4 weeks, records total number of instances of drug use for the previous 30 days|
| Dose Record |meds_|50|Numeric |Records the medication, averge weekly dose and week of treatment| 


### Import Required Libraries

In [2]:
import pandas as pd # data manipulation
import numpy as np # numerical computation
import matplotlib.pyplot as plt # visualization
import seaborn as sns # enhanced visualization
import warnings # ignore warnings
import helper # custom data transformation functions
from IPython.display import display, Markdown # display and markdown conversion
import re # regular expressions
warnings.filterwarnings('ignore') # ignore warnings

# Load the data
data = pd.read_csv('../data/59_features.csv')

# Display the first few rows of the data
data

Unnamed: 0,test_oxycodone_0,test_cocaine_0,test_methamphetamine_0,test_opiate300_0,test_oxycodone_1,test_cocaine_1,test_methamphetamine_1,test_opiate300_1,test_oxycodone_2,test_cocaine_2,...,mdh_gi_problems,mdh_thyroid_problems,mdh_heart_condition,mdh_asthma,mdh_hypertension,mdh_skin_disease,mdh_head_injury,mdh_opi_withdrawal,mdh_epilepsy,dropout
0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,yes_history,no_history,no_history,no_history,no_history,no_history,no_history,yes_history,no_history,0.0
1,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,...,no_history,no_history,no_history,no_history,no_history,yes_history,no_history,no_history,no_history,0.0
2,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,...,no_history,no_history,no_history,no_history,no_history,yes_history,no_history,yes_history,no_history,0.0
3,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,...,no_history,no_history,no_history,no_history,no_history,no_history,no_history,yes_history,no_history,0.0
4,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,yes_history,no_history,no_history,no_history,yes_history,no_history,no_history,yes_history,no_history,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1264,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,no_history,no_history,no_history,yes_history,no_history,no_history,yes_history,no_history,no_history,1.0
1265,0.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,no_history,no_history,no_history,no_history,no_history,no_history,no_history,no_history,no_history,1.0
1266,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,no_history,no_history,no_history,yes_history,yes_history,no_history,no_history,yes_history,no_history,1.0
1267,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,no_history,no_history,no_history,no_history,yes_history,yes_history,no_history,yes_history,no_history,1.0


In [3]:
data.columns.tolist()

['test_oxycodone_0',
 'test_cocaine_0',
 'test_methamphetamine_0',
 'test_opiate300_0',
 'test_oxycodone_1',
 'test_cocaine_1',
 'test_methamphetamine_1',
 'test_opiate300_1',
 'test_oxycodone_2',
 'test_cocaine_2',
 'test_methamphetamine_2',
 'test_opiate300_2',
 'test_oxycodone_3',
 'test_cocaine_3',
 'test_methamphetamine_3',
 'test_opiate300_3',
 'test_oxycodone_4',
 'test_cocaine_4',
 'test_methamphetamine_4',
 'test_opiate300_4',
 'survey_cannabis_0',
 'survey_cocaine_0',
 'survey_oxycodone_0',
 'survey_methamphetamine_0',
 'survey_opiates_0',
 'survey_cannabis_4',
 'survey_cocaine_4',
 'survey_oxycodone_4',
 'survey_methamphetamine_4',
 'survey_opiates_4',
 'medication',
 'cows_predose',
 'cows_postdose',
 'rbs_n_sexual_activity',
 'rbs_n_heroin_injection',
 'rbs_heroin_daily_injection',
 'rbs_heroin_non_iv_use',
 'rbs_speedball',
 'rbs_other_opiates',
 'gender',
 'mdh_liver_problems',
 'mdh_kidney_problems',
 'mdh_alc_withdrawal',
 'mdh_schizophrenia',
 'mdh_major_depressive_di