<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Data-Preprocessing-(Data-Transformations)" data-toc-modified-id="Data-Preprocessing-(Data-Transformations)-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data Preprocessing (Data Transformations)</a></span><ul class="toc-item"><li><span><a href="#Drug-Outcome-variable-transformations" data-toc-modified-id="Drug-Outcome-variable-transformations-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Drug Outcome variable transformations</a></span><ul class="toc-item"><li><span><a href="#Remove-string-and-change-to-integer" data-toc-modified-id="Remove-string-and-change-to-integer-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Remove string and change to integer</a></span></li><li><span><a href="#Create-3-broader-outcome-variables-(Stimulants,-Depressants-and-Hallucinogens)" data-toc-modified-id="Create-3-broader-outcome-variables-(Stimulants,-Depressants-and-Hallucinogens)-2.1.2"><span class="toc-item-num">2.1.2&nbsp;&nbsp;</span>Create 3 broader outcome variables (<em>Stimulants, Depressants and Hallucinogens</em>)</a></span></li><li><span><a href="#Recode-from-6-levels-to-3-levels" data-toc-modified-id="Recode-from-6-levels-to-3-levels-2.1.3"><span class="toc-item-num">2.1.3&nbsp;&nbsp;</span>Recode from 6 levels to 3 levels</a></span></li></ul></li></ul></li><li><span><a href="#Rebalance" data-toc-modified-id="Rebalance-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Rebalance</a></span></li><li><span><a href="#Testing-Hyperparameters---GridSeardCV" data-toc-modified-id="Testing-Hyperparameters---GridSeardCV-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Testing Hyperparameters - GridSeardCV</a></span><ul class="toc-item"><li><span><a href="#LinearSVC-GridSerachCV" data-toc-modified-id="LinearSVC-GridSerachCV-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>LinearSVC GridSerachCV</a></span></li><li><span><a href="#Logistic-Regression-GridSearchCV" data-toc-modified-id="Logistic-Regression-GridSearchCV-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Logistic Regression GridSearchCV</a></span></li><li><span><a href="#Random-Forest-GridSearchCV" data-toc-modified-id="Random-Forest-GridSearchCV-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Random Forest GridSearchCV</a></span></li><li><span><a href="#Neural-Network-GridSearchCV" data-toc-modified-id="Neural-Network-GridSearchCV-4.4"><span class="toc-item-num">4.4&nbsp;&nbsp;</span>Neural Network GridSearchCV</a></span></li></ul></li></ul></div>

In [22]:
#import libraries
import pandas as pd
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt
import re
import numpy as np
%matplotlib inline

# Introduction

In this notebook hyperparameters for the **depressants models** are optimized using GridSearchCV. This is an exhaustive search of all the parameters and parameter values defined in the dictionary in the Testing Hyperparameters section.

The best values for the parameters for each model are output and then used to recalibrate the models in the previous notebook to provide the final models.


# Data Preprocessing (Data Transformations)

In this first section, the dataset is prepared for modelling in a series of data transfromations.

For the outcome variable, the eighteen outcome variables are collapsed into three new outcome variables, representing broader classes of drugs. They are _**Stimulants, Depressants, and Hallucinogens**_.

Additionally, the 7 levels of drug use are also collapsed to three new levels of drug use: _**1 - unlike to use, 2 - medium use, 3 - high usage**_


In [23]:
#read in dataset
df = pd.read_csv("../drug_consumption_cap_20230505.csv")
#show df
df

Unnamed: 0,ID,Age,Gender,Education,Country,Ethnicity,NEO_N,NEO_E,NEO_O,NEO_A,...,ECST,HEROIN,KETA,LEGALH,LSD,METH,MUSHRM,NICO,SEMER,VSA
0,1,0.49788,0.48246,-0.05921,0.96082,0.12600,0.31287,-0.57545,-0.58331,-0.91699,...,CL0,CL0,CL0,CL0,CL0,CL0,CL0,CL2,CL0,CL0
1,2,-0.07854,-0.48246,1.98437,0.96082,-0.31685,-0.67825,1.93886,1.43533,0.76096,...,CL4,CL0,CL2,CL0,CL2,CL3,CL0,CL4,CL0,CL0
2,3,0.49788,-0.48246,-0.05921,0.96082,-0.31685,-0.46725,0.80523,-0.84732,-1.62090,...,CL0,CL0,CL0,CL0,CL0,CL0,CL1,CL0,CL0,CL0
3,4,-0.95197,0.48246,1.16365,0.96082,-0.31685,-0.14882,-0.80615,-0.01928,0.59042,...,CL0,CL0,CL2,CL0,CL0,CL0,CL0,CL2,CL0,CL0
4,5,0.49788,0.48246,1.98437,0.96082,-0.31685,0.73545,-1.63340,-0.45174,-0.30172,...,CL1,CL0,CL0,CL1,CL0,CL0,CL2,CL2,CL0,CL0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1880,1884,-0.95197,0.48246,-0.61113,-0.57009,-0.31685,-1.19430,1.74091,1.88511,0.76096,...,CL0,CL0,CL0,CL3,CL3,CL0,CL0,CL0,CL0,CL5
1881,1885,-0.95197,-0.48246,-0.61113,-0.57009,-0.31685,-0.24649,1.74091,0.58331,0.76096,...,CL2,CL0,CL0,CL3,CL5,CL4,CL4,CL5,CL0,CL0
1882,1886,-0.07854,0.48246,0.45468,-0.57009,-0.31685,1.13281,-1.37639,-1.27553,-1.77200,...,CL4,CL0,CL2,CL0,CL2,CL0,CL2,CL6,CL0,CL0
1883,1887,-0.95197,0.48246,-0.61113,-0.57009,-0.31685,0.91093,-1.92173,0.29338,-1.62090,...,CL3,CL0,CL0,CL3,CL3,CL0,CL3,CL4,CL0,CL0


## Drug Outcome variable transformations

### Remove string and change to integer

In [24]:
#select only the drug variable columns
df.iloc[:,13:]

Unnamed: 0,ALC,AMPHET,AMYL,BENZOS,CAFF,CANNABIS,CHOC,COCAINE,CRACK,ECST,HEROIN,KETA,LEGALH,LSD,METH,MUSHRM,NICO,SEMER,VSA
0,CL5,CL2,CL0,CL2,CL6,CL0,CL5,CL0,CL0,CL0,CL0,CL0,CL0,CL0,CL0,CL0,CL2,CL0,CL0
1,CL5,CL2,CL2,CL0,CL6,CL4,CL6,CL3,CL0,CL4,CL0,CL2,CL0,CL2,CL3,CL0,CL4,CL0,CL0
2,CL6,CL0,CL0,CL0,CL6,CL3,CL4,CL0,CL0,CL0,CL0,CL0,CL0,CL0,CL0,CL1,CL0,CL0,CL0
3,CL4,CL0,CL0,CL3,CL5,CL2,CL4,CL2,CL0,CL0,CL0,CL2,CL0,CL0,CL0,CL0,CL2,CL0,CL0
4,CL4,CL1,CL1,CL0,CL6,CL3,CL6,CL0,CL0,CL1,CL0,CL0,CL1,CL0,CL0,CL2,CL2,CL0,CL0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1880,CL5,CL0,CL0,CL0,CL4,CL5,CL4,CL0,CL0,CL0,CL0,CL0,CL3,CL3,CL0,CL0,CL0,CL0,CL5
1881,CL5,CL0,CL0,CL0,CL5,CL3,CL4,CL0,CL0,CL2,CL0,CL0,CL3,CL5,CL4,CL4,CL5,CL0,CL0
1882,CL4,CL6,CL5,CL5,CL6,CL6,CL6,CL4,CL0,CL4,CL0,CL2,CL0,CL2,CL0,CL2,CL6,CL0,CL0
1883,CL5,CL0,CL0,CL0,CL6,CL6,CL5,CL0,CL0,CL3,CL0,CL0,CL3,CL3,CL0,CL3,CL4,CL0,CL0


In [25]:
#remove 'CL' prefix
df.iloc[:,13:] = df.iloc[:,13:].applymap(lambda x: re.sub('CL','',x))
df.iloc[:,13:]

Unnamed: 0,ALC,AMPHET,AMYL,BENZOS,CAFF,CANNABIS,CHOC,COCAINE,CRACK,ECST,HEROIN,KETA,LEGALH,LSD,METH,MUSHRM,NICO,SEMER,VSA
0,5,2,0,2,6,0,5,0,0,0,0,0,0,0,0,0,2,0,0
1,5,2,2,0,6,4,6,3,0,4,0,2,0,2,3,0,4,0,0
2,6,0,0,0,6,3,4,0,0,0,0,0,0,0,0,1,0,0,0
3,4,0,0,3,5,2,4,2,0,0,0,2,0,0,0,0,2,0,0
4,4,1,1,0,6,3,6,0,0,1,0,0,1,0,0,2,2,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1880,5,0,0,0,4,5,4,0,0,0,0,0,3,3,0,0,0,0,5
1881,5,0,0,0,5,3,4,0,0,2,0,0,3,5,4,4,5,0,0
1882,4,6,5,5,6,6,6,4,0,4,0,2,0,2,0,2,6,0,0
1883,5,0,0,0,6,6,5,0,0,3,0,0,3,3,0,3,4,0,0


In [26]:
#recode as integer field type
df.iloc[:,13:] = df.iloc[:,13:].apply(lambda x: x.astype(int))
#check for field type of outcomes (should be integers)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1885 entries, 0 to 1884
Data columns (total 32 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   ID         1885 non-null   int64  
 1   Age        1885 non-null   float64
 2   Gender     1885 non-null   float64
 3   Education  1885 non-null   float64
 4   Country    1885 non-null   float64
 5   Ethnicity  1885 non-null   float64
 6   NEO_N      1885 non-null   float64
 7   NEO_E      1885 non-null   float64
 8   NEO_O      1885 non-null   float64
 9   NEO_A      1885 non-null   float64
 10  NEO_C      1885 non-null   float64
 11  IMP        1885 non-null   float64
 12  SS         1885 non-null   float64
 13  ALC        1885 non-null   int32  
 14  AMPHET     1885 non-null   int32  
 15  AMYL       1885 non-null   int32  
 16  BENZOS     1885 non-null   int32  
 17  CAFF       1885 non-null   int32  
 18  CANNABIS   1885 non-null   int32  
 19  CHOC       1885 non-null   int32  
 20  COCAINE 

### Create 3 broader outcome variables (*Stimulants, Depressants and Hallucinogens*)

In [27]:
#testing function to group drugs to create a new outcome variable
def create_drug_test(row):      
    return max(row['ALC'],row['AMPHET'],row['AMYL'],\
              row['BENZOS'],row['CANNABIS'])

In [28]:
#test on first three throws - before and after
display(df.iloc[:3,13:])

#selection from row
display(df.iloc[:3,13:].apply(lambda x: create_drug_test(x), axis=1))

Unnamed: 0,ALC,AMPHET,AMYL,BENZOS,CAFF,CANNABIS,CHOC,COCAINE,CRACK,ECST,HEROIN,KETA,LEGALH,LSD,METH,MUSHRM,NICO,SEMER,VSA
0,5,2,0,2,6,0,5,0,0,0,0,0,0,0,0,0,2,0,0
1,5,2,2,0,6,4,6,3,0,4,0,2,0,2,3,0,4,0,0
2,6,0,0,0,6,3,4,0,0,0,0,0,0,0,0,1,0,0,0


0    5
1    5
2    6
dtype: int64

In [29]:
#function to group drugs to create a new stimulants outcome variable
def create_stimulants(row):      
    return max(row['AMPHET'],row['NICO'],row['COCAINE'],\
              row['CRACK'],row['CAFF'],row['CHOC'])

In [30]:
#function to group drugs to create a new depressants outcome variable
def create_depressants(row):      
    return max(row['ALC'],row['AMYL'],row['BENZOS'],row['VSA'],row['HEROIN'],\
              row['METH'])

In [31]:
#function to group drugs to create a new hallucinogens outcome variable
def create_hallucinogens(row):      
    return max(row['CANNABIS'],row['ECST'],row['KETA'],row['LSD'],\
               row['MUSHRM'],row['LEGALH'])

In [32]:
df["stimulants"] = df.iloc[:,13:].apply(lambda x: create_stimulants(x).astype(int), axis=1)
df["depressants"] = df.iloc[:,13:].apply(lambda x: create_depressants(x).astype(int), axis=1)
df["hallucinogens"] = df.iloc[:,13:].apply(lambda x: create_hallucinogens(x).astype(int), axis=1)



In [33]:
df

Unnamed: 0,ID,Age,Gender,Education,Country,Ethnicity,NEO_N,NEO_E,NEO_O,NEO_A,...,LEGALH,LSD,METH,MUSHRM,NICO,SEMER,VSA,stimulants,depressants,hallucinogens
0,1,0.49788,0.48246,-0.05921,0.96082,0.12600,0.31287,-0.57545,-0.58331,-0.91699,...,0,0,0,0,2,0,0,6,5,0
1,2,-0.07854,-0.48246,1.98437,0.96082,-0.31685,-0.67825,1.93886,1.43533,0.76096,...,0,2,3,0,4,0,0,6,5,4
2,3,0.49788,-0.48246,-0.05921,0.96082,-0.31685,-0.46725,0.80523,-0.84732,-1.62090,...,0,0,0,1,0,0,0,6,6,3
3,4,-0.95197,0.48246,1.16365,0.96082,-0.31685,-0.14882,-0.80615,-0.01928,0.59042,...,0,0,0,0,2,0,0,5,4,2
4,5,0.49788,0.48246,1.98437,0.96082,-0.31685,0.73545,-1.63340,-0.45174,-0.30172,...,1,0,0,2,2,0,0,6,4,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1880,1884,-0.95197,0.48246,-0.61113,-0.57009,-0.31685,-1.19430,1.74091,1.88511,0.76096,...,3,3,0,0,0,0,5,4,5,5
1881,1885,-0.95197,-0.48246,-0.61113,-0.57009,-0.31685,-0.24649,1.74091,0.58331,0.76096,...,3,5,4,4,5,0,0,5,5,5
1882,1886,-0.07854,0.48246,0.45468,-0.57009,-0.31685,1.13281,-1.37639,-1.27553,-1.77200,...,0,2,0,2,6,0,0,6,5,6
1883,1887,-0.95197,0.48246,-0.61113,-0.57009,-0.31685,0.91093,-1.92173,0.29338,-1.62090,...,3,3,0,3,4,0,0,6,5,6


### Recode from 6 levels to 3 levels

In [34]:
#define recoding function
def recode(val):
    #for values 4 or greater
    if val >= 4:
        return 3
    #for values 2 and 3
    if (val >=2) & (val< 4):
        return 2
    else:
        return 0

In [35]:
#apply recoding
df[['stim_final','dep_final','hallu_final']] = df[['stimulants','depressants','hallucinogens']].applymap(lambda x: recode(x))

In [36]:
#show df
df

Unnamed: 0,ID,Age,Gender,Education,Country,Ethnicity,NEO_N,NEO_E,NEO_O,NEO_A,...,MUSHRM,NICO,SEMER,VSA,stimulants,depressants,hallucinogens,stim_final,dep_final,hallu_final
0,1,0.49788,0.48246,-0.05921,0.96082,0.12600,0.31287,-0.57545,-0.58331,-0.91699,...,0,2,0,0,6,5,0,3,3,0
1,2,-0.07854,-0.48246,1.98437,0.96082,-0.31685,-0.67825,1.93886,1.43533,0.76096,...,0,4,0,0,6,5,4,3,3,3
2,3,0.49788,-0.48246,-0.05921,0.96082,-0.31685,-0.46725,0.80523,-0.84732,-1.62090,...,1,0,0,0,6,6,3,3,3,2
3,4,-0.95197,0.48246,1.16365,0.96082,-0.31685,-0.14882,-0.80615,-0.01928,0.59042,...,0,2,0,0,5,4,2,3,3,2
4,5,0.49788,0.48246,1.98437,0.96082,-0.31685,0.73545,-1.63340,-0.45174,-0.30172,...,2,2,0,0,6,4,3,3,3,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1880,1884,-0.95197,0.48246,-0.61113,-0.57009,-0.31685,-1.19430,1.74091,1.88511,0.76096,...,0,0,0,5,4,5,5,3,3,3
1881,1885,-0.95197,-0.48246,-0.61113,-0.57009,-0.31685,-0.24649,1.74091,0.58331,0.76096,...,4,5,0,0,5,5,5,3,3,3
1882,1886,-0.07854,0.48246,0.45468,-0.57009,-0.31685,1.13281,-1.37639,-1.27553,-1.77200,...,2,6,0,0,6,5,6,3,3,3
1883,1887,-0.95197,0.48246,-0.61113,-0.57009,-0.31685,0.91093,-1.92173,0.29338,-1.62090,...,3,4,0,0,6,5,6,3,3,3


In [37]:
#value counts check
df[['stim_final','dep_final','hallu_final']].apply(df.value_counts)

Unnamed: 0,stim_final,dep_final,hallu_final
0,4,42,584
2,8,215,418
3,1873,1628,883


# Rebalance

In this section the outcome variable classes are rebalanced using the SMOTE oversampling method. The RandomOverSampler method was also used and compared with the SMOTE method. The four selected classifiers were then re-calibrated using the rebalanced data. All of the rebalanced models performed better than the initial models in decreasing overfitting. In the end the SMOTE results resulted in better overall model accuracy and was selected as the preferred balancing method.

In [38]:
#import re-balancing libraries
from imblearn.over_sampling import SMOTE, RandomOverSampler
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize

In [39]:
#subset independent variables
x_vars = df.iloc[:,:13]
#x_vars = df_d

#dependent variables
out = df[['stim_final','dep_final','hallu_final']]

#create test/train set
X_train, X_test, y_train, y_test = train_test_split(x_vars,out, test_size=.4, random_state=42)

In [40]:
#instantiate oversampler
over = SMOTE(k_neighbors=2,random_state=42)
#over = RandomOverSampler(random_state=42)

In [41]:
#oversample on training data
X_sampled,y_sampled = over.fit_resample(X_train,y_train["dep_final"])

In [42]:
#check value counts for re-balanced
y_sampled.value_counts()

0    986
2    986
3    986
Name: dep_final, dtype: int64

# Testing Hyperparameters - GridSeardCV

In this section the parameter search is carried out with GridSearchCV

In [43]:
#binarize
y = label_binarize(y_sampled, classes=[0, 2, 3])
#get number of classes
n_classes = y.shape[1]
#set random state for repeatability
random_state = np.random.RandomState()

# shuffle and split training and test sets
#X_train_b, X_test_b, y_train_b, y_test_b = train_test_split(X_sampled, y, test_size=0.5, random_state=0)

In [44]:
#import libraries for GridSearch CV
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingRegressor, RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
import numpy as np
from sklearn.multiclass import OneVsRestClassifier
import time

In [59]:
#final parameter grids
param_grid = [#LR parameter grid
    {'estimator__penalty' : ['l1', 'l2'],
    'estimator__C': [1.0, 0.1, .001, .0001],
    'estimator__solver' : ['liblinear','newton-cg', 'lbfgs','saga']}
    ]

param_grid_svm = [#svm parameter grid
    {'estimator__penalty' : ['l1', 'l2'],
     'estimator__dual': [True, False],
    'estimator__C': [1.0, 0.1, .001, .0001],
    'estimator__loss' : ['squared_hinge','hinge'],
    'estimator__max_iter': [1000, 5000, 10000],
    'estimator__multi_class': ['ovr', 'crammer_singer']}
    ]

param_grid_RF = [#RF parameter grid
    {#'estimator' : [OneVsRestClassifier(RandomForestClassifier())],
    #'estimator__n_estimators' : [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000],
    'estimator__n_estimators' : [200, 400, 600],
    'estimator__max_features' : ['auto', 'sqrt','log2'],
    'estimator__bootstrap' : [True, False],
    #'estimator__max_depth': [5,10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],
    'estimator__max_depth': [5,10, 20, 30, 40, 50, None],
     'estimator__min_samples_leaf': [1, 2, 4],
     'estimator__min_samples_split': [2, 5, 10],
    'estimator__criterion': ['gini','entropy','log_loss']}
]

param_grid_MLP = [#Neural Network parameter grid
    
    {#'estimator' : [OneVsRestClassifier(MLPClassifier())],
    'estimator__hidden_layer_sizes': [(50,50,50), (50,100,50), (100,)],
    'estimator__activation': ['tanh', 'relu'],
    'estimator__solver': ['sgd', 'adam'],
    'estimator__alpha': [0.0001, 0.05, 0.1],
    'estimator__learning_rate': ['constant','adaptive', 'invscaling'],
    'estimator__max_iter' :[200,300,400]}
    
    ]



## LinearSVC GridSerachCV

In [60]:
#instantiate GridSearchCV object for SVM
SVM_clf = GridSearchCV(OneVsRestClassifier(svm.LinearSVC(C=.001, random_state=42, max_iter=10000, dual=False)),
                       param_grid = param_grid_svm, cv = 5, verbose=True, n_jobs=-1)

In [61]:
#get start time
start = time.time()

#fit on depressants data
best_SVM_clf = SVM_clf.fit(X_sampled, y)

#print duration
print(f"GridSearchCV Total Time in seconds: {time.time()-start}")

Fitting 5 folds for each of 192 candidates, totalling 960 fits


 0.09258529 0.09258529        nan 0.27134095 0.09258529 0.09258529
        nan 0.1196186  0.09258529 0.09258529        nan 0.16832099
 0.09258529 0.09258529        nan 0.19372742 0.09258529 0.09258529
 0.19370798 0.19134198 0.09258529 0.09258529 0.19370798 0.19134198
 0.09258529 0.09258529 0.19370798 0.19134198 0.09258529 0.09258529
        nan        nan 0.09258529 0.09258529        nan        nan
 0.09258529 0.09258529        nan        nan 0.09258529 0.09258529
        nan 0.31055357 0.14202966 0.14202966        nan 0.20930683
 0.14202966 0.14202966        nan 0.18790186 0.14202966 0.14202966
        nan 0.1934033  0.14202966 0.14202966        nan 0.16971064
 0.14202966 0.14202966        nan 0.20152684 0.14202966 0.14202966
 0.18728735 0.18863699 0.14202966 0.14202966 0.18728735 0.18863699
 0.14202966 0.14202966 0.18728735 0.18863699 0.14202966 0.14202966
        nan        nan 0.14202966 0.14202966        nan        nan
 0.14202966 0.14202966        nan        nan 0.14202966 0.1420

GridSearchCV Total Time in seconds: 814.3755588531494




In [62]:
#get best parameters
best_SVM_clf.best_estimator_

OneVsRestClassifier(estimator=LinearSVC(C=0.1, random_state=42))

In [46]:
#pipe = Pipeline([('estimator' , OneVsRestClassifier(LogisticRegression()))])

## Logistic Regression GridSearchCV

In [47]:
#instantiate LR GridSearchCV object
LR_clf = GridSearchCV(OneVsRestClassifier(LogisticRegression()), param_grid = param_grid, cv = 5, verbose=True, n_jobs=-1)

In [48]:
#get start time
start = time.time()

#fit to depressants data
best_LR_clf = LR_clf.fit(X_sampled, y)

#print duration
print(f"GridSearchCV Total Time in seconds: {time.time()-start}")

Fitting 5 folds for each of 32 candidates, totalling 160 fits
GridSearchCV Total Time in seconds: 3.3719730377197266


 0.18154982 0.01216845 0.19979078        nan        nan 0.01216845
 0.19236063 0.1954006  0.1697215  0.01216845 0.                nan
        nan 0.         0.09466491 0.02703389 0.02703389 0.01216845
 0.                nan        nan 0.         0.01757729 0.
 0.         0.00980244]


In [49]:
#get best params
best_LR_clf.best_estimator_

OneVsRestClassifier(estimator=LogisticRegression(penalty='l1',
                                                 solver='liblinear'))

## Random Forest GridSearchCV

In [52]:
#instantiate RF GridSearchCV
RF_clf = GridSearchCV(OneVsRestClassifier(RandomForestClassifier()), param_grid = param_grid_RF, cv = 5, verbose=True, n_jobs=-1)

In [53]:
#get start time
start = time.time()

#fit on depressants data
best_RF_clf = RF_clf.fit(X_sampled, y)

#print duration
print(f"GridSearchCV Total Time in seconds: {time.time()-start}")

Fitting 5 folds for each of 3402 candidates, totalling 17010 fits




GridSearchCV Total Time in seconds: 4243.851388454437


In [54]:
#get best parameters
best_RF_clf.best_estimator_

OneVsRestClassifier(estimator=RandomForestClassifier(bootstrap=False,
                                                     criterion='entropy',
                                                     max_depth=40,
                                                     n_estimators=400))

## Neural Network GridSearchCV

In [55]:
#instantiate NN GridSearchCV object
MLP_clf = GridSearchCV(OneVsRestClassifier(MLPClassifier()), param_grid = param_grid_MLP, cv = 5, verbose=True, n_jobs=-1)

In [56]:
#get start time
start = time.time()

#fit on depressants data
best_MLP_clf = MLP_clf.fit(X_sampled, y)

#print duration
print(f"GridSearchCV Total Time in seconds: {time.time()-start}")

Fitting 5 folds for each of 324 candidates, totalling 1620 fits
GridSearchCV Total Time in seconds: 523.813027381897


In [57]:
#get best estimators
best_MLP_clf.best_estimator_

OneVsRestClassifier(estimator=MLPClassifier(learning_rate='invscaling',
                                            max_iter=400, solver='sgd'))