# COMP47590: Advanced Machine Learning
# Assignment 1: Multi-label Classification

Name(s): Aditi Bansal

Student Number(s): 19200465

## Import Packages Etc

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
from sklearn.model_selection import train_test_split
from sklearn.base import BaseEstimator
from sklearn.base import ClassifierMixin
from sklearn.base import MetaEstimatorMixin
from sklearn.base import clone
from sklearn.utils import check_X_y, check_array
from sklearn.utils.validation import check_is_fitted
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score, hamming_loss
from sklearn import metrics
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from imblearn.under_sampling import RandomUnderSampler
from sklearn.model_selection import GridSearchCV

Using TensorFlow backend.


## Task 0: Load the Yeast Dataset

Loading features in X and 14 labels in Y

In [2]:
# Write your code here
dataset = pd.read_csv('yeast.csv')
dataset.head(3)
X = dataset[dataset.columns[1:103]]
Y = dataset[dataset.columns[103:117]]
Y.head(3)

Unnamed: 0,Class1,Class2,Class3,Class4,Class5,Class6,Class7,Class8,Class9,Class10,Class11,Class12,Class13,Class14
0,0,0,0,0,0,0,1,1,0,0,0,1,1,0
1,0,0,1,1,0,0,0,0,0,0,0,0,0,0
2,0,1,1,0,0,0,0,0,0,0,0,1,1,0


## Task 1: Implement the Binary Relevance Algorithm

Passing the base estimator to Binary relevance and appending each estimator prediction for 14 labels in a list. 
We will first check the X and Y to be 2D and 1D array respectively.
We will then fit the ith column of Y and append it in estimator list and will clone it every time to keep a copy of estimator for every ith column, instead of updating it for all 14 labels.
Prediction is made after checking whether the data is fitted or not. 

In [5]:
class BinaryRelevance(BaseEstimator, ClassifierMixin, MetaEstimatorMixin):
    
    def __init__(self,estimator):
        
        self.estimator = estimator
        self.estimators_ =[]
       
        
    def fit(self, X, y):
        
        X,y = check_X_y(X,y, accept_sparse = True, multi_output = True)
        y = check_array(y,accept_sparse=True)
        
        for i in range(y.shape[1]):
            self.estimators_.append(clone(self.estimator).fit(X,y[:,i])) 
    
    
            
    def predict(self,X):
        pred =[]
                
        check_is_fitted(self,'estimators_')
        for e in self.estimators_ :
            pred.append(e.predict(X))
        arr = np.asarray(pred)
        return arr.T

# Splitting the dataset in 70% train and 30% test set

In [6]:
X_train,X_test,y_train,y_test = train_test_split(X,Y,train_size=0.7) 

## Task 2: Implement the Binary Relevance Algorithm with Under-Sampling

Traversed the columns of Y, one at a time to find whether the target label is imbalanced or not and under sampled the dataset if the target label are not balanced. Fit the under sampled data by appending estimator for each label in a list.
After checking whether the data is fitted or not, made predictions on the fitted data. 

In [55]:
class BinaryRelevanceSampling(BaseEstimator, ClassifierMixin, MetaEstimatorMixin):
    
    def __init__(self,estimator):
        
        self.estimator = estimator
        self.estimators_ =[]
       
        
    def fit(self, X, y):
        
        X,y = check_X_y(X,y, accept_sparse = True, multi_output = True)
        y = check_array(y,accept_sparse=True)
        
        for i in range(y.shape[1]):
            
            unique_elements, target_count = np.unique(y[:,i],return_counts =True)
            label0_ratio = round(target_count[0] / (target_count[1] + target_count[0]), 2)
            label1_ratio = round(target_count[1] / (target_count[1] + target_count[0]), 2)
            label_ratio = (label0_ratio/label1_ratio)

            if label_ratio != 1:
                rus = RandomUnderSampler(random_state=42)
                X_res, y_res = rus.fit_resample(X, y[:,i])
                self.estimators_.append(clone(self.estimator).fit(X,y[:,i])) 
            else:
                self.estimators_.append(clone(self.estimator).fit(X,y[:,i]))
                
    
    
            
    def predict(self,X):
        pred =[]
                
        check_is_fitted(self,'estimators_')
        for e in self.estimators_ :
            pred.append(e.predict(X))
        arr = np.asarray(pred)
        return arr.T

# Task 3: Compare the Performance of Different Binary Relevance Approaches

Used GridSearchCV to find the best performing estimator from DecisionTree, SVC, LogisticRegression for our binary relevance. Passed a dictionary of classifiers with hyper tuned parameters and used hamming_loss scorer for finding out the performance of each classifier. 

### Without Under Sampling 

In [19]:
from sklearn.metrics import make_scorer
param_grid = [
           
            {'estimator':[DecisionTreeClassifier()], 'estimator__criterion': ['gini', 'entropy'], 'estimator__max_depth':[10, 25, 50], 'estimator__min_samples_split': [2,5,10]},
            {'estimator':[SVC()], 'estimator__C':[0.1,1,10,100,1000], 'estimator__gamma':[1,0.1,0.01,0.001,0.0001], 'estimator__kernel':['rbf','linear']},
            {'estimator':[LogisticRegression()], 'estimator__max_iter':[10000,15000,20000]}
]

model_tuned = GridSearchCV(BinaryRelevance(param_grid), param_grid, verbose=2, n_jobs=1, scoring=make_scorer(hamming_loss, greater_is_better = False))
model_tuned.fit(X_train, y_train)

# print("Best param set:")
print(model_tuned.best_params_)
print(model_tuned.best_score_)


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 3 folds for each of 71 candidates, totalling 213 fits
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2 
[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.2s remaining:    0.0s


[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2, total=   1.3s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__



[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000 
[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
               




[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 


[Parallel(n_jobs=1)]: Done 213 out of 213 | elapsed: 26.7min finished


[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.2s
{'estimator': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), 'estimator__C': 1, 'estimator__gamma': 1, 'estimator__kernel': 'rbf'}
-0.19126467855030838


### Performance of Binary Relevance with best model estimated from GridSearchCV

In [20]:
#Binary Relevance with best model SVM as found from GridSearchCV above
y_pred = model_tuned.predict(X_test)

#Evaluating performance with SVM
print(("Hamming Loss with SVM : %f " ) % (hamming_loss(y_test,y_pred)))

Hamming Loss with SVM : 0.186639 


### With under Sampling

In [21]:
param_grid = [
           
            {'estimator':[DecisionTreeClassifier()], 'estimator__criterion': ['gini', 'entropy'], 'estimator__max_depth':[10, 25, 50], 'estimator__min_samples_split': [2,5,10]},
            {'estimator':[SVC()], 'estimator__C':[0.1,1,10,100,1000], 'estimator__gamma':[1,0.1,0.01,0.001,0.0001], 'estimator__kernel':['rbf','linear']},
            {'estimator':[LogisticRegression()], 'estimator__max_iter':[10000,15000,20000]}
]

model_tuned = GridSearchCV(BinaryRelevanceSampling(param_grid), param_grid, verbose=2, n_jobs=1, scoring=make_scorer(hamming_loss, greater_is_better = False))
model_tuned.fit(X_train, y_train)

print("Best param set:")
print(model_tuned.best_params_)
print(model_tuned.best_score_)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 3 folds for each of 71 candidates, totalling 213 fits
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2 
[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.1s remaining:    0.0s


[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2, total=   1.3s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__



[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 


[Parallel(n_jobs=1)]: Done 213 out of 213 | elapsed: 46.2min finished


[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.3s
Best param set:
{'estimator': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), 'estimator__C': 1, 'estimator__gamma': 1, 'estimator__kernel': 'rbf'}
-0.19126467855030838


In [22]:
#Binary Relevance with best model SVM for under sampled dataset as found from GridSearchCV above
y_pred = model_tuned.predict(X_test)

#Evaluating performance with SVM
print(("Hamming Loss with SVM : %f " ) % (hamming_loss(y_test,y_pred)))

Hamming Loss with SVM : 0.186639 


## Task 4: Implement the Classifier Chains Algorithm

In [23]:
class ClassifierChain(BaseEstimator, ClassifierMixin, MetaEstimatorMixin):
    
    def __init__(self,estimator):
        
        self.estimator = estimator
        self.estimators_ = []
       
        
    def fit(self, X, y):
        
        X,y = check_X_y(X,y, accept_sparse = True, multi_output = True)
        y = check_array(y,accept_sparse=True)
        
        
        for i in range(y.shape[1]):
            
            c = clone(self.estimator)
            
            if (i==0):
                c.fit(X,y[:,0])
            else:
                stacked = np.hstack((X, y[:, :i]))
                c.fit(stacked, y[:, i])
                
        
            self.estimators_.append(c)          
            
    def predict(self,X):
        
        check_is_fitted(self,'estimators_')
        
        for i, c in enumerate(self.estimators_):
            
            if i == 0:
                y_pred = (c.predict(X)).reshape(-1, 1)
            else:
                stacked = np.hstack((X, y_pred))
                new_y = c.predict(stacked)
                y_pred = np.hstack((y_pred, new_y.reshape(-1, 1)))
                
        return y_pred

## Task 5: Evaluate the Performance of the Classifier Chains Algorithm

In [24]:
param_grid = [
           
            {'estimator':[DecisionTreeClassifier()], 'estimator__criterion': ['gini', 'entropy'], 'estimator__max_depth':[10, 25, 50], 'estimator__min_samples_split': [2,5,10]},
            {'estimator':[SVC()], 'estimator__C':[0.1,1,10,100,1000], 'estimator__gamma':[1,0.1,0.01,0.001,0.0001], 'estimator__kernel':['rbf','linear']},
            {'estimator':[LogisticRegression()], 'estimator__max_iter':[10000,15000,20000]}
]

model_tuned = GridSearchCV(ClassifierChain(param_grid), param_grid, verbose=2, n_jobs=1, scoring=make_scorer(hamming_loss, greater_is_better = False))
model_tuned.fit(X_train, y_train)

print("Best param set:")
print(model_tuned.best_params_)
print(model_tuned.best_score_)

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 3 folds for each of 71 candidates, totalling 213 fits
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2 
[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    1.1s remaining:    0.0s


[CV]  estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__min_samples_split=2, total=   1.1s
[CV] estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=None, splitter='best'), estimator__criterion=gini, estimator__max_depth=10, estimator__



[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=10000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.2s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=15000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=15000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 




[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.3s
[CV] estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000 


[Parallel(n_jobs=1)]: Done 213 out of 213 | elapsed: 21.8min finished


[CV]  estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=20000,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False), estimator__max_iter=20000, total=   0.3s
Best param set:
{'estimator': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf', max_iter=-1,
    probability=False, random_state=None, shrinking=True, tol=0.001,
    verbose=False), 'estimator__C': 1, 'estimator__gamma': 1, 'estimator__kernel': 'rbf'}
-0.21479259947621868


In [56]:
#Classifier Chain with best model SVM as found from GridSearchCV above
y_pred = model_tuned.predict(X_test)

#Evaluating performance with SVM
print(("Hamming Loss with SVM : %f " ) % (hamming_loss(y_test,y_pred)))

Hamming Loss with SVM : 0.208284 


## Task 6: Reflect on the Performance of the Different Models Evaluated


Evaluating performance: For evaluating the performance of Binary Relevance I splitted the data in 70% train and 30% test dataset. I evaluated the performance using hamming loss and F1 measure. It was observed that hamming loss gave better performance insight than F1 measure as hamming loss gives equal weight to each label in multi label classification whereas in F1 measure, for each label the metrics (eg. precision, recall) are computed and then these label-wise metrics are aggregated. Hence, in this case we end up computing the precision/recall for each label over the entire dataset, as we do for a binary classification, then aggregate it. Thus, for a multi label imbalanced dataset hamming loss is a better performance score. 

Handling Imbalanced dataset: I checked the target value count in each of the 14 labels to find out which one is imbalanced, if the ratio of target values in a label is not 50%, I performed under sampling for that label column using RandomUnderSampler which down sampled the majority target value by randomly removing data rows until the dataset is balanced.

Performance of Binary Relevance from different approaches: I used GridSearchCV to find the best performing model from DecisionTree, LogisticRegression and SVM for Binary Relevance with and without under sampling. It estimated that SVM performed best for Binary Relevance with the hamming loss of 0.19 without undersampling whereas SVM performed best for undersampled data as well with hamming loss of 0.18, this is beacuse SVM(Support Vector Machine) is very fast and suitable for large datasets. SVM can be used for both binary and multi-label classifications as it handles both linear and non-linearly seaparable data very well. 

Classifier Chain: Stacked each label column to dataset with previous predicted label in classifier chain to make prediction for next label in the dataset. Used GridSearchCV to find the best performing model for classifier chain and evaluate the performance with the estimated model. SVM was estimated to be best classifier for making predictions with classifier chain with hamming loss of 0.21. Binary Relevance performed a little better than Classifier Chain here probably as hamming loss is more in classifier chain because the loss is propagated through label predictions as each predicated label value is stacked in the dataset for next label prediction. 

Conclusion: SVM performed better than decision tree and Logistic Regression classifier in both binary relevance and classifier chain, with or without sampling. 