<img src="http://www.ubu.es/sites/default/files/portal_page/images/logo_color_2l_dcha.jpg" height="200" width="200" align="right"/> 
### Author: Abel Aioanei 
### Director: César García-Osorio 
### Director: Juan José Rodríguez Díez
### Title: Using my BinaryRelevance classifier and some metrics

### Table of contents:
* [Initializing the classifier](#classifier)
* [Fiting the classifier](#fit)
* [Compute predictions](#predict)
* [Compute probabilities of the predictions](#predict_proba)
* [Compute metrics](#metrics)
* [Feature Selection](#feature_selection)
* [Feature Selection for Binary Relevance](#feature_selection_bv)
* [Feature Selection](#feature_selection)
* [Make CrossValidation](#cross)

In [2]:
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.svm import SVC

from sklearn import datasets

import pandas as pd

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from sklearn.feature_selection import GenericUnivariateSelect

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_multilabel_classification
from sklearn.metrics import hamming_loss
from sklearn.metrics import accuracy_score


from library.MyBinaryRelevanceFeatureSelect import MyBinaryRelevanceFeatureSelect


Parameters

In [3]:
number_samples= 100
number_classes = 6
number_labels = 2#in case the data chosen to work with is constructed
cross_validation = 5
test_size=0.33
features_to_be_selected = 3

Choosing the Dataset to use or Contructing it by 'make_multilabel_classification'

In [4]:
iris = datasets.load_iris()

X = iris.data  
y = iris.target 
#it has a SINGLE LABEL
#have to split it other way

In [5]:
wine = datasets.load_wine()

X = wine.data  
y = wine.target
#it has a SINGLE LABEL
#have to split it other way

In [6]:
X, y = make_multilabel_classification(n_samples=number_samples, n_classes=number_classes, n_labels=number_classes,sparse = True, allow_unlabeled=False, random_state=1)

Splitting in Training and Test Data

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size)


<a id='classifier'></a>
Intializing with the my classifier class

In [8]:
clf = MyBinaryRelevanceFeatureSelect()

<a id='fit'></a>
Fit the classifier

In [9]:
clf.fit(X_train, y_train)

BinaryRelevance(classifier=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=True, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
        require_dense=[True, True])

<a id='predict'></a>
Compute predictions

In [10]:
predictions = clf.predict(X_test)
pd.DataFrame(predictions.toarray()).head()

Unnamed: 0,0,1,2,3,4,5
0,1,1,0,1,1,1
1,1,1,0,1,1,1
2,1,1,0,1,1,1
3,1,1,0,1,1,1
4,1,1,0,1,1,1


<a id='predict_proba'></a>
Compute probabilities of the predictions

In [11]:
probability_predictions = clf.predict_proba(X_test)
pd.DataFrame(probability_predictions.toarray()).head()


Unnamed: 0,0,1,2,3,4,5
0,0.928954,0.870272,0.140762,0.764143,0.583313,0.582269
1,0.915178,0.93886,0.22628,0.807937,0.575524,0.597128
2,0.922847,0.925715,0.233787,0.862355,0.750064,0.584054
3,0.92659,0.882388,0.161468,0.924823,0.612838,0.584973
4,0.918552,0.901744,0.102528,0.663504,0.594288,0.592304


<a id='metrics'></a>
Compute metrics

In [12]:
hamming = hamming_loss(y_test, predictions.toarray())
print("Hamming Loss:", "%.3f" % hamming)

accuracy = accuracy_score(y_test, predictions.toarray())
print("Accuracy Score:", "%.3f" % accuracy)

Hamming Loss: 0.177
Accuracy Score: 0.273


<a id='feature_selection'></a>

Feature Selection function

In this part I defin a simple function that can do feature selection over a dataset using a given estimator

In [13]:
def feature_select(X, y, estimator):   
        estimator.fit(X, y)
        selected_attributes_indices = estimator.get_support(indices = True)
        
        return selected_attributes_indices

<a id='feature_selection_bv'></a>

Feature Selection for single labeled multi-class set

Using this for this specific case regarding Binary Relevance

Have to detail more this

It returns an array of selected features in accordance with the estimator given


In [14]:
def feature_selection_br(X, y, estimator):
    selected_features_array = []

    for i in range(0,predictions.shape[1]):
        indices_of_selected_features = feature_select(X_test, predictions[:,i].toarray(), estimator)
        selected_features_array.append(indices_of_selected_features)
        #print(indices_of_selected_features,'indices #',i)

    df = pd.DataFrame(selected_features_array)
    return df

<a id='estimator'></a>

Select the Feature Selection estimator: SelectKBest or GenericUnivariateSelect

In [15]:
estimator = GenericUnivariateSelect(chi2, 'k_best', param=features_to_be_selected)

In [16]:
estimator = SelectKBest(chi2, k=features_to_be_selected)

Perform the Feature Selection for BR

returns a dataframe with the atributes selected for each instance

In [17]:
feature_selection_br(X, y, estimator)

Unnamed: 0,0,1,2
0,17,18,19
1,17,18,19
2,17,18,19
3,17,18,19
4,1,2,16
5,17,18,19


<a id='cross_validation'></a>
Perform the Cross Validation for the model

In [18]:
scores = cross_val_score(clf, X, y, cv=cross_validation, scoring='recall_macro')
average=np.mean(scores)
print(scores)
print(average)

TypeError: Cannot clone object '<library.MyBinaryRelevanceFeatureSelect.MyBinaryRelevanceFeatureSelect object at 0x000001F862D3F898>' (type <class 'library.MyBinaryRelevanceFeatureSelect.MyBinaryRelevanceFeatureSelect'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.

In [None]:
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB

# initialize Binary Relevance multi-label classifier
# with an SVM classifier
# SVM in scikit only supports the X matrix in sparse representation

classifier = BinaryRelevance(
    classifier = MultinomialNB(),
    require_dense = [False, True]
)

# train
classifier.fit(X_train, y_train)

# predict
predictions2 = classifier.predict(X_test)

In [None]:
predictions1
df = pd.DataFrame(predictions1.toarray())
df.head()

In [None]:
predictions2
df = pd.DataFrame(predictions2.toarray())
df.head()