<img src="http://www.ubu.es/sites/default/files/portal_page/images/logo_color_2l_dcha.jpg" height="200" width="200" align="right"/> 
### Author: Eduardo Tubilleja Calvo 
### Director: Álvar Arnaiz González 
### Director: Juan José Rodríguez Díez
### Title: Example with real ML data set

### Índice:
* [Classifiers](#classifier)
* [Fit](#fit)
* [Predict](#predict)
* [Predict_proba](#predict_proba)
* [Measures](#measures)
* [Tree](#tree)
* [CrossValidation](#cross)

In this notebook, we will see the example of an ensemble on base classifiers, than from a set of real data obtained from [Mulan](http://mulan.sourceforge.net/datasets-mlc.html), we train and predict them.

After this, different sklearn distances and measures are calculated, and we draw a tree to better appreciate the results.

Finally we use cross validation.

In [1]:
from sklearn_ubu.disturbing_neighbors import DisturbingNeighbors
from sklearn_ubu.random_oracles import RandomOracles
from sklearn_ubu.rotation_forest import RotationForest
from sklearn.datasets import make_multilabel_classification
from sklearn.metrics import hamming_loss
from sklearn.metrics import accuracy_score
from sklearn.metrics import jaccard_similarity_score
from sklearn.metrics import zero_one_loss
from sklearn.metrics import f1_score
from sklearn.metrics import precision_score
from sklearn.metrics import fbeta_score
from sklearn.metrics import recall_score
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from sklearn.tree import export_graphviz
from sklearn.tree import DecisionTreeClassifier
import graphviz
import arff


Read the file that contains the data set that we are going to use

In [2]:
dataset = arff.load(open('flags.arff', "r"))
data = np.array(dataset['data'], dtype=float)
data

array([[ 0.,  0.,  0., ...,  1.,  1.,  0.],
       [ 0.,  0.,  1., ...,  0.,  1.,  0.],
       [ 0.,  0.,  0., ...,  1.,  0.,  0.],
       ..., 
       [ 0.,  0.,  0., ...,  0.,  0.,  1.],
       [ 0.,  0.,  0., ...,  0.,  1.,  1.],
       [ 0.,  0.,  0., ...,  1.,  1.,  0.]])

In [3]:
rows = np.arange(0,data.shape[0])
columnsX = np.arange(0,data.shape[1]-7)
columnsy = np.arange(data.shape[1]-7,data.shape[1])
rows[:, np.newaxis]
X=data[rows[:, np.newaxis], columnsX]
y=data[rows[:, np.newaxis], columnsy]


Select classifier Disturbing Neighbors or Random Oracles or Rotation Forest <a id='classifier'></a>

In [4]:
classifier = DisturbingNeighbors()

In [8]:
classifier = RandomOracles()

In [5]:
classifier = RotationForest()

In [5]:
X_train, X_test, y_train, y_test = train_test_split(
                X, y, test_size=0.5, train_size=0.5)


Train of classifier<a id='fit'></a>

In [6]:
classifier_train = classifier.fit(X_train, y_train)

After being fitted, the model can then be used to predict the class of samples: <a id='predict'></a>

In [7]:
y_predict = classifier.predict(X_test)


Alternatively, the probability of each class can be predicted, which is the fraction of training samples of the same class in a leaf:<a id='predict_proba'></a>

In [8]:
y_predict_proba = classifier.predict_proba(X_test)


Calculate different distances and measures <a id='measures'></a>

In [11]:
dist_hamming = hamming_loss(y_test, y_predict)
print("Hamming Loss:", dist_hamming)

dist_accuracy = accuracy_score(y_test, y_predict)
print("Accuracy Score:", dist_accuracy)

dist_jaccard = jaccard_similarity_score(y_test, y_predict)
print("Jaccard Similarity Score:", dist_jaccard)

dist_zero_one = zero_one_loss(y_test, y_predict)
print("Zero One Loss:", dist_zero_one)

measure_f1 = f1_score(y_test, y_predict, average='micro')
print("F1 Score:", measure_f1)

measure_precision = precision_score(y_test, y_predict, average='micro')
print("Precision Score:", measure_precision)

measure_fbeta = fbeta_score(y_test, y_predict, average='micro', beta=0.5)
print("Fbeta Score:", measure_fbeta)

measure_recall = recall_score(y_test, y_predict, average='micro')
print("Recall Score:", measure_recall)

F1 Score: 0.706422018349
Precision Score: 0.702127659574
Fbeta Score: 0.703839122486
Recall Score: 0.710769230769


Once trained, we can export the tree in Graphviz format using the export_graphviz exporter. If you use the conda package manager, the graphviz binaries and the python package can be installed with

conda install python-graphviz

The export_graphviz exporter also supports a variety of aesthetic options. Jupyter notebooks also render these plots inline automatically:
<a id='tree'></a>

In [12]:
dot_data = export_graphviz(classifier_train, out_file=None)
graph = graphviz.Source(dot_data)
graph

AttributeError: 'NoneType' object has no attribute 'tree_'

Croos Validation <a id='cross'></a>

In [16]:
scores = cross_val_score(classifier, X, y, cv=5)
print(scores)

[ 0.20512821  0.23076923  0.17948718  0.28205128  0.15789474]
