# Multilabel Classification Example

Import required libraries and packages

In [9]:
from numpy import mean
from numpy import std
from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import RepeatedKFold
from keras.models import Sequential
from keras.layers import Dense
from sklearn.metrics import accuracy_score
from numpy import asarray


Create a synthetic multi-label classification dataset using the make_multilabel_classification() function in the scikit-learn library.
This dataset will have 1,000 samples with 10 input features. The dataset will have three class label outputs for each sample and each class will have one or two values (0 or 1, e.g. present or not present).

In [2]:
def get_dataset():
	X, y = make_multilabel_classification(n_samples=1000, n_features=10, n_classes=3, n_labels=2, random_state=1)
	return X, y

# Model Description
The task has three output labels (classes) therefore it will require a neural network output layer with three nodes in the output layer.

We use the ReLU activation function in the hidden layer. The hidden layer has 20 nodes.

Each node in the output layer uses the sigmoid activation. This will predict a probability of class membership for the label, a value between 0 and 1. 

We fit the model using binary cross-entropy loss and the Adam version of stochastic gradient descent.

### **Input and Output Dimensions**
Each sample has 10 inputs and three outputs; therefore, the network requires an input layer that expects 10 inputs specified via the “input_dim” argument in the first hidden layer and three nodes in the output layer.

 

In [3]:
# get the model
def get_model(n_inputs, n_outputs):
	model = Sequential()
	model.add(Dense(20, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
	model.add(Dense(n_outputs, activation='sigmoid'))
	model.compile(loss='binary_crossentropy', optimizer='adam')
	return model

In [4]:
# evaluate a model using repeated k-fold cross-validation
#takes the dataset, evaluates the model, and returns a list of evaluation scores, in this case, accuracy scores.
def evaluate_model(X, y):
	results = list()
	n_inputs, n_outputs = X.shape[1], y.shape[1]
	# define evaluation procedure
	cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
	# enumerate folds
	for train_ix, test_ix in cv.split(X):
		# prepare data
		X_train, X_test = X[train_ix], X[test_ix]
		y_train, y_test = y[train_ix], y[test_ix]
		# define model
		model = get_model(n_inputs, n_outputs)
		# fit model
		model.fit(X_train, y_train, verbose=0, epochs=100)
		# make a prediction on the test set
		yhat = model.predict(X_test)
		# round probabilities to class labels
		yhat = yhat.round()
		# calculate accuracy
		acc = accuracy_score(y_test, yhat)
		# store result
		print('>%.3f' % acc)
		results.append(acc)
	return results

# Training

In [5]:
# load dataset
X, y = get_dataset()
# evaluate model
results = evaluate_model(X, y)
# summarize performance
print('Accuracy: %.3f (%.3f)' % (mean(results), std(results)))

>0.860
>0.860
>0.850
>0.810
>0.850
>0.860
>0.770
>0.810
>0.820
>0.760
>0.820
>0.780
>0.830
>0.760
>0.780
>0.810
>0.840
>0.810
>0.730
>0.850
>0.780
>0.860
>0.800
>0.830
>0.820
>0.800
>0.840
>0.800
>0.810
>0.790
Accuracy: 0.813 (0.034)


# Prediction
The model will predict the probability for each class label. This means it will predict three probabilities for each sample.

In [10]:
# fit the model on all data
n_inputs, n_outputs = X.shape[1], y.shape[1]
model = get_model(n_inputs, n_outputs)
model.fit(X, y, verbose=0, epochs=100)
# make a prediction for new data
row = [3, 3, 6, 7, 8, 2, 11, 11, 1, 3]
newX = asarray([row])
yhat = model.predict(newX)
print('Predicted: %s' % yhat[0])

Predicted: [0.9996196  0.97865856 0.00253844]


Prediction contains three output variables required for the multi-label classification task: the probabilities of each class label.
Output of the model for an input is : Predicted: [0.9996196  0.97865856 0.00253844]

It shows that class 1 and class 2 are true while class 3 is false in this example since the true probabilties for class 1 and class 2 are close to 1 whereas it is close to 0 for class 3