# Multi-label classification 

It is a variant of the classification problem where multiple target labels must be assigned to each sample

MultLabelTrainData contains 103 features with 500 samples. MultLabelTestData contains 103 features with 100 samples. The label file for the train data can be download at MultLabelTrainLabel

In the following dataset, there are totally 14 target labels. The samples in the training dataset are assigned with more than one target label. For example, in the first sample MultLabelTrainLabel, the label assignment for the first sample is 7, 8, 12 and 13. Those positions are marked with 1.

0	0	0	0	0	0	1	1	0	0	0	1	1	0

Please predict the labels for the test samples. The output file format of Testing Label should be consistent with MultLabelTrainLabel. For example, if there are 3 test samples where the predicting labels for the first sample is that it has label of 2, 3, the predicting label for the second sample is 12, 14, and the predicting labels for the third one is 2, 5. The output is as follows:

0	1	1	0	0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0	0	1	0	1
0	1	0	0	1	0	0	0	0	0	0	0	0	0

In [1]:
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.multioutput import MultiOutputClassifier

In [2]:
#Loading the data

ml = np.loadtxt('MultLabelTrainData.txt')
ml_label = np.loadtxt('MultLabelTrainLabel.txt')
ml.shape

(500, 103)

In [3]:
# Normalizing the Data and splitting it
scaler = StandardScaler()
ml = scaler.fit_transform(ml)
X_train, X_test, Y_train, Y_test = train_test_split( ml, ml_label, test_size = 0.2, random_state = 0 )

In [4]:
clf5 = MultiOutputClassifier(KNeighborsClassifier(n_neighbors=5)).fit(X_train, Y_train)
clf5.score(X_test, Y_test)

0.21

In [5]:
#Calculating Hamming score which can score for partial matches
x = clf5.predict(X_test)
from sklearn.metrics import hamming_loss
hamming_loss(x, Y_test)

0.22071428571428572

In [6]:
mltest = np.loadtxt('MultLabelTestData.txt')
pred = clf5.predict(mltest)
np.savetxt("GelliMultiLabelClassification.txt", pred, fmt='%d')

In [7]:
df1 = pd.DataFrame(pred)
df1.head()
df1.to_csv("GelliMultiLabelClassification.txt", header=None, index=None, sep=' ',float_format='%.0f')