# Homework 7

#### 1. Multi-class and Multi-Label Classification Using Support Vector Machines

#### (a) Download the Anuran Calls (MFCCs) Data Set from: https://archive.ics.uci.edu/ml/datasets/Anuran+Calls+%28MFCCs%29 . Choose 70% of the data randomly as the training set.

In [328]:
#Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.svm import SVC, LinearSVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import silhouette_score
from scipy.spatial.distance import hamming
import joblib

from imblearn.over_sampling import SMOTE

from sklearn.cluster import KMeans

In [329]:
#Reading the dataset
df = pd.read_csv("../data/Frogs_MFCCs.csv")

In [330]:
#Viewing data
df.head()

Unnamed: 0,MFCCs_ 1,MFCCs_ 2,MFCCs_ 3,MFCCs_ 4,MFCCs_ 5,MFCCs_ 6,MFCCs_ 7,MFCCs_ 8,MFCCs_ 9,MFCCs_10,...,MFCCs_17,MFCCs_18,MFCCs_19,MFCCs_20,MFCCs_21,MFCCs_22,Family,Genus,Species,RecordID
0,1.0,0.152936,-0.105586,0.200722,0.317201,0.260764,0.100945,-0.150063,-0.171128,0.124676,...,-0.108351,-0.077623,-0.009568,0.057684,0.11868,0.014038,Leptodactylidae,Adenomera,AdenomeraAndre,1
1,1.0,0.171534,-0.098975,0.268425,0.338672,0.268353,0.060835,-0.222475,-0.207693,0.170883,...,-0.090974,-0.05651,-0.035303,0.02014,0.082263,0.029056,Leptodactylidae,Adenomera,AdenomeraAndre,1
2,1.0,0.152317,-0.082973,0.287128,0.276014,0.189867,0.008714,-0.242234,-0.219153,0.232538,...,-0.050691,-0.02359,-0.066722,-0.025083,0.099108,0.077162,Leptodactylidae,Adenomera,AdenomeraAndre,1
3,1.0,0.224392,0.118985,0.329432,0.372088,0.361005,0.015501,-0.194347,-0.098181,0.270375,...,-0.136009,-0.177037,-0.130498,-0.054766,-0.018691,0.023954,Leptodactylidae,Adenomera,AdenomeraAndre,1
4,1.0,0.087817,-0.068345,0.306967,0.330923,0.249144,0.006884,-0.265423,-0.1727,0.266434,...,-0.048885,-0.053074,-0.08855,-0.031346,0.10861,0.079244,Leptodactylidae,Adenomera,AdenomeraAndre,1


#### This dataset was created segmenting 60 audio records belonging to 4 different families, 8 genus, and 10 species. Each audio corresponds to one specimen (an individual frog), the record ID is also included as an extra column.

In [331]:
def get_train_test(data:pd.DataFrame):
    
    '''Returns train and test sets'''
    data1 = data.drop("RecordID",axis=1).copy()
    
    labels = ['Family','Genus','Species']
    train = data1.sample(frac=0.70,random_state=0)
    test = data1[data1.index.isin(train.index)==False]

    train = train.reset_index(drop=True)
    test = test.reset_index(drop=True)
    
    Xtrain = train.drop(labels,axis=1)
    ytrain = train[labels]
    
    Xtest = test.drop(labels,axis=1)
    ytest = test[labels]

    return Xtrain,Xtest,ytrain,ytest


def exact_match(ytrue,ypred):
    '''Returns exact match score'''
    return np.all(ypred == ytrue, axis=1).mean()

def hammingloss(ytrue,ypred):
    '''In multi-label classification, hamming loss penalizes only the individual labels.'''
    temp=0
    for i in range(ytrue.shape[0]):
        temp += np.size(ytrue[i] == ypred[i]) - np.count_nonzero(ytrue[i] == ypred[i])
    return temp/(ytrue.shape[0] * ytrue.shape[1])

def hamming_score(ytrue, ypred):
    '''The proportion of the predicted correct labels to the total number (predicted and actual) of labels 
    for that instance. Overall hamming score is the average across all instances.'''
    temp = 0
    for i in range(ytrue.shape[0]):
        temp += np.count_nonzero(np.intersect1d(ytrue[i],ypred[i])) / np.count_nonzero(np.union1d(ytrue[i],ypred[i]))
    return temp / ytrue.shape[0]

def hamming_distance(ytrue,ypred):
    ''' The Hamming distance between two vectors is simply the sum of corresponding elements that differ between the vectors.'''
    s = 0
    class_cnt = ytrue.shape[1]
    for i in range(class_cnt):
        s+=hamming(ytrue[:,i],ypred[:,i])

    return int((s/class_cnt)*ytrue.shape[0])

In [332]:
Xtrain,Xtest,ytrain,ytest = get_train_test(df)

#### (b) Each instance has three labels: Families, Genus, and Species. Each of the labels has multiple classes. We wish to solve a multi-class and multi-label problem. One of the most important approaches to multi-label classification is to train a classifier for each label (binary relevance). We first try this approach:

#### i. Research exact match and hamming score/ loss methods for evaluating multi-label classification and use them in evaluating the classifiers in this problem.

**Exact Match Ratio:** Simply ignore partially correct (consider them incorrect) and extend the accuracy used in single label case for multi-label prediction.
    
$MR = (1/n) \sum_{i=1}^n I(y_i=\hat y_i)$
  
I is the indicator function. Clearly, a disadvantage of this measure is that it does not distinguish between complete incorrect and partially correct which might be considered harsh.

--------------------------------------------------------------------------------------------------------------------------
**0/1 Loss:** Here, we calculate proportions of instances whose actual value is not equal to predicted value.

$0/1 Loss = (1/n) \sum_{i=1}^n I(y_i ≠ \hat y_i)$

--------------------------------------------------------------------------------------------------------------------------

**Hamming Score:** Accuracy for each instance is defined as the proportion of the predicted correct labels to the total number (predicted and actual) of labels for that instance. Overall accuracy is the average across all instances. It is less ambiguously referred to as the Hamming score.

$Accuracy = (1/n) \sum_{i=1}^n \frac{|y_i ∩ \hat y_i|}{|y_i ∪ \hat y_i|}$

--------------------------------------------------------------------------------------------------------------------------
**Hamming Loss:** It reports how many times on average, the relevance of an example to a class label is incorrectly predicted. Therefore, hamming loss takes into account the prediction error (an incorrect label is predicted) and missing error (a relevant label not predicted), normalized over total number of classes and total number of examples.

$Hamming Loss = (1/nL)\sum_{i=1}^n \sum_{j=1}^L I(y_{i}^j ≠ \hat y_i^j)$

where I is the indicator function. Ideally, we would expect the hamming loss to be 0, which would imply no error; practically the smaller the value of hamming loss, the better the performance of the learning algorithm.

--------------------------------------------------------------------------------------------------------------------------
**Hamming Distance:** The Hamming Distance finds the sum of corresponding elements that differ between two vectors. Practically-speaking, the greater the Hamming Distance is, the more the two vectors differ. Inversely, the smaller the Hamming Distance, the more similar the two vectors are.

#### ii. Train a SVM for each of the labels, using Gaussian kernels and one versus all classifiers. Determine the weight of the SVM penalty and the width of the Gaussian Kernel using 10 fold cross validation. You are welcome to try to solve the problem with both standardized and raw attributes and report the results.

One of the most important approaches to multi-label classification is to train a
classifier for each label (binary relevance)

In [333]:
def get_params_SVM(Xtrain,ytrain,decision_function_shape='ovr'):   
    svm = SVC(kernel="rbf",decision_function_shape=decision_function_shape)
    gridsearch_cv = GridSearchCV(estimator=svm, param_grid={"C":np.logspace(-2,6,5),"gamma":np.logspace(0.1,3,5)}, cv=10)
    gridsearch_cv.fit(Xtrain,ytrain)
    return gridsearch_cv

In [338]:
#Family
fam_gridcv = get_params_SVM(Xtrain,ytrain["Family"])

#Genus
gen_gridcv = get_params_SVM(Xtrain,ytrain["Genus"])

# Species
species_gridcv = get_params_SVM(Xtrain,ytrain["Species"])

In [340]:
# Saving the models
joblib.dump(fam_gridcv,'fam_svm_ii.pkl')
joblib.dump(gen_gridcv,'gen_svm_ii.pkl')
joblib.dump(species_gridcv,'species_svm_ii.pkl')

In [341]:
#Loading the models
fam_svm = joblib.load('fam_svm_ii.pkl')
gen_svm = joblib.load('gen_svm_ii.pkl')
species_svm = joblib.load('species_svm_ii.pkl')

In [342]:
print("Family:")
print("Best Parameters: ",fam_svm.best_params_)
print("Best Score:",gen_svm.best_score_)
print()

print("Genus:")
print("Best Parameters: ",gen_svm.best_params_)
print("Best Score:",gen_svm.best_score_)
print()

print("Species:")
print("Best Parameters: ",species_svm.best_params_)
print("Best Score:",species_svm.best_score_)

Family:
Best Parameters:  {'C': 100.0, 'gamma': 1.2589254117941673}
Best Score: 0.990864337782827

Genus:
Best Parameters:  {'C': 100.0, 'gamma': 1.2589254117941673}
Best Score: 0.990864337782827

Species:
Best Parameters:  {'C': 100.0, 'gamma': 1.2589254117941673}
Best Score: 0.9898718798321184


In [343]:
#Making predictions
ypred_fam = fam_svm.predict(Xtest)[:,np.newaxis]
ypred_gen = gen_svm.predict(Xtest)[:,np.newaxis]
ypred_spe = species_svm.predict(Xtest)[:,np.newaxis]

#Combining predictions
ypred_test = np.concatenate([ypred_fam,ypred_gen,ypred_spe],axis=1)

print("Performance on Test Set:\n")
print("Exact Match:", exact_match(ytest.values,ypred_test))
print("Hamming Loss:", hammingloss(ytest.values,ypred_test))

Performance on Test Set:

Exact Match: 0.9856415006947661
Hamming Loss: 0.010653080129689671


It seems that the data are already normalized.

#### iii. Repeat 1(b)ii with L1 -penalized SVMs. 3 Remember to standardize the attributes. Determine the weight of the SVM penalty using 10 fold cross validation.

In [344]:
from sklearn.preprocessing import StandardScaler
m = StandardScaler()
Xtrain1 = m.fit_transform(Xtrain)
Xtest1 = m.transform(Xtest)

In [345]:
import sys
def get_params_LinearSVM(Xtrain,ytrain,multi_class='ovr'):   
    '''Returns fitted grid search cv object on Xtrain, ytrain.'''
    svm = LinearSVC(penalty='l1', dual=False,max_iter=10000)
    gridsearch_cv = GridSearchCV(estimator=svm, param_grid={"C": np.logspace(-2, 6, 5)}, cv=10)
    gridsearch_cv.fit(Xtrain, ytrain)   
    return gridsearch_cv

In [348]:
#Family
fam_gridcv1 = get_params_LinearSVM(Xtrain1,ytrain["Family"])

#Genus
gen_gridcv1 = get_params_LinearSVM(Xtrain1,ytrain["Genus"])

# Species
species_gridcv1 = get_params_LinearSVM(Xtrain1,ytrain["Species"])

In [349]:
# #Saving the models
joblib.dump(fam_gridcv1,'fam_svm_iii.pkl')
joblib.dump(gen_gridcv1,'gen_svm_iii.pkl')
joblib.dump(species_gridcv1,'species_svm_iii.pkl')

In [350]:
#Loading the models
fam_svm = joblib.load('fam_svm_iii.pkl')
gen_svm = joblib.load('gen_svm_iii.pkl')
species_svm = joblib.load('species_svm_iii.pkl')

In [352]:
print("Family:")
print("Best Parameters: ",fam_svm.best_params_)
print("Best Score:",gen_svm.best_score_)
print()

print("Genus:")
print("Best Parameters: ",gen_svm.best_params_)
print("Best Score:",gen_svm.best_score_)
print()

print("Species:")
print("Best Parameters: ",species_svm.best_params_)
print("Best Score:",species_svm.best_score_)

Family:
Best Parameters:  {'C': 100.0}
Best Score: 0.9513506264003281

Genus:
Best Parameters:  {'C': 100.0}
Best Score: 0.9513506264003281

Species:
Best Parameters:  {'C': 100.0}
Best Score: 0.9590962163526775


In [355]:
#Making predictions
ypred_fam = fam_svm.predict(Xtest1)[:,np.newaxis]
ypred_gen = gen_svm.predict(Xtest1)[:,np.newaxis]
ypred_spe = species_svm.predict(Xtest1)[:,np.newaxis]
ypred_test = np.concatenate([ypred_fam,ypred_gen,ypred_spe],axis=1)

print("Performance on Test Set:\n")
print("Exact Match:", exact_match(ytest.values,ypred_test))
print("Hamming Loss:", hammingloss(ytest.values,ypred_test))

Performance on Test Set:

Exact Match: 0.9198703103288559
Hamming Loss: 0.04832484174772271


#### iv. Repeat 1(b)iii by using SMOTE or any other method you know to remedy class imbalance. Report your conclusions about the classifiers you trained.

In [356]:
import sys
def get_params_SmoteSVM(Xtrain,ytrain,multi_class='ovr'):   
    '''Returns fitted grid search cv object on resampled train set'''
    
    s = SMOTE(sampling_strategy='minority')
    Xtrain2, ytrain2 = s.fit_resample(Xtrain,ytrain)
    
    svm = LinearSVC(penalty='l1', dual=False,max_iter=10000)
    gridsearch_cv = GridSearchCV(estimator=svm, param_grid={"C": np.logspace(-2, 6, 5)}, cv=10)
    gridsearch_cv.fit(Xtrain2, ytrain2)   
    return gridsearch_cv

In [359]:
# Family
fam_gridcv2 = get_params_SmoteSVM(Xtrain1,ytrain["Family"])

#Genus
gen_gridcv2 = get_params_SmoteSVM(Xtrain1,ytrain["Genus"])

# Species
species_gridcv2 = get_params_SmoteSVM(Xtrain1,ytrain["Species"])

In [362]:
#Saving the models
joblib.dump(fam_gridcv2,'fam_svm_iv.pkl')
joblib.dump(gen_gridcv2,'gen_svm_iv.pkl')
joblib.dump(species_gridcv2,'species_svm_iv.pkl')

In [363]:
#Loading the models
fam_svm = joblib.load('fam_svm_iv.pkl')
gen_svm = joblib.load('gen_svm_iv.pkl')
species_svm = joblib.load('species_svm_iv.pkl')

In [364]:
print("Family:")
print("Best Parameters: ",fam_svm.best_params_)
print("Best Score:",gen_svm.best_score_)
print()

print("Genus:")
print("Best Parameters: ",gen_svm.best_params_)
print("Best Score:",gen_svm.best_score_)
print()

print("Species:")
print("Best Parameters: ",species_svm.best_params_)
print("Best Score:",species_svm.best_score_)

Family:
Best Parameters:  {'C': 10000.0}
Best Score: 0.9634542079405977

Genus:
Best Parameters:  {'C': 100.0}
Best Score: 0.9634542079405977

Species:
Best Parameters:  {'C': 100.0}
Best Score: 0.9679713165168817


In [365]:
#Making Predictions
ypred_fam = fam_svm.predict(Xtest1)[:,np.newaxis]
ypred_gen = gen_svm.predict(Xtest1)[:,np.newaxis]
ypred_spe = species_svm.predict(Xtest1)[:,np.newaxis]
ypred_test = np.concatenate([ypred_fam,ypred_gen,ypred_spe],axis=1)

print("Performance on Test Set:\n")
print("Exact Match:", exact_match(ytest.values,ypred_test))
print("Hamming Loss:", hammingloss(ytest.values,ypred_test))

Performance on Test Set:

Exact Match: 0.9106067623899954
Hamming Loss: 0.05681642735834491


#### Report your conclusions about the classifiers you trained.

From the results, Gaussian Kernel gives the best performance.

#### 2. K-Means Clustering on a Multi-Class and Multi-Label Data Set

Monte-Carlo Simulation: Perform the following procedures 50 times, and report the average and standard deviation of the 50 Hamming Distances that you calculate.

#### (a) Use k-means clustering on the whole Anuran Calls (MFCCs) Data Set (do not split the data into train and test, as we are not performing supervised learning in this exercise). Choose k ∈ { 1 , 2 , . . . , 50 } automatically based on one of the methods provided in the slides (CH or Gap Statistics or scree plots or Silhouettes) or any other method you know.

#### (b) In each cluster, determine which family is the majority by reading the true labels. Repeat for genus and species.

#### (c) Now for each cluster you have a majority label triplet (family, genus, species). Calculate the average Hamming distance, Hamming score, and Hamming loss between the true labels and the labels assigned by clusters.

In [366]:
X = df.drop(["Family","Genus","Species","RecordID"],axis=1).values
y = df[["Family","Genus","Species"]].values

In [375]:
#Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.svm import SVC, LinearSVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import silhouette_score
from scipy.spatial.distance import hamming
import joblib

from imblearn.over_sampling import SMOTE

from sklearn.cluster import KMeans



hamloss = []
hamscore = []
hamdist = []
for iteration in range(50):
    
    print("\nIteration:",iteration)
    silhouette_avg = []
    classifiers = []
    krange = range(2,51)

    for kval in krange:
        k = KMeans(n_clusters=kval)
        k.fit(X)
        
        #Fitted Classifier
        classifiers.append(k)
    
        # silhouette score
        silhouette_avg.append(silhouette_score(X, k.labels_))
            
    index = silhouette_avg.index(max(silhouette_avg))
    best_k = index + 2
    best_clf = classifiers[index]
    labels = best_clf.labels_
    df["Label"] = labels
    fam_data = df[["Family","Label"]]
    gen_data = df[["Genus","Label"]]
    spe_data = df[["Species","Label"]]

    fam_major = fam_data.groupby(["Label","Family"])["Family"].count().unstack().idxmax(axis=1)
    gen_major = gen_data.groupby(["Label","Genus"])["Genus"].count().unstack().idxmax(axis=1)
    spe_major = spe_data.groupby(["Label","Species"])["Species"].count().unstack().idxmax(axis=1)
    
    print("Label Distribution:",np.unique(labels,return_counts=True))
    print("\nMajority in each class:")
    print("\nFAMILY:\n",fam_major)
    print("\nGENUS:\n",gen_major)
    print("\nSPECIES:\n",spe_major)
    print("-----------------------------------------------------------------------------------------------------------------\n\n")
    
    flist = []
    glist = []
    slist = []
    ypred_k = [0]*df.shape[0]
    for j in range(len(df['Label'])):
        lval = df.loc[j,'Label'] 
        ypred_k[j] = [fam_major.loc[lval],gen_major.loc[lval],spe_major.loc[lval]]

    ypred_k = np.array(ypred_k)
    
    hamloss.append(hammingloss(y,ypred_k))
    hamscore.append(hamming_score(y,ypred_k))
    hamdist.append(hamming_distance(y,ypred_k))
 


Iteration: 0
Label Distribution: (array([0, 1, 2, 3]), array([ 614, 3568, 1982, 1031], dtype=int64))

Majority in each class:

FAMILY:
 Label
0            Hylidae
1    Leptodactylidae
2            Hylidae
3      Dendrobatidae
dtype: object

GENUS:
 Label
0    Hypsiboas
1    Adenomera
2    Hypsiboas
3     Ameerega
dtype: object

SPECIES:
 Label
0      HypsiboasCinerascens
1    AdenomeraHylaedactylus
2         HypsiboasCordobae
3        Ameeregatrivittata
dtype: object
-----------------------------------------------------------------------------------------------------------------



Iteration: 1
Label Distribution: (array([0, 1, 2, 3]), array([3568, 1982,  614, 1031], dtype=int64))

Majority in each class:

FAMILY:
 Label
0    Leptodactylidae
1            Hylidae
2            Hylidae
3      Dendrobatidae
dtype: object

GENUS:
 Label
0    Adenomera
1    Hypsiboas
2    Hypsiboas
3     Ameerega
dtype: object

SPECIES:
 Label
0    AdenomeraHylaedactylus
1         HypsiboasCordobae
2      H


Iteration: 14
Label Distribution: (array([0, 1, 2, 3]), array([3568, 1031, 1982,  614], dtype=int64))

Majority in each class:

FAMILY:
 Label
0    Leptodactylidae
1      Dendrobatidae
2            Hylidae
3            Hylidae
dtype: object

GENUS:
 Label
0    Adenomera
1     Ameerega
2    Hypsiboas
3    Hypsiboas
dtype: object

SPECIES:
 Label
0    AdenomeraHylaedactylus
1        Ameeregatrivittata
2         HypsiboasCordobae
3      HypsiboasCinerascens
dtype: object
-----------------------------------------------------------------------------------------------------------------



Iteration: 15
Label Distribution: (array([0, 1, 2, 3]), array([1031, 3568, 1982,  614], dtype=int64))

Majority in each class:

FAMILY:
 Label
0      Dendrobatidae
1    Leptodactylidae
2            Hylidae
3            Hylidae
dtype: object

GENUS:
 Label
0     Ameerega
1    Adenomera
2    Hypsiboas
3    Hypsiboas
dtype: object

SPECIES:
 Label
0        Ameeregatrivittata
1    AdenomeraHylaedactylus
2     


Iteration: 28
Label Distribution: (array([0, 1, 2, 3]), array([3568,  614, 1982, 1031], dtype=int64))

Majority in each class:

FAMILY:
 Label
0    Leptodactylidae
1            Hylidae
2            Hylidae
3      Dendrobatidae
dtype: object

GENUS:
 Label
0    Adenomera
1    Hypsiboas
2    Hypsiboas
3     Ameerega
dtype: object

SPECIES:
 Label
0    AdenomeraHylaedactylus
1      HypsiboasCinerascens
2         HypsiboasCordobae
3        Ameeregatrivittata
dtype: object
-----------------------------------------------------------------------------------------------------------------



Iteration: 29
Label Distribution: (array([0, 1, 2, 3]), array([1982, 1035,  610, 3568], dtype=int64))

Majority in each class:

FAMILY:
 Label
0            Hylidae
1      Dendrobatidae
2            Hylidae
3    Leptodactylidae
dtype: object

GENUS:
 Label
0    Hypsiboas
1     Ameerega
2    Hypsiboas
3    Adenomera
dtype: object

SPECIES:
 Label
0         HypsiboasCordobae
1        Ameeregatrivittata
2     


Iteration: 42
Label Distribution: (array([0, 1, 2, 3]), array([3568, 1031,  614, 1982], dtype=int64))

Majority in each class:

FAMILY:
 Label
0    Leptodactylidae
1      Dendrobatidae
2            Hylidae
3            Hylidae
dtype: object

GENUS:
 Label
0    Adenomera
1     Ameerega
2    Hypsiboas
3    Hypsiboas
dtype: object

SPECIES:
 Label
0    AdenomeraHylaedactylus
1        Ameeregatrivittata
2      HypsiboasCinerascens
3         HypsiboasCordobae
dtype: object
-----------------------------------------------------------------------------------------------------------------



Iteration: 43
Label Distribution: (array([0, 1, 2, 3]), array([3568, 1982,  614, 1031], dtype=int64))

Majority in each class:

FAMILY:
 Label
0    Leptodactylidae
1            Hylidae
2            Hylidae
3      Dendrobatidae
dtype: object

GENUS:
 Label
0    Adenomera
1    Hypsiboas
2    Hypsiboas
3     Ameerega
dtype: object

SPECIES:
 Label
0    AdenomeraHylaedactylus
1         HypsiboasCordobae
2     

In [376]:
print("Hamming Distance:\n")
print("Average:",int(np.mean(hamdist)))
print("Standard Deviation:",np.std(hamdist,ddof=1))

Hamming Distance:

Average: 1605
Standard Deviation: 32.61681674679302


In [377]:
print("Average Hamming Score:",np.mean(hamscore))

Average Hamming Score: 0.7688416956219528


In [378]:
print("Average Hamming Loss:",np.mean(hamloss))

Average Hamming Loss: 0.22326337734537874


In [379]:
import json
f = open("monte-carlo-info.json","w")
json.dump(obj={"hamloss":hamloss,"hamdist":hamdist,"hamscore":hamscore},fp=f)
f.close()

f = open("monte-carlo-info.json","r")
mcs = json.load(f)

#You can use this dictionary to verify results
mcs

{'hamloss': [0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.24526291406069028,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.22246930738939077,
  0.22177438035672922,
  0.22214500810748206,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.24526291406069028,
  0.22214500810748206,
  0.2219133657632615,
  0.2224229789205467,
  0.22177438035672922,
  0.2224229789205467,
  0.22177438035672922,
  0.2224229789205467,
  0.2224229789205467,
  0.22233032198285846,
  0.2224229789205467,
  0.22177438035672922,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229789205467,
  0.2224229

In [380]:
f.close()

### References:

- https://mmuratarat.github.io/2020-01-25/multilabel_classification_metrics
- https://www.semanticscholar.org/paper/A-Literature-Survey-on-Algorithms-for-Multi-label-Sorower/6b5691db1e3a79af5e3c136d2dd322016a687a0b?p2df
- https://www.sciencedirect.com/science/article/pii/0377042787901257