## Task 1

(a) Prepare a function named binary_relevance_ldc which will implement the Binary Relevance
Method for multilabel classification. The classifier model for the binary classification should be the
Linear Discriminant Classifier (LDC). The input should be: training data (N × n), training labels
(N × c, containing integers 1, 2, 3, ...), and testing data (Ntest × n). The function should return a
binary matrix of assigned labels of size Ntest × c

In [79]:
import numpy as np
import pandas as pd
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

def binary_relevance_ldc(train_data, train_binary_labels, test_data):
    N, n = train_data.shape
    N_test, _ = test_data.shape
    c = train_binary_labels.shape[1]

    ldc_classifiers = []
    for j in range(c):
        ldc = LinearDiscriminantAnalysis()
        ldc.fit(train_data, train_binary_labels[:, j])
        ldc_classifiers.append(ldc)

    assigned_labels = np.zeros((N_test, c))
    for i, ldc_classifier in enumerate(ldc_classifiers):
        predictions = ldc_classifier.predict(test_data)
        assigned_labels[:, i] = predictions

    return assigned_labels



(b) Subsequently, write a function named adaptive_knn which will implement the Adaptive knn
methods for multilabel classification as explained in the lectures (The slide is reprodiced in Figure 1).

### Here's the revised solution for the adaptive_knn function using only the specified libraries:

In [80]:

def euclidean_distance(x1, x2):
    return np.sqrt(np.sum((x1 - x2) ** 2))

def adaptive_knn(train_data, train_labels, test_data, k=10, threshold=3):
    N, n = train_data.shape
    N_test, _ = test_data.shape
    c = train_labels.max()

    binary_labels = np.zeros((N, c))
    for i in range(N):
        binary_labels[i, train_labels[i] - 1] = 1

    assigned_labels = np.zeros((N_test, c))

    for i, x in enumerate(test_data):
        distances = np.array([euclidean_distance(x, train_data[j]) for j in range(N)])
        sorted_indices = np.argsort(distances)[:k]
        neighbors_labels = binary_labels[sorted_indices, :]
        membership_counting_vector = np.sum(neighbors_labels, axis=0)

        for j in range(c):
            if membership_counting_vector[j] > threshold:
                assigned_labels[i, j] = 1

    return assigned_labels



(c) Use the Bird data set from Lab 6. (The csv files are also provided with this script for convenience.)
Apply the two functions you programmed in (a) and (b) to compare the two classifiers for multilabel
data on this dataset. Program and use the hamming loss for your comparison

In [81]:
# test_data

In [82]:
def hamming_loss(y_true, y_pred):
    return np.mean(np.not_equal(y_true, y_pred))

def bird_accuracies(y_true, y_pred):
    bird_accuracies = []
    for i in range(y_true.shape[1]):
        accuracy = np.sum((y_true[:, i] == y_pred[:, i])) / y_true.shape[0]
        bird_accuracies.append(accuracy)
    return np.array(bird_accuracies)

# Load the train and test datasets:
train_data = pd.read_csv('csv_result-birds-train.csv')
test_data = pd.read_csv('csv_result-birds-test.csv')

# Separate the features and labels in both train and test datasets:
train_features = train_data.iloc[:, :-19].values
train_labels = train_data.iloc[:, -19:].values
test_features = test_data.iloc[:, :-19].values
test_labels = test_data.iloc[:, -19:].values

# Apply the two classifiers on the dataset
assigned_labels_ldc = binary_relevance_ldc(train_features, train_labels, test_features)
assigned_labels_knn = adaptive_knn(train_features, train_labels, test_features)

# Calculate the Hamming loss for each classifier
hamming_loss_ldc = hamming_loss(test_labels, assigned_labels_ldc)
hamming_loss_knn = hamming_loss(test_labels, assigned_labels_knn)

# Print the Hamming loss for each classifier
print("Hamming Loss for Binary Relevance LDC:", hamming_loss_ldc)
print("Hamming Loss for Adaptive kNN:", hamming_loss_knn)

Hamming Loss for Binary Relevance LDC: 0.1952093856933355
Hamming Loss for Adaptive kNN: 0.948997881701157


(d) Find out the bird that is most well recognised and the one that is least well recognised. Find
images of these birds and show them in a figure. You may use plt.imread() and plt.imshow(). An example of the expected output is shown in Figure 2

In [83]:
# Calculate bird accuracies
bird_accuracies_ldc = bird_accuracies(test_labels, assigned_labels_ldc)

# Find the indices for the most and least well-recognized birds
max_index = np.argmax(bird_accuracies_ldc)
min_index = np.argmin(bird_accuracies_ldc)

print("Most well-recognized bird index:", max_index)
print("Least well-recognized bird index:", min_index)

Most well-recognized bird index: 13
Least well-recognized bird index: 7


Problem 2. Streaming data
Consider the following experiment. A data stream comes from two classes with normal distributions.
The distributions are static; their parameters don’t change with time. The parameters are