# Deep Learning for Multi-Label Learning - A Comprehensive Survey

This is just a summary of what i can understand from the original paper.

Link paper: [here](https://arxiv.org/abs/2401.16549)

Year: 2024

## Introduction

Depending on the goal, there are two primary tasks in 
MLL: multi-label classification (MLC) and multi-label ranking 
(MLR) 

MLC constitutes the primary learning task, aiming 
to train a model that segregates the label set into relevant and 
irrelevant categories with relative to a query instance. On the 
other hand, MLR focuses on training a model to arrange the 
class labels based on their relevance to a query instance. 

Currently, a 
predominant trend in MLC involves extensively incorporating 
DL techniques even for more challenging problems, such as 
Extreme MLC, imbalanced MLC, weakly 
supervised MLC, and MLC with missing labels. 

## FUNDAMENTAL CONCEPTS OF MLL

MLL is founded on a dataset where instances are associated 
with several target variables or labels simultaneously. The main 
goal when working with such data is MLC, which aims to 
categorize the target variables into relevant and irrelevant 
groups for a specific instance. 

Two traditional approaches exist for solving the MLL task: 
algorithm adaptation and problem transformation. 

Algorithm 
adaptation aims to modify or extend conventional learning 
methods learning to directly handle MLD 

On the other 
hand, problem transformation involves converting the MLC 
task into either one or multiple single-label classification tasks 

The three most prominent 
methods from the problem transformation category include:
+ label power-set (LP)
+ binary relevance (BR)
+ classifier chains (CC)

The BR method breaks down the multi-label  problem into a series of independent binary 
problems. Subsequently, each binary problem is addressed 
using a traditional classifier.  Finally, the individual predictions 
are combined to get the subset of labels relevant to each test 
instance. 

Although BR is relatively simple to implement, it is 
realized that BR ignores the possible relationship between 
labels (such as label dependency, cooccurrence, and 
correlation). To deal with the limitation of the BR method, the 
classifier chain (CC) was introduced. 

This method 
interconnects binary classifiers in a sequential chain, where the 
predictions of preceding classifiers serve as features for 
subsequent classifiers. This allows the latter classifiers to 
leverage the correlation with earlier predictions to enhance the 
quality of their predictions. 

The approach of LP involves 
treating every distinct label combination as a class identifier, 
thereby converting the original MLD into a multi-class dataset. 

The classical 
approaches mentioned earlier prove ineffective in addressing 
these challenges. Recently, deep learning (DL) techniques have 
gained increased popularity across diverse disciplines, and 
MLC has been no exception to benefiting from the latest developments in DL.

## DEEP LEARNING FOR MLC

### Neural Networks for MLC

#### Deep Neural Networks for MLC

Deep neural networks (DNNs) have been employed to address 
MLC problems, and the simplest approach is to decompose the 
MLC problem into several sets of binary classification 
problems, one for each label

However, this strategy encounters 
scalability issues, particularly when handling a substantial 
number of labels. Additionally, it considers missing labels as 
negatives, resulting in a performance decline, and ignores 
dependencies among labels, which is an important aspect of 
effective recognition.

Therefore, a different approach that 
focuses on the use of label relationships needs to be explored. 
One such approach is BP-MLL (Backpropagation for Multi
label Learning) which frames MLC problems as a neural 
network featuring numerous output nodes, with each node 
representing a distinct label.

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:300px">
    <img src="images/NNinMLC.png" alt="" />
</div>

One should consider the labels within $𝑌_i$ hold greater 
significance compared to those outside of $𝑌_i$.

 BP-MLL views 
each output node as a binary classification problem, and 
training is based on the classical BP algorithm, but in order to 
address the dependencies across labels, the new global error 
function is proposed that

$$
E = \sum_{i=1}^{m} E_i = \sum_{i=1}^{m} \frac{1}{|\mathcal{Y}_i||\hat{\mathcal{Y}}_i|} \sum_{(k,l) \in \mathcal{Y}_i \times \hat{\mathcal{Y}}_i} \exp\left( -\left( C_k^i - C_l^i \right) \right)
$$


Subsequently, correlations between label pairs are computed. 
The error function quantifies label output disparities. Learning 
entails minimizing this function by amplifying output values for 
labels belonging to training samples and reducing those for non
members

The subsequent stage in 
attaining the MLC classifier involves identifying the label set 
associated with the input instance, which can be extracted from 
the output values of the neural network using the threshold 
function. If the value of the output neuron surpasses the 
threshold, the respective label is attributed to the input instance; 
otherwise, it is not. 

An improvement to the BP-MLL method by modifying the global error function. This modified 
error function allows the threshold value to be determined 
automatically by adaptation during neural network learning 
instead of using an 

Furthermore, an author found the suboptimal 
performance of BP-MLL on textual datasets.

In response to this 
limitation, thay explored the constraints of BP-MLL by 
substituting ranking loss minimization with the more 
commonly employed cross-entropy error function.

The authors 
demonstrate the capability of a single hidden layer neural 
network to reach cutting-edge performance levels in extensive 
multi-label text classification assignments by leveraging the 
available techniques in DL, such as ReLUs, AdaGrad, and 
Dropout.

In a different study documented, a label-decision 
module was integrated into DNNs, resulting in the attainment 
of top-tier accuracy in multi-label image classification tasks. 

Building upon this framework, others introduced ML
Net, a DNN designed for the MLC of biomedical texts. 

ML-Net 
incorporates the label–decision module from, but it 
converts the framework from image processing to text 
classification. The ML-Net model integrates label prediction 
and decision-making within the same network, enabling the 
determination of output labels through a combination of label 
confidence scores and document context.

 Its objective is to 
reduce pairwise ranking errors among labels, allowing for end
to-end training and prediction of the label set without requiring 
an additional step for determining output labels.  

Recently, other proposed a new loss for MLC, named ZLPR 
loss, to extend the application of DL in MLC. The authors 
extended the cross-entropy loss from the single-label 
classification, which is expressed in:

$$
Loss_{z\_lpr} = \log\left(1 + \sum_{i \in \Omega_{pos}} e^{-s_i}\right) + \log\left(1 + \sum_{i \in \Omega_{neg}} e^{s_j}\right)
$$


In contrast to earlier ranking-based losses, ZLPR 
exhibits the capability to dynamically determine the number of 
target categories while enhancing a model's label-ranking 
proficiency.

 In comparison to certain binary losses, the ZLPR 
loss excels in capturing a more robust correlation of labels and 
elucidating the ranking relationship between negative and 
positive categories.  