# Setups of Noisy Labels

Throughout this section, you will get familiar with:

* What are noisy labels?

* How do we characterize noisy labels?

* How do we simulate noisy labels for controlled experiments?

## 1. What are noisy labels?

* Given an instance $(X,Y)$, where $X$ is the feature, $Y$ is the clean/true label.
* A noisy label $\widetilde Y$ may or may not be the same as the true label.

<img src="tutorial_imgs/label_noise.png" width="1000"> 

## 2. How noisy is the data?

**Statistical measure: Label noise transition matrix**


### General: Instance-dependent label noise

**Instance-dependent label noise transition matrix $T(X)$** 
* Different matrices for different feature $X$
* Each element $T_{ij}(X)$: Flipping probability the clean label $Y=i$ $\rightarrow$ the noisy label $\widetilde{Y} = j$, given feature $X$

$$
T_{ij}(X)=\mathbb P(\widetilde{Y}=j|Y=i,X).
$$

### Simplified: Class-dependent label noise

**Class-dependent label noise transition matrix $T$**
* Assume $T(X) = T, \forall X$
* Each element $T_{ij}$: Flipping probability the clean label $Y=i$ $\rightarrow$ the noisy label $\widetilde{Y} = j$


### Intuitions:
* Capture random, averaged flipping errors
* Enable more possibilities of theoretical analyses

Remembering the previous illustration figure, we could derive a rough estimate that:

<img src="./tutorial_imgs/label_noise_cut.png" width="700"> 

$
\mathbf{P}(\widetilde{Y} = \text{dog}| Y = \text{cat}) = 1/10 = 0.1
$

$
\mathbf{P}(\widetilde{Y} = \text{cat}| Y = \text{cat}) = 7/10 = 0.7
$

### Importance of $T$

* Understanding the pattern/structure of label noise
* Design robust loss functions
* Helps label aggregation (weighted majority vote)

#### Importance 1: Understand the pattern of label noise
**Examples from CIFAR-10N**

* CIFAR-10N: 
  * 10 classes. 
  * Each image is annotated by 3 independent human workers.
* Aggregation labels: 
  * Take the majority vote from 3 annotations.
  * Break ties evenly.

<table>
  <tr>
    <td ><img src="./tutorial_imgs/c10_agg.png" width="800"> 

    Figure: Label noise transition matrix of CIFAR-10N.
</td>
    <td>
    * Humans can be very accurate on some classes (ship 97%, horse 96%)<br/>
    * Humans can be inaccurate on other classes (cat 83%, deer 83%)<br/>
    * Human annotations have bias:<br/>
      &nbsp;&nbsp;&nbsp;&nbsp;- Horse-deer is a pair with high similarity, <b>but</b>..<br/>
      &nbsp;&nbsp;&nbsp;&nbsp;- Humans tend to annotate deer as horse: deer &rarr; horse 0.04<br/>
      &nbsp;&nbsp;&nbsp;&nbsp;- Humans tend <b>not</b> to annotate horse as deer: horse &rarr; deer 0.01 <br/>
    </td>
  </tr>
</table> 

**Examples from CIFAR-100N**

* CIFAR-100N: 
  * 20 coarse classes, 100 fine classes. Each coarse class contains 5 fine classes.
  * The human workers are asked to annotate the fine classes for each image (choose one class from the pool of 100 labels).
  * Each image is annotated by only one human worker
  * The following T shows the transition between coarse labels


<table>
  <tr>
    <td ><img src="tutorial_imgs/c100_coarse.png" width="1100"> 

    Figure: Label noise transition matrix of CIFAR-100N.
</td>
    <td>
    * Humans can be very accurate on some classes<br/>
      &nbsp;&nbsp;&nbsp;&nbsp;- people 94%<br/>
    * Humans can be inaccurate on other classes <br/>
      &nbsp;&nbsp;&nbsp;&nbsp;- medium-sized mammals 47%<br/>
    * Human annotations have bias:<br/>
      &nbsp;&nbsp;&nbsp;&nbsp;- man-made &rarr; natural 0.09<br/>
      &nbsp;&nbsp;&nbsp;&nbsp;- natural &rarr; man-made 0.03 <br/>
    </td>
  </tr>
</table> 

#### Importance 2: Design robust loss functions

Recall that:
* Feature $X$, noisy label $\widetilde Y$. 
* Model: $\bm f(\cdot)$ (Input: $X$, output: a column vector, probability of predicting each label class)
* Loss function: $\ell$.
* Label noise transition matrix $\bm T$, and its transpose $\bm T^\top$.

#### Forward loss correction:
$$
\ell^{\rightarrow}(\bm f(X),\widetilde Y):= \ell(\bm T^\top \bm f(X),\widetilde Y).
$$

#### Importance 3 Helps label aggregation (weighted majority vote)

Intuition:
* Normal majority vote: each labeler has the same weight. 
  * E.g., $\text{MV}(1,1,0) = 1$.
* Weighted majority vote: human annotation has bias (refer to Importance 1). 
  * E.g., 
    * label class 1 is rare, 
    * clean 1 --> noisy 1 is likely to make mistakes
    * ==> we may have $\text{MV}_\text{Weighted}(1,1,0) = 0$.
  * Condition:  
    * $\mathbb P(Y=1) = 0.2, $ 
    * $T = \begin{pmatrix} 0.8 & 0.2 \\ 0.7 & 0.3 \end{pmatrix}  $
  * Probability of label 1: $$   \begin{align*} & \mathbb P(Y=1| \widetilde Y_1 = 1, \widetilde Y_2 = 1, \widetilde Y_3 = 0)  \\ = & \frac{\mathbb P(Y=1)}{\mathbb P( \widetilde Y_1 = 1, \widetilde Y_2 = 1, \widetilde Y_3 = 0)} \cdot \mathbb P(\widetilde Y_1=1|Y=1) \cdot \mathbb P(\widetilde Y_2=1|Y=1) \cdot \mathbb P(\widetilde Y_3=0|Y=1) \\ = & \frac{0.0126}{\mathbb P( \widetilde Y_1 = 1, \widetilde Y_2 = 1, \widetilde Y_3 = 0)}\end{align*}  $$
  * Probability of label 0: $$   \begin{align*} & \mathbb P(Y=0| \widetilde Y_1 = 1, \widetilde Y_2 = 1, \widetilde Y_3 = 0)  \\ = & \frac{\mathbb P(Y=0)}{\mathbb P( \widetilde Y_1 = 1, \widetilde Y_2 = 1, \widetilde Y_3 = 0)} \cdot \mathbb P(\widetilde Y_1=1|Y=0) \cdot \mathbb P(\widetilde Y_2=1|Y=0) \cdot \mathbb P(\widetilde Y_3=0|Y=0) \\ = & \frac{0.0256}{\mathbb P( \widetilde Y_1 = 1, \widetilde Y_2 = 1, \widetilde Y_3 = 0)} \end{align*} $$
  
 

## 3. How can we simulate label noise?

* **Symmetric noise** 
    * Same noise rate, even flipping. 
    * E.g., $T=\begin{pmatrix}0.8 & 0.1 & 0.1 \\ 0.1 & 0.8 & 0.1 \\ 0.1 & 0.1 & 0.8\end{pmatrix}$
  
* **Asymmetric noise (Pairflip)** 
    * Flip only to the next label class, e.g., 1-->2, 2-->3, 3-->1
    * E.g., $T=\begin{pmatrix}0.8 & 0.2 & 0.0 \\ 0.0 & 0.8 & 0.2 \\ 0.2 & 0.0 & 0.8\end{pmatrix}$
  
* **Random noise** 
    * A random $T$
    * E.g., $T=\begin{pmatrix}0.7 & 0.2 & 0.1 \\ 0.2 & 0.6 & 0.2 \\ 0.2 & 0.3 & 0.5\end{pmatrix}$
### Simulation of class-dependent label noise

In [1]:
import numpy as np
from numpy.testing import assert_array_almost_equal
def multiclass_noisify(y, T, random_state=0):
    """ Flip classes according to transition probability matrix T.
    """

    # T must satisfy the following four properties:
    assert T.shape[0] == T.shape[1]  # requires a square matrix
    assert np.max(y) < T.shape[0]    # E.g., 3-class classifications, np.max(y)<=2, T.shape[0]=3.
    assert_array_almost_equal(T.sum(axis=1), np.ones(T.shape[1])) # row sum should be 1
    assert (T >= 0.0).all()  # non-negative

    m = y.shape[0]
    noisy_y = y.copy()
    flipper = np.random.RandomState(random_state)

    for idx in np.arange(m):
        i = y[idx] # clean label
        flipped = flipper.multinomial(1, T[i, :], 1)[0] # take the i-th row from T, draw a vector according to the probability
        noisy_y[idx] = np.where(flipped == 1)[0] # noisy label

    return noisy_y

### Example:

**A Toy Example**
* Synthesize a dataset of 4,000 instances
* Binary classification
* Each instance has three noisy labels (given by three independent labelers)

In [9]:
import numpy as np
np.random.seed(0)
num_samples = 4000

# Set the label transition matrix T
T = np.array([
    [0.6, 0.4],
    [0.2, 0.8],
])

# Set the clean label distribution p
p = [0.3, 0.7]

# Generate clean labels
clean_labels = np.array([0] * int(num_samples * p[0]) + [1] * (num_samples - int(num_samples * p[0])))
np.random.shuffle(clean_labels)

# Generate three noisy labels
noisy_labels_1 = multiclass_noisify(clean_labels, T, random_state=1)
noisy_labels_2 = multiclass_noisify(clean_labels, T, random_state=2)
noisy_labels_3 = multiclass_noisify(clean_labels, T, random_state=3)
noisy_labels = [[noisy_labels_1[i], noisy_labels_2[i], noisy_labels_3[i]] for i in range(len(clean_labels))] # restructure

print(noisy_labels[:5])


[[1, 1, 1], [1, 0, 1], [1, 1, 1], [1, 1, 1], [0, 0, 1]]


Now we have clean labels and noisy labels. We want to double-check the T to ensure we are on the right track.

In [11]:
# Get the true T
true_T = np.zeros((2,2))
true_p = np.zeros(2)
for i in range(len(clean_labels)):
    for j in range(len(noisy_labels[0])):
        true_T[clean_labels[i]][noisy_labels[i][j]] += 1
    true_p[clean_labels[i]] += 1
true_T /= np.sum(true_T, 1).reshape(-1,1)
true_p /= np.sum(true_p)

# Set precisions
np.set_printoptions(precision=3)

# Print the True T and p
print(f"The true T is:\n{true_T}")
print(f"The true p is:\n{true_p}")
print("------"*7)
print(f"The predefined T is:\n{T}")
print(f"The predefined p is:\n{p}")
np.save("./data/clean_labels.npy", clean_labels)
np.save("./data/noisy_labels.npy", noisy_labels)

The true T is:
[[0.611 0.389]
 [0.203 0.797]]
The true p is:
[0.3 0.7]
------------------------------------------
The predefined T is:
[[0.6 0.4]
 [0.2 0.8]]
The predefined p is:
[0.3, 0.7]
