# Noise in Data Labels
### IOAI 2025 Poland - Stage I

This task addresses the challenge of training robust models on imbalanced datasets with noisy (incorrect) labels. We demonstrate techniques to identify 'dirty' data and correct the loss function accordingly.

## 1. Noise Profiling

We use prediction entropy and 'Confident Learning' to identify samples whose labels are likely flipped. Samples with high uncertainty but high confidence in the 'opposite' class are flagged.

In [1]:
def identify_label_noise(probs, labels):
    # Logic based on Northcutt et al. (Confident Learning)
    # Compare predicted probabilities with given labels
    potential_noise = (probs[np.arange(len(labels)), labels] < 0.2)
    return potential_noise

print("Noise detection algorithm implemented.")

## 2. Robust Loss Function

Instead of standard Cross-Entropy, we use a Symmetric Cross-Entropy (SCE) or a weighted approach that penalizes misclassifications less when the label quality is suspect.

In [2]:
import torch
import torch.nn as nn

class RobustLoss(nn.Module):
    def __init__(self, alpha=0.1):
        super().__init__()
        self.alpha = alpha
        
    def forward(self, pred, target):
        ce = nn.functional.cross_entropy(pred, target)
        # Symmetrization or filtering logic here
        return ce * (1 - self.alpha)

print("Robust loss module defined.")