# Mondrian Conformal Prediction with Class-wise Coverage Guarantees
## 1. Overview
This implementation focuses on Mondrian Conformal Prediction, a method that provides reliable prediction intervals while maintaining class-specific coverage guarantees. The approach ensures that predictive uncertainty is accurately quantified for each individual class.
## 2. Methodology
The non-conformity scores are calculated using cumulative probability summation, which provides a robust measure of prediction reliability. The method proceeds as follows:

1. For each class, we compute non-conformity scores based on the cumulative probability distribution
2. These scores maintain separate calibration for each class label
3. The result is a prediction set that guarantees the desired coverage level within each class

The non-conformity score α for an example (x, y) is computed as:
$$
C_i = \sum_{k=1}^{K}  |y_{i,k} - \hat{h}_{i,k}(x_i)|,\; i \in X_{calibration}
$$

## 3. Notation and Example
#### 1) Variable Definitions
#### Let:

1. $K$ represent the total number of classes
2. $y_{i,k}$ denote the true label for the $i$-th instance in class $k$
3. $\hat{h}_{i,k}$ represent our predicted probability for the $i$-th instance in class $k$

#### 2) Example Illustration
Consider a binary classification case where we represent labels in cumulative binary form:
#### True Labels
$y = [1, 1, 0, 0]$ represents class 2, where:

- 1's indicate the classes up to and including the true class
- 0's indicate the classes above the true class

#### 3) Predicted Probabilities
$\hat{h} = [0.96, 0.82, 0.45, 0.15]$ represents our model's predictions, where:

- Each value represents the predicted probability for the corresponding position
- Values typically decrease as we move through the cumulative binary representation

#### 4) Mathematical Representation
For this example:

- $K = 4$ (number of positions in the vector)
- $i$ represents a single instance
- Each position $k \in {1,2,3,4}$ has a corresponding $y_{i,k}$ and $\hat{h}_{i,k}$

$$
C = |1 - 0.96| + |1 - 0.82| + |0 - 0.45| + |0 - 0.15| = 0.82
$$


In [3]:
import torch
import torch.nn as nn

In [38]:
m = nn.Sigmoid()
loss = nn.BCELoss(reduction = 'mean')
input = torch.randn(3, 3, requires_grad=True)
target = torch.tensor([[1, 0, 0], [1, 1, 1], [0, 0, 0]], dtype = torch.float)
output = loss(m(input), target)
output.backward()

In [39]:
m(input)

tensor([[0.0378, 0.7450, 0.3553],
        [0.7459, 0.7854, 0.4061],
        [0.3720, 0.6727, 0.9120]], grad_fn=<SigmoidBackward0>)

In [40]:
torch.log10(m(input))

tensor([[-1.4228, -0.1278, -0.4494],
        [-0.1273, -0.1049, -0.3914],
        [-0.4294, -0.1722, -0.0400]], grad_fn=<Log10Backward0>)

In [41]:
torch.log10(1-m(input))

tensor([[-0.0167, -0.5935, -0.1906],
        [-0.5951, -0.6683, -0.2263],
        [-0.2020, -0.4851, -1.0556]], grad_fn=<Log10Backward0>)

In [44]:
torch.mean(target*torch.log10(m(input)) + (1-target)*torch.log10(1-m(input)))

tensor(-0.5081, grad_fn=<MeanBackward0>)

In [43]:
output

tensor(1.1700, grad_fn=<BinaryCrossEntropyBackward0>)

In [None]:
# Nonconformity measure(NCM)
# for ordinal regression
def NCM_Cum_Prob(arr = None):
    
     
        
        