# Conformal Prediction on CIFAR-10 and CIFAR-100: Experimental Report

## Introduction

In this experimental report, we explore the application of conformal prediction using TorchCP on two popular image classification datasets, CIFAR-10 and CIFAR-100. The goal is to assess the performance of different predictors and score functions in terms of coverage rate, average set size, accuracy, precision, recall, and F1 score. We conducted experiments using various combinations of predictors and score functions and present a detailed analysis of the results.

## Experimental Setup

### Datasets

We used two datasets for our experiments:

1. **CIFAR-10**: A dataset consisting of 60,000 32x32 color images in 10 different classes, with 6,000 images per class.

2. **CIFAR-100**: Similar to CIFAR-10 but with 100 different classes, each containing 600 images.

### Model Architecture

We employed a convolutional neural network (CNN) for both datasets. The model architecture includes a convolutional layer, ReLU activation, and a fully connected layer. The final layer output corresponds to the number of classes in each dataset.

### Predictors and Score Functions

We experimented with three predictors: `SplitPredictor`, `ClusterPredictor`, and `ClassWisePredictor`. Additionally, four score functions were employed: `THR`, `APS`, `SAPS`, and `RAPS`. These combinations were used to assess the impact of different predictor and score function pairings on the conformal prediction performance.

## Experimental Results and Analysis

### CIFAR-10
![10.png](attachment:55b2ea19-499c-4365-91cc-572b529928cc.png)

### CIFAR-100
![100.png](attachment:5a1b8595-3cd0-486e-b6f6-121a614b90e2.png)

## Discussion

## CIFAR-10 vs. CIFAR-100 Analysis

### Differences between CIFAR-10 and CIFAR-100

By comparing the results of CIFAR-10 and CIFAR-100, we can observe the following trends and differences:

1. **Coverage Rate and Average Set Size:** On CIFAR-100, the coverage rate is slightly lower than CIFAR-10, while the average set size is slightly higher. This may be due to the increased complexity of predictions with more categories in CIFAR-100.

2. **Performance Metrics:** On CIFAR-100, overall performance metrics such as accuracy, precision, recall, and F1 score are relatively lower. This could be attributed to the presence of more categories making it challenging for the model to make accurate predictions.

### Impact of Predictors and Score Functions

#### ClassWisePredictor

- On CIFAR-10, ClassWisePredictor performs relatively well across all performance metrics, especially precision and F1 score.
- On CIFAR-100, the performance of ClassWisePredictor decreases, possibly due to the increased difficulty in predicting with more categories.

#### ClusterPredictor

- On both CIFAR-10 and CIFAR-100, ClusterPredictor shows relatively stable performance, with a slight decrease on CIFAR-100. This may be attributed to increased differences between clusters with more categories.

#### SplitPredictor

- On CIFAR-10, SplitPredictor performs well in terms of recall, but its performance decreases on CIFAR-100. This may be due to the challenges in capturing splitting trends with more categories.

### Conclusions

In conclusion, our experiments indicate that:

- CIFAR-100, with more categories compared to CIFAR-10, leads to an overall decrease in model performance.
- Different predictors exhibit varying performance on the two datasets, likely influenced by the number of categories and relationships between categories.
- Future work could explore ways to adapt models to scenarios with more categories to improve performance.