🚀O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models (CVPR 2025 - highlighted(Top- 13%))
Our contributions are summarized as follows:
-
We provide new insights underlying the suboptimal performance of an existing top-performing calibration method for test-time prompt tuning
-
We propose a novel approach (named
O-TPT) for calibrating test-time prompt tuning for VLMs by enforcing orthogonality constraints. This is accomplished by introducing orthogonal regularization on the textual features. -
We perform an extensive evaluation to validate our approach on various datasets and across different baselines. Results reveal that
O-TPTprovides consistent gains over the state-of-the-art methods in overall average calibration performance with several different baselines. Moreover, ourO-TPTprovides better calibration performance than the zero-shot CLIP which reveals improved calibration compared to existing SOTA
#Steps to set up the environment
1. git clone https://github.com/ashshaksharifdeen/O-TPT.git
2. cd O-TPT
3. conda env create -f environment.yml
4. conda activate otptWe have conducted main experiments on fine-grained and natural distribution shift datasets:
-
Fine-grained datasets:
- ImageNet
- Flower102
- OxfordPets
- SUN397
- DTD
- Food101
- StanfordCars
- Aircraft
- UCF101
- EuroSAT
- Caltech101
-
Natural distribution shift datasets:
- ImageNet-V2
- ImageNet-A
- ImageNet-R
- ImageNet-Sketch
Follow this repository for datasets preparation: TPT
In each .sh file, you can edit the root dataset directory location as well as configure the baseline, whether it's ‘RN50’ or ‘ViT-B/16’. Also, you can switch between different experiment modes by changing run_type, whether it is opt, tpt baselines, or calibration with temperature scaling.
🏁 Baseline Experiment
#bash scripts/test_baseline.sh /I/DTD/Flower102/Food101/Cars/SUN397/Aircraft/Pets/Caltech101/UCF101/eurosat for fine-grained classification
bash scripts/test_baseline.sh {dataset}
🎯 TPT Experiment
#Fine-grained classification
bash scripts/test_tpt_fg.sh {dataset}
#Natural distribution shift
bash scripts/test_tpt_ds.sh {dataset}
🔥 O-TPT Experiment
#Fine-grained classification
bash scripts/test_tpt_otpt_fg.sh {dataset}
#natural distribution shift
bash scripts/test_tpt_otpt_ds.sh {dataset}
| Method | Metric | INet | DfID | FLW | Food | SUN | Air | Pets | Calt | UCF | SAT | Car | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Zero Shot | Acc. | 66.7 | 44.3 | 67.3 | 83.6 | 62.5 | 23.9 | 88.0 | 92.9 | 65.0 | 41.3 | 65.3 | 63.7 |
| ECE | 2.12 | 8.50 | 3.00 | 2.39 | 2.53 | 5.11 | 4.37 | 5.50 | 3.59 | 13.89 | 4.25 | 4.43 | |
| TPT | Acc. | 69.0 | 46.7 | 69.0 | 84.7 | 64.5 | 23.4 | 87.1 | 93.8 | 67.3 | 42.4 | 66.3 | 65.0 |
| ECE | 10.6 | 21.2 | 13.5 | 3.98 | 11.3 | 16.8 | 5.77 | 4.51 | 2.54 | 13.2 | 5.16 | 11.6 | |
| C-TPT | Acc. | 68.5 | 46.0 | 69.8 | 83.7 | 64.8 | 24.85 | 88.2 | 93.63 | 65.7 | 43.2 | 65.8 | 64.57 |
| ECE | 3.15 | 11.9 | 5.04 | 3.43 | 5.04 | 4.36 | 1.9 | 4.24 | 2.54 | 13.2 | 1.59 | 5.13 | |
| Robust-adapt-SaLs-CTPT | Acc. | 68.04 | 45.51 | 69.43 | 83.18 | 64.38 | 23.94 | 88.12 | 93.63 | 65.32 | 43.05 | 65.48 | 64.55 |
| ECE | 2.63 | 14.56 | 2.74 | 1.26 | 3.56 | 6.21 | 3.16 | 3.78 | 6.96 | 14.92 | 2.82 | 5.69 | |
| Robust-adapt-Penalty-CTPT | Acc. | 68.04 | 45.69 | 69.55 | 83.28 | 64.36 | 23.91 | 87.95 | 93.47 | 65.32 | 44.06 | 65.53 | 64.65 |
| ECE | 2.63 | 13.9 | 5.27 | 3.35 | 4.87 | 4.43 | 1.63 | 4.56 | 2.29 | 7.08 | 1.25 | 4.66 | |
| Robust-adapt-ZS-CTPT | Acc. | 68.01 | 45.63 | 69.55 | 83.25 | 64.41 | 23.88 | 88.03 | 93.31 | 65.24 | 42.64 | 65.45 | 64.51 |
| ECE | 3.01 | 12.35 | 4.94 | 3.8 | 5.16 | 4.31 | 2.06 | 4.34 | 2.17 | 12.23 | 1.7 | 5.09 | |
| O-TPT (Ours) | Acc. | 67.33 | 45.68 | 70.07 | 84.13 | 64.23 | 23.64 | 87.95 | 93.95 | 64.16 | 42.84 | 64.53 | 64.41 |
| ECE | 1.96 | 7.88 | 3.87 | 1.46 | 4.93 | 3.68 | 1.9 | 3.8 | 2.34 | 12.98 | 1.78 | 4.21 |
🌍 Comparison of calibration performance with CLIP-ViTB/16 backbone on Natural distribution shift datasets:
| Method | Metric | I-A | I-V2 | I-R | I-S | Avg |
|---|---|---|---|---|---|---|
| CLIP-ViT-B/16 | Acc. | 47.8 | 60.8 | 74.0 | 46.1 | 57.2 |
| ECE | 8.61 | 3.01 | 3.58 | 4.95 | 5.04 | |
| TPT | Acc. | 52.6 | 63.0 | 76.7 | 47.5 | 59.9 |
| ECE | 16.4 | 11.1 | 4.36 | 16.1 | 12.0 | |
| C-TPT | Acc. | 51.6 | 62.7 | 76.0 | 47.9 | 59.6 |
| ECE | 8.16 | 6.23 | 1.54 | 7.35 | 5.82 | |
| O-TPT (Ours) | Acc. | 49.87 | 61.65 | 72.55 | 47.12 | 57.80 |
| ECE | 7.22 | 3.97 | 1.46 | 6.87 | 4.88 |
We are thankful to the authors of TPT, C-TPT, and CoOp/CoCoOp for their open-source contributions.
If you find our work useful for your research, please consider citing it:
@InProceedings{Sharifdeen_2025_CVPR,
author = {Sharifdeen, Ashshak and Munir, Muhammad Akhtar and Baliah, Sanoojan and Khan, Salman and Khan, Muhammad Haris},
title = {O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
month = {June},
year = {2025},
pages = {19942-19951}
}If you need any further clarification, please feel free to contact me at ashshaks@gmail.com.

