# Backward-Compatible Representation Learning with DualPrompt

## Continual Learning and Feature Compatibility Analysis

This project investigates the backward compatibility of feature representations learned by a continual learning model (DualPrompt), following the evaluation principles introduced in *Towards Backward-Compatible Representation Learning*.

The goal is to verify whether features extracted from an updated model can be used as **queries** against a **gallery indexed with features from a previous model**, without re-extracting the gallery features.


## Environment Setup

This section reports the execution environment by checking:
- the PyTorch version
- CUDA availability
- the number of available GPUs

This information is provided to ensure reproducibility of the experiments.


In [None]:
import torch
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("Num GPUs:", torch.cuda.device_count())


PyTorch version: 2.9.0+cu126
CUDA available: True
Num GPUs: 1


## DualPrompt Codebase

For the experimental phase, a PyTorch-based implementation of DualPrompt was employed, preferred over the official release for its superior portability and compatibility with Google Colab. This codebase ensures parity of results and strictly adheres to the original design, while providing a more adaptable structure for testing and integration.


In [None]:
!git clone https://github.com/JH-LEE-KR/dualprompt-pytorch.git
%cd dualprompt-pytorch


Cloning into 'dualprompt-pytorch'...
remote: Enumerating objects: 79, done.[K
remote: Counting objects: 100% (35/35), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 79 (delta 27), reused 24 (delta 24), pack-reused 44 (from 1)[K
Receiving objects: 100% (79/79), 58.17 KiB | 945.00 KiB/s, done.
Resolving deltas: 100% (41/41), done.
/content/dualprompt-pytorch


## Dependency Installation

The main dependencies required for training and evaluation are installed, including:
- PyTorch and torchvision
- timm for Vision Transformer models
- utility libraries for preprocessing and visualization


In [None]:
!pip install --upgrade pip
!pip install torch torchvision timm pillow matplotlib


Collecting pip
  Downloading pip-25.3-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-25.3-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m36.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.1.2
    Uninstalling pip-24.1.2:
      Successfully uninstalled pip-24.1.2
Successfully installed pip-25.3


In [None]:
!pip install timm==0.6.7

Collecting timm==0.6.7
  Downloading timm-0.6.7-py3-none-any.whl.metadata (33 kB)
Downloading timm-0.6.7-py3-none-any.whl (509 kB)
Installing collected packages: timm
  Attempting uninstall: timm
    Found existing installation: timm 1.0.24
    Uninstalling timm-1.0.24:
      Successfully uninstalled timm-1.0.24
Successfully installed timm-0.6.7


## Dataset: CIFAR-100

The DualPrompt model is trained on **CIFAR-100**, a standard benchmark for continual learning.

The dataset contains:
- 100 object classes
- RGB images of size 32×32
- standard train/test splits

The dataset is downloaded locally and used for training.


In [None]:
import os
from torchvision import datasets

data_path = "./data"
os.makedirs(data_path, exist_ok=True)
datasets.CIFAR100(root=data_path, download=True, train=True)
datasets.CIFAR100(root=data_path, download=True, train=False)
print(" CIFAR-100 saved in:", data_path)


100%|██████████| 169M/169M [00:03<00:00, 44.0MB/s]


 CIFAR-100 saved in: ./data


## Pre-trained Model Cache

A local directory is specified for caching pre-trained models used by `timm`.
This avoids repeated downloads and improves execution efficiency.


In [None]:
import os

os.environ['TORCH_HOME'] = './torch_cache'
os.makedirs(os.environ['TORCH_HOME'], exist_ok=True)
print("Pre-trained model saved in:", os.environ['TORCH_HOME'])


Pre-trained model saved in: ./torch_cache


## Continual Learning Training with DualPrompt

In this step, the model is trained using **DualPrompt** on CIFAR-100.

Key settings:
- Backbone: Vision Transformer (ViT-B/16)
- Method: DualPrompt
- Scenario: class-incremental continual learning
- Output: a trained model with dynamically selected prompts

The training follows the standard configuration provided in the official repository.


In [None]:
!python -m torch.distributed.run \
    --nproc-per-node=1 \
    main.py cifar100_dualprompt \
    --model vit_base_patch16_224 \
    --batch-size 64 \
    --data-path ./data \
    --output_dir ./output


| distributed init (rank 0): env://
[rank0]:[W122 16:08:39.217019991 ProcessGroupNCCL.cpp:5068] Guessing device ID based on global rank. This can cause a hang if rank to GPU mapping is heterogeneous. You can specify device_id in init_process_group()
Creating original model: vit_base_patch16_224
Creating model: vit_base_patch16_224
Namespace(subparser_name='cifar100_dualprompt', batch_size=64, epochs=5, model='vit_base_patch16_224', input_size=224, pretrained=True, drop=0.0, drop_path=0.0, opt='adam', opt_eps=1e-08, opt_betas=(0.9, 0.999), clip_grad=1.0, momentum=0.9, weight_decay=0.0, reinit_optimizer=True, sched='constant', lr=0.03, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, warmup_lr=1e-06, min_lr=1e-05, decay_epochs=30, warmup_epochs=5, cooldown_epochs=10, patience_epochs=10, decay_rate=0.1, unscale_lr=True, color_jitter=None, aa=None, smoothing=0.1, train_interpolation='bicubic', reprob=0.0, remode='pixel', recount=1, data_path='./data', dataset='Split-CIFAR100', shuffle=F

## Model Evaluation

After training, the model is evaluated in inference mode.
This step verifies that training has completed successfully and loads the saved model weights for subsequent experiments.


In [None]:
!python -m torch.distributed.run \
    --nproc-per-node=1 \
    main.py cifar100_dualprompt \
    --eval \
    --model vit_base_patch16_224 \
    --data-path ./data \
    --output_dir ./output
# modify main.py adding weights_only=False at row 107

| distributed init (rank 0): env://
[rank0]:[W122 19:00:43.004228735 ProcessGroupNCCL.cpp:5068] Guessing device ID based on global rank. This can cause a hang if rank to GPU mapping is heterogeneous. You can specify device_id in init_process_group()
Creating original model: vit_base_patch16_224
Creating model: vit_base_patch16_224
Namespace(subparser_name='cifar100_dualprompt', batch_size=24, epochs=5, model='vit_base_patch16_224', input_size=224, pretrained=True, drop=0.0, drop_path=0.0, opt='adam', opt_eps=1e-08, opt_betas=(0.9, 0.999), clip_grad=1.0, momentum=0.9, weight_decay=0.0, reinit_optimizer=True, sched='constant', lr=0.03, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, warmup_lr=1e-06, min_lr=1e-05, decay_epochs=30, warmup_epochs=5, cooldown_epochs=10, patience_epochs=10, decay_rate=0.1, unscale_lr=True, color_jitter=None, aa=None, smoothing=0.1, train_interpolation='bicubic', reprob=0.0, remode='pixel', recount=1, data_path='./data', dataset='Split-CIFAR100', shuffle=F

## Dataset Structure and Overlap Analysis

Before proceeding with feature extraction and backward-compatible retrieval experiments, it is necessary to analyze the structure of the datasets involved and verify the absence of unintended overlaps.

Three datasets are used at different stages of the pipeline:

- **ImageNet**: used to pre-train the Vision Transformer (ViT-B/16) backbone.
- **CIFAR-100**: used to train the DualPrompt model in a continual learning setting.
- **CIFAR-10**: used exclusively for backward compatibility and retrieval experiments (gallery–query matching).

### ImageNet vs CIFAR Datasets

ImageNet and CIFAR datasets differ significantly in both image resolution and data curation process:

- ImageNet contains high-resolution natural images collected from the web and manually curated.
- CIFAR-10 and CIFAR-100 consist of 32×32 images derived from a different collection pipeline.

There is no instance-level overlap between ImageNet and CIFAR datasets. While some semantic categories may share similar names (e.g., *dog*, *car*), the actual images are distinct. Therefore, ImageNet pre-training does not introduce data leakage into CIFAR-based experiments.

### CIFAR-100 vs CIFAR-10

CIFAR-10 and CIFAR-100 are constructed from the same original image pool but are split into **disjoint label sets**:

- CIFAR-10 contains 10 coarse-grained classes.
- CIFAR-100 contains 100 fine-grained classes.

Each image appears in **only one** of the two datasets, and no image is shared between CIFAR-10 and CIFAR-100. Consequently:

- Training DualPrompt on CIFAR-100 does not expose the model to any images from CIFAR-10.
- CIFAR-10 can be safely used as an independent benchmark for evaluating backward compatibility.

### Implications for Backward-Compatible Evaluation

This dataset separation ensures that:

- Feature representations extracted from the DualPrompt model are evaluated on **unseen data**.
- Gallery–query matching on CIFAR-10 measures representation consistency rather than memorization.
- Backward compatibility is tested under a realistic deployment scenario, where a new model must interoperate with previously indexed features from a different data distribution.

This setup aligns with the assumptions of backward-compatible representation learning, where old embeddings and new embeddings coexist without requiring reprocessing of the original gallery.


In [None]:
import torchvision.transforms as transforms

# ----------------------------
# Load CIFAR-10 and CIFAR-100
# ----------------------------

transform = transforms.ToTensor()

cifar10_train = datasets.CIFAR10(root=data_path, train=True, download=True, transform=transform)
cifar10_test = datasets.CIFAR10(root=data_path, train=False, download=True, transform=transform)

cifar100_train = datasets.CIFAR100(root=data_path, train=True, download=False, transform=transform)
cifar100_test = datasets.CIFAR100(root=data_path, train=False, download=False, transform=transform)


In [None]:
# ----------------------------
# Dataset statistics
# ----------------------------

print("CIFAR-10: ")
print(" Train samples:", len(cifar10_train))
print(" Test samples:", len(cifar10_test))
print(" Classes:", cifar10_train.classes)

print("\nCIFAR-100: ")
print(" Train samples:", len(cifar100_train))
print(" Test samples:", len(cifar100_test))
print(" Classes:", cifar100_train.classes)

In [None]:
# ----------------------------
# Helper: compute image hash
# ----------------------------

import hashlib
import numpy as np

def image_hash(img_tensor):
    """
    Compute a hash for an image tensor.
    Tensor shape: [C, H, W]
    """
    img_bytes = (img_tensor.numpy() * 255).astype(np.uint8).tobytes()
    return hashlib.md5(img_bytes).hexdigest()

In [None]:
num_samples = 5000  # subset size to keep computation reasonable

cifar10_hashes = set(
    image_hash(cifar10_train[i][0]) for i in range(num_samples)
)

overlap_count = 0
for i in range(num_samples):
    h = image_hash(cifar100_train[i][0])
    if h in cifar10_hashes:
        overlap_count += 1

print(f"Checked {num_samples} images from each dataset.")
print(f"Number of overlapping images found: {overlap_count}")

if overlap_count == 0:
    print("No image-level overlap detected between CIFAR-10 and CIFAR-100.")
else:
    print("Potential overlap detected (unexpected).")

## Feature Extractor Construction from DualPrompt Checkpoints

To evaluate representation stability and backward compatibility, the classification head is removed and only the feature extractor is retained. The analysis focuses on the output of the last layer before the classification head, which represents the learned embedding used for retrieval.

### Checkpoint Selection

DualPrompt training produces a sequence of checkpoints corresponding to different tasks. Each checkpoint represents a model state after learning a new subset of classes. In this experiment:

- One feature extractor is built from each checkpoint.
- A total of 10 models are considered, corresponding to the 10 incremental tasks.
- All models share the same Vision Transformer backbone architecture.

This setup allows tracking how the learned representation evolves across tasks and enables direct comparison between embeddings extracted at different stages of continual learning.

### Purpose for Backward Compatibility Evaluation

Using feature-only models enables:

- Direct comparison between embeddings from different checkpoints.
- Simulation of a real-world deployment scenario where older gallery features coexist with newer query features.
- Evaluation of whether queries generated by a newer model can successfully retrieve gallery items indexed with older embeddings.

The extracted embeddings serve as the basis for retrieval experiments on CIFAR-10, using backward compatibility criteria inspired by the BCT framework.


In [None]:
# Download DualPrompt checkpoints from Google Drive
import gdown

output_dir = './output/checkpoint'
os.makedirs(output_dir, exist_ok=True)

folder_id = 'https://drive.google.com/drive/folders/1PdhU4Ko7iRqoPjZzsAG_eZ4S8TRJSrM0?usp=sharing'
gdown.download_folder(url=folder_id, output=output_dir, quiet=True)
print("Checkpoints downloaded to:", output_dir)

In [None]:
from timm.models import create_model

# -----------------------------
# Configuration
# -----------------------------
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
NUM_TASKS = 10
CHECKPOINT_DIR = "./output/checkpoint"

MODEL_NAME = "vit_base_patch16_224"
NUM_CLASSES = 100   # CIFAR-100
PROMPT_CONFIG = dict(
    prompt_length=5,
    embedding_key="cls",
    prompt_init="uniform",
    prompt_pool=True,
    prompt_key=True,
    pool_size=10,
    top_k=1,
    batchwise_prompt=True,
    head_type="token",
    use_prompt_mask=True,
    use_g_prompt=True,
    g_prompt_length=5,
    g_prompt_layer_idx=[0, 1],
    use_prefix_tune_for_g_prompt=True,
    use_e_prompt=True,
    e_prompt_layer_idx=[2, 3, 4],
    use_prefix_tune_for_e_prompt=True,
    same_key_value=False,
)

# -----------------------------
# Load DualPrompt Models
# -----------------------------
def load_dualprompt_models():
    models = []

    for task_id in range(NUM_TASKS):
        print(f"Loading model for task {task_id + 1}")

        # Create model (same as main.py)
        model = create_model(
            MODEL_NAME,
            pretrained=False,
            num_classes=NUM_CLASSES,
            drop_rate=0.0,
            drop_path_rate=0.0,
            **PROMPT_CONFIG
        )

        checkpoint_path = os.path.join(
            CHECKPOINT_DIR,
            f"task{task_id + 1}_checkpoint.pth"
        )

        if not os.path.exists(checkpoint_path):
            raise FileNotFoundError(f"Checkpoint not found: {checkpoint_path}")

        checkpoint = torch.load(checkpoint_path, weights_only=False)
        model.load_state_dict(checkpoint["model"], strict=True)

        # Remove classification head → feature extractor
        model.head = torch.nn.Identity()

        model.to(DEVICE)
        model.eval()

        models.append(model)

    print("All DualPrompt models loaded successfully.")
    return models


In [None]:
# Load models
dualprompt_models = load_dualprompt_models()

In [None]:
# print model summary
from torchsummary import summary
summary(dualprompt_models[0], (3, 224, 224))