Skip to content

byzhaoAI/VILA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VILA: Vision-Language Analytic Learning for Class-Incremental Learning

Hits

🎉Introduction 📌Updates 🌈Datasets Supported 📋Methods Supported
📖Get Started 📊Reproduced Results 🏷️Citation 🤝Contact 🙏Acknowledgments


🎉 Introduction

This is the official PyTorch implementation of VILA [Arxiv], a novel dual-branch framework that addresses representation rigidity in pretrained model (PTM) based analytic class-incremental learning (CIL).

VILA integrates a task‑adapted Vision Transformer with a frozen vision‑language model via geometric feature alignment (UGC) and semantic decision calibration (CSE). It leverages recursive least squares updates to achieve mathematically optimal weights with zero forgetting, consistently outperforming iterative methods across diverse benchmarks with higher training efficiency.

📌 Updates

  • [2026/02] Initial version of VILA is released.

🌈 Datasets Supported

  • CIFAR100: will be automatically downloaded by the code.
  • ImageNet-R: Google Drive: link or Onedrive: link
  • CUB: Google Drive: link or Onedrive: link
  • UCF: Google Drive: link or OneDrive: link
  • Aircraft: Google Drive: link or OneDrive: link
  • Cars: Google Drive: link or OneDrive: link
  • Food: Google Drive: link or OneDrive: link
  • SUN: OneDrive: link

These subsets are sampled from the original datasets. Please note that I do not have the right to distribute these datasets. If the distribution violates the license, I shall provide the filenames instead.

When training not on CIFAR100, you should specify the folder of your dataset in utils/data.py.

    def download_data(self):
        "You should specify the folder of your dataset"
        train_dir = '[DATA-PATH]/train/'
        test_dir = '[DATA-PATH]/val/'

📋 Methods Supported

These methods are implemented in the exps/ directory:

  • FineTune: Baseline method which simply updates parameters on new tasks.
  • iCaRL: CVPR 2017 [paper]
  • Coil: ACM MM 2021 [paper]
  • DER: CVPR 2021 [paper]
  • FOSTER: ECCV 2022 [paper]
  • L2P: CVPR 2022 [paper]
  • DualPrompt: ECCV 2022 [paper]
  • MEMO: ICLR 2023 Spotlight [paper]
  • CODA-Prompt: CVPR 2023 [paper]
  • RanPAC: NeurIPS 2023 [paper]
  • LAE: ICCV 2023 [paper]
  • SLCA: ICCV 2023 [paper]
  • FeCAM: NeurIPS 2023 [paper]
  • DGR: CVPR 2024 [paper]
  • Ease: CVPR 2024 [paper]
  • CoFiMA: ECCV 2024 [paper]
  • SimpleCIL: IJCV 2024 [paper]
  • Aper: IJCV 2024 [paper]
  • MOS: AAAI 2025 [paper]
  • DUCT: CVPR 2025 [paper]
  • TUNA: ICCV 2025 [paper]

📖 Get Started

1. Clone

Clone this GitHub repository:

git clone https://github.com/byzhaoAI/VILA.git
cd VILA

2. Installation

Dependencies

  1. torch 2.0.1
  2. torchvision 0.15.2
  3. timm 0.6.12
  4. tqdm
  5. numpy
  6. scipy
  7. easydict

Install via .yml

Comming soon ...

3. Run Experiment

Edit the [MODEL NAME].json file in vila/ (proposed method) or exps/ (compared methods) for global settings and hyperparameters. Run:

python main.py --config=vila/[MODEL NAME].json
python main.py --config=exps/[MODEL NAME].json
  • model_name: The model's name should be selected from the methods listed in utils/factory.py.
  • init_cls: The number of classes in the initial incremental stage.
  • increment: The number of classes in each incremental stage $i$ ($i$ > 1).
  • backbone_type: The backbone network of the incremental model. It can be selected from a variety of pre-trained models available in the Timm library, such as ViT-B/16-IN1K and ViT-B/16-IN21K. Both are pre-trained on ImageNet21K, while the former is additionally fine-tuned on ImageNet1K.
  • seed: The random seed is utilized for shuffling the class order (1993 by default).
  • fixed_memory: a Boolean parameter. When set to true, the model will maintain a fixed amount of memory per class.
  • memory_size: The total number of exemplars in the incremental learning process.
  • memory_per_class: If fixed memory is set to true, the model will preserve a fixed number of memory_per_class exemplars for each class.

4. Issues

If Hugging Face is unreachable, manually download the weights and use the code from lines 122–125 in utils/inc_net.py.

model, _, preprocess = open_clip.create_model_and_transforms('ViT-B-16', pretrained=None)
state_dict = torch.load("[THE DOWNLOADED MODEL WEIGHTS]")
msg = model.load_state_dict(state_dict)
print(msg)

📊 Reproduced Results

Accuracy Performance

Training Time

We report the total training time (in minutes, 20 tasks) for all methods across datasets. The y-axis is plotted on a logarithmic scale to handle the large variance between methods.

🏷️ Citation

If you use any content of this repo for your work, please cite the following bib entries:

@article{zhao2026advancing,
  title={Advancing Analytic Class-Incremental Learning through Vision-Language Calibration},
  author={Binyu Zhao and Wei Zhang and Xingrui Yu and Zhaonian Zou and Ivor Tsang},
  journal={arXiv preprint arXiv:2602.13670},
  year={2026}
}

🤝 Contact

If there are any questions, please feel free to propose new features by opening an issue or contact with the author: Binyu Zhao (binyu-zhao@outlook.com).

🙏 Acknowledgments

This repo is based on LAMDA-PILOT.

🚀 Star History

Star History Chart

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages