Thank you for your interest in our work! This repository contains the original implementation of "FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning", accepted to ACL 2026.
Important: Please ensure the following package versions:
transformers==4.57.0peft==0.17.1torch>=2.0.0
Our data preprocessing follows standard data formats. We also provide preprocessed datasets that are ready to use.
Download the required backbone models from Hugging Face:
- Qwen3-0.6B / Qwen3-4B / Qwen3-14B
- LLaMA3.1-8B
To fine-tune the models using our FOREVER method, run the following commands:
For Qwen models:
./scripts/run_train_cl_ours_qwen.shFor LLaMA models:
./scripts/run_train_cl_ours_llama.shNotes:
- You need to modify the dataset names in the training scripts according to your training data and set the corresponding parameters
- Use the
--base_modelargument to specify the location of your downloaded models - We use LoRA for efficient parameter-efficient fine-tuning
- Fine-tuned model weights will be saved to the directory specified by
$output_path - All visualized results will be saved in the
./visualizationfolder - The prediction results will be stored in the folder specified by
$output_root
Generate Overall Performance results:
./scripts/run_generate_avgPerf.shGenerate Backward Transfer (BWT) results:
./scripts/run_generate_bwt.shTo calculate the metrics, run:
For SuperNI dataset:
python src/eval_avgPerf_superni.py
python src/eval_bwt_superni.pyFor LongSequence dataset:
python src/eval_avgPerf_longsequence.py
python src/eval_bwt_longsequence.pyThis project supports two dataset configurations:
- data_superni: SuperNI task dataset (task002, task363, task875, etc.)
- data_longsequence: Long sequence task dataset (yelp, amazon, dbpedia, agnews, etc.)
The dataset order is defined by the get_dataset_order() function in utils/dataset_order.py, and different task orders can be selected through the dataset_id parameter.
--dataset_id: Dataset configuration ID (1-8)--task_id: Current task ID--memory_data_ratio: Historical data sampling ratio (default: 2%)--memory_epochs: Number of epochs for training historical data when replay is triggered--steps_per_day: Define 1 day = N steps (default: 24)
If this work proves beneficial or use our code for your research, citing our paper would be greatly appreciated.
@article{feng2026forever,
title={FOREVER: Forgetting Curve-Inspired Memory Replay for Language Model Continual Learning},
author={Feng, Yujie and Wang, Hao and Li, Jian and Chu, Xu and Kang, Zhaolu and Liu, Yiran and Wang, Yasha and Yu, Philip S and Wu, Xiao-Ming},
journal={arXiv preprint arXiv:2601.03938},
year={2026}
}