Continual Learning of Large Language Models: A Comprehensive Survey

This is an updating survey for Continual Learning of Large Language Models (CL-LLMs), a constantly updated and extended version for the manuscript "Continual Learning of Large Language Models: A Comprehensive Survey".

Welcome to contribute to this survey by submitting a pull request or opening an issue!

Update History

[10/2024] (⭐) new papers: 09/2024 - 10/2024.
[09/2024] (🔥) new papers: 07/2024 - 09/2024.
[07/2024] new papers: 06/2024 - 07/2024.
[07/2024] the updated version of the paper has been released on arXiv.
[06/2024] new papers: 05/2024 - 06/2024.
[05/2024] new papers: 02/2024 - 05/2024.
[04/2024] initial release.

Relevant Survey Papers

Towards Lifelong Learning of Large Language Models: A Survey [paper][code]
Recent Advances of Foundation Language Models-based Continual Learning: A Survey [paper]
A Comprehensive Survey of Continual Learning: Theory, Method and Application (TPAMI 2024) [paper]
Continual Learning for Large Language Models: A Survey [paper]
Continual Lifelong Learning in Natural Language Processing: A Survey (COLING 2020) [paper]
Continual Learning of Natural Language Processing Tasks: A Survey [paper]
A Survey on Knowledge Distillation of Large Language Models [paper]

Continual Pre-Training of LLMs (CPT)

⭐ Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs [paper]
⭐ A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language Models [paper]
🔥 A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio [paper]
🔥 Towards Effective and Efficient Continual Pre-training of Large Language Models [paper][code]
Bilingual Adaptation of Monolingual Foundation Models [paper]
Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment [paper]
Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale [paper]
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training [paper][code]
Efficient Continual Pre-training by Mitigating the Stability Gap [paper][huggingface]
How Do Large Language Models Acquire Factual Knowledge During Pretraining? [paper]
DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion [paper]
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [paper][code]
Large Language Model Can Continue Evolving From Mistakes [paper]
Rho-1: Not All Tokens Are What You Need [paper][code]
Simple and Scalable Strategies to Continually Pre-train Large Language Models [paper]
Investigating Continual Pretraining in Large Language Models: Insights and Implications [paper]
Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization [paper][code]
TimeLMs: Diachronic Language Models from Twitter (ACL 2022, Demo Track) [paper][code]
Continual Pre-Training of Large Language Models: How to (re)warm your model? [paper]
Continual Learning Under Language Shift [paper]
Examining Forgetting in Continual Pre-training of Aligned Large Language Models [paper]
Towards Continual Knowledge Learning of Language Models (ICLR 2022) [paper][code]
Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora (NAACL 2022) [paper]
TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models (EMNLP 2022) [paper][code]
Continual Training of Language Models for Few-Shot Learning (EMNLP 2022) [paper][code]
ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding (AAAI 2020) [paper][code]
Dynamic Language Models for Continuously Evolving Content (KDD 2021) [paper]
Continual Pre-Training Mitigates Forgetting in Language and Vision [paper][code]
DEMix Layers: Disentangling Domains for Modular Language Modeling (NAACL 2022) [paper][code]
Time-Aware Language Models as Temporal Knowledge Bases (TACL 2022) [paper]
Recyclable Tuning for Continual Pre-training (ACL 2023 Findings) [paper][code]
Lifelong Language Pretraining with Distribution-Specialized Experts (ICML 2023) [paper]
ELLE: Efficient Lifelong Pre-training for Emerging Data (ACL 2022 Findings) [paper][code]

Domain-Adaptive Pre-Training of LLMs (DAP)

For General Domains

⭐ DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining [paper]
🔥 Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models [paper]
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models [paper]
Task Oriented In-Domain Data Augmentation [paper]
Instruction Pre-Training: Language Models are Supervised Multitask Learners [paper][code][huggingface]
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models [paper]
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [paper]
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains [paper]
Adapting Large Language Models via Reading Comprehension (ICLR 2024) [paper][code]

Legal Domain

SaulLM-7B: A pioneering Large Language Model for Law [paper][huggingface]
Lawyer LLaMA Technical Report [paper]

Medical Domain

PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications [paper]
Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare [paper][project][huggingface]
Me LLaMA: Foundation Large Language Models for Medical Applications [paper][code]
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine [paper][code]
Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering [paper]
PMC-LLaMA: Towards Building Open-source Language Models for Medicine [paper][code]
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model [paper]
Continual Domain-Tuning for Pretrained Language Models [paper]
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs [paper][code]

Financial Domain

⭐ The Construction of Instruction-tuned LLMs for Finance without Instruction Data Using Continual Pretraining and Model Merging [paper][huggingface]
🔥 Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications [paper]
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation [paper][huggingface]
Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training [paper]
Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain [paper][huggingface]
BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark [paper][code]
CFGPT: Chinese Financial Assistant with Large Language Model [paper][code]
Efficient Continual Pre-training for Building Domain Specific Large Language Models [paper]
WeaverBird: Empowering Financial Decision-Making with Large Language Model, Knowledge Base, and Search Engine [paper][code][huggingface][demo]
XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters [paper][huggingface]

Scientific Domain

⭐ MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science [paper][code]
⭐ AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy [paper][huggingface]
🔥 SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding [paper]
PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [paper][code]
ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change [paper][hugginface]
AstroLLaMA: Towards Specialized Foundation Models in Astronomy [paper]
OceanGPT: A Large Language Model for Ocean Science Tasks [paper][code]
K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization [paper][code][huggingface]
MarineGPT: Unlocking Secrets of "Ocean" to the Public [paper][code]
GeoGalactica: A Scientific Large Language Model in Geoscience [paper][code][huggingface]
Llemma: An Open Language Model For Mathematics [paper][code][huggingface]
PLLaMa: An Open-source Large Language Model for Plant Science [paper][code][huggingface]

Code Domain

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis [paper][code][huggingface]
Code Needs Comments: Enhancing Code LLMs with Comment Augmentation [code]
StarCoder: may the source be with you! [ppaer][code]
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence [paper][code][huggingface]
IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [paper][code]
Code Llama: Open Foundation Models for Code [paper][code]

Language Domain

🔥 RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining [paper]
Unlocking the Potential of Model Merging for Low-Resource Languages [paper]
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging [paper]
Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data [paper]
BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM [paper]
InstructionCP: A fast approach to transfer Large Language Models into target language [paper]
Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities [paper]
Sailor: Open Language Models for South-East Asia [paper][code]
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order [paper][huggingface]

Other Domains

LLaMA Pro: Progressive LLaMA with Block Expansion [paper][code][huggingface]
ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning [paper][code]
Pre-training Text-to-Text Transformers for Concept-centric Common Sense [paper][code][project]
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks (ACL 2020) [paper][code]
EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data [paper]

Continual Fine-Tuning of LLMs (CFT)

General Continual Fine-Tuning

⭐ Preserving Generalization of Language models in Few-shot Continual Relation Extraction [paper]
🔥 MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning [paper]
Learn it or Leave it: Module Composition and Pruning for Continual Learning [paper]
Unlocking Continual Learning Abilities in Language Models [paper][code]
Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning (NeurIPS 2021) [paper][code]
Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study (ICLR 2023) [paper][code]
CIRCLE: Continual Repair across Programming Languages (ISSTA 2022) [paper]
ConPET: Continual Parameter-Efficient Tuning for Large Language Models [paper][code]
Enhancing Continual Learning with Global Prototypes: Counteracting Negative Representation Drift [paper]
Investigating Forgetting in Pre-Trained Representations Through Continual Learning [paper]
Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models [paper][code]
LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5 (ICLR 2022) [paper][code]
On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code [paper]
Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning (ACL 2023 Findings) [paper]
Parameterizing Context: Unleashing the Power of Parameter-Efficient Fine-Tuning and In-Context Tuning for Continual Table Semantic Parsing (NeurIPS 2023) [paper][code]

Continual Instruction Tuning (CIT)

Fine-tuned Language Models are Continual Learners [paper][code]
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [paper][code]
Large-scale Lifelong Learning of In-context Instructions and How to Tackle It [paper]
CITB: A Benchmark for Continual Instruction Tuning [paper][code]
Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal [paper]
Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning [paper]
ConTinTin: Continual Learning from Task Instructions [paper]
Orthogonal Subspace Learning for Language Model Continual Learning [paper][code]
SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models [paper]
InsCL: A Data-efficient Continual Learning Paradigm for Fine-tuning Large Language Models with Instructions [paper]

Continual Model Refinement (CMR)

⭐ UniAdapt: A Universal Adapter for Knowledge Calibration [paper]
LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models [paper]
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models [paper][code]
Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors [paper][code]
On Continual Model Refinement in Out-of-Distribution Data Streams [paper][code][project]
Melo: Enhancing model editing with neuron-indexed dynamic lora [paper][code]
Larimar: Large language models with episodic memory control [paper]
Wilke: Wise-layer knowledge editor for lifelong knowledge editing [paper]

Continual Model Alignment (CMA)

Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [paper]
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment [paper][code]
Alpaca: A Strong, Replicable Instruction-Following Model [project] [code]
Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems [paper] [code]
Training language models to follow instructions with human feedback (NeurIPS 2022) [paper]
Direct preference optimization: Your language model is secretly a reward model (NeurIPS 2023) [paper]
Copf: Continual learning human preference through optimal policy fitting [paper]
CPPO: Continual Learning for Reinforcement Learning with Human Feedback (ICLR 2024) [paper]
A Moral Imperative: The Need for Continual Superalignment of Large Language Models [paper]
Mitigating the Alignment Tax of RLHF [paper]

Continual Multimodal LLMs (CMLLMs)

⭐ ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy [paper][code]
⭐ Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models [paper][code]
CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [paper]
Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments [paper]
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [paper]
CLIP model is an Efficient Online Lifelong Learner [paper]
CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [paper][code]
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters (CVPR 2024) [paper][code]
CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning [paper]
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models [paper]
Investigating the Catastrophic Forgetting in Multimodal Large Language Models (PMLR 2024) [paper]
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models [paper] [code]
Visual Instruction Tuning (NeurIPS 2023, Oral) [paper] [code]
Continual Instruction Tuning for Large Multimodal Models [paper]
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model [paper] [code]
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models [paper]
Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration [paper] [code]

Continual LLMs Miscs

⭐ Scalable Data Ablation Approximations for Language Models through Modular Training and Merging [paper]
How Do Large Language Models Acquire Factual Knowledge During Pretraining? [paper]
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [paper][code]
Evaluating the External and Parametric Knowledge Fusion of Large Language Models [paper]
Demystifying Forgetting in Language Model Fine-Tuning with Statistical Analysis of Example Associations [paper]
AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees [paper]
COPAL: Continual Pruning in Large Language Generative Models [paper]
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models [paper][code]
Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training [paper][code]

Reference

If you find our survey or this collection of papers useful, please consider citing our work by

@article{shi2024continual,
  title={Continual Learning of Large Language Models: A Comprehensive Survey},
  author={Shi, Haizhou and 
          Xu, Zihao and 
          Wang, Hengyi and 
          Qin, Weiyi and 
          Wang, Wenyuan and 
          Wang, Yibin and 
          Wang, Zifeng and 
          Ebrahimi, Sayna and 
          Wang, Hao},
  journal={arXiv preprint arXiv:2404.16789},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
fig		fig
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continual Learning of Large Language Models: A Comprehensive Survey

Update History

Table of Contents

Relevant Survey Papers

Continual Pre-Training of LLMs (CPT)

Domain-Adaptive Pre-Training of LLMs (DAP)

For General Domains

Legal Domain

Medical Domain

Financial Domain

Scientific Domain

Code Domain

Language Domain

Other Domains

Continual Fine-Tuning of LLMs (CFT)

General Continual Fine-Tuning

Continual Instruction Tuning (CIT)

Continual Model Refinement (CMR)

Continual Model Alignment (CMA)

Continual Multimodal LLMs (CMLLMs)

Continual LLMs Miscs

Reference

About

Releases

Packages

Contributors 6

Wang-ML-Lab/llm-continual-learning-survey

Folders and files

Latest commit

History

Repository files navigation

Continual Learning of Large Language Models: A Comprehensive Survey

Update History

Table of Contents

Relevant Survey Papers

Continual Pre-Training of LLMs (CPT)

Domain-Adaptive Pre-Training of LLMs (DAP)

For General Domains

Legal Domain

Medical Domain

Financial Domain

Scientific Domain

Code Domain

Language Domain

Other Domains

Continual Fine-Tuning of LLMs (CFT)

General Continual Fine-Tuning

Continual Instruction Tuning (CIT)

Continual Model Refinement (CMR)

Continual Model Alignment (CMA)

Continual Multimodal LLMs (CMLLMs)

Continual LLMs Miscs

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Packages