Skip to content

Continual Learning of Large Language Models: A Comprehensive Survey

Notifications You must be signed in to change notification settings

Wang-ML-Lab/llm-continual-learning-survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Continual Learning of Large Language Models: A Comprehensive Survey

This is an updating survey for Continual Learning of Large Language Models (CL-LLMs), a constantly updated and extended version for the manuscript "Continual Learning of Large Language Models: A Comprehensive Survey".

Welcome to contribute to this survey by submitting a pull request or opening an issue!

Update History

  • [10/2024] (⭐) new papers: 09/2024 - 10/2024.
  • [09/2024] (πŸ”₯) new papers: 07/2024 - 09/2024.
  • [07/2024] new papers: 06/2024 - 07/2024.
  • [07/2024] the updated version of the paper has been released on arXiv.
  • [06/2024] new papers: 05/2024 - 06/2024.
  • [05/2024] new papers: 02/2024 - 05/2024.
  • [04/2024] initial release.

Table of Contents

Relevant Survey Papers

  • Towards Lifelong Learning of Large Language Models: A Survey [paper][code]
  • Recent Advances of Foundation Language Models-based Continual Learning: A Survey [paper]
  • A Comprehensive Survey of Continual Learning: Theory, Method and Application (TPAMI 2024) [paper]
  • Continual Learning for Large Language Models: A Survey [paper]
  • Continual Lifelong Learning in Natural Language Processing: A Survey (COLING 2020) [paper]
  • Continual Learning of Natural Language Processing Tasks: A Survey [paper]
  • A Survey on Knowledge Distillation of Large Language Models [paper]

Continual Pre-Training of LLMs (CPT)

  • ⭐ Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs [paper]
  • ⭐ A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language Models [paper]
  • πŸ”₯ A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio [paper]
  • πŸ”₯ Towards Effective and Efficient Continual Pre-training of Large Language Models [paper][code]
  • Bilingual Adaptation of Monolingual Foundation Models [paper]
  • Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment [paper]
  • Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale [paper]
  • LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training [paper][code]
  • Efficient Continual Pre-training by Mitigating the Stability Gap [paper][huggingface]
  • How Do Large Language Models Acquire Factual Knowledge During Pretraining? [paper]
  • DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion [paper]
  • MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning [paper][code]
  • Large Language Model Can Continue Evolving From Mistakes [paper]
  • Rho-1: Not All Tokens Are What You Need [paper][code]
  • Simple and Scalable Strategies to Continually Pre-train Large Language Models [paper]
  • Investigating Continual Pretraining in Large Language Models: Insights and Implications [paper]
  • Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization [paper][code]
  • TimeLMs: Diachronic Language Models from Twitter (ACL 2022, Demo Track) [paper][code]
  • Continual Pre-Training of Large Language Models: How to (re)warm your model? [paper]
  • Continual Learning Under Language Shift [paper]
  • Examining Forgetting in Continual Pre-training of Aligned Large Language Models [paper]
  • Towards Continual Knowledge Learning of Language Models (ICLR 2022) [paper][code]
  • Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora (NAACL 2022) [paper]
  • TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models (EMNLP 2022) [paper][code]
  • Continual Training of Language Models for Few-Shot Learning (EMNLP 2022) [paper][code]
  • ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding (AAAI 2020) [paper][code]
  • Dynamic Language Models for Continuously Evolving Content (KDD 2021) [paper]
  • Continual Pre-Training Mitigates Forgetting in Language and Vision [paper][code]
  • DEMix Layers: Disentangling Domains for Modular Language Modeling (NAACL 2022) [paper][code]
  • Time-Aware Language Models as Temporal Knowledge Bases (TACL 2022) [paper]
  • Recyclable Tuning for Continual Pre-training (ACL 2023 Findings) [paper][code]
  • Lifelong Language Pretraining with Distribution-Specialized Experts (ICML 2023) [paper]
  • ELLE: Efficient Lifelong Pre-training for Emerging Data (ACL 2022 Findings) [paper][code]

Domain-Adaptive Pre-Training of LLMs (DAP)

For General Domains

  • ⭐ DoPAMine: Domain-specific Pre-training Adaptation from seed-guided data Mining [paper]
  • πŸ”₯ Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models [paper]
  • CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models [paper]
  • Task Oriented In-Domain Data Augmentation [paper]
  • Instruction Pre-Training: Language Models are Supervised Multitask Learners [paper][code][huggingface]
  • D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models [paper]
  • BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [paper]
  • Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains [paper]
  • Adapting Large Language Models via Reading Comprehension (ICLR 2024) [paper][code]

Legal Domain

  • SaulLM-7B: A pioneering Large Language Model for Law [paper][huggingface]
  • Lawyer LLaMA Technical Report [paper]

Medical Domain

  • PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications [paper]
  • Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare [paper][project][huggingface]
  • Me LLaMA: Foundation Large Language Models for Medical Applications [paper][code]
  • BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine [paper][code]
  • Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering [paper]
  • PMC-LLaMA: Towards Building Open-source Language Models for Medicine [paper][code]
  • AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model [paper]
  • Continual Domain-Tuning for Pretrained Language Models [paper]
  • HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs [paper][code]

Financial Domain

  • ⭐ The Construction of Instruction-tuned LLMs for Finance without Instruction Data Using Continual Pretraining and Model Merging [paper][huggingface]
  • πŸ”₯ Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications [paper]
  • Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation [paper][huggingface]
  • Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training [paper]
  • Pretraining and Updating Language- and Domain-specific Large Language Model: A Case Study in Japanese Business Domain [paper][huggingface]
  • BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark [paper][code]
  • CFGPT: Chinese Financial Assistant with Large Language Model [paper][code]
  • Efficient Continual Pre-training for Building Domain Specific Large Language Models [paper]
  • WeaverBird: Empowering Financial Decision-Making with Large Language Model, Knowledge Base, and Search Engine [paper][code][huggingface][demo]
  • XuanYuan 2.0: A Large Chinese Financial Chat Model with Hundreds of Billions Parameters [paper][huggingface]

Scientific Domain

  • ⭐ MELT: Materials-aware Continued Pre-training for Language Model Adaptation to Materials Science [paper][code]
  • ⭐ AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy [paper][huggingface]
  • πŸ”₯ SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding [paper]
  • PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [paper][code]
  • ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change [paper][hugginface]
  • AstroLLaMA: Towards Specialized Foundation Models in Astronomy [paper]
  • OceanGPT: A Large Language Model for Ocean Science Tasks [paper][code]
  • K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization [paper][code][huggingface]
  • MarineGPT: Unlocking Secrets of "Ocean" to the Public [paper][code]
  • GeoGalactica: A Scientific Large Language Model in Geoscience [paper][code][huggingface]
  • Llemma: An Open Language Model For Mathematics [paper][code][huggingface]
  • PLLaMa: An Open-source Large Language Model for Plant Science [paper][code][huggingface]

Code Domain

  • CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis [paper][code][huggingface]
  • Code Needs Comments: Enhancing Code LLMs with Comment Augmentation [code]
  • StarCoder: may the source be with you! [ppaer][code]
  • DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence [paper][code][huggingface]
  • IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators [paper][code]
  • Code Llama: Open Foundation Models for Code [paper][code]

Language Domain

  • πŸ”₯ RedWhale: An Adapted Korean LLM Through Efficient Continual Pretraining [paper]
  • Unlocking the Potential of Model Merging for Low-Resource Languages [paper]
  • Mitigating Catastrophic Forgetting in Language Transfer via Model Merging [paper]
  • Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data [paper]
  • BAMBINO-LM: (Bilingual-)Human-Inspired Continual Pretraining of BabyLM [paper]
  • InstructionCP: A fast approach to transfer Large Language Models into target language [paper]
  • Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities [paper]
  • Sailor: Open Language Models for South-East Asia [paper][code]
  • Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order [paper][huggingface]

Other Domains

  • LLaMA Pro: Progressive LLaMA with Block Expansion [paper][code][huggingface]
  • ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning [paper][code]
  • Pre-training Text-to-Text Transformers for Concept-centric Common Sense [paper][code][project]
  • Don't Stop Pretraining: Adapt Language Models to Domains and Tasks (ACL 2020) [paper][code]
  • EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data [paper]

Continual Fine-Tuning of LLMs (CFT)

General Continual Fine-Tuning

  • ⭐ Preserving Generalization of Language models in Few-shot Continual Relation Extraction [paper]
  • πŸ”₯ MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning [paper]
  • Learn it or Leave it: Module Composition and Pruning for Continual Learning [paper]
  • Unlocking Continual Learning Abilities in Language Models [paper][code]
  • Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning (NeurIPS 2021) [paper][code]
  • Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study (ICLR 2023) [paper][code]
  • CIRCLE: Continual Repair across Programming Languages (ISSTA 2022) [paper]
  • ConPET: Continual Parameter-Efficient Tuning for Large Language Models [paper][code]
  • Enhancing Continual Learning with Global Prototypes: Counteracting Negative Representation Drift [paper]
  • Investigating Forgetting in Pre-Trained Representations Through Continual Learning [paper]
  • Learn or Recall? Revisiting Incremental Learning with Pre-trained Language Models [paper][code]
  • LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5 (ICLR 2022) [paper][code]
  • On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code [paper]
  • Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning (ACL 2023 Findings) [paper]
  • Parameterizing Context: Unleashing the Power of Parameter-Efficient Fine-Tuning and In-Context Tuning for Continual Table Semantic Parsing (NeurIPS 2023) [paper][code]

Continual Instruction Tuning (CIT)

  • Fine-tuned Language Models are Continual Learners [paper][code]
  • TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [paper][code]
  • Large-scale Lifelong Learning of In-context Instructions and How to Tackle It [paper]
  • CITB: A Benchmark for Continual Instruction Tuning [paper][code]
  • Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal [paper]
  • Don't Half-listen: Capturing Key-part Information in Continual Instruction Tuning [paper]
  • ConTinTin: Continual Learning from Task Instructions [paper]
  • Orthogonal Subspace Learning for Language Model Continual Learning [paper][code]
  • SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models [paper]
  • InsCL: A Data-efficient Continual Learning Paradigm for Fine-tuning Large Language Models with Instructions [paper]

Continual Model Refinement (CMR)

  • ⭐ UniAdapt: A Universal Adapter for Knowledge Calibration [paper]
  • LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models [paper]
  • WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models [paper][code]
  • Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors [paper][code]
  • On Continual Model Refinement in Out-of-Distribution Data Streams [paper][code][project]
  • Melo: Enhancing model editing with neuron-indexed dynamic lora [paper][code]
  • Larimar: Large language models with episodic memory control [paper]
  • Wilke: Wise-layer knowledge editor for lifelong knowledge editing [paper]

Continual Model Alignment (CMA)

  • Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [paper]
  • Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment [paper][code]
  • Alpaca: A Strong, Replicable Instruction-Following Model [project] [code]
  • Self-training Improves Pre-training for Few-shot Learning in Task-oriented Dialog Systems [paper] [code]
  • Training language models to follow instructions with human feedback (NeurIPS 2022) [paper]
  • Direct preference optimization: Your language model is secretly a reward model (NeurIPS 2023) [paper]
  • Copf: Continual learning human preference through optimal policy fitting [paper]
  • CPPO: Continual Learning for Reinforcement Learning with Human Feedback (ICLR 2024) [paper]
  • A Moral Imperative: The Need for Continual Superalignment of Large Language Models [paper]
  • Mitigating the Alignment Tax of RLHF [paper]

Continual Multimodal LLMs (CMLLMs)

  • ⭐ ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy [paper][code]
  • ⭐ Model Developmental Safety: A Safety-Centric Method and Applications in Vision-Language Models [paper][code]
  • CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning [paper]
  • Continually Learn to Map Visual Concepts to Large Language Models in Resource-constrained Environments [paper]
  • Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [paper]
  • CLIP model is an Efficient Online Lifelong Learner [paper]
  • CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [paper][code]
  • Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters (CVPR 2024) [paper][code]
  • CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning [paper]
  • Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models [paper]
  • Investigating the Catastrophic Forgetting in Multimodal Large Language Models (PMLR 2024) [paper]
  • MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models [paper] [code]
  • Visual Instruction Tuning (NeurIPS 2023, Oral) [paper] [code]
  • Continual Instruction Tuning for Large Multimodal Models [paper]
  • CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model [paper] [code]
  • Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models [paper]
  • Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration [paper] [code]

Continual LLMs Miscs

  • ⭐ Scalable Data Ablation Approximations for Language Models through Modular Training and Merging [paper]
  • How Do Large Language Models Acquire Factual Knowledge During Pretraining? [paper]
  • Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [paper][code]
  • Evaluating the External and Parametric Knowledge Fusion of Large Language Models [paper]
  • Demystifying Forgetting in Language Model Fine-Tuning with Statistical Analysis of Example Associations [paper]
  • AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees [paper]
  • COPAL: Continual Pruning in Large Language Generative Models [paper]
  • HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models [paper][code]
  • Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training [paper][code]

Reference

If you find our survey or this collection of papers useful, please consider citing our work by

@article{shi2024continual,
  title={Continual Learning of Large Language Models: A Comprehensive Survey},
  author={Shi, Haizhou and 
          Xu, Zihao and 
          Wang, Hengyi and 
          Qin, Weiyi and 
          Wang, Wenyuan and 
          Wang, Yibin and 
          Wang, Zifeng and 
          Ebrahimi, Sayna and 
          Wang, Hao},
  journal={arXiv preprint arXiv:2404.16789},
  year={2024}
}

About

Continual Learning of Large Language Models: A Comprehensive Survey

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published