Official Implementation of Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning (ICLR 2026)
This repository contains the Pytorch implementation of our FAME continual RL algorithm "Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning". We employ our FAME algorithm in the environments of MinAtar, Atari, and MetaWorld, respectively.
pip install -r MinAtar/requirements.txt # python=3.10For example, we can run the baseline algorithms on the sequence 3 of tasks.
# Reset
python DQN.py --lr1=1e-5 --seed=0 --save --save-model --seq 3 --reset 1 --gpu 0
# Finetune
python DQN.py --lr1=1e-5 --seed=0 --save --save-model --seq 3 --reset 0 --gpu 0
# Multi-head
python DQN_multi_task.py --lr1=1e-5 --seed=0 --save --save-model --seq 3 --gpu 0
# LargeBuffer
python DQN_large_buffer.py --lr1=1e-4 --seed=0 --save --save-model --seq 3 --gpu 0
# PT-DQN
python PT_DQN_half.py --lr1=1e-8 --lr2=1e-4 --decay=0.75 --seed=0 --save --save-model --seq 3 --gpu 0
| Argument | Value | Description |
|---|---|---|
lr1 |
1e-3 |
Learning rate for Fast Learner |
lr2 |
1e-5 |
Learning rate for Meta Learner |
size_fast2meta |
12000 |
N: Number of samples collected in each fast learning to replay buffer (12000 × 7 = 84000 < 100000) |
size_meta |
100000 |
Size of meta replay buffer (same as original simple DQN setting) |
warmstep |
50000 |
L: Warm-up step with behevior cloning |
detection_step |
1200 |
n: number of steps for policy evaluation |
lambda_reg |
1.0 |
Regularization hyperparameter in behavior cloning |
use_ttest |
0 |
0: empirical ranking, 1: t-test for one-vs-all hypothesis test |
python FAME.py --lr1=1e-3 --lr2=1e-5 --size_fast2meta 12000 --detection_step 600 --seed=1 --save --save-model --seq 1 --gpu 0 --warmstep 50000 --lambda_reg 1.0
We can run the bash file, but we suggest to tailor the bash code with your compute resources.
bash run.sh
This implementation in MinAtar is adapted from the released code of the paper Prediction and Control in Continual Reinforcement Learning (NeurIPS 2023).
pip install -r Atari/requirements.txt| Argument | Value | Description |
|---|---|---|
--algorithm |
from-scratch |
Reset |
finetune |
Finetune | |
packnet |
Packnet | |
prog-net |
ProgressiveNet | |
--env |
ALE/SpaceInvaders-v5 |
Sequence in Spaceinvader |
ALE/Freeway-v5 |
Sequence in Freeway |
# example: packnet
python run_experiments.py --algorithm packnet --env ALE/SpaceInvaders-v5 --seed 1 --start-mode 0
We use a set of default hyper-parameters without sweeping them, but it already suggests the superiority in the performance.
In the original code, we run run_experiments.py to conduct all baselines algorithms. A straightforward application of this implementation logic to FAME is not applicable as the meta buffer will require the huge memory resources. Therefore, we modify the implementation log and run the FAME algorithm via run_ppo_FAME.py.
python run_ppo_FAME.py --model-type=FAME --env-id=ALE/Freeway-v5 --seed=0 --save-dir=agents --total-timesteps=1000000 --epoch_meta 200
python run_ppo_FAME.py --model-type=FAME --env-id=ALE/SpaceInvaders-v5 --seed=0 --save-dir=agents --total-timesteps=1000000 --epoch_meta 200
python process_results_pre.py # from event data to csv
python process_results.py # from csv to CRL-relavant metrics
This implementation in Atari is adapted from the released code of the paper Self-composing policies for scalable continual reinforcement learning (ICML 2024).
pip install -r Metaworld/requirements.txtNote: The requirements file includes mujoco and metaworld. Please ensure you have the necessary system dependencies for MuJoCo installed.
To run the main experiments using FAME and standard baselines (Reset, Average, Finetune), use the following command:
python test_main.py --seed 0 --method buffer --gpu 0 --store_traj_num 10 --use_ttest 1 --env metaworld_sequence_set18| Argument | Value | Description |
|---|---|---|
--method |
buffer |
FAME-KL (Our method) |
buffer_wd |
FAME-MD (Our method variant) | |
independent |
Reset (Baseline: Train from scratch) | |
average |
Average (Baseline: Parameter averaging) | |
continue |
Finetune (Baseline: Continual learning without regularization) | |
--env |
metaworld_sequence_set6 |
Sequence of 6 tasks |
metaworld_sequence_set12 |
Sequence of 12 tasks | |
metaworld_sequence_set18 |
Sequence of 18 tasks | |
metaworld_sequence_set22 |
Sequence of CW10 |
These baselines are located in a separate directory.
Important: You must first run the simple algorithm (SAC) to generate the initial model for the first task, which is required by other baselines.
-
Navigate to the experiment directory:
cd Metaworld/baselines_packnet_progressivenet_componet/experiments/meta-world -
Run the experiment:
python run_experiments.py --algorithm simple --seed 0 --start-mode 0 --task-sequence 6
| Argument | Value | Description |
|---|---|---|
--algorithm |
simple |
SAC (Standard Soft Actor-Critic) |
packnet |
PackNet | |
prognet |
ProgressiveNet | |
componet |
CompoNet |
Please cite our paper if you use our implementation in your research:
@inproceedings{
sun2026principled,
title={Principled Fast and Meta Knowledge Learners for Continual Reinforcement Learning},
author={Ke Sun and Hongming Zhang and Jun Jin and Chao Gao and Xi Chen and Wulong Liu and Linglong Kong},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=loNTDX3wTn}
}
