This is the official implementation of the paper "Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge", which was accepted for IJCAI'24.
The effectiveness of model training heavily relies on the quality of available training resources. However, budget constraints often impose limitations on data collection efforts. To tackle this challenge, we introduce causal exploration in this paper, a strategy that leverages the underlying causal knowledge for both data collection and model training. We, in particular, focus on enhancing the sample efficiency and reliability of the world model learning within the domain of task-agnostic reinforcement learning. During the exploration phase, the agent actively selects actions expected to yield causal insights most beneficial for world model training. Concurrently, the causal knowledge is acquired and incrementally refined with the ongoing collection of data. We demonstrate that causal exploration aids in learning accurate world models using fewer data and provide theoretical guarantees for its convergence. Empirical experiments, on both synthetic data and real-world applications, further validate the benefits of causal exploration.
Our key contributions are summarized as:
- In order to enhance the sample efficiency and reliability of model training with causal knowledge, we introduce a novel concept: causal exploration, and focus particularly on the domain of task-agnostic reinforcement learning.
- To efficiently learn and use causal structural constraints, we develop an online method for causal discovery and formulate the world model with explicit structural embeddings. During exploration, we train the dynamics model under a novel weight-sharing-decomposition schema that can avoid additional computational burden.
- Theoretically, we show that, given strong convexity and smoothness assumptions, our approach attains a superior convergence rate compared to non-causal methods. Empirical experiments further demonstrate the robustness of our online causal discovery method and validate the effectiveness of causal exploration across a range of demanding reinforcement learning environments.
Throughout the process, the agent, guided by policy
Assumption 1.
The following theorem shows a reduced error bound with causal exploration.
Theorem 1. Suppose Assumption 1 holds, and suppose the density of the causal matrix
The first inequality establishes an upper bound for
Synthetic Datasets.
We build our simulated environment following the state space model with controls. When the agent takes an action
To install, run
cd ./simulation
conda env create -f environment.yml
or
cd ./simulation
conda create -n your_env_name python=3.7
pip install -r requirements.txt
To train, run
python main.py
Traffic Signal Control.
Traffic signal control is an important means of mitigating congestion in traffic management. Compared to using fixed-duration traffic signals, an RL agent learns a policy to determine real-time traffic signal states based on current road conditions. The state observed by the agent at each time consists of five dimensions of information, namely the number of vehicles, queue length, average waiting time in each lane plus current and next traffic signal states. Action here is to decide whether to change the traffic signal state or not. For example, suppose the traffic signal is red at time
You may first need to install the packages or environments listed in './traffic', and then train the codes by running
python main.py
MuJoCo Tasks. We also evaluate causal exploration on the challenging MuJoCo tasks, where the state-action dimensions range from tens (Hopper-v2) to hundreds (Humanoid-v2). Implementation details and more experimental results including the identified causal structures are given in Supplementary_material.pdf.
You may first need to install the packages or environments listed in './mujoco', and then train the codes by running
python main.py
If you find this work useful for your research, please cite our paper:
@inproceedings{yang-boosting-2024,
title = {Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge},
author = {Yang, Yupei and Huang, Biwei and Tu, Shikui and Xu, Lei},
booktitle = {Proceedings of the Thirty-Third International Joint Conference on
Artificial Intelligence, {IJCAI-24}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
pages = {5344--5352},
year = {2024},
month = {8},
}
Our codes are partly based on the following GitHub repository: IntelliLight, Mujoco-Pytorch. Thanks for their awesome works.