# AlgorithmConfig API
首先创建 AlgorithmConfig 的一个实例，然后调用其某些方法来设置各种配置选项。RLlib 在其所有代码部分使用以下符合 black 标准的格式。

In [1]:
from ray.rllib.algorithms.algorithm_config import AlgorithmConfig

config = (
    # Create an `AlgorithmConfig` instance.
    AlgorithmConfig()
    # Change the learning rate.
    .training(lr=0.0005)
    # Change the number of Learner actors.
    .learners(num_learners=2)
)
config.environment(env="CartPole-v1")  # call the proper method

<ray.rllib.algorithms.algorithm_config.AlgorithmConfig at 0x7665c42fe3b0>

## 算法特定的配置类
实践中不会直接使用基础 AlgorithmConfig 类，而总是使用其算法特定的子类，例如 PPOConfig。每个子类都有自己的一组 additional arguments 用于 training() 方法。


In [None]:
from ray.rllib.algorithms.impala import IMPALAConfig

config = (
    # Create an `IMPALAConfig` instance.
    IMPALAConfig()
    # Specify the RL environment.
    .environment("CartPole-v1")
    # Change the learning rate.
    .training(lr=0.0004)
)

# Change an IMPALA-specific setting (the entropy coefficient).
config.training(entropy_coeff=0.01)

# Build the algorithm instance.
impala = config.build_algo()

# Further alter the config without affecting the previously built IMPALA object ...
config.training(lr=0.00123)
# ... and build a new IMPALA from it.
another_impala = config.build_algo()


`UnifiedLogger` will be removed in Ray 2.7.
  return UnifiedLogger(config, logdir, loggers=None)
The `JsonLogger interface is deprecated in favor of the `ray.tune.json.JsonLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
The `CSVLogger interface is deprecated in favor of the `ray.tune.csv.CSVLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
The `TBXLogger interface is deprecated in favor of the `ray.tune.tensorboardx.TBXLoggerCallback` interface and will be removed in Ray 2.7.
  self._loggers.append(cls(self.config, self.logdir, self.trial))
2025-08-06 17:06:48,839	INFO worker.py:1927 -- Started a local Ray instance.
[2025-08-06 17:06:50,660 E 3945021 3945021] core_worker.cc:2740: Actor with class name: 'SingleAgentEnvRunner' and ID: 'd5bac1ed88e14b9dd4eb48d901000000' has constructor arguments in the object store and max_restarts > 0. If t

[33m(raylet)[0m [2025-08-06 17:11:48,775 E 3945255 3945255] (raylet) node_manager.cc:3041: 18 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 61fd26e6f3808fe0e771dcc37032101039e669fcb13ae215856e2c04, IP: 198.18.0.1) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 198.18.0.1`
[33m(raylet)[0m 
[33m(raylet)[0m Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
[33m(raylet)[0m [2025-08-06 17:12:48,779 E 3945255 3945255] (raylet) node_manager.cc:30

In [2]:
from ray import tune

tuner = tune.Tuner(
    "IMPALA",
    param_space=config,  # <- your RLlib AlgorithmConfig object
    run_config=tune.RunConfig(stop={"num_env_steps_sampled_lifetime": 4000}),
)
# Run the experiment with Ray Tune.
results = tuner.fit()

0,1
Current time:,2025-08-06 17:20:54
Running for:,00:00:39.63
Memory:,14.8/15.3 GiB

Trial name,status,loc,iter,total time (s),mean_num_learner_gro up_update_called,...calls_since_last_ synch_worker_weights,num_training_step_ca lls_per_iteration
IMPALA_CartPole-v1_8f71d_00000,TERMINATED,198.18.0.1:3984719,1,10.0061,0.00743095,35,3184


2025-08-06 17:20:54,176	INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/home/robotarm/ray_results/IMPALA_2025-08-06_17-20-14' in 0.0252s.
2025-08-06 17:20:54,214	INFO tune.py:1041 -- Total run time: 39.70 seconds (39.60 seconds for the tuning loop).


## 通用配置设置
大多数配置设置是通用的，适用于 RLlib 的所有 Algorithm 类。以下部分将引导您了解用户在深入研究其他配置设置和开始超参数微调之前应密切关注的最重要的配置设置。
### RL 环境
config.environment("Humanoid-v5")
### 学习率 lr
config.training(lr=0.0001)
### 训练批次大小
config.training(train_batch_size_per_learner=256)
### 折扣因子 gamma
config.training(gamma=0.995)
### 使用 num_env_runners 和 num_learners 进行扩展
config.env_runners(num_env_runners=4)

'''Also use `num_envs_per_env_runner` to vectorize your environment on each EnvRunner actor.
Note that this option is only available in single-agent setups.
The Ray Team is working on a solution for this restriction. '''

config.env_runners(num_envs_per_env_runner=10)

config.learners(num_learners=2)
### 禁用 explore 行为
'''Disable exploration behavior.
When False, the EnvRunner calls `forward_inference()` on the RLModule to compute
actions instead of `forward_exploration()`. '''

config.env_runners(explore=False)
### Rollout 长度
config.env_runners(rollout_fragment_length=50)