Add changes from vcha/stable by vcharraut · Pull Request #436 · Emerge-Lab/PufferDrive

vcharraut · 2026-05-21T16:25:26Z

No description provided.

… clarity - Added 'amp' option to default.ini for automatic mixed precision support. - Introduced 'resume_state_path' in default.ini for state restoration. - Updated compilation settings in default.ini for better compatibility. - Refined Waypoint structure in datatypes.h for clarity. - Modified Drive class in drive.h to improve collision handling and agent initialization. - Enhanced observation handling in drive.py, including padded observations and traffic control features. - Implemented utility functions in pufferl.py for better device management and state handling. - Improved training state loading and saving mechanisms in PuffeRL class. - Adjusted training logic to support advanced features like mixed precision and dynamic batching.

…d training evaluation

…resource management

Copilot

Pull request overview

This PR appears to merge in “stable” changes that extend PufferDrive’s training loop with improved checkpoint/resume support, additional evaluation utilities (multi-scenario evaluation + CSV export), and several Drive environment/config updates.

Changes:

Extend PuffeRL with precision/AMP handling, state dict key cleaning, richer checkpoint state, and resume-from-state support.
Add standalone multi-scenario evaluation helpers (config merging, overrides, CSV export, coverage verification, logging).
Update Drive env observation construction/padding and configs (including new INI defaults and new weight config YAMLs).

Reviewed changes

Copilot reviewed 9 out of 45 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`weigths/tomate/config.yaml`	Adds a new experiment/config preset for training/eval.
`weigths/salade/config.yaml`	Adds another experiment/config preset for training/eval.
`pufferlib/pufferl.py`	Major training/eval refactor: AMP/precision validation, compile tweaks, checkpoint state v2 + RNG capture/restore, resume, and new multi-scenario eval utilities.
`pufferlib/ocean/torch.py`	Refactors encoder+pooling and aligns one-hot dtypes with continuous features.
`pufferlib/ocean/drive/drive.py`	Adjusts `control_mode` error message text.
`pufferlib/ocean/drive/drive.h`	Changes observation padding strategy and removes a zero-drivable-cells guard; minor control logic tweak.
`pufferlib/ocean/drive/datatypes.h`	Edits a struct field comment.
`pufferlib/config/ocean/drive.ini`	Updates map_dir and adds an `[eval]` section with multi-scenario eval config.
`pufferlib/config/default.ini`	Adds `amp` and `resume_state_path` defaults; changes torch.compile defaults.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+[eval]
+; Set to True to enable periodic multi-scenario evaluation during training
+multi_scenario_eval = False
+; Frequency of evaluation during training (in epochs)
+eval_interval = 25
+num_agents = 512
+; Batch size for eval_multi_scenarios (number of scenarios per batch)
+; Path to dataset used for evaluation
+map_dir = "pufferlib/resources/drive/binaries/eval"
+; Simulation mode for evaluation: "gigaflow" or "replay"
+multi_scenario_simulation_mode = "replay"
+; Total number of scenarios to evaluate
+multi_scenario_num_scenarios = 250
+backend = PufferEnv


        else:
            raise ValueError(
-                f"control_mode must be one of 'control_vehicles', 'control_agents', 'control_wosac', or 'control_sdc_only'. Got: {self.control_mode_str}"
+                f"control_mode must be one of 'control_vehicles', 'control_wosac', or 'control_agents'. Got: {self.control_mode_str}"


    float sin_heading; // Cached sinf(heading) - set in build_path
    float kappa;       // Curvature at this point
-    int lane_idx;      // Index of the lane this waypoint belongs to (for GT path) or closest to (for expert path)
+    int lane_idx;      // Index of the lane this waypoint


+    if model_path:
+        experiment_dir = os.path.dirname(os.path.dirname(model_path))
+        config_yaml_path = os.path.join(experiment_dir, "config.yaml")
+        EXCLUDE_KEYS = eval_overrides["env"].keys()


+                # Multi-worker backend returns infos as list of lists (one per worker)
+                if infos and infos[0]:
+                    for sub_env in infos:
+                        for env_idx, summary in enumerate(sub_env):
+                            env_map_name = summary["map_name"].split("/")[-1].split(".")[0]
+                            summary["episode_id"] = env_idx
+                            summary["map_name"] = env_map_name
+                            scenarios_processed += 1
+                            pbar.update(1)
+
+                            for k, v in summary.items():
+                                if k not in global_infos:
+                                    global_infos[k] = []
+                                global_infos[k].append(v)
+


+    try:
+        df_episodes = pd.DataFrame(global_infos)
+        first_cols = ["episode_id", "map_name"]
+        other_cols = [col for col in df_episodes.columns if col not in first_cols]
+        new_col_order = first_cols + other_cols
+        df_episodes = df_episodes[new_col_order]
+


-            return;
-        }
        int num_agents_to_create = env->num_controllable_agents;



+static inline void fill_padded_observation_rows(float *obs, int rows, int features) {
+    for (int r = 0; r < rows; r++) {
+        for (int c = 0; c < features; c++) {
+            obs[r * features + c] = PADDED_OBSERVATION_VALUE;


Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Add one-line comments to fill_padded_observation_rows / fill_padded_traffic_control_rows, and pull the road-edge heading fold into a reusable wrap_heading(angle) helper (folds a heading into [-pi/2, pi/2] so opposite directions map to one orientation). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The message omitted control_sdc_only (a valid mode → control_mode=3); list all four accepted values. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The folded range is [-pi/2, pi/2], not the [-pi, pi] that "wrap" implies, so the helper name was misleading. Inline it back at the road-edge block and replace it with a comment that states why the fold exists. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

eugenevinitsky · 2026-05-22T14:16:04Z

+                logits, value = self.policy.forward_eval(o_device.to(self.observations.dtype), state)
+                logits = logits_to_float(logits)
+                value = value.float()


@vcharraut why are we doing a cast here?

To support bfloat16 training

eugenevinitsky · 2026-05-22T14:17:50Z

+            clipfrac = ((ratio - 1.0).abs() > config["clip_coef"]).float().mean()

-        mb_adv = (mb_adv - mb_adv.mean()) / (mb_adv.std(unbiased=unbiased_std) + 1e-8)
+        mb_adv = (mb_adv - mb_adv.mean()) / (mb_adv.std(unbiased=False) + 1e-8)


why this change?

The value was False for PPO w/ adv filtering and True w/ adv sampling; with refactoring I've put False by default, there is a not a big thought behind it.

…#437)

vcharraut added 5 commits May 21, 2026 18:00

Add multi-scenario evaluation configuration and functions for enhance…

9b1485f

…d training evaluation

Update map directory and remove deprecated binary files for improved …

359a406

…resource management

Add Salade

bcc3d5c

Add Tomate

94dbd7c

Copilot AI review requested due to automatic review settings May 21, 2026 16:25

Copilot started reviewing on behalf of vcharraut May 21, 2026 16:25 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Eugene Vinitsky and others added 4 commits May 21, 2026 12:07

Fix misspelled weights directory: weigths -> weights

0dbdd23

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

drive.py: list control_sdc_only in the control_mode error message

5fbb741

The message omitted control_sdc_only (a valid mode → control_mode=3); list all four accepted values. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

eugenevinitsky reviewed May 22, 2026

View reviewed changes

Comment thread pufferlib/pufferl.py Outdated

eugenevinitsky reviewed May 22, 2026

View reviewed changes

Comment thread pufferlib/pufferl.py

eugenevinitsky reviewed May 22, 2026

View reviewed changes

vcharraut and others added 3 commits May 22, 2026 17:11

Remove duplicated cast

e2554f7

Remove useless casts

72cd5cc

eval: consolidate onto the unified Evaluator pipeline (+ viewer/docs) (…

0f8a730

…#437)

eugenevinitsky merged commit cc5e0ea into emerge/temp_training May 22, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add changes from vcha/stable#436

Add changes from vcha/stable#436
eugenevinitsky merged 12 commits into
emerge/temp_trainingfrom
vcha/update

vcharraut commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

eugenevinitsky May 22, 2026

Uh oh!

vcharraut May 22, 2026

Uh oh!

Uh oh!

Uh oh!

eugenevinitsky May 22, 2026

Uh oh!

vcharraut May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vcharraut commented May 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

eugenevinitsky May 22, 2026

Choose a reason for hiding this comment

Uh oh!

vcharraut May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eugenevinitsky May 22, 2026

Choose a reason for hiding this comment

Uh oh!

vcharraut May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants