Skip to content

Goal Radius Randomization#299

Merged
riccardosavorgnan merged 6 commits into3.0_betafrom
ricky/goal_radius_randomization
Feb 16, 2026
Merged

Goal Radius Randomization#299
riccardosavorgnan merged 6 commits into3.0_betafrom
ricky/goal_radius_randomization

Conversation

@riccardosavorgnan
Copy link
Collaborator

@riccardosavorgnan riccardosavorgnan commented Feb 16, 2026

Goal radius randomization. Based on the work by Kevin; remaking this branch was simpler than rebasing the other branch after all the merge conflicts we had.

  • first commit
  • Working version with rendering working as well
  • commit with config comments

@riccardosavorgnan riccardosavorgnan marked this pull request as ready for review February 16, 2026 01:22
@greptile-apps
Copy link

greptile-apps bot commented Feb 16, 2026

Greptile Summary

This PR adds per-goal randomization of the goal radius for each agent. Previously, the goal radius was a fixed global value (env->goal_radius). Now, when reward_randomization is enabled, each time a new goal is sampled (via sample_new_goal), the goal radius is independently randomized from a uniform distribution over [reward_bound_goal_radius_min, reward_bound_goal_radius_max] (configured as 2.0–12.0m). When randomization is disabled, the fixed goal_radius value is used as before.

  • New sample_new_goal_radius function added and called at the end of sample_new_goal to decouple goal radius sampling from the rest of reward coefficient generation
  • Distance check in c_step and all three rendering paths updated to use the per-agent reward_coefs[REWARD_COEF_GOAL_RADIUS] instead of the global env->goal_radius
  • Goal radius assignment is duplicated between generate_reward_coefs (called at init/reset) and the new sample_new_goal_radius (called per new goal) — consider consolidating to reduce redundancy
  • Config comment updated to note that the fixed goal_radius value is only active when reward_randomization = 0

Confidence Score: 4/5

  • This PR is safe to merge — the logic is correct and the changes are well-scoped to goal radius randomization.
  • Score of 4 reflects that the core logic is correct and consistent across all code paths (step logic, rendering modes). The only concerns are stylistic: duplicated goal radius assignment between two functions and mixed comment syntax in the INI file. No functional bugs or runtime risks identified.
  • pufferlib/ocean/drive/drive.h — verify that the duplicated goal radius logic between generate_reward_coefs and sample_new_goal_radius behaves as intended across all goal behavior modes.

Important Files Changed

Filename Overview
pufferlib/config/ocean/drive.ini Updated comment for goal_radius to clarify it's only active when reward_randomization = 0. Minor mixed comment syntax style issue (; and //).
pufferlib/ocean/drive/drive.h Adds sample_new_goal_radius function for per-goal radius randomization, updates distance check and all rendering paths to use per-agent radius. Logic is correct but introduces duplication with generate_reward_coefs.

Flowchart

flowchart TD
    A[Agent Init / Reset] -->|"generate_reward_coefs()"| B[Set all reward coefs including goal_radius]
    B --> C{goal_behavior?}
    C -->|GOAL_GENERATE_NEW| D[Agent reaches goal]
    D --> E["sample_new_goal()"]
    E --> F["sample_new_goal_radius()"]
    F --> G{reward_randomization?}
    G -->|Yes| H["random_uniform(min, max)"]
    G -->|No| I["env->goal_radius (fixed)"]
    H --> J[Set agent reward_coefs GOAL_RADIUS]
    I --> J
    J --> K["within_distance = distance < agent.reward_coefs[GOAL_RADIUS]"]
    C -->|GOAL_RESPAWN| L[Agent respawns]
    L -->|"generate_reward_coefs()"| M[Goal radius set at respawn]
    M --> K
    C -->|GOAL_STOP| N[Agent stops at goal]
    N --> K
Loading

Last reviewed commit: b73b835

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +469 to +479
static void sample_new_goal_radius(Drive *env, Agent *agent) {

if (env->reward_randomization) {
// Standard Uniform Randomization
agent->reward_coefs[REWARD_COEF_GOAL_RADIUS] = random_uniform(
env->reward_bounds[REWARD_COEF_GOAL_RADIUS].min_val, env->reward_bounds[REWARD_COEF_GOAL_RADIUS].max_val);
} else {
// Fixed coefficients
agent->reward_coefs[REWARD_COEF_GOAL_RADIUS] = env->goal_radius;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated goal radius logic with generate_reward_coefs
generate_reward_coefs (line 417-418 and 449) already sets REWARD_COEF_GOAL_RADIUS using the same logic. When goal_behavior == GOAL_GENERATE_NEW, the value set by generate_reward_coefs at init/reset is immediately overwritten by this function when sample_new_goal is first called.

This isn't a bug — the separation is intentional so the goal radius can be re-randomized per goal. But you may want to consider removing the REWARD_COEF_GOAL_RADIUS assignment from generate_reward_coefs and instead calling sample_new_goal_radius directly after generate_reward_coefs in the init/reset paths. This would make sample_new_goal_radius the single source of truth for goal radius assignment, reducing duplication.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link

@mpragnay mpragnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@riccardosavorgnan riccardosavorgnan merged commit dddf28e into 3.0_beta Feb 16, 2026
10 checks passed
@riccardosavorgnan riccardosavorgnan deleted the ricky/goal_radius_randomization branch February 16, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants