Obstacle-aware reaching for a FANUC LR Mate 200iC/5L + SG2 tool using Isaac Lab DirectRLEnv.
Train a PPO policy to reach a target TCP position while avoiding randomized obstacles, including partially or fully infeasible layouts.
- Overview
- Key features
- Components
- System architecture
- Requirements
- Quick start
- Run options
- Verify your setup
- Task details
- Environment design
- Invalid scenes and metrics
- Key configuration files
- Asset conversion (URDF to USD)
- Repo layout
- Roadmap
- References
Task ID: Isaac-Reach-Fanuc-v0
This repository implements a direct-workflow RL manipulation task (Isaac Lab DirectRLEnv) for a 6-DOF industrial arm:
- Reach a sampled 3D target TCP pose
- Avoid collisions with 0 to 2 rigid obstacles
- Produce smooth, bounded joint motion under randomized scene layouts
The design explicitly addresses a common manipulation RL reality: randomized scenes can create targets that are kinematically reachable in free space but geometrically blocked once obstacles are introduced.
β
Direct workflow environment (custom reset, reward, dones, sampling logic)
β
PPO training using RSL-RL scripts
β
Parallel training friendly (headless mode supported)
β
Scene randomization (targets + obstacles) with retry budgets
β
Curriculum staging (0 obstacles to 2 obstacles) driven by success rate
β
Invalid-scene handling to prevent silent training corruption
β
URDF to USD conversion workflow for Isaac Sim articulations
| Component | Purpose |
|---|---|
| FANUC LR Mate 200iC/5L (sim) | 6-DOF industrial arm used for reaching |
| SG2 tool (sim) | End-effector geometry and TCP frame correctness |
| Isaac Sim 5.1 | Physics (PhysX), contacts, rigid bodies, USD pipeline |
| Isaac Lab 5.1 | RL environment scaffolding (DirectRLEnv) and vectorized execution |
| RSL-RL (PPO) | Training loop, logging, checkpointing, policy rollout |
| USD assets | Robot articulation asset used in the environment |
- Install the project as an editable Isaac Lab extension
- Confirm the task is registered
- Run a no-learning sanity check (zero-agent)
- Train PPO headless for speed
- Play back a checkpoint to validate behavior
- Iterate on sampling, rewards, curriculum, and constraints
ββββββββββββββββββββββββββββββββββββββββββββ
β Isaac Sim (PhysX) β
β Robot articulation + rigid obstacles β
βββββββββββββββββββββββββ¬βββββββββββββββββββ
β (sim step)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β Isaac Lab DirectRLEnv β
β reset() / dones() / reward() / obs() β
β target + obstacle sampling + validity β
βββββββββββββββββββββββββ¬βββββββββββββββββββ
β (vectorized env API)
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β RSL-RL PPO β
β rollout -> optimize -> checkpoint/logs β
βββββββββββββββββββββββββ¬βββββββββββββββββββ
β (actions)
βΌ
Joint targets / torques
- Isaac Sim 5.1 + Isaac Lab installed
- Python 3.11 (Isaac Sim 5.x requirement)
- Conda environment assumed:
env_isaaclab
- NVIDIA GPU for parallel environments and faster training
- Enough VRAM to scale
num_envssafely
From the repository root:
conda activate env_isaaclabpython -m pip install -e source/RL_Arm_Controllerpython scripts/list_envs.pypython scripts/zero_agent.py --task Isaac-Reach-Fanuc-v0python scripts/rsl_rl/train.py --task Isaac-Reach-Fanuc-v0 --headless# All runs for this experiment
tensorboard serve --logdir logs/rsl_rl/reach_fanuc
# Or a single run
tensorboard serve --logdir logs/rsl_rl/reach_fanuc/<run_dir>Open http://localhost:6006 in a browser.
# Option A: let play.py pick the latest checkpoint in the run folder
python scripts/rsl_rl/play.py --task Isaac-Reach-Fanuc-v0 --num_envs 32 --load_run <run_dir>
# Option B: specify a full path to the checkpoint file
python scripts/rsl_rl/play.py --task Isaac-Reach-Fanuc-v0 --num_envs 32 --checkpoint logs/rsl_rl/reach_fanuc/<run_dir>/model_499.ptRemove --headless to watch rollouts and confirm resets, collisions, and reward shaping behave as intended.
python scripts/rsl_rl/train.py --task Isaac-Reach-Fanuc-v0 --headlessC:\repos\IsaacLab\isaaclab.bat -p scripts\rsl_rl\train.py --task Isaac-Reach-Fanuc-v0 --headlesspython scripts/list_envs.pypython scripts/zero_agent.py --task Isaac-Reach-Fanuc-v0python scripts/random_agent.py --task Isaac-Reach-Fanuc-v0nvidia-smiTCP reference
- Path:
/lrmate200ic5l_with_sg2/link_6/flange/tool0/sg2/tcp - Forward axis:
+Z
High-level objectives:
- Minimize TCP-to-target distance
- Avoid collisions with obstacles
- Encourage smooth, bounded joint motion
Environment IO:
- Actions: 6D joint delta targets (scaled, clamped to joint limits)
- Observations: scaled joint positions, joint velocities, TCP-to-target vector, optional obstacle-relative positions, optional previous actions
- Termination: success, collision (if enabled), timeout, optional stuck detection
| Term | Description | Scale | Notes |
|---|---|---|---|
| distance | -dist(TCP, target) | 0.1 | Always on |
| success | +reward on success | 100.0 | dist < success_tolerance |
| success_time | +time bonus on success | 50.0 | Scales with earlier success |
| progress | +previous_dist - dist | 1.0 | Only after first step |
| action_l2 | -sum(action^2) | -0.01 | Smoothness penalty |
| action_rate | -sum((a_t - a_{t-1})^2) | -0.01 | Penalizes jerks |
| joint_vel_l2 | -sum(joint_vel^2) | -0.005 | Damps motion |
| collision | +penalty on collision | -10.0 | Once per episode by default |
| approach | +align TCP forward to target | 0.0 | Enabled if nonzero |
| proximity | +min_dist - radius | 1.0 | Only when inside proximity_radius |
This task uses the Isaac Lab direct workflow (DirectRLEnv), meaning environment logic is implemented explicitly:
- scene setup
- reset logic
- termination logic
- reward computation
- observation computation
Conceptually, this is closest to how you would write a custom env for real manipulation RL research when you need full control over sampling and edge cases.
- Targets are sampled within bounded workspace limits
- A reachability check is applied before obstacle placement (IK ignoring obstacles)
- Obstacles are spawned as rigid bodies with contact sensing and optional line-of-sight clearance checks
- Rewards include distance-to-target, smoothness penalties, and collision penalties
- Optional forced path obstacle placement for harder avoidance scenarios
Difficulty increases when the rolling success rate reaches thresholds:
- Stage 1: large tolerance, easy targets, 0 obstacles
- Stage 2: wider workspace, tighter tolerance, 1 obstacle
- Stage 3: full workspace, tight tolerance, 2 obstacles
Config location:
source/RL_Arm_Controller/RL_Arm_Controller/tasks/direct/rl_arm_controller/rl_arm_controller_env_cfg.py
Randomized scenes can be invalid (no feasible placement after retries). When sampling fails:
- Target is placed near the current TCP
- Obstacles are moved outside the active workspace
- Degraded resets are excluded from curriculum stats; invalid resets receive zero reward and immediate timeout
Logged metrics (via self.extras) include (emitted every env.stats_log_interval resets):
Stats/success_rate_validStats/success_rate_allStats/curriculum_stageStats/invalid_env_fractionStats/degraded_target_fractionStats/degraded_obstacle_fractionStats/exclude_from_curriculum_fractionStats/repair_attempts_used_avgStats/collision_fraction
These are important because parallel RL can silently waste samples if invalid resets are not tracked.
rl_arm_controller_env_cfg.pyrl_arm_controller_env.pyfanuc_cfg.pyagents/rsl_rl_ppo_cfg.py
Convert the robot URDF to a USD articulation:
cd C:\repos\IsaacLab
isaaclab.bat -p scripts\tools\convert_urdf.py ^
C:\repos\RL_arm_controller\RL_Arm_Controller\assets\Robots\FANUC\urdf\fanuc200ic5l_w_sg2.urdf ^
C:\repos\RL_arm_controller\RL_Arm_Controller\assets\Robots\FANUC\usd\fanuc200ic5l_sg2.usd ^
--fix-base ^
--merge-joints ^
--joint-stiffness 0.0 ^
--joint-damping 0.0 ^
--joint-target-type nonesource/RL_Arm_Controller/
RL_Arm_Controller/
tasks/direct/rl_arm_controller/
rl_arm_controller_env.py
rl_arm_controller_env_cfg.py
fanuc_cfg.py
agents/
rsl_rl_ppo_cfg.py
assets/Robots/FANUC/
urdf/
usd/
scripts/
list_envs.py
zero_agent.py
random_agent.py
rsl_rl/
train.py
play.py
High priority:
- Add a small evaluation script that reports success rate and collision rate over N episodes
- Add an option to export a trained policy (TorchScript or ONNX) for deployment experiments
Medium priority:
- Add richer domain randomization (sensor noise, friction, mass)
- Add a "reachability with obstacles" filter for data efficiency (optional toggle)
Low priority:
- Integrate the learned policy into a ROS 2 control loop for hardware-style command publishing
- Benchmark against a classical planner baseline (MoveIt, OMPL) for comparable scene sets
- Isaac Lab docs (DirectRLEnv vs ManagerBasedRLEnv): https://isaac-sim.github.io/IsaacLab/main/source/api/lab/isaaclab.envs.html
- Isaac Lab task workflows overview: https://isaac-sim.github.io/IsaacLab/main/source/overview/core-concepts/task_workflows.html
- Isaac Lab RSL-RL wrapper utilities: https://isaac-sim.github.io/IsaacLab/main/source/api/lab_rl/isaaclab_rl.html
- Isaac Lab installation notes (Python version compatibility): https://isaac-sim.github.io/IsaacLab/main/source/setup/installation/index.html
- Isaac Sim Python environment installation: https://docs.omniverse.nvidia.com/isaacsim/latest/installation/install_python.html
- NVIDIA Isaac Lab blog example (sim-to-real RL workflow patterns): https://developer.nvidia.com/blog/closing-the-sim-to-real-gap-training-spot-quadruped-locomotion-with-nvidia-isaac-lab/