FlightRL is a research-oriented drone RL scaffold built around a small C simulator and a thin PufferLib Ocean-style Python wrapper. The goal is fast simulation throughput, modular environment structure, and a clean path toward richer sensor models, manufacturer-specific parameter profiles, and later sim-to-real work on civilian developer platforms.
Clean previews exported from the live renderer. The inspection view shows a quadrotor airframe, per-rotor thrust state, target geometry, body orientation, color-coded force vectors, and compact telemetry. The underlying MVP dynamics are still planar, so direct motor_quad control is physically meaningful through front-vs-rear pitch authority, while left-vs-right asymmetry remains future-facing until a fuller 3D model lands:
| Reach waypoint | Hover |
|---|---|
![]() |
![]() |
- License: MIT
- Contributions: CONTRIBUTING.md
- Conduct: CODE_OF_CONDUCT.md
- Security reporting: SECURITY.md
- CI: GitHub Actions under
.github/workflows/
The native simulator keeps state, stepping, reward logic, reset sampling, and observation assembly in C so Python overhead stays minimal. The Python wrapper only defines spaces, owns shared buffers, exposes config loading, and plugs the environment into pufferlib.PufferEnv and PuffeRL.
The implementation follows the current Ocean pattern:
- C writes directly into contiguous NumPy buffers.
- Python vectorization happens inside the native env rather than through a pure Python loop.
- The binding layer is split into small local headers instead of copying the upstream Ocean bridge as one large file.
src/flightrl/: config loading, native env wrapper, policy, training helpers, rollout and plotting utilities.src/flightrl/native/: modular C simulator, reward/task logic, and Ocean-style binding bridge.configs/tasks/: runnable hover, waypoint, and sequence experiment configs.configs/hardware/: placeholder hardware-oriented profile examples.scripts/: train, eval, benchmark, rollout, plotting, comparison, and smoke-test entrypoints.tests/: lightweight regression and smoke coverage.docs/architecture.md: module boundaries and extension path.
Editable install:
python -m pip install -e . --no-build-isolationPufferLib currently advertises an older NumPy constraint than many Python 3.13 environments already use. In a shared interpreter, pip may try to reshuffle NumPy during install; a dedicated virtualenv is the safer setup.
Direct extension rebuild:
python setup.py build_ext --inplace --forceConvenience targets:
make dev
make build
make testpython scripts/smoke_test.py --config configs/tasks/hover.tomlpython scripts/train.py --config configs/tasks/hover.toml
python scripts/train.py --config configs/tasks/reach.tomlThe training loop uses a small Gaussian actor-critic and calls PuffeRL directly. Configurable sections live in TOML under:
environmentdronesensorstaskrewardtrainingdomain_randomizationlogging
Random rollout:
python scripts/rollout_random.py --config configs/tasks/hover.toml
python scripts/rollout_random.py --config configs/tasks/hover.toml --render-mode humanPolicy evaluation:
python scripts/eval.py --config configs/tasks/reach.toml --checkpoint artifacts/<run>/model_000004.pt
python scripts/eval.py --config configs/tasks/reach.toml --checkpoint artifacts/<run>/model_000004.pt --render-mode humanTrajectory plotting:
python scripts/plot_trajectory.py --input artifacts/trajectories/random_rollout.csvReward comparison:
python scripts/compare_rewards.py --left rollout_a.csv --right rollout_b.csvEnvironment-only throughput benchmark:
python scripts/benchmark_env.py --config configs/tasks/hover.tomlThe environment also exposes Gymnasium-style rendering through DronePlanarEnv(render_mode="human") and DronePlanarEnv(render_mode="rgb_array"). Rendering is lazy and stays out of the fast path unless explicitly enabled.
Supported action modes:
stabilized_planar: two commands, collective thrust and pitch torque.motor_pair: two direct commands for front-pair and rear-pair thrust.motor_quad: four direct normalized rotor commands for front-left, front-right, rear-left, and rear-right actuators.
Wind support is also built into the native dynamics through air-relative drag plus correlated gusts. Example config:
[wind]
enabled = true
steady_x = 2.0
steady_z = 0.0
gust_strength = 0.4
gust_tau = 0.3To export a clean preview frame without a desktop window:
python scripts/export_render_preview.py --config configs/tasks/reach.toml --output docs/images/reach-preview.pnghover: stabilize near a hover target for a configured hold duration.reach_waypoint: reach one sampled or fixed waypoint.follow_waypoints: progress through a sequence of waypoints.
Obstacle avoidance, live native rendering, and richer vision/range sensors are intentionally deferred. If unsupported sensor flags are enabled, the config path errors explicitly instead of falling back to mock data.
- Add a new task enum mapping in
src/flightrl/env.py. - Extend native task progression in
src/flightrl/native/native_tasks.c. - Adjust reward or termination logic only if the new task needs different completion behavior.
- Add a new TOML task config under
configs/tasks/. - Add at least one regression test in
tests/.
The scaffold is organized around swappable task, reset, reward, sensor, and action layers rather than a hardcoded one-off drone. The hardware profile placeholder under configs/hardware/manufacturer_placeholder.toml shows where to start for:
- manufacturer-specific mass, thrust, drag, and actuator lag
- noisier sensor profiles
- switching from stabilized commands to direct actuator-style control
- future parameter-fitting or replay-driven calibration workflows
For future autonomy work, the intended control hierarchy is:
camera + telemetry + mission context -> VLA navigator -> high-level commands -> stabilizer/controller -> motor mixing
That keeps low-level stabilization fast and local while allowing a slower perception-conditioned model to handle navigation and mission semantics later.

