FlightRL

FlightRL is a research-oriented drone RL scaffold built around a small C simulator and a thin PufferLib Ocean-style Python wrapper. The goal is fast simulation throughput, modular environment structure, and a clean path toward richer sensor models, manufacturer-specific parameter profiles, and later sim-to-real work on civilian developer platforms.

Renderer Preview

Clean previews exported from the live renderer. The inspection view shows a quadrotor airframe, per-rotor thrust state, target geometry, body orientation, color-coded force vectors, and compact telemetry. The underlying MVP dynamics are still planar, so direct motor_quad control is physically meaningful through front-vs-rear pitch authority, while left-vs-right asymmetry remains future-facing until a fuller 3D model lands:

Reach waypoint	Hover

Open Source

License: MIT
Contributions: CONTRIBUTING.md
Conduct: CODE_OF_CONDUCT.md
Security reporting: SECURITY.md
CI: GitHub Actions under .github/workflows/

Why C + PufferLib Ocean

The native simulator keeps state, stepping, reward logic, reset sampling, and observation assembly in C so Python overhead stays minimal. The Python wrapper only defines spaces, owns shared buffers, exposes config loading, and plugs the environment into pufferlib.PufferEnv and PuffeRL.

The implementation follows the current Ocean pattern:

C writes directly into contiguous NumPy buffers.
Python vectorization happens inside the native env rather than through a pure Python loop.
The binding layer is split into small local headers instead of copying the upstream Ocean bridge as one large file.

Repository Layout

src/flightrl/: config loading, native env wrapper, policy, training helpers, rollout and plotting utilities.
src/flightrl/native/: modular C simulator, reward/task logic, and Ocean-style binding bridge.
configs/tasks/: runnable hover, waypoint, and sequence experiment configs.
configs/hardware/: placeholder hardware-oriented profile examples.
scripts/: train, eval, benchmark, rollout, plotting, comparison, and smoke-test entrypoints.
tests/: lightweight regression and smoke coverage.
docs/architecture.md: module boundaries and extension path.

Build

Editable install:

python -m pip install -e . --no-build-isolation

PufferLib currently advertises an older NumPy constraint than many Python 3.13 environments already use. In a shared interpreter, pip may try to reshuffle NumPy during install; a dedicated virtualenv is the safer setup.

Direct extension rebuild:

python setup.py build_ext --inplace --force

Convenience targets:

make dev
make build
make test

Smoke Test

python scripts/smoke_test.py --config configs/tasks/hover.toml

Train

python scripts/train.py --config configs/tasks/hover.toml
python scripts/train.py --config configs/tasks/reach.toml

The training loop uses a small Gaussian actor-critic and calls PuffeRL directly. Configurable sections live in TOML under:

environment
drone
sensors
task
reward
training
domain_randomization
logging

Evaluate And Roll Out

Random rollout:

python scripts/rollout_random.py --config configs/tasks/hover.toml
python scripts/rollout_random.py --config configs/tasks/hover.toml --render-mode human

Policy evaluation:

python scripts/eval.py --config configs/tasks/reach.toml --checkpoint artifacts/<run>/model_000004.pt
python scripts/eval.py --config configs/tasks/reach.toml --checkpoint artifacts/<run>/model_000004.pt --render-mode human

Trajectory plotting:

python scripts/plot_trajectory.py --input artifacts/trajectories/random_rollout.csv

Reward comparison:

python scripts/compare_rewards.py --left rollout_a.csv --right rollout_b.csv

Environment-only throughput benchmark:

python scripts/benchmark_env.py --config configs/tasks/hover.toml

The environment also exposes Gymnasium-style rendering through DronePlanarEnv(render_mode="human") and DronePlanarEnv(render_mode="rgb_array"). Rendering is lazy and stays out of the fast path unless explicitly enabled.

Supported action modes:

stabilized_planar: two commands, collective thrust and pitch torque.
motor_pair: two direct commands for front-pair and rear-pair thrust.
motor_quad: four direct normalized rotor commands for front-left, front-right, rear-left, and rear-right actuators.

Wind support is also built into the native dynamics through air-relative drag plus correlated gusts. Example config:

[wind]
enabled = true
steady_x = 2.0
steady_z = 0.0
gust_strength = 0.4
gust_tau = 0.3

To export a clean preview frame without a desktop window:

python scripts/export_render_preview.py --config configs/tasks/reach.toml --output docs/images/reach-preview.png

Tasks In The MVP

hover: stabilize near a hover target for a configured hold duration.
reach_waypoint: reach one sampled or fixed waypoint.
follow_waypoints: progress through a sequence of waypoints.

Obstacle avoidance, live native rendering, and richer vision/range sensors are intentionally deferred. If unsupported sensor flags are enabled, the config path errors explicitly instead of falling back to mock data.

Add A New Task

Add a new task enum mapping in src/flightrl/env.py.
Extend native task progression in src/flightrl/native/native_tasks.c.
Adjust reward or termination logic only if the new task needs different completion behavior.
Add a new TOML task config under configs/tasks/.
Add at least one regression test in tests/.

Sim-To-Real Readiness

The scaffold is organized around swappable task, reset, reward, sensor, and action layers rather than a hardcoded one-off drone. The hardware profile placeholder under configs/hardware/manufacturer_placeholder.toml shows where to start for:

manufacturer-specific mass, thrust, drag, and actuator lag
noisier sensor profiles
switching from stabilized commands to direct actuator-style control
future parameter-fitting or replay-driven calibration workflows

For future autonomy work, the intended control hierarchy is:

camera + telemetry + mission context -> VLA navigator -> high-level commands -> stabilizer/controller -> motor mixing

That keeps low-level stabilization fast and local while allowing a slower perception-conditioned model to handle navigation and mission semantics later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlightRL

Renderer Preview

Open Source

Why C + PufferLib Ocean

Repository Layout

Build

Smoke Test

Train

Evaluate And Roll Out

Tasks In The MVP

Add A New Task

Sim-To-Real Readiness

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
configs		configs
docs		docs
scripts		scripts
src/flightrl		src/flightrl
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

FlightRL

Renderer Preview

Open Source

Why C + PufferLib Ocean

Repository Layout

Build

Smoke Test

Train

Evaluate And Roll Out

Tasks In The MVP

Add A New Task

Sim-To-Real Readiness

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages