Skip to content

v0.2.0 — Metrics registry, schema v6, ConstructableEnv

Latest

Choose a tag to compare

@anthonytorlucci anthonytorlucci released this 07 Jun 14:23
· 129 commits to main since this release

Changelog

All notable changes to this project will be documented in this file.

The format follows Keep a Changelog.
This project adheres to Semantic Versioning.


0.2.0 – 2026-06-07

Breaking changes

  • DIMRANK — the const generic parameter on State<D>, Observation<D>, Action<D>, and Environment<D, SD, AD> is renamed to RANK (or R, SR, AR at usage sites) across all crates. Update any downstream impl State<D> / impl Environment<D, …> declarations accordingly.
  • fn new removed from Environment trait (ADR 0011) — construction is no longer part of the shared trait contract. Replace call sites with the new ConstructableEnv factory trait or a concrete new method.

New crates

  • rlevo-examples — heavy visualisation, recording, and report examples extracted from the rlevo umbrella (ADR 0012). Lightweight environment/algorithm examples stay in crates/rlevo/.
  • rlevo-metrics-registry — wasm32-compatible leaf crate that owns the canonical metric descriptor list (CANONICAL_METRICS, MetricDescriptor, domain grouping). Eliminates the hand-copied duplicate that previously existed between rlevo-benchmarks and rlevo-benchmarks-report-client (ADR 0015).
  • rlevo-benchmarks-report-client — Leptos/WASM static-HTML post-run report viewer. Served from an embedded axum server. Shares the metric registry with rlevo-benchmarks without pulling in burn or rand.

Dependency upgrades

  • burn 0.20.00.21.0; migrated ndarray backend to the new flex backend.
  • rand 0.9.x0.10.1, rand_distr0.6.0.

rlevo-core

Added

  • ConstructableEnv factory trait — standalone fn new(render: bool) -> Self replacement for the removed Environment::new (ADR 0011).
  • StyledFrame, StyledLine, StyledSpan, SpanStyle, Color, Modifier, semantic palette module, and AsciiRenderable/AsciiRenderer hoisted from rlevo-environments::render into rlevo-core::render (ADR 0009). Import paths inside rlevo-environments are preserved via a re-export shim.

Changed

  • AsciiRenderable demoted from a required library invariant to an optional debug helper; implementing it is no longer implied by Environment (ADR 0013).

rlevo-environments

Changed

  • Render types (StyledFrame, AsciiRenderable, etc.) re-exported from rlevo-core::render; originals removed (ADR 0009).
  • Environment::new removed; each environment exposes its own new constructor and may opt into ConstructableEnv (ADR 0011).

rlevo-evolution

Changed

  • All EA algorithms and shared ops (selection, crossover, mutation, replacement) now draw random values through seed_stream on the host CPU rather than calling B::seed + Tensor::random, eliminating the process-wide RNG mutex contention that caused non-determinism in parallel tests.
  • SharedPopulationObserver unified to parking_lot::Mutex (was split between std::sync and parking_lot lock types, causing type mismatches in recording examples) (ADR 0010).

rlevo-reinforcement-learning

Added

  • polyak_update hoisted as a shared utility function available to all RL algorithm crates.

rlevo-benchmarks

Added

  • Record schema v6 (FORMAT_VERSION bumped 5 → 6, ADR 0014):
    • Expanded CANONICAL_METRICS (explained variance, per-iteration episode-return statistics, DQN/SAC loss terms) — list now owned by rlevo-metrics-registry.
    • Typed run-provenance fields on RunManifest: algorithm name, crate versions, git ref, device, seed count, success threshold.
    • EpisodeKind { Training, Evaluation } field in episode headers.
    • Episode wall-clock duration as a terminal metric.
    • checkpoints: Vec<CheckpointRef> seam for deep-RL Burn-Recorder model files (EA runs unaffected).
  • Metrics-only live ratatui TUI replaces the earlier three-tier visualisation plan (ADR 0013 supersedes ADR 0008); no environment render panel in the TUI.

Changed

  • CANONICAL_METRICS constant moved to rlevo-metrics-registry; rlevo-benchmarks re-exports it for back-compat.

rlevo-benchmarks-report-client

Added

  • Interactive post-run static HTML report (Leptos + WASM):
    • Min/max downsampling for long metric series (ADR 0013 / M8.2).
    • Multi-seed mean ± std band aggregation.
    • Hover crosshair with exact raw-value tooltip.
    • Per-panel SVG export buttons.
    • Step / episode / wall-clock x-axis toggle for episode panels.
    • Eval/training split via EpisodeKind in the episode index and table badge.
    • Landscape heatmap background for EA optimisation landscape runs.
    • Diversity-threshold guideline line with breach-pulse highlight.
    • Strip-plot overlay toggle on the population box-plot panel.

rlevo (umbrella)

Changed

  • Lightweight examples retained; heavy viz/record/report examples migrated to rlevo-examples (ADR 0012).

Infrastructure

Added

  • GitHub Actions CI: integration-test matrix (Linux × stable toolchain) and weekly full-workspace test run.
  • BACKEND_LOCK per-binary synchronisation for wgpu-backed integration tests; removes the previous --test-threads=1 requirement.

0.1.0 – 2026-04-28

Initial alpha release. All crates are published together at the same version.

rlevo-core

Added

  • State<D> / Observation<D> traits for typed, const-generic environment state and agent perception.
  • Action<D> trait hierarchy (DiscreteAction, ContinuousAction, MultiDiscreteAction, MultiBinaryAction) for compile-time action-space safety.
  • Environment<D, SD, AD> trait with reset / step / render contract; Snapshot<D> trait with SnapshotBase<D, O, R> concrete type.
  • Reward trait with ScalarReward and VectorReward implementations.
  • TensorConvertible<B, D> bridge trait for lifting state/action types onto Burn tensors.
  • Agent and BenchableAgent traits for uniform agent interaction.
  • FitnessEvaluable and Landscape traits for benchmarking evolutionary algorithms.
  • BenchEnv, BenchError, BenchStep, Metric, MetricsProvider, and SeedStream (moved from rlevo-benchmarks per ADR-0004).
  • util::seed — deterministic SeedStream for reproducible multi-run experiments.
  • EnvironmentError and StateError error types with thiserror derives.

rlevo-environments

Added

  • Classic controlCartPole, MountainCar, MountainCarContinuous, Pendulum, Acrobot.
  • BanditsKArmedBandit<K>, ContextualBandit, NonStationaryBandit, AdversarialBandit.
  • Toy textBlackjack, CliffWalking, FrozenLake, Taxi.
  • Gridworlds (MiniGrid-style) — Empty, DoorKey, Memory, FourRooms, Crossing, LavaGap, MultiRoom, Unlock, UnlockPickup, GoToDoor, DistShift, DynamicObstacles; shared GridCore (grid, entity, action, direction, observation, render, reward, dynamics).
  • Box2D physics (box2d feature, rapier2d) — BipedalWalker, LunarLander (discrete and continuous action spaces), CarRacing.
  • Locomotion (locomotion feature, rapier3d) — Reacher, Swimmer, InvertedPendulum, InvertedDoublePendulum.
  • GamesChess (full move generation and board state), ConnectFour.
  • Optimisation landscapesSphere, Ackley, Rastrigin for benchmarking evolutionary algorithms.
  • WrappersTimeLimit wraps any Environment with an episode step cap.
  • Bench adapter (bench feature) — BenchAdapter and preset Suite factories to drive any environment from rlevo-benchmarks.
  • ASCII render backend for text-based environments.

rlevo-evolution

Added

  • Strategy<B> pure trait (init / ask / tell / best) — stateless, parallelism-friendly, trivially checkpointable.
  • EvolutionaryHarness<B, S, F> — wraps any Strategy as a BenchEnv.
  • BatchFitnessFn trait with FromFitnessEvaluable adapter.
  • GenomeKind enum (RealValued, Binary, Integer, Program).
  • Classical familiesGeneticAlgorithm (real-valued, SBX crossover + polynomial mutation), BinaryGeneticAlgorithm (one-point/uniform crossover + bit-flip mutation), EvolutionStrategy ((1+1), (1+λ), (μ,λ), (μ+λ) with self-adaptive σ), EvolutionaryProgramming (Gaussian perturbation + tournament), DifferentialEvolution (Rand/1/Bin, Best/1/Bin, CurrentToBest/1/Bin), CartesianGeneticProgramming (symbolic regression via CGP graph).
  • MetaheuristicsParticleSwarmOptimization, AntColonyOptimizationReal, AntColonyOptimizationPermutation, ArtificialBeeColony, FireflyAlgorithm, BatAlgorithm, CuckooSearch (Lévy flights via Mantegna), GreyWolfOptimizer, SalpSwarmAlgorithm, WhaleOptimizationAlgorithm.
  • Genetic operators (ops) — selection (tournament, roulette, rank, SUS, elitism, NSGA-II crowding), crossover (uniform, one-point, multi-point, SBX, BLX-α, intermediate), mutation (Gaussian, uniform, polynomial, bit-flip, inversion), replacement (generational, steady-state, elitist, comma, plus).
  • Custom CubeCL kernels (custom-kernels feature) — fused pairwise-attract (Firefly large-N path) and fused Lévy-flight (Cuckoo/Bat) kernels; pure-tensor fallbacks used when feature is off.
  • PopulationState tensor wrapper; ShapingFn fitness shaping (linear rank, exponential rank, truncation).

rlevo-reinforcement-learning

Added

  • Replay memoryPrioritizedExperienceReplay (uniform-sampling mode in v0.1.0); TrainingBatch typed container.
  • ExperienceExperienceTuple (s, a, r, s', done), History trajectory buffer.
  • MetricsAgentStats (per-step), PerformanceRecord (per-episode).
  • DQNDqnModel, DqnAgent, DqnTrainingConfig; ε-greedy exploration schedule; Double-DQN target option.
  • C51C51Model, C51Agent, C51TrainingConfig; Bellman projection onto N-atom support; categorical cross-entropy loss.
  • QR-DQNQrDqnModel, QrDqnAgent, QrDqnTrainingConfig; quantile Huber loss; no [v_min, v_max] required.
  • PPOPpoAgent, PpoTrainingConfig, RolloutBuffer, GAE advantages, CategoricalPolicyHead (discrete), TanhGaussianPolicyHead (continuous), clipped surrogate + value loss, early-stop on approx_kl.
  • PPGPpgAgent, PpgConfig, AuxBuffer, PpgCategoricalPolicyHead; interleaved policy-phase + auxiliary-phase with KL distillation.
  • DDPGDdpgAgent, DdpgTrainingConfig; deterministic actor + Q-critic; Polyak target sync; Gaussian exploration noise.
  • TD3Td3Agent, Td3TrainingConfig; twin-critic min-bootstrap; delayed actor updates; target-policy smoothing.
  • SACSacAgent, SacTrainingConfig; squashed-Gaussian stochastic actor; twin critics; learnable temperature α with auto-tuning toward -|A|.
  • Shared EpsilonGreedy schedule (DQN / C51 / QR-DQN) and GaussianNoise exploration (DDPG / TD3).

rlevo-benchmarks

Added

  • Evaluator — drives any BenchEnv for N episodes, collecting per-step and per-episode metrics.
  • Suite — ordered sequence of (env, evaluator) pairs with shared reporter.
  • MetricsEaMetrics (best fitness, population diversity, convergence rate), RlMetrics (episode return, episode length, sample efficiency).
  • ReportersJsonReporter (newline-delimited JSON), LoggingReporter (tracing spans), TuiReporter (ratatui live dashboard, tui feature).
  • Checkpoint and Storage traits for saving/resuming benchmark state.
  • rayon-parallel episode evaluation for multi-seed sweeps.

rlevo-hybrid

Added

  • Stub crate establishing the dependency wiring between rlevo-evolution and rlevo-reinforcement-learning. No hybrid strategies are implemented in v0.1.0; see the crate README for the v0.2.0 roadmap.

rlevo (umbrella)

Added

  • Re-exports all public APIs from every workspace crate behind a single rlevo entry point.
  • keywords: reinforcement-learning, evolutionary, deep-learning, burn, neural-network.
  • categories: science, algorithms, simulation.
  • Full example suite (35 examples across gridworlds, classic control, Box2D, locomotion, evolutionary showcases, RL algorithms, and benchmarks harness).
  • Cross-crate integration tests.