Changelog
All notable changes to this project will be documented in this file.
The format follows Keep a Changelog.
This project adheres to Semantic Versioning.
0.2.0 – 2026-06-07
Breaking changes
DIM→RANK— the const generic parameter onState<D>,Observation<D>,Action<D>, andEnvironment<D, SD, AD>is renamed toRANK(orR,SR,ARat usage sites) across all crates. Update any downstreamimpl State<D>/impl Environment<D, …>declarations accordingly.fn newremoved fromEnvironmenttrait (ADR 0011) — construction is no longer part of the shared trait contract. Replace call sites with the newConstructableEnvfactory trait or a concretenewmethod.
New crates
rlevo-examples— heavy visualisation, recording, and report examples extracted from therlevoumbrella (ADR 0012). Lightweight environment/algorithm examples stay incrates/rlevo/.rlevo-metrics-registry— wasm32-compatible leaf crate that owns the canonical metric descriptor list (CANONICAL_METRICS,MetricDescriptor, domain grouping). Eliminates the hand-copied duplicate that previously existed betweenrlevo-benchmarksandrlevo-benchmarks-report-client(ADR 0015).rlevo-benchmarks-report-client— Leptos/WASM static-HTML post-run report viewer. Served from an embeddedaxumserver. Shares the metric registry withrlevo-benchmarkswithout pulling inburnorrand.
Dependency upgrades
- burn
0.20.0→0.21.0; migratedndarraybackend to the newflexbackend. - rand
0.9.x→0.10.1, rand_distr →0.6.0.
rlevo-core
Added
ConstructableEnvfactory trait — standalonefn new(render: bool) -> Selfreplacement for the removedEnvironment::new(ADR 0011).StyledFrame,StyledLine,StyledSpan,SpanStyle,Color,Modifier, semanticpalettemodule, andAsciiRenderable/AsciiRendererhoisted fromrlevo-environments::renderintorlevo-core::render(ADR 0009). Import paths insiderlevo-environmentsare preserved via a re-export shim.
Changed
AsciiRenderabledemoted from a required library invariant to an optional debug helper; implementing it is no longer implied byEnvironment(ADR 0013).
rlevo-environments
Changed
- Render types (
StyledFrame,AsciiRenderable, etc.) re-exported fromrlevo-core::render; originals removed (ADR 0009). Environment::newremoved; each environment exposes its ownnewconstructor and may opt intoConstructableEnv(ADR 0011).
rlevo-evolution
Changed
- All EA algorithms and shared ops (
selection,crossover,mutation,replacement) now draw random values throughseed_streamon the host CPU rather than callingB::seed + Tensor::random, eliminating the process-wide RNG mutex contention that caused non-determinism in parallel tests. SharedPopulationObserverunified toparking_lot::Mutex(was split betweenstd::syncandparking_lotlock types, causing type mismatches in recording examples) (ADR 0010).
rlevo-reinforcement-learning
Added
polyak_updatehoisted as a shared utility function available to all RL algorithm crates.
rlevo-benchmarks
Added
- Record schema v6 (
FORMAT_VERSIONbumped5 → 6, ADR 0014):- Expanded
CANONICAL_METRICS(explained variance, per-iteration episode-return statistics, DQN/SAC loss terms) — list now owned byrlevo-metrics-registry. - Typed run-provenance fields on
RunManifest: algorithm name, crate versions, git ref, device, seed count, success threshold. EpisodeKind { Training, Evaluation }field in episode headers.- Episode wall-clock duration as a terminal metric.
checkpoints: Vec<CheckpointRef>seam for deep-RL Burn-Recordermodel files (EA runs unaffected).
- Expanded
- Metrics-only live
ratatuiTUI replaces the earlier three-tier visualisation plan (ADR 0013 supersedes ADR 0008); no environment render panel in the TUI.
Changed
CANONICAL_METRICSconstant moved torlevo-metrics-registry;rlevo-benchmarksre-exports it for back-compat.
rlevo-benchmarks-report-client
Added
- Interactive post-run static HTML report (Leptos + WASM):
- Min/max downsampling for long metric series (ADR 0013 / M8.2).
- Multi-seed mean ± std band aggregation.
- Hover crosshair with exact raw-value tooltip.
- Per-panel SVG export buttons.
- Step / episode / wall-clock x-axis toggle for episode panels.
- Eval/training split via
EpisodeKindin the episode index and table badge. - Landscape heatmap background for EA optimisation landscape runs.
- Diversity-threshold guideline line with breach-pulse highlight.
- Strip-plot overlay toggle on the population box-plot panel.
rlevo (umbrella)
Changed
- Lightweight examples retained; heavy viz/record/report examples migrated to
rlevo-examples(ADR 0012).
Infrastructure
Added
- GitHub Actions CI: integration-test matrix (Linux × stable toolchain) and weekly full-workspace test run.
BACKEND_LOCKper-binary synchronisation for wgpu-backed integration tests; removes the previous--test-threads=1requirement.
0.1.0 – 2026-04-28
Initial alpha release. All crates are published together at the same version.
rlevo-core
Added
State<D>/Observation<D>traits for typed, const-generic environment state and agent perception.Action<D>trait hierarchy (DiscreteAction,ContinuousAction,MultiDiscreteAction,MultiBinaryAction) for compile-time action-space safety.Environment<D, SD, AD>trait withreset/step/rendercontract;Snapshot<D>trait withSnapshotBase<D, O, R>concrete type.Rewardtrait withScalarRewardandVectorRewardimplementations.TensorConvertible<B, D>bridge trait for lifting state/action types onto Burn tensors.AgentandBenchableAgenttraits for uniform agent interaction.FitnessEvaluableandLandscapetraits for benchmarking evolutionary algorithms.BenchEnv,BenchError,BenchStep,Metric,MetricsProvider, andSeedStream(moved fromrlevo-benchmarksper ADR-0004).util::seed— deterministicSeedStreamfor reproducible multi-run experiments.EnvironmentErrorandStateErrorerror types withthiserrorderives.
rlevo-environments
Added
- Classic control —
CartPole,MountainCar,MountainCarContinuous,Pendulum,Acrobot. - Bandits —
KArmedBandit<K>,ContextualBandit,NonStationaryBandit,AdversarialBandit. - Toy text —
Blackjack,CliffWalking,FrozenLake,Taxi. - Gridworlds (MiniGrid-style) —
Empty,DoorKey,Memory,FourRooms,Crossing,LavaGap,MultiRoom,Unlock,UnlockPickup,GoToDoor,DistShift,DynamicObstacles; sharedGridCore(grid, entity, action, direction, observation, render, reward, dynamics). - Box2D physics (
box2dfeature, rapier2d) —BipedalWalker,LunarLander(discrete and continuous action spaces),CarRacing. - Locomotion (
locomotionfeature, rapier3d) —Reacher,Swimmer,InvertedPendulum,InvertedDoublePendulum. - Games —
Chess(full move generation and board state),ConnectFour. - Optimisation landscapes —
Sphere,Ackley,Rastriginfor benchmarking evolutionary algorithms. - Wrappers —
TimeLimitwraps anyEnvironmentwith an episode step cap. - Bench adapter (
benchfeature) —BenchAdapterand presetSuitefactories to drive any environment fromrlevo-benchmarks. - ASCII render backend for text-based environments.
rlevo-evolution
Added
Strategy<B>pure trait (init/ask/tell/best) — stateless, parallelism-friendly, trivially checkpointable.EvolutionaryHarness<B, S, F>— wraps anyStrategyas aBenchEnv.BatchFitnessFntrait withFromFitnessEvaluableadapter.GenomeKindenum (RealValued,Binary,Integer,Program).- Classical families —
GeneticAlgorithm(real-valued, SBX crossover + polynomial mutation),BinaryGeneticAlgorithm(one-point/uniform crossover + bit-flip mutation),EvolutionStrategy((1+1),(1+λ),(μ,λ),(μ+λ)with self-adaptive σ),EvolutionaryProgramming(Gaussian perturbation + tournament),DifferentialEvolution(Rand/1/Bin, Best/1/Bin, CurrentToBest/1/Bin),CartesianGeneticProgramming(symbolic regression via CGP graph). - Metaheuristics —
ParticleSwarmOptimization,AntColonyOptimizationReal,AntColonyOptimizationPermutation,ArtificialBeeColony,FireflyAlgorithm,BatAlgorithm,CuckooSearch(Lévy flights via Mantegna),GreyWolfOptimizer,SalpSwarmAlgorithm,WhaleOptimizationAlgorithm. - Genetic operators (
ops) — selection (tournament, roulette, rank, SUS, elitism, NSGA-II crowding), crossover (uniform, one-point, multi-point, SBX, BLX-α, intermediate), mutation (Gaussian, uniform, polynomial, bit-flip, inversion), replacement (generational, steady-state, elitist, comma, plus). - Custom CubeCL kernels (
custom-kernelsfeature) — fused pairwise-attract (Firefly large-N path) and fused Lévy-flight (Cuckoo/Bat) kernels; pure-tensor fallbacks used when feature is off. PopulationStatetensor wrapper;ShapingFnfitness shaping (linear rank, exponential rank, truncation).
rlevo-reinforcement-learning
Added
- Replay memory —
PrioritizedExperienceReplay(uniform-sampling mode in v0.1.0);TrainingBatchtyped container. - Experience —
ExperienceTuple(s, a, r, s', done),Historytrajectory buffer. - Metrics —
AgentStats(per-step),PerformanceRecord(per-episode). - DQN —
DqnModel,DqnAgent,DqnTrainingConfig; ε-greedy exploration schedule; Double-DQN target option. - C51 —
C51Model,C51Agent,C51TrainingConfig; Bellman projection onto N-atom support; categorical cross-entropy loss. - QR-DQN —
QrDqnModel,QrDqnAgent,QrDqnTrainingConfig; quantile Huber loss; no[v_min, v_max]required. - PPO —
PpoAgent,PpoTrainingConfig,RolloutBuffer, GAE advantages,CategoricalPolicyHead(discrete),TanhGaussianPolicyHead(continuous), clipped surrogate + value loss, early-stop onapprox_kl. - PPG —
PpgAgent,PpgConfig,AuxBuffer,PpgCategoricalPolicyHead; interleaved policy-phase + auxiliary-phase with KL distillation. - DDPG —
DdpgAgent,DdpgTrainingConfig; deterministic actor + Q-critic; Polyak target sync; Gaussian exploration noise. - TD3 —
Td3Agent,Td3TrainingConfig; twin-critic min-bootstrap; delayed actor updates; target-policy smoothing. - SAC —
SacAgent,SacTrainingConfig; squashed-Gaussian stochastic actor; twin critics; learnable temperature α with auto-tuning toward-|A|. - Shared
EpsilonGreedyschedule (DQN / C51 / QR-DQN) andGaussianNoiseexploration (DDPG / TD3).
rlevo-benchmarks
Added
Evaluator— drives anyBenchEnvfor N episodes, collecting per-step and per-episode metrics.Suite— ordered sequence of(env, evaluator)pairs with shared reporter.- Metrics —
EaMetrics(best fitness, population diversity, convergence rate),RlMetrics(episode return, episode length, sample efficiency). - Reporters —
JsonReporter(newline-delimited JSON),LoggingReporter(tracing spans),TuiReporter(ratatui live dashboard,tuifeature). CheckpointandStoragetraits for saving/resuming benchmark state.rayon-parallel episode evaluation for multi-seed sweeps.
rlevo-hybrid
Added
- Stub crate establishing the dependency wiring between
rlevo-evolutionandrlevo-reinforcement-learning. No hybrid strategies are implemented in v0.1.0; see the crate README for the v0.2.0 roadmap.
rlevo (umbrella)
Added
- Re-exports all public APIs from every workspace crate behind a single
rlevoentry point. keywords:reinforcement-learning,evolutionary,deep-learning,burn,neural-network.categories:science,algorithms,simulation.- Full example suite (35 examples across gridworlds, classic control, Box2D, locomotion, evolutionary showcases, RL algorithms, and benchmarks harness).
- Cross-crate integration tests.