v0.1.0 — Initial Alpha Release
Changelog
All notable changes to this project will be documented in this file.
The format follows Keep a Changelog.
This project adheres to Semantic Versioning.
0.1.0 – 2026-04-28
Initial alpha release. All crates are published together at the same version.
rlevo-core
Added
State<D>/Observation<D>traits for typed, const-generic environment state and agent perception.Action<D>trait hierarchy (DiscreteAction,ContinuousAction,MultiDiscreteAction,MultiBinaryAction) for compile-time action-space safety.Environment<D, SD, AD>trait withreset/step/rendercontract;Snapshot<D>trait withSnapshotBase<D, O, R>concrete type.Rewardtrait withScalarRewardandVectorRewardimplementations.TensorConvertible<B, D>bridge trait for lifting state/action types onto Burn tensors.AgentandBenchableAgenttraits for uniform agent interaction.FitnessEvaluableandLandscapetraits for benchmarking evolutionary algorithms.BenchEnv,BenchError,BenchStep,Metric,MetricsProvider, andSeedStream(moved fromrlevo-benchmarksper ADR-0004).util::seed— deterministicSeedStreamfor reproducible multi-run experiments.EnvironmentErrorandStateErrorerror types withthiserrorderives.
rlevo-environments
Added
- Classic control —
CartPole,MountainCar,MountainCarContinuous,Pendulum,Acrobot. - Bandits —
KArmedBandit<K>,ContextualBandit,NonStationaryBandit,AdversarialBandit. - Toy text —
Blackjack,CliffWalking,FrozenLake,Taxi. - Gridworlds (MiniGrid-style) —
Empty,DoorKey,Memory,FourRooms,Crossing,LavaGap,MultiRoom,Unlock,UnlockPickup,GoToDoor,DistShift,DynamicObstacles; sharedGridCore(grid, entity, action, direction, observation, render, reward, dynamics). - Box2D physics (
box2dfeature, rapier2d) —BipedalWalker,LunarLander(discrete and continuous action spaces),CarRacing. - Locomotion (
locomotionfeature, rapier3d) —Reacher,Swimmer,InvertedPendulum,InvertedDoublePendulum. - Games —
Chess(full move generation and board state),ConnectFour. - Optimisation landscapes —
Sphere,Ackley,Rastriginfor benchmarking evolutionary algorithms. - Wrappers —
TimeLimitwraps anyEnvironmentwith an episode step cap. - Bench adapter (
benchfeature) —BenchAdapterand presetSuitefactories to drive any environment fromrlevo-benchmarks. - ASCII render backend for text-based environments.
rlevo-evolution
Added
Strategy<B>pure trait (init/ask/tell/best) — stateless, parallelism-friendly, trivially checkpointable.EvolutionaryHarness<B, S, F>— wraps anyStrategyas aBenchEnv.BatchFitnessFntrait withFromFitnessEvaluableadapter.GenomeKindenum (RealValued,Binary,Integer,Program).- Classical families —
GeneticAlgorithm(real-valued, SBX crossover + polynomial mutation),BinaryGeneticAlgorithm(one-point/uniform crossover + bit-flip mutation),EvolutionStrategy((1+1),(1+λ),(μ,λ),(μ+λ)with self-adaptive σ),EvolutionaryProgramming(Gaussian perturbation + tournament),DifferentialEvolution(Rand/1/Bin, Best/1/Bin, CurrentToBest/1/Bin),CartesianGeneticProgramming(symbolic regression via CGP graph). - Metaheuristics —
ParticleSwarmOptimization,AntColonyOptimizationReal,AntColonyOptimizationPermutation,ArtificialBeeColony,FireflyAlgorithm,BatAlgorithm,CuckooSearch(Lévy flights via Mantegna),GreyWolfOptimizer,SalpSwarmAlgorithm,WhaleOptimizationAlgorithm. - Genetic operators (
ops) — selection (tournament, roulette, rank, SUS, elitism, NSGA-II crowding), crossover (uniform, one-point, multi-point, SBX, BLX-α, intermediate), mutation (Gaussian, uniform, polynomial, bit-flip, inversion), replacement (generational, steady-state, elitist, comma, plus). - Custom CubeCL kernels (
custom-kernelsfeature) — fused pairwise-attract (Firefly large-N path) and fused Lévy-flight (Cuckoo/Bat) kernels; pure-tensor fallbacks used when feature is off. PopulationStatetensor wrapper;ShapingFnfitness shaping (linear rank, exponential rank, truncation).
rlevo-reinforcement-learning
Added
- Replay memory —
PrioritizedExperienceReplay(uniform-sampling mode in v0.1.0);TrainingBatchtyped container. - Experience —
ExperienceTuple(s, a, r, s', done),Historytrajectory buffer. - Metrics —
AgentStats(per-step),PerformanceRecord(per-episode). - DQN —
DqnModel,DqnAgent,DqnTrainingConfig; ε-greedy exploration schedule; Double-DQN target option. - C51 —
C51Model,C51Agent,C51TrainingConfig; Bellman projection onto N-atom support; categorical cross-entropy loss. - QR-DQN —
QrDqnModel,QrDqnAgent,QrDqnTrainingConfig; quantile Huber loss; no[v_min, v_max]required. - PPO —
PpoAgent,PpoTrainingConfig,RolloutBuffer, GAE advantages,CategoricalPolicyHead(discrete),TanhGaussianPolicyHead(continuous), clipped surrogate + value loss, early-stop onapprox_kl. - PPG —
PpgAgent,PpgConfig,AuxBuffer,PpgCategoricalPolicyHead; interleaved policy-phase + auxiliary-phase with KL distillation. - DDPG —
DdpgAgent,DdpgTrainingConfig; deterministic actor + Q-critic; Polyak target sync; Gaussian exploration noise. - TD3 —
Td3Agent,Td3TrainingConfig; twin-critic min-bootstrap; delayed actor updates; target-policy smoothing. - SAC —
SacAgent,SacTrainingConfig; squashed-Gaussian stochastic actor; twin critics; learnable temperature α with auto-tuning toward-|A|. - Shared
EpsilonGreedyschedule (DQN / C51 / QR-DQN) andGaussianNoiseexploration (DDPG / TD3).
rlevo-benchmarks
Added
Evaluator— drives anyBenchEnvfor N episodes, collecting per-step and per-episode metrics.Suite— ordered sequence of(env, evaluator)pairs with shared reporter.- Metrics —
EaMetrics(best fitness, population diversity, convergence rate),RlMetrics(episode return, episode length, sample efficiency). - Reporters —
JsonReporter(newline-delimited JSON),LoggingReporter(tracing spans),TuiReporter(ratatui live dashboard,tuifeature). CheckpointandStoragetraits for saving/resuming benchmark state.rayon-parallel episode evaluation for multi-seed sweeps.
rlevo-hybrid
Added
- Stub crate establishing the dependency wiring between
rlevo-evolutionandrlevo-reinforcement-learning. No hybrid strategies are implemented in v0.1.0; see the crate README for the v0.2.0 roadmap.
rlevo (umbrella)
Added
- Re-exports all public APIs from every workspace crate behind a single
rlevoentry point. keywords:reinforcement-learning,evolutionary,deep-learning,burn,neural-network.categories:science,algorithms,simulation.- Full example suite (35 examples across gridworlds, classic control, Box2D, locomotion, evolutionary showcases, RL algorithms, and benchmarks harness).
- Cross-crate integration tests.