Skip to content

v0.26.0 - The Monte Carlo Release

Choose a tag to compare

@NadimGhaznavi NadimGhaznavi released this 05 Apr 15:51
· 37 commits to main since this release

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.


[0.26.0] - 2026-04-05 @ 11:19 - Monte Carlo Release

Added

  • Monte Carlo Tree Search
    • This policy implements a limited Monte Carlo Tree Search (MCTS). The goal is not to replace the neural network, but to selectively enrich the training data in complex situations.
    • By design, MCTS operates sparingly and in bursts.
    • A burst is a short sequence of consecutive steps during which decision-making is temporarily delegated from the neural network to the MCTS. Outside of these bursts, the system behaves normally.
    • The following settings are exposed in the TUI:
      • Gating P-Value - This is the probability that the MCTS policy is activated.
      • Search Depth - The maxiumum number of steps, into the future, that simulation looks.
      • Iterations - The number of MCTS simulations performed per decision.
      • Each iteration expands and evaluates part of the search tree.
      • Higher values → more accurate action selection.
      • Exploration P-Value - The exploration constant used in the UCB (Upper Confidence Bound) formula
      • Controls the balance between:
        • exploiting known good actions (lower values)
        • exploring less visited actions (higher values)
      • Steps - The length of an MCTS burst.
      • Once triggered, MCTS remains active for this many consecutive steps.
      • Enables MCTS to guide short action sequences, not just individual moves.
      • Score Threshold - A game must have achieved or surpassed this value for the MCTS to trigger.
  • Updated PyPI and RTD documentation with updated screenshots.

Fixed

  • Table of contents for the ATH Memory in the PyPI documentation.
  • Formatting of the settings fields.