Skip to content

Conversation

@NripeshN
Copy link
Owner

@NripeshN NripeshN commented Jan 20, 2026

Note

Major CI and engine updates with platform-aware tournaments and Apple GPU NNUE support.

  • CI/workflow: Split into setup-macos/setup-linux with separate matrices (requires_metal vs non-Metal); run matches on macOS (MetalFish) and Linux; add Berserk build/network; include opening book (8moves_v3.pgn); artifact names split; add wrappers rename (parallel_hybrid, mctsmt); stricter concurrency and new defaults (games=16, time=600+0.1).
  • Build/CMake: Reorganize sources (GPU/MCTS unified, remove legacy files), switch Metal shader to nnue.metal, add Accelerate framework, adjust test targets, and general formatting/cleanup.
  • Engine/eval: Add Apple Silicon GPU NNUE toggle and fast path in Eval::evaluate with GPUNNUEManager; update GPU backend APIs (hardware capability queries, shutdown hooks); extend CUDA backend/utilities and kernels; remove deprecated gpu/batch_ops and gpu_accumulator.
  • Docs/housekeeping: Expand README (search modes, MCTS/Hybrid, GPU, tournament); update .gitignore; add _codeql_detected_source_root; tweak benchmark_gpu.cpp and minor headers.

Written by Cursor Bugbot for commit febc2a5. This will update automatically on new commits. Configure here.

- Updated HybridSearchBridge to utilize the Stockfish engine for move verification and search operations, improving accuracy and performance.
- Added support for synchronous search in the Engine class, allowing for real-time move evaluations.
- Enhanced fallback mechanisms to ensure functionality when the engine is not available, maintaining robustness in MCTS operations.
- Updated documentation to reflect the integration of the Stockfish engine and its impact on search capabilities.
cursor[bot]

This comment was marked as outdated.

…-to-move

- Updated score calculations in verify_with_alphabeta to correctly negate scores based on the player's perspective (white vs. black).
- Added comments for clarity on the score negation logic, improving code readability and maintainability.
- Removed redundant updates to ab_nodes in HybridSearchBridge, clarifying the statistics handling.
…e verification

- Modified the initialize method to accept an Engine parameter, enabling the use of Stockfish for more accurate alpha-beta verification.
- Enhanced the verify_with_alphabeta function to utilize the engine for move evaluations, providing a fallback to NNUE when the engine is unavailable.
- Updated the create_enhanced_hybrid_search factory function to pass the engine instance, ensuring seamless integration.
- Improved comments and code structure for better readability and maintainability.
cursor[bot]

This comment was marked as outdated.

- Removed EnhancedHybridSearch and its associated files, streamlining the codebase.
- Added ParallelHybridSearch, which integrates MCTS and Alpha-Beta search for improved performance.
- Updated CMakeLists.txt to include new source files and removed references to the deleted enhanced hybrid search.
- Modified UCI commands to support the new parallel hybrid search functionality.
- Adjusted CI workflow and wrapper scripts to accommodate the changes in search commands and engine configurations.

These updates enhance the search capabilities of MetalFish, optimizing for both strategic and tactical play through parallel processing.
- Introduced new tests for ABSearchResult, ABSearchConfig, and ABSearcher to validate functionality and default configurations.
- Added TacticalAnalyzer and HybridSearchBridge tests to ensure proper behavior and initialization.
- Updated test_mcts.cpp to include these new tests, enhancing coverage for the MCTS module and ensuring robustness in search capabilities.
cursor[bot]

This comment was marked as outdated.

- Removed legacy GPU components including nnue_eval, batch_ops, and persistent_pipeline to streamline the codebase.
- Introduced new Apple Silicon-specific optimizations in MCTS, including unified memory handling and SIMD-accelerated computations.
- Updated CMakeLists.txt to reflect the removal of obsolete files and added new sources for Apple Silicon optimizations.
- Enhanced the MCTS algorithms with Lc0-inspired techniques, improving performance and efficiency in search operations.
- Adjusted CI workflows to accommodate the changes in GPU and MCTS implementations, ensuring compatibility and robustness.

These updates significantly enhance the performance of MetalFish on Apple Silicon, optimizing both memory usage and computational efficiency.
…lHybridSearch

- Added an initialization check in the start_search method to prevent search execution without proper setup, ensuring stability.
- Refined the decision-making logic in make_final_decision to better weigh confidence and score differences, improving move selection accuracy.
- Updated async evaluation submission to safely capture batch count, preventing potential use-after-free issues.

These changes enhance the robustness and effectiveness of the ParallelHybridSearch component, optimizing its performance in various scenarios.
cursor[bot]

This comment was marked as outdated.

- Introduced a custom deleter for posix_memalign-allocated memory to ensure proper deallocation using free() instead of delete[].
- Updated memory handling in the MCTS implementation to utilize unique_ptr with the new AlignedDeleter on Apple platforms, enhancing memory management and compatibility.

These changes improve memory safety and performance on Apple Silicon, aligning with recent optimizations in the codebase.
cursor[bot]

This comment was marked as outdated.

The FPU calculation was using -parent_q instead of parent_q, which inverted
the exploration behavior. This caused unvisited nodes to be over-punished in
winning positions and incorrectly preferred in losing positions.

Fixed by changing the formula from:
  fpu = -parent_q - reduction
to:
  fpu = parent_q - reduction

This fix was applied in:
- src/mcts/hybrid_search.cpp
- src/mcts/thread_safe_mcts.cpp
- src/mcts/lc0_mcts_core.h (ComputeFpu and ComputeFpuSimple)

The correct formula ensures unvisited nodes are slightly pessimistic compared
to visited children, as intended by the FPU reduction strategy.
@cursor
Copy link

cursor bot commented Jan 20, 2026

Bugbot Autofix resolved the bug found in the latest run.

  • ✅ Fixed: FPU formula has inverted sign causing wrong exploration
    • Changed FPU formula from -parent_q - reduction to parent_q - reduction in all four locations (hybrid_search.cpp, thread_safe_mcts.cpp, and lc0_mcts_core.h) to correctly implement pessimistic unvisited node values.

@NripeshN
Copy link
Owner Author

@cursor review

cursor[bot]

This comment was marked as outdated.

…max functions

- Bug 1: Use config_.cpuct_base and config_.cpuct_factor instead of hardcoded local constants in select_child_puct
- Bug 2: Handle temperature==0 as argmax in compute_softmax_simd to prevent division by zero
- Bug 3: Handle temperature==0 as argmax in expand_node policy softmax to prevent division by zero
@cursor
Copy link

cursor bot commented Jan 20, 2026

Bugbot Autofix resolved all 3 of the 3 bugs found in the latest run.

  • ✅ Fixed: Config parameters cpuct_base and cpuct_factor are ignored
    • Replaced hardcoded local constants with config_.cpuct_base and config_.cpuct_factor in select_child_puct function.
  • ✅ Fixed: Division by zero when temperature is zero in softmax
    • Added temperature==0 check to handle argmax case before division in compute_softmax_simd function.
  • ✅ Fixed: Division by zero in expand_node policy softmax
    • Added temperature==0 check to handle argmax case before division in expand_node policy softmax.

- Introduced a global cleanup function to ensure proper shutdown of GPU resources before program exit, preventing potential crashes during static destruction.
- Registered the cleanup function with atexit to guarantee execution at the end of the program.
- Implemented shutdown functions for MCTS components, GPU feature extractor, and GPU NNUE manager to streamline resource management.
- Updated the main function to explicitly handle cleanup and ensure all GPU operations are synchronized before destruction.

These changes improve the stability and reliability of the MetalFish engine, particularly in scenarios involving GPU resources.
…ce CPU backend shutdown handling

- Modified the GitHub Actions workflow to trigger on pull request events, allowing automatic updates of tournament results.
- Removed the manual input for PR number, streamlining the process by directly using the pull request context.
- Added no-op shutdown functions for the CPU backend to improve code clarity and maintainability, ensuring safe shutdown behavior in CPU fallback mode.

These changes enhance the automation of tournament result posting and improve the overall structure of the backend code.
cursor[bot]

This comment was marked as outdated.

…n MCTS

- Add nullptr check in AppleSiliconNodePool::allocate() to handle posix_memalign failures
- Add bounds checking when accessing GPU batch results to prevent out-of-bounds access
- Matches the defensive pattern used in hybrid_search.cpp
@cursor
Copy link

cursor bot commented Jan 21, 2026

Bugbot Autofix resolved 2 of the 2 bugs found in the latest run.

  • ✅ Fixed: Memory allocation failure causes undefined pointer arithmetic
    • Added nullptr check at the start of allocate() to return nullptr when posix_memalign fails in constructor, preventing undefined behavior from pointer arithmetic on null pointer.
  • ✅ Fixed: Missing bounds check on GPU batch results access
    • Added bounds checking before accessing gpu_batch.psqt_scores[i] and gpu_batch.positional_scores[i], matching the defensive pattern used in hybrid_search.cpp to handle partial GPU evaluation failures.

- Introduced PGN parsing capabilities to extract individual game results and details, improving the tournament result reporting.
- Enhanced output formatting for game results, including color-coded results and move counts.
- Updated match summary display to include final scores and detailed game outputs.
- Added new shell scripts for MetalFish engine wrappers to facilitate hybrid and multi-threaded MCTS execution.

These changes significantly improve the usability and clarity of tournament results, while also streamlining engine interactions.
- Integrated the Berserk engine into the Elo tournament workflow, allowing it to be cloned and built alongside other engines.
- Updated the tournament logic to include Berserk as a configurable engine, with an expected Elo rating of 3550.
- Enhanced the output to check for the presence of the Berserk binary and report its status during the tournament execution.

These changes improve the tournament's engine diversity and provide additional options for performance evaluation.
cursor[bot]

This comment was marked as outdated.

- Introduced a new JSON configuration file for engine settings, allowing for dynamic loading of engine parameters such as expected Elo ratings and options.
- Updated the `elo_tournament.py` script to load engine configurations from the new `engines_config.json`, enhancing flexibility and maintainability.
- Added support for multiple engines including MetalFish variants, Patricia, Berserk, and Lc0, with their respective configurations.
- Improved the tournament setup process by integrating the new configuration loading mechanism, ensuring a more streamlined and customizable tournament experience.

These changes significantly enhance the configurability of the Elo tournament, allowing for easier adjustments and additions of new engines in the future.
Repository owner deleted a comment from Copilot AI Jan 24, 2026
@NripeshN
Copy link
Owner Author

@copilot Fix this issue #15 (comment)

Copy link
Contributor

Copilot AI commented Jan 24, 2026

@NripeshN I've opened a new pull request, #22, to work on those changes. Once the pull request is ready, I'll request review from you.

Repository owner deleted a comment from Copilot AI Jan 24, 2026
NripeshN and others added 12 commits January 24, 2026 14:51
…ction

- Implemented GPUTuningParams::select_strategy for CPU fallback in cpu_backend.cpp.
- Updated test_tuning in test_gpu_module.cpp to handle strategy selection based on GPU availability, ensuring correct fallback behavior when GPU is not available.
- Enhanced test coverage for different batch sizes to validate strategy outcomes under varying conditions.
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
[WIP] Enhance MCTS hybrid search features
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
- Added a conditional check in the test_eval_batch function to skip tests when the GPU is not available, ensuring that the test suite behaves correctly in environments without GPU support.
- Introduced a new test case to validate the no-GPU scenario, enhancing test coverage and robustness.
- Deleted the GPUTuningParams::select_strategy function and its associated comments, as it is no longer needed following recent refactoring.
- Cleaned up the code by removing the inclusion of the gpu_nnue_integration.h header file, streamlining the CPU backend implementation.
- Simplified the strategy selection logic by removing the GPU availability check for small batch sizes, which now consistently returns CPU_FALLBACK.
- Updated comments to clarify the behavior of larger batch sizes, which depend on GPU availability.
- Added a no-op test case for GPUPositionData when GPU is not available, improving test coverage for scenarios without GPU support.
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issues.

- Remove accidentally committed _codeql_build_dir/ directory containing CMake build artifacts
- Add _codeql_build_dir/ to .gitignore to prevent future accidental commits
- Add missing Accelerate framework linkage to metalfish_tests target for macOS builds
@cursor
Copy link

cursor bot commented Jan 24, 2026

Bugbot Autofix resolved both of the 2 bugs found in the latest run.

  • ✅ Fixed: CodeQL build directory accidentally committed to repository
    • Removed _codeql_build_dir/ directory from repository and added it to .gitignore to prevent future accidental commits.
  • ✅ Fixed: Tests missing Accelerate framework link on macOS
    • Added ${ACCELERATE_FRAMEWORK} to metalfish_tests target_link_libraries for macOS builds to resolve undefined symbols from apple_silicon_mcts.cpp.

NripeshN and others added 5 commits January 24, 2026 15:32
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…rganization

Remove Stockfish/Lc0 references and rebrand to MetalFish identity
Co-authored-by: NripeshN <86844847+NripeshN@users.noreply.github.com>
Phase 1: Remove unused tuning infrastructure
@NripeshN NripeshN merged commit 8b82703 into main Jan 24, 2026
11 of 242 checks passed
@NripeshN NripeshN deleted the hybrid-and-mcts-fix branch January 24, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants