Skip to content

test(python): add comprehensive RecursiveHash test suite#485

Merged
junrushao merged 2 commits intoapache:mainfrom
junrushao:2026-02-27/recursive-hash
Feb 28, 2026
Merged

test(python): add comprehensive RecursiveHash test suite#485
junrushao merged 2 commits intoapache:mainfrom
junrushao:2026-02-27/recursive-hash

Conversation

@junrushao
Copy link
Member

@junrushao junrushao commented Feb 28, 2026

Summary

  • Expose RecursiveHash to the Python FFI API (_ffi_api.py stub + __all__)
  • Add TestHash and TestCustomHash reflected test fixture classes to tvm_ffi.testing
  • Add comprehensive test_dataclass_hash.py covering the full RecursiveHash contract

Architecture

  • Two new reflected test fixture classes registered via C++ reflection:
    • TestHash (testing.TestHash): exercises Hash(false) field exclusion on hash_ignored
    • TestCustomHash (testing.TestCustomHash): exercises __ffi_hash__ custom hook (hashes only key, ignores label)

Test Coverage

Category What's tested
Primitives int, float, bool, str, bytes, None, DataType, Device
NaN handling All NaN payloads hash equal; canonicalization in nested containers
Signed zero +0.0 and -0.0 hash identically
Containers Array, List, Shape, Map, Dict — equal/different/empty/nested
Reflected objects TestIntPair, inherited fields (3-level), objects with container fields
Field exclusion Hash(false) via TestHash; Compare(false) implies hash-off
Custom hooks __ffi_hash__ via TestCustomHash and TestCustomCompare
Cycle detection Self-referential List/Dict hashing succeeds gracefully
Consistency law RecursiveEq(a, b) ⟹ RecursiveHash(a) == RecursiveHash(b) — primitives, containers, reflected objects, custom hooks
Aliasing invariants Shared vs duplicated references produce identical hashes
Recursion depth 127 and 1000 levels of nesting (iterative heap-based stack)
DAG scaling Shared binary DAG hashing is linear, not exponential (warm-up + averaged)
Guard __ffi_eq__ without __ffi_hash__ raises ValueError

Test Plan

  • uv run pytest -vvs tests/python/test_dataclass_hash.py

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Python FFI by integrating recursive hashing capabilities. This feature allows for consistent and reliable hashing of complex, nested data structures, including those with cyclic references and custom hashing logic. The changes are thoroughly validated through a new, comprehensive test suite that ensures correctness across various data types and scenarios, thereby improving the robustness and predictability of object comparisons and hashing within the system.

Highlights

  • RecursiveHash Exposure: The RecursiveHash function has been exposed to the Python FFI, allowing Python code to utilize this deep hashing functionality.
  • New Test Classes: Two new test classes, TestHash and TestCustomHash, were introduced to facilitate comprehensive testing of hashing behaviors, including ignored fields and custom hashing hooks.
  • Comprehensive Test Suite: A new, extensive test file (test_dataclass_hash.py) was added, covering primitives, containers, reflected objects, HashOff flags, custom hooks, cycle detection, consistency laws (RecursiveEq => RecursiveHash), and aliasing invariants.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/tvm_ffi/_ffi_api.py
    • Added RecursiveHash function to the FFI API definition.
    • Included RecursiveHash in the list of exported FFI functions.
  • python/tvm_ffi/testing/init.py
    • Imported the newly defined TestCustomHash and TestHash classes into the testing module's __init__.py.
  • python/tvm_ffi/testing/testing.py
    • Defined the TestHash class, an Object with a field (hash_ignored) marked to be excluded from hashing.
    • Defined the TestCustomHash class, an Object with a custom __ffi_hash__ hook that hashes only its key field.
  • tests/python/test_dataclass_hash.py
    • Added a new test file dedicated to ffi.RecursiveHash.
    • Implemented tests for primitive types (int, float, bool, string, bytes, None), including edge cases like NaN and signed zero.
    • Included tests for various container types (Array, List, Map, Dict) and their nested forms.
    • Added tests for reflected objects, inherited fields, and objects with container fields.
    • Verified behavior with HashOff and CompareOff flags.
    • Ensured cycle safety and handling of shared references/aliasing invariants.
    • Validated the consistency law: RecursiveEq(a, b) implies RecursiveHash(a) == RecursiveHash(b).
    • Included adversarial quality checks and tests for custom __ffi_hash__ hooks.
    • Added regression tests for the Eq=>Hash invariant and guards for types defining __ffi_eq__ but not __ffi_hash__.
Activity
  • The author has implemented and verified the changes with Python tests in test_dataclass_hash.py.
  • The pull request was generated using Claude Code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Architecture:
- Adds two new reflected test fixture classes (TestHash, TestCustomHash) to
  the testing module, registered via the C++ reflection system.
  TestHash exercises Hash(false) field exclusion; TestCustomHash exercises
  the __ffi_hash__ custom hook (hashes only `key`, ignores `label`).
- Exposes RecursiveHash in the Python FFI API stub (_ffi_api.py) so tests
  can call it directly without private imports.

Public Interfaces:
- `tvm_ffi._ffi_api.RecursiveHash` added to TYPE_CHECKING stub and __all__.
- `tvm_ffi.testing.TestHash` and `tvm_ffi.testing.TestCustomHash` exported
  as public test fixtures.

UI/UX:
- none

Behavioral Changes:
- No runtime behavioral changes; this is a test-only addition.

Docs:
- Test docstrings serve as specification documentation for hash semantics.

Tests:
- Executed: N/A (test-only commit; no build validation in this cherry-pick)
- Result: 1033-line test file covering:
  - Primitives (int, float, bool, str, bytes, None, DataType, Device)
  - NaN canonicalization (all NaN payloads hash equal)
  - Signed-zero normalization (+0.0 == -0.0 for hashing)
  - Containers (Array, List, Shape, Map, Dict) including nesting
  - Reflected objects (TestIntPair, inherited fields, container fields)
  - Hash(false) / Compare(false) field exclusion
  - Custom __ffi_hash__ hook via TestCustomHash
  - Cycle detection (self-referential List/Dict)
  - Consistency law: RecursiveEq(a,b) => RecursiveHash(a)==RecursiveHash(b)
  - Aliasing invariants (shared vs duplicated references)
  - Recursion depth (127 and 1000 levels)
  - Shared DAG scaling (linear, not exponential)
  - Guard: __ffi_eq__ without __ffi_hash__ raises ValueError
  - Parametrized cyclic-structure mismatch tests

Untested Edge Cases:
- Cross-process hash stability (hashes may differ across builds/platforms)
- Thread-safety of RecursiveHash under concurrent mutation
@junrushao junrushao force-pushed the 2026-02-27/recursive-hash branch from 05d52b8 to 0368686 Compare February 28, 2026 00:37
@junrushao junrushao changed the title feat: support recursive hash (Python integration + tests) test(python): add comprehensive RecursiveHash test suite Feb 28, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces RecursiveHash to the Python FFI and adds a comprehensive test suite for it. The changes to expose the new function and test classes are correct. The new test file test_dataclass_hash.py is very thorough, covering a wide range of types, edge cases like cycles and aliasing, and consistency with RecursiveEq. I have a couple of suggestions to improve the maintainability and reliability of the new test suite.

Comment on lines +1 to +17
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Tests for ffi.RecursiveHash."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is a very comprehensive test suite, which is great. However, at over 1000 lines, this file is becoming quite large and difficult to navigate. For better long-term maintainability, consider splitting it into smaller, more focused files based on the type of data being tested. For example, you could have test_hash_primitives.py, test_hash_containers.py, test_hash_objects.py, and test_hash_edge_cases.py. The existing sections in the file provide a good structure for such a split.

Comment on lines +852 to +858
t0 = time.perf_counter()
RecursiveHash(d18)
t18 = time.perf_counter() - t0

t0 = time.perf_counter()
RecursiveHash(d19)
t19 = time.perf_counter() - t0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This performance test can be flaky because the execution time of a single RecursiveHash call can be very short and subject to system noise. To get a more stable and reliable measurement, it's better to run the function in a loop and average the time.

A warm-up call before starting the measurements can also help reduce noise from one-time setup costs (e.g., JIT compilation if applicable, cache warming).

Suggested change
t0 = time.perf_counter()
RecursiveHash(d18)
t18 = time.perf_counter() - t0
t0 = time.perf_counter()
RecursiveHash(d19)
t19 = time.perf_counter() - t0
# Warm-up run to mitigate one-time setup costs
RecursiveHash(_make_shared_binary_dag(10))
repeats = 10
t0 = time.perf_counter()
for _ in range(repeats):
RecursiveHash(d18)
t18 = (time.perf_counter() - t0) / repeats
t0 = time.perf_counter()
for _ in range(repeats):
RecursiveHash(d19)
t19 = (time.perf_counter() - t0) / repeats

@junrushao junrushao mentioned this pull request Feb 28, 2026
10 tasks
- Remove 24 tests that incorrectly assumed RecursiveEq raises ValueError
  on distinct cyclic structures (it handles them gracefully instead).
  These also tested RecursiveEq behavior, not RecursiveHash.
- Add warm-up + averaging (10 repeats) to the DAG scaling perf test to
  reduce flakiness from system noise.
@junrushao junrushao merged commit 5796ff4 into apache:main Feb 28, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants