test(python): add comprehensive RecursiveHash test suite#485
test(python): add comprehensive RecursiveHash test suite#485junrushao merged 2 commits intoapache:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Python FFI by integrating recursive hashing capabilities. This feature allows for consistent and reliable hashing of complex, nested data structures, including those with cyclic references and custom hashing logic. The changes are thoroughly validated through a new, comprehensive test suite that ensures correctness across various data types and scenarios, thereby improving the robustness and predictability of object comparisons and hashing within the system. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Architecture: - Adds two new reflected test fixture classes (TestHash, TestCustomHash) to the testing module, registered via the C++ reflection system. TestHash exercises Hash(false) field exclusion; TestCustomHash exercises the __ffi_hash__ custom hook (hashes only `key`, ignores `label`). - Exposes RecursiveHash in the Python FFI API stub (_ffi_api.py) so tests can call it directly without private imports. Public Interfaces: - `tvm_ffi._ffi_api.RecursiveHash` added to TYPE_CHECKING stub and __all__. - `tvm_ffi.testing.TestHash` and `tvm_ffi.testing.TestCustomHash` exported as public test fixtures. UI/UX: - none Behavioral Changes: - No runtime behavioral changes; this is a test-only addition. Docs: - Test docstrings serve as specification documentation for hash semantics. Tests: - Executed: N/A (test-only commit; no build validation in this cherry-pick) - Result: 1033-line test file covering: - Primitives (int, float, bool, str, bytes, None, DataType, Device) - NaN canonicalization (all NaN payloads hash equal) - Signed-zero normalization (+0.0 == -0.0 for hashing) - Containers (Array, List, Shape, Map, Dict) including nesting - Reflected objects (TestIntPair, inherited fields, container fields) - Hash(false) / Compare(false) field exclusion - Custom __ffi_hash__ hook via TestCustomHash - Cycle detection (self-referential List/Dict) - Consistency law: RecursiveEq(a,b) => RecursiveHash(a)==RecursiveHash(b) - Aliasing invariants (shared vs duplicated references) - Recursion depth (127 and 1000 levels) - Shared DAG scaling (linear, not exponential) - Guard: __ffi_eq__ without __ffi_hash__ raises ValueError - Parametrized cyclic-structure mismatch tests Untested Edge Cases: - Cross-process hash stability (hashes may differ across builds/platforms) - Thread-safety of RecursiveHash under concurrent mutation
05d52b8 to
0368686
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces RecursiveHash to the Python FFI and adds a comprehensive test suite for it. The changes to expose the new function and test classes are correct. The new test file test_dataclass_hash.py is very thorough, covering a wide range of types, edge cases like cycles and aliasing, and consistency with RecursiveEq. I have a couple of suggestions to improve the maintainability and reliability of the new test suite.
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
| """Tests for ffi.RecursiveHash.""" |
There was a problem hiding this comment.
This is a very comprehensive test suite, which is great. However, at over 1000 lines, this file is becoming quite large and difficult to navigate. For better long-term maintainability, consider splitting it into smaller, more focused files based on the type of data being tested. For example, you could have test_hash_primitives.py, test_hash_containers.py, test_hash_objects.py, and test_hash_edge_cases.py. The existing sections in the file provide a good structure for such a split.
tests/python/test_dataclass_hash.py
Outdated
| t0 = time.perf_counter() | ||
| RecursiveHash(d18) | ||
| t18 = time.perf_counter() - t0 | ||
|
|
||
| t0 = time.perf_counter() | ||
| RecursiveHash(d19) | ||
| t19 = time.perf_counter() - t0 |
There was a problem hiding this comment.
This performance test can be flaky because the execution time of a single RecursiveHash call can be very short and subject to system noise. To get a more stable and reliable measurement, it's better to run the function in a loop and average the time.
A warm-up call before starting the measurements can also help reduce noise from one-time setup costs (e.g., JIT compilation if applicable, cache warming).
| t0 = time.perf_counter() | |
| RecursiveHash(d18) | |
| t18 = time.perf_counter() - t0 | |
| t0 = time.perf_counter() | |
| RecursiveHash(d19) | |
| t19 = time.perf_counter() - t0 | |
| # Warm-up run to mitigate one-time setup costs | |
| RecursiveHash(_make_shared_binary_dag(10)) | |
| repeats = 10 | |
| t0 = time.perf_counter() | |
| for _ in range(repeats): | |
| RecursiveHash(d18) | |
| t18 = (time.perf_counter() - t0) / repeats | |
| t0 = time.perf_counter() | |
| for _ in range(repeats): | |
| RecursiveHash(d19) | |
| t19 = (time.perf_counter() - t0) / repeats |
- Remove 24 tests that incorrectly assumed RecursiveEq raises ValueError on distinct cyclic structures (it handles them gracefully instead). These also tested RecursiveEq behavior, not RecursiveHash. - Add warm-up + averaging (10 repeats) to the DAG scaling perf test to reduce flakiness from system noise.
Summary
RecursiveHashto the Python FFI API (_ffi_api.pystub +__all__)TestHashandTestCustomHashreflected test fixture classes totvm_ffi.testingtest_dataclass_hash.pycovering the fullRecursiveHashcontractArchitecture
TestHash(testing.TestHash): exercisesHash(false)field exclusion onhash_ignoredTestCustomHash(testing.TestCustomHash): exercises__ffi_hash__custom hook (hashes onlykey, ignoreslabel)Test Coverage
+0.0and-0.0hash identicallyHash(false)via TestHash;Compare(false)implies hash-off__ffi_hash__via TestCustomHash and TestCustomCompareRecursiveEq(a, b) ⟹ RecursiveHash(a) == RecursiveHash(b)— primitives, containers, reflected objects, custom hooks__ffi_eq__without__ffi_hash__raises ValueErrorTest Plan
uv run pytest -vvs tests/python/test_dataclass_hash.py