Skip to content

feat(python): add field registration, _register_py_class, and Field descriptor#505

Merged
junrushao merged 1 commit intoapache:mainfrom
junrushao:2026-03-19/py-class-field-registration
Mar 20, 2026
Merged

feat(python): add field registration, _register_py_class, and Field descriptor#505
junrushao merged 1 commit intoapache:mainfrom
junrushao:2026-03-19/py-class-field-registration

Conversation

@junrushao
Copy link
Member

@junrushao junrushao commented Mar 20, 2026

Summary

  • Implement the Python-side type registration pipeline for @py_class-decorated classes: allocates dynamic type indices, computes native field layouts, registers getters/setters with type-converting callbacks, and wires up __ffi_new__/__ffi_init__ for automatic construction.
  • Add Field descriptor and field() factory function mirroring stdlib dataclasses.field() API.
  • Fix dunder family-skip logic so user-defined __eq__ suppresses generated __ne__ (and vice versa), same for ordering operators.

Architecture

Three-layer implementation:

  • Cython object.pxi: _register_py_class (dynamic type index allocation, ancestor chain, TypeInfo insertion) and _rollback_py_class (cleanup on phase-2 failure).
  • Cython type_info.pxi: Field registration engine — computes per-field native layout from parent_type_info.total_size, obtains C getter/setter function pointers, and registers via TVMFFITypeRegisterField. Installs MakeFFINew/RegisterAutoInit.
  • Python field.py: Pure-Python Field descriptor with __slots__ matching stdlib dataclasses.field() signature, plus KW_ONLY sentinel (3.9-compat).

Behavioral Changes

  • __post_init__ is now called after __ffi_init__ if defined (previously silently ignored).
  • Dunder family-skip: defining any member of {__eq__, __ne__} or {__lt__, __le__, __gt__, __ge__} suppresses the entire family's auto-generation.

Test plan

  • Python tests pass: uv run pytest -vvs tests/python
  • Full @py_class end-to-end tests (in follow-up commits)
  • Pre-commit lints pass

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Python-side type system for TVM-FFI, providing a robust and flexible mechanism for defining custom Python classes that integrate deeply with the underlying C++ FFI. It introduces a comprehensive type registration pipeline, allowing for dynamic allocation of type indices, precise control over native field layouts, and automatic generation of FFI-compatible getters and setters. Furthermore, it improves the developer experience by offering a Field descriptor akin to Python's dataclasses, and refines the behavior of auto-generated dunder methods to better respect user-defined implementations, ensuring more predictable and customizable class behavior.

Highlights

  • Python-side Type Registration Pipeline: Implemented the full pipeline for @py_class-decorated classes, handling dynamic type indices, native field layouts, and FFI integration, including __ffi_new__ and __ffi_init__ for automatic construction.
  • Field Descriptor and Factory Function: Introduced a Field descriptor and field() factory function, mirroring stdlib dataclasses.field(), for customizing fields in TVM-FFI types.
  • Dunder Method Family-Skip Logic Fix: Corrected the dunder family-skip logic to ensure user-defined __eq__ suppresses generated __ne__ (and vice versa), and similarly for ordering operators, preventing conflicts between generated and custom methods.
  • __post_init__ Support: Enabled __post_init__ to be called after __ffi_init__ if defined in @py_class types, which was previously silently ignored.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant feature for Python-side type registration, including dynamic type index allocation, field layout computation, and automatic constructor generation. The implementation is split across Cython and Python, with a new Field descriptor mirroring dataclasses.field. The changes also improve dunder method generation to avoid conflicts with user-defined methods. My feedback includes suggestions to improve code clarity, consistency, and reduce duplication.

…escriptor

Implement the Python-side type registration pipeline for @py_class-decorated
classes. This is the infrastructure layer that bridges Python class definitions
to the C-level type table: it allocates dynamic type indices, computes native
field layouts, registers getters/setters with type-converting callbacks, and
wires up __ffi_new__/__ffi_init__ for automatic construction.

## Architecture

The registration flow is split across three layers:

**Cython object.pxi** — Two new public functions:
- _register_py_class: allocates a dynamic type index via TVMFFITypeGetOrAllocIndex,
  builds the ancestor chain, creates a TypeInfo with fields=None (phase-1),
  and inserts it into all four global lookup tables.
- _rollback_py_class: removes the Python-level registry entries when phase-2
  validation fails. The C-level type index is permanently consumed (by design —
  the index allocator is monotonic), but Python dicts are cleaned up so the
  type key can be retried.

**Cython type_info.pxi** — Field registration engine:
- TypeInfo._register_fields: phase-2 entry point; delegates to the module-level
  _register_fields, then reads back C++-registered methods via _register_methods.
- _register_fields (module-level): iterates Field descriptors, computes per-field
  native layout (offset, size, alignment) starting from parent_type_info.total_size
  to avoid overlapping parent memory, obtains C getter function pointers and
  FunctionObj setters with type-converting callbacks, and registers each field
  via TVMFFITypeRegisterField. After all fields, calls ffi.MakeFFINew and
  ffi.RegisterAutoInit to install the object allocator and auto-constructor.
- _register_one_field: low-level cdef that populates TVMFFIFieldInfo (flags, layout,
  getter/setter, default value) and calls into the C API.
- _f_type_convert: noexcept+gil C callback for MakeFieldSetter's type conversion
  path. Unpacks AnyView to Python, dispatches through _TypeConverter, transfers
  ownership of the resulting CAny back to the caller.
- _ORIGIN_NATIVE_LAYOUT: maps TypeSchema origin strings to (size, alignment,
  field_static_type_index) triples. str/bytes/Optional/Union are stored as Any
  (16 bytes) because they can be inline SmallStr/SmallBytes, not just ObjectRef.

**Python field.py** — Pure-Python Field descriptor:
- Field class with __slots__ matching the stdlib dataclasses.field() signature
  (default, default_factory, init, repr, hash, compare, kw_only, doc) plus
  name/ty filled in later by @py_class.
- field() factory function with return type Any (required by dataclass_transform
  field specifier protocol).
- KW_ONLY sentinel re-exported from stdlib (3.10+) or defined as a class
  sentinel for 3.9 compatibility.

**Python registry.py** — Two targeted changes:
- _make_init: checks for __post_init__ at init-generation time (not per-call)
  and emits one of two closures, so the hot path avoids a hasattr on every
  instantiation.
- _install_dataclass_dunders: treats __eq__/__ne__ and __lt__/__le__/__gt__/__ge__
  as semantic families — if the user defines any member of a family, the entire
  family is skipped to prevent generated and user-defined methods from disagreeing.

## Public Interfaces

- _register_py_class(parent_type_info, type_key, type_cls) -> TypeInfo
- _rollback_py_class(type_info) -> None
- TypeInfo._register_fields(fields) -> None
- Field class (tvm_ffi.dataclasses.field module)
- field() factory function
- KW_ONLY sentinel

These are internal APIs consumed by the @py_class decorator (not yet in this
commit). They are public in the sense that they are importable from the package,
but not part of the stable user-facing API.

## UI/UX

No user-facing changes. The Field descriptor and field() function mirror the
stdlib dataclasses.field() API intentionally, so @py_class (when added) will
feel familiar to users of dataclasses or attrs.

## Behavioral Changes

- __post_init__ is now called after __ffi_init__ if defined on the class.
  Previously __post_init__ was silently ignored. This is a semantic addition,
  not a breaking change.
- Dunder family-skip logic changes _install_dataclass_dunders: if a user
  defines __eq__, the generated __ne__ is also suppressed (and vice versa).
  Previously, defining __eq__ alone would still install a generated __ne__
  that could disagree with the user's __eq__. Same applies to the ordering
  family (__lt__/__le__/__gt__/__ge__).

## Docs

No documentation changes. Docs will be added with the @py_class decorator
commit.

## Tests

No new tests in this commit. Tests are in a separate commit covering the
full @py_class workflow end-to-end.

## Untested Edge Cases

- Deeply nested inheritance chains (>3 levels) where parent_type_info.total_size
  compounding could cause surprising offsets.
- Concurrent _register_py_class calls from multiple threads (the Python GIL
  serializes them, but the C-level TVMFFITypeGetOrAllocIndex uses its own lock).
- _rollback_py_class after partial field registration (currently only called
  before _register_fields, but nothing enforces this ordering).
- Field with default_factory that raises during __ffi_init__ (the exception
  propagates, but the partially-constructed object may have some fields set).
@junrushao junrushao force-pushed the 2026-03-19/py-class-field-registration branch from 6cc9780 to 272ffa4 Compare March 20, 2026 05:11
@junrushao junrushao requested a review from tqchen March 20, 2026 05:13
@junrushao junrushao merged commit 6a61156 into apache:main Mar 20, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants