Skip to content

Conversation

@chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Jan 20, 2026

Why?

The current Fory compiler mixes FDL-native and protobuf-compatible syntax handling in a single parser, making it difficult to add support for new IDL formats like .proto and .fbs files. The validation logic is scattered across parsing and Schema.validate(), and there's no clear separation between parsing, semantic analysis, and code generation.

What does this PR do?

This PR refactors the Fory compiler into a hierarchical, multi-frontend architecture that establishes the Fory IDL AST as the canonical intermediate representation (IR), with separate frontend parsers for different IDL formats.

Key changes:

  1. New directory structure with clear separation of concerns:

    • ir/ - Intermediate Representation (canonical Fory AST)
      • ast.py - Core AST node definitions with SourceLocation tracking
      • types.py - Extended type system (primitives including varint, tagged types, etc.)
      • validator.py - Centralized semantic validation
      • emitter.py - FDL text emitter for debugging translated schemas
    • frontend/ - IDL Frontends
      • base.py - Base frontend interface
      • fdl/ - FDL Frontend (lexer + parser)
      • proto/ - Protobuf Frontend (lexer + parser + translator to Fory IR)
      • fbs/ - FlatBuffers Frontend (placeholder)
  2. Proto3 frontend - Full support for parsing .proto files and translating to Fory IR:

    • Proto3 syntax parsing (messages, enums, nested types, maps, repeated fields)
    • Type mapping (int32→var_uint32, sint32→varint32, fixed32→uint32, etc.)
    • Fory extension options ((fory).id, (fory).ref, (fory).nullable, etc.)
    • Well-known types support (google.protobuf.Timestamp, Duration)
  3. Simplified FDL syntax - Removed protobuf-style (fory) prefix from options:

    • File options: option use_record_for_java_message = true;
    • Type options: message Foo [id=100] { ... }
    • Field options: MyType data = 1 [ref=true, nullable=true];
  4. Extended type system with new primitive kinds:

    • Signed/unsigned variants: int8-int64, uint8-uint64
    • Variable-length encoding: varint32, varint64, var_uint32, var_uint64
    • Tagged types: tagged_int64, tagged_uint64
    • Additional types: float16, duration, decimal
  5. Improved code generators for all target languages with better type mapping

  6. CLI enhancements:

    • Auto-detect input format by file extension (.fdl, .proto)
    • New --emit-fdl flag to output translated FDL for debugging
  7. Cross-language integration tests for proto-based schemas

Related issues

Closes #3178

Does this PR introduce any user-facing change?

  • CLI now accepts .proto files directly (in addition to .fdl)

  • FDL syntax simplified: option (fory).xxxoption xxx

  • New primitive types available in FDL

  • Does this PR introduce any public API change?

  • Does this PR introduce any binary protocol compatibility change?

Benchmark

N/A - This is a compiler refactoring that doesn't affect runtime performance.

@chaokunyang chaokunyang force-pushed the refactor_proto_compiler branch from 716c2f8 to 5a6cea9 Compare January 20, 2026 16:20
@chaokunyang chaokunyang force-pushed the refactor_proto_compiler branch from a71562d to dfe46b0 Compare January 21, 2026 03:55
@chaokunyang chaokunyang merged commit d82d97f into apache:main Jan 21, 2026
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Xlang][Compiler] Fory Compiler Refactoring: Hierarchical IDL Architecture

2 participants