Skip to content

Conversation

@chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Jan 1, 2026

Why?

Fory has lacked a schema-first Interface Definition Language (IDL) for cross-language serialization.

Currently, teams must manually craft type registration and cross-language models. This is Ok and perfered option if we are suign one language to build the system, since we can serialize domain object directly.

But it is error-prone when we have multiple languages due to the type system inconsistency and makes it difficult to guarantee consistent schemas, type IDs, and reference-tracking behavior across languages.

Users who want to migrate from Protocol Buffers to Fory for better performance, reference tracking, or polymorphism support had to manually define structs in every language.

This PR addresses this gap by introducing FDL (Fory Definition Language) - a native schema IDL specifically designed for Fory's cross-language serialization capabilities.

Key motivations:

  • Enable schema-first development workflow for Fory
  • Provide deterministic, cross-language type IDs and registration rules
  • Generate native code with minimal/no runtime overhead
  • Ensure consistent reference-tracking and polymorphism behavior across languages
  • Simplify migration from Protocol Buffers by supporting similar syntax patterns

What does this PR do?

1. FDL (Fory Definition Language) Specification

  • Defines file structure: package, import, enum, message
  • Supports field modifiers: optional, ref, repeated
  • Supports collection types: repeated (list), map<K,V>
  • Supports nested types (message within message, enum within message)
  • Supports reserved names/IDs for backward compatibility
  • Supports file/type/field options (protobuf-style and bracket-style)

2. Compiler Frontend (Python-based)

  • Hand-written Lexer (fory_compiler/parser/lexer.py): Tokenizes FDL source files
  • Recursive-Descent Parser (fory_compiler/parser/parser.py): Parses tokens into AST
  • AST/IR Definitions (fory_compiler/parser/ast.py): Schema, Message, Enum, Field, Import, and type representations
  • Schema Validation: Detects duplicate names, duplicate IDs, unknown type references, duplicate field numbers
  • Import Resolution: Supports relative and search-path based imports
  • Circular Import Detection: Prevents infinite recursion in imports

3. Multi-Language Code Generation

Language Output Features
Java POJOs with Fory annotations Getters/setters, equals/hashCode, registration helper class, nested type support
Python Dataclasses with type hints Native type mappings, registration function, nested type flattening
Go Structs with struct tags Fory struct tags, registration function, nested type flattening
Rust Structs with derive macros #[derive(Fory, ...)], registration function, nested type flattening
C++ Structs with FORY macros FORY_STRUCT, FORY_FIELD_INFO, registration helper, nested type support

4. Compiler CLI & Build Integration

  • CLI command: fory compile with comprehensive options
  • Language-specific output directories: --java_out, --python_out, --go_out, --rust_out, --cpp_out
  • Include paths: -I/--proto_path for import resolution
  • Package installable: pip install -e . with fory entrypoint

5. C++ Improvements

  • Added FORY_PP_IS_EMPTY, FORY_PP_HAS_ARGS macros for empty struct support
  • Added FORY_STRUCT_0 variant to support structs with no fields
  • Fixed FORY_STRUCT macro to detect empty argument lists

6. Integration Tests

  • Full cross-language roundtrip tests in integration_tests/idl_tests/
  • Tests cover Java ↔ Python ↔ Go ↔ Rust ↔ C++ serialization compatibility
  • CI workflow integration for all language combinations

7. Comprehensive Documentation

  • FDL Overview - Introduction and quick start
  • FDL Syntax Reference - Complete language syntax
    -Type System - Primitive types, collections, and mappings
  • Compiler Guide - CLI usage and build integration
  • Generated Code - Output format for each language
  • Protocol Buffers vs FDL - Feature comparison and migration guide

Example FDL Schema

package addressbook;

message Person [id=100] {
    string name = 1;
    int32 id = 2;
    string email = 3;
    repeated string tags = 4;
    map<string, int32> scores = 5;
    
    enum PhoneType [id=101] {
        MOBILE = 0;
        HOME = 1;
        WORK = 2;
    }
    
    message PhoneNumber [id=102] {
        string number = 1;
        PhoneType phone_type = 2;
    }
    
    repeated PhoneNumber phones = 6;
}

message AddressBook [id=103] {
    repeated Person people = 1;
    map<string, Person> people_by_name = 2;
}

Related issues

Closes #3163 #3164 #3165 #3167 #3168 #3169 #3173 #3174 #3175 #3176 #3177

#1197
#1945
#3099

Does this PR introduce any user-facing change?

Yes, this PR introduces a new FDL compiler tool that can be installed via pip:

cd compiler
pip install -e .
fory compile --lang java,python,go,rust,cpp schema.fdl -o output/
  • Does this PR introduce any public API change?

    • New fory compile CLI command
    • New FDL schema language specification
    • Generated code uses existing Fory APIs (annotations, macros, derive macros)
  • Does this PR introduce any binary protocol compatibility change?

    • No changes to the serialization protocol
    • Generated code uses standard Fory serialization

Benchmark

N/A - This PR focuses on code generation tooling. The generated code uses existing Fory serialization APIs which have been benchmarked separately. The compiler itself is a build-time tool and does not impact runtime serialization performance.

@chaokunyang chaokunyang marked this pull request as draft January 1, 2026 09:07
@chaokunyang chaokunyang changed the title feat(xlang: introduce fory idl and compiler feat(xlang): introduce fory idl and compiler Jan 1, 2026
@chaokunyang chaokunyang changed the title feat(xlang): introduce fory idl and compiler feat(xlang): introduce fory schema idl and compiler Jan 4, 2026
@chaokunyang chaokunyang changed the title feat(xlang): introduce fory schema idl and compiler feat(xlang): fory schema idl and compiler Jan 12, 2026
@chaokunyang chaokunyang marked this pull request as ready for review January 20, 2026 00:16
@chaokunyang chaokunyang merged commit 1cf3827 into apache:main Jan 20, 2026
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Xlang] FDL syntax & grammar docs and parser

2 participants