Skip to content

Add stream support to Thrift IDL and type system (THRIFT-1948)#1

Open
carlyeks wants to merge 9 commits into
streamfrom
claude/add-thrift-stream-support-LqD1O
Open

Add stream support to Thrift IDL and type system (THRIFT-1948)#1
carlyeks wants to merge 9 commits into
streamfrom
claude/add-thrift-stream-support-LqD1O

Conversation

@carlyeks
Copy link
Copy Markdown
Owner

This commit adds foundational support for streaming to Apache Thrift,
allowing chunked data transfer between client and server without
buffering entire messages in memory.

Compiler changes:

  • Added stream keyword to lexer (thriftl.ll)
  • Added StreamType grammar rules to parser (thrifty.yy)
  • Created t_stream type class for representing streams
  • Updated t_type.h with is_stream() method
  • Added t_stream.h includes to t_program.h and t_scope.h

Protocol type system:

  • Added T_STREAM (17) to TType enumeration
  • Added TStreamMessageType enum with NEXT, ERROR, END states

Documentation and examples:

  • Created doc/STREAMS.md with comprehensive documentation
  • Added test/StreamTest.thrift with example streaming services
    demonstrating server, client, and bidirectional streaming

Stream primitives:

  • next(T): Send next value in stream
  • error(TException): Send exception and close stream
  • end(): Complete stream successfully

This implementation supports:

  • Server streaming: stream methodName(args)
  • Client streaming: T methodName(stream arg)
  • Bidirectional streaming: stream methodName(stream arg)

Future work includes protocol method implementations, code generator
updates, and runtime library support for various languages.

Related to JIRA issue THRIFT-1948

@carlyeks carlyeks changed the base branch from master to stream December 24, 2025 17:08
Add stream<T> type to Thrift IDL for streaming RPC support.

Compiler changes:
- Add t_stream.h: New container type for streams
- Update lexer (thriftl.ll): Add 'stream' keyword token
- Update parser (thrifty.yy): Add StreamType grammar production
- Add T_STREAM to t_type.h: is_stream() virtual method
- Update t_generator.h: Stream support helpers
  * supports_streams(): Whether generator supports streams
  * get_effective_type(): Converts stream<T> to list<T> for non-supporting generators
  * is_effective_stream(): Checks if type is effectively a stream

Backward compatibility:
- Generators default to supports_streams() = false
- Non-supporting generators automatically get list<T> instead of stream<T>
- Allows incremental adoption across language backends

Test file:
- test/StreamTest.thrift: Example stream definitions
  * Server streaming: stream<Update> updates_for(string prefix)
  * Client streaming: i64 sum(stream<i64> values)
  * Bidirectional: stream<i64> random(stream<i64> max)
Implement stream protocol support at the base protocol layer.

Wire protocol design:
- T_STREAM (17): New wire type for streams
- Stream framing: has_more byte before each element
  * T_STREAM_END (0): Stream finished
  * T_STREAM_NEXT (1): Element follows
  * T_STREAM_ERROR (2): Exception follows
- Protocol methods: readStreamBegin/End, writeStreamBegin/End

TProtocol.h changes:
- Add T_STREAM and TStreamMessageType to TEnum.h
- Add pure virtual stream methods (_virt suffix)
- Add non-virtual stream method wrappers
- Implement skip(T_STREAM) for graceful unknown type handling

TVirtualProtocol.h changes:
- Add CRTP forwarding for stream methods
- Add fallback implementations in TProtocolDefaults

TProtocolDecorator.h changes:
- Add stream method forwarding to wrapped protocol
- Fixes StoredMessageProtocol and other decorators

Design rationale:
- T_STREAM vs size=-1: Avoid OOM (size=-1 → 0xFFFFFFFF → 4GB allocation)
- Element-level framing: Enables incremental processing
- Skip support: Old clients gracefully skip unknown T_STREAM fields
- Breaking change with clean failure mode vs catastrophic crash
Add stream support to all 6 Thrift C++ protocol implementations.
Each protocol writes only element type - framing handled by generated code.

TBinaryProtocol (.h, .tcc):
- writeStreamBegin(): Write element type byte
- readStreamBegin(): Read element type byte
- Standard binary encoding

TCompactProtocol (.h, .tcc):
- writeStreamBegin(): Write compact-encoded type using getCompactType()
- readStreamBegin(): Read compact type using getTType()
- Compact variable-length encoding

TJSONProtocol (.h, .cpp):
- writeStreamBegin(): JSON array ["elemType"]
- readStreamBegin(): Parse JSON array, use getTypeIDForTypeName()
- Human-readable JSON format

TDebugProtocol (.h, .cpp):
- Write-only debugging protocol
- writeStreamBegin(): Simple type byte output

THeaderProtocol (.h, .cpp):
- Wrapper protocol - delegates to wrapped proto_
- All stream methods forward to wrapped protocol

TProtocolTap (.h):
- Wiretap protocol for mirroring operations
- Stream methods read from source, mirror to sink

All implementations follow same pattern:
- writeStreamBegin(elemType): Write element type marker only
- writeStreamEnd(): No-op (return 0)
- readStreamBegin(elemType&): Read and return element type
- readStreamEnd(): No-op (return 0)

Stream framing (has_more, elements, errors) handled by higher layer.
Documentation:
- doc/STREAMS.md: Comprehensive stream protocol specification
  * Wire format using T_STREAM type
  * Backward compatibility via skip()
  * Migration path and use cases
  * Protocol method specifications

- doc/STREAM_WIRE_PROTOCOL_EXAMPLE.md: Byte-level examples
  * Traditional list vs stream comparison
  * Wire format byte-by-byte breakdown
  * Performance analysis
  * Old client vs new client behavior

Build configuration:
- .gitignore: Add build/ directory for build artifacts

Key documentation points:
- Why T_STREAM instead of size=-1 (OOM prevention)
- Stream framing with has_more bytes
- Graceful degradation for old clients
- Example services and use cases
@carlyeks carlyeks force-pushed the claude/add-thrift-stream-support-LqD1O branch from e5e884c to 54ae336 Compare December 24, 2025 23:29
Implements stream serialization and deserialization in the C++ generator:

- Added supports_streams() to enable stream type handling
- Map stream<T> types to std::vector<T> in generated code
- Implement stream serialization with T_STREAM_NEXT/END framing:
  * writeStreamBegin() with element type
  * T_STREAM_NEXT byte before each element
  * T_STREAM_END byte after all elements
  * writeStreamEnd() to close stream
- Implement stream deserialization with has_more validation:
  * readStreamBegin() to get element type
  * while loop reading has_more bytes
  * Validate has_more is T_STREAM_NEXT or T_STREAM_END
  * Throw TProtocolException on invalid continuation byte
  * readStreamEnd() to close stream
- Added generate_serialize_stream_element() and
  generate_deserialize_stream_element() helper methods

Generated code can now serialize/deserialize stream types using the
stream wire protocol with proper framing for incremental processing.
This commit completes the C++ stream implementation by fixing protocol
type mappings and adding a test file:

**Compiler fixes:**
- Added T_STREAM case to type_to_enum() function in t_cpp_generator.cc
  to prevent "INVALID TYPE" errors when generating stream code

**TCompactProtocol fixes:**
- Added CT_STREAM = 0x0E to compact protocol type enum
- Extended TTypeToCType array from 17 to 18 elements for T_STREAM
- Added CT_STREAM case to getTType() switch statement

**TJSONProtocol fixes:**
- Added kTypeNameStream("stm") constant for JSON serialization
- Added T_STREAM case to getTypeNameForTypeID()
- Extended getTypeIDForTypeName() to parse "stm" type name

**Test file:**
- Added test/StreamBasicTest.thrift with simple struct examples
  demonstrating stream<i32>, stream<string>, and nested stream types

All three C++ protocols (Binary, Compact, JSON) now correctly
serialize and deserialize stream types with proper framing.
Implements stream code generation for Python similar to C++:

**Generator support:**
- Added supports_streams() to enable stream type handling
- Map stream<T> types to list[T] in generated Python code

**Type system:**
- Added T_STREAM case to type_to_enum() returning "TType.STREAM"
- Added stream support to type_to_py_type() for type hints (list[T])
- Added stream support to type_to_spec_args() for thrift_spec

**Serialization:**
- Updated generate_serialize_container() to call writeStreamBegin()
- Write T_STREAM_NEXT (1) byte before each element
- Write T_STREAM_END (0) byte after all elements
- Call writeStreamEnd() to close stream
- Added generate_serialize_stream_element() helper method

**Deserialization:**
- Updated generate_deserialize_container() with while loop
- Read has_more bytes and check for T_STREAM_NEXT/END
- Validate continuation bytes and raise TProtocolException on errors
- Call readStreamEnd() after loop completes
- Added generate_deserialize_stream_element() helper method

Generated Python code uses lists as the underlying container type
with proper stream protocol framing at the wire level.
Implements stream type support in Python library and code generator:

Generator changes (compiler/cpp/src/thrift/generate/t_py_generator.cc):
- Added supports_streams() returning true
- Updated type_to_enum() to return "TType.STREAM" for stream types
- Updated type_to_py_type() for list[T] type hints
- Updated type_to_spec_args() for thrift_spec generation
- Implemented stream serialization with T_STREAM_NEXT/END framing
- Implemented stream deserialization with while loop validation
- Added generate_serialize_stream_element() method
- Added generate_deserialize_stream_element() method

Runtime library changes:
- lib/py/src/Thrift.py: Added TType.STREAM = 17 and TType.UUID = 16
- lib/py/src/protocol/TProtocol.py:
  * Added writeStreamBegin/End() and readStreamBegin/End() base methods
  * Updated skip() to handle TType.STREAM with framing protocol
  * Updated _TTYPE_HANDLERS with readContainerStream/writeContainerStream
  * Implemented readContainerStream() with has_more validation
  * Implemented writeContainerStream() with T_STREAM_NEXT/END bytes
- lib/py/src/protocol/TBinaryProtocol.py:
  * Implemented writeStreamBegin() - writes element type byte
  * Implemented readStreamBegin() - reads element type byte
- lib/py/src/protocol/TCompactProtocol.py:
  * Added CompactType.UUID = 0x0D and CompactType.STREAM = 0x0E
  * Updated CTYPES/TTYPES mappings for UUID and STREAM
  * Implemented stream methods with state management
- lib/py/src/protocol/TJSONProtocol.py:
  * Added TType.STREAM: 'stm' to CTYPES dictionary
  * Implemented stream methods with JSON array format

Testing:
- Generated Python code from StreamBasicTest.thrift
- All three protocols tested successfully (Binary, Compact, JSON)
- Nested streams working correctly
- Byte sizes match C++ implementation exactly
Implements stream type support in Java code generator:

- Added supports_streams() returning true
- Updated type_to_enum() to return TType.STREAM for stream types
- Updated get_java_type_string() to return TType.STREAM
- Updated type_name() to map stream<T> to List<T> (ArrayList for init)
- Added generate_serialize_stream_element() method
  * Writes T_STREAM_NEXT byte (1) before each element
  * Serializes element using generate_serialize_field()
- Added generate_deserialize_stream_element() method
  * Uses while loop with has_more byte checking
  * Validates has_more is 0 (T_STREAM_END) or 1 (T_STREAM_NEXT)
  * Throws TProtocolException on invalid continuation byte
- Updated generate_serialize_container() for streams
  * Calls writeStreamBegin() with element type
  * Iterates elements writing T_STREAM_NEXT + element
  * Writes T_STREAM_END byte (0) after loop
  * Calls writeStreamEnd()
- Updated generate_deserialize_container() for streams
  * Calls readStreamBegin() to get element type
  * Initializes empty ArrayList (streams have no size up front)
  * Uses while loop to read elements until T_STREAM_END
  * Calls readStreamEnd()
- Updated generate_field_value_meta_data() to handle streams
  * Uses ListMetaData with TType.STREAM for metadata

Generated code uses proper stream framing protocol matching C++ and Python.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants