Add stream support to Thrift IDL and type system (THRIFT-1948)#1
Open
carlyeks wants to merge 9 commits into
Open
Add stream support to Thrift IDL and type system (THRIFT-1948)#1carlyeks wants to merge 9 commits into
carlyeks wants to merge 9 commits into
Conversation
Add stream<T> type to Thrift IDL for streaming RPC support. Compiler changes: - Add t_stream.h: New container type for streams - Update lexer (thriftl.ll): Add 'stream' keyword token - Update parser (thrifty.yy): Add StreamType grammar production - Add T_STREAM to t_type.h: is_stream() virtual method - Update t_generator.h: Stream support helpers * supports_streams(): Whether generator supports streams * get_effective_type(): Converts stream<T> to list<T> for non-supporting generators * is_effective_stream(): Checks if type is effectively a stream Backward compatibility: - Generators default to supports_streams() = false - Non-supporting generators automatically get list<T> instead of stream<T> - Allows incremental adoption across language backends Test file: - test/StreamTest.thrift: Example stream definitions * Server streaming: stream<Update> updates_for(string prefix) * Client streaming: i64 sum(stream<i64> values) * Bidirectional: stream<i64> random(stream<i64> max)
Implement stream protocol support at the base protocol layer. Wire protocol design: - T_STREAM (17): New wire type for streams - Stream framing: has_more byte before each element * T_STREAM_END (0): Stream finished * T_STREAM_NEXT (1): Element follows * T_STREAM_ERROR (2): Exception follows - Protocol methods: readStreamBegin/End, writeStreamBegin/End TProtocol.h changes: - Add T_STREAM and TStreamMessageType to TEnum.h - Add pure virtual stream methods (_virt suffix) - Add non-virtual stream method wrappers - Implement skip(T_STREAM) for graceful unknown type handling TVirtualProtocol.h changes: - Add CRTP forwarding for stream methods - Add fallback implementations in TProtocolDefaults TProtocolDecorator.h changes: - Add stream method forwarding to wrapped protocol - Fixes StoredMessageProtocol and other decorators Design rationale: - T_STREAM vs size=-1: Avoid OOM (size=-1 → 0xFFFFFFFF → 4GB allocation) - Element-level framing: Enables incremental processing - Skip support: Old clients gracefully skip unknown T_STREAM fields - Breaking change with clean failure mode vs catastrophic crash
Add stream support to all 6 Thrift C++ protocol implementations. Each protocol writes only element type - framing handled by generated code. TBinaryProtocol (.h, .tcc): - writeStreamBegin(): Write element type byte - readStreamBegin(): Read element type byte - Standard binary encoding TCompactProtocol (.h, .tcc): - writeStreamBegin(): Write compact-encoded type using getCompactType() - readStreamBegin(): Read compact type using getTType() - Compact variable-length encoding TJSONProtocol (.h, .cpp): - writeStreamBegin(): JSON array ["elemType"] - readStreamBegin(): Parse JSON array, use getTypeIDForTypeName() - Human-readable JSON format TDebugProtocol (.h, .cpp): - Write-only debugging protocol - writeStreamBegin(): Simple type byte output THeaderProtocol (.h, .cpp): - Wrapper protocol - delegates to wrapped proto_ - All stream methods forward to wrapped protocol TProtocolTap (.h): - Wiretap protocol for mirroring operations - Stream methods read from source, mirror to sink All implementations follow same pattern: - writeStreamBegin(elemType): Write element type marker only - writeStreamEnd(): No-op (return 0) - readStreamBegin(elemType&): Read and return element type - readStreamEnd(): No-op (return 0) Stream framing (has_more, elements, errors) handled by higher layer.
Documentation: - doc/STREAMS.md: Comprehensive stream protocol specification * Wire format using T_STREAM type * Backward compatibility via skip() * Migration path and use cases * Protocol method specifications - doc/STREAM_WIRE_PROTOCOL_EXAMPLE.md: Byte-level examples * Traditional list vs stream comparison * Wire format byte-by-byte breakdown * Performance analysis * Old client vs new client behavior Build configuration: - .gitignore: Add build/ directory for build artifacts Key documentation points: - Why T_STREAM instead of size=-1 (OOM prevention) - Stream framing with has_more bytes - Graceful degradation for old clients - Example services and use cases
e5e884c to
54ae336
Compare
Implements stream serialization and deserialization in the C++ generator: - Added supports_streams() to enable stream type handling - Map stream<T> types to std::vector<T> in generated code - Implement stream serialization with T_STREAM_NEXT/END framing: * writeStreamBegin() with element type * T_STREAM_NEXT byte before each element * T_STREAM_END byte after all elements * writeStreamEnd() to close stream - Implement stream deserialization with has_more validation: * readStreamBegin() to get element type * while loop reading has_more bytes * Validate has_more is T_STREAM_NEXT or T_STREAM_END * Throw TProtocolException on invalid continuation byte * readStreamEnd() to close stream - Added generate_serialize_stream_element() and generate_deserialize_stream_element() helper methods Generated code can now serialize/deserialize stream types using the stream wire protocol with proper framing for incremental processing.
This commit completes the C++ stream implementation by fixing protocol
type mappings and adding a test file:
**Compiler fixes:**
- Added T_STREAM case to type_to_enum() function in t_cpp_generator.cc
to prevent "INVALID TYPE" errors when generating stream code
**TCompactProtocol fixes:**
- Added CT_STREAM = 0x0E to compact protocol type enum
- Extended TTypeToCType array from 17 to 18 elements for T_STREAM
- Added CT_STREAM case to getTType() switch statement
**TJSONProtocol fixes:**
- Added kTypeNameStream("stm") constant for JSON serialization
- Added T_STREAM case to getTypeNameForTypeID()
- Extended getTypeIDForTypeName() to parse "stm" type name
**Test file:**
- Added test/StreamBasicTest.thrift with simple struct examples
demonstrating stream<i32>, stream<string>, and nested stream types
All three C++ protocols (Binary, Compact, JSON) now correctly
serialize and deserialize stream types with proper framing.
Implements stream code generation for Python similar to C++: **Generator support:** - Added supports_streams() to enable stream type handling - Map stream<T> types to list[T] in generated Python code **Type system:** - Added T_STREAM case to type_to_enum() returning "TType.STREAM" - Added stream support to type_to_py_type() for type hints (list[T]) - Added stream support to type_to_spec_args() for thrift_spec **Serialization:** - Updated generate_serialize_container() to call writeStreamBegin() - Write T_STREAM_NEXT (1) byte before each element - Write T_STREAM_END (0) byte after all elements - Call writeStreamEnd() to close stream - Added generate_serialize_stream_element() helper method **Deserialization:** - Updated generate_deserialize_container() with while loop - Read has_more bytes and check for T_STREAM_NEXT/END - Validate continuation bytes and raise TProtocolException on errors - Call readStreamEnd() after loop completes - Added generate_deserialize_stream_element() helper method Generated Python code uses lists as the underlying container type with proper stream protocol framing at the wire level.
Implements stream type support in Python library and code generator: Generator changes (compiler/cpp/src/thrift/generate/t_py_generator.cc): - Added supports_streams() returning true - Updated type_to_enum() to return "TType.STREAM" for stream types - Updated type_to_py_type() for list[T] type hints - Updated type_to_spec_args() for thrift_spec generation - Implemented stream serialization with T_STREAM_NEXT/END framing - Implemented stream deserialization with while loop validation - Added generate_serialize_stream_element() method - Added generate_deserialize_stream_element() method Runtime library changes: - lib/py/src/Thrift.py: Added TType.STREAM = 17 and TType.UUID = 16 - lib/py/src/protocol/TProtocol.py: * Added writeStreamBegin/End() and readStreamBegin/End() base methods * Updated skip() to handle TType.STREAM with framing protocol * Updated _TTYPE_HANDLERS with readContainerStream/writeContainerStream * Implemented readContainerStream() with has_more validation * Implemented writeContainerStream() with T_STREAM_NEXT/END bytes - lib/py/src/protocol/TBinaryProtocol.py: * Implemented writeStreamBegin() - writes element type byte * Implemented readStreamBegin() - reads element type byte - lib/py/src/protocol/TCompactProtocol.py: * Added CompactType.UUID = 0x0D and CompactType.STREAM = 0x0E * Updated CTYPES/TTYPES mappings for UUID and STREAM * Implemented stream methods with state management - lib/py/src/protocol/TJSONProtocol.py: * Added TType.STREAM: 'stm' to CTYPES dictionary * Implemented stream methods with JSON array format Testing: - Generated Python code from StreamBasicTest.thrift - All three protocols tested successfully (Binary, Compact, JSON) - Nested streams working correctly - Byte sizes match C++ implementation exactly
Implements stream type support in Java code generator: - Added supports_streams() returning true - Updated type_to_enum() to return TType.STREAM for stream types - Updated get_java_type_string() to return TType.STREAM - Updated type_name() to map stream<T> to List<T> (ArrayList for init) - Added generate_serialize_stream_element() method * Writes T_STREAM_NEXT byte (1) before each element * Serializes element using generate_serialize_field() - Added generate_deserialize_stream_element() method * Uses while loop with has_more byte checking * Validates has_more is 0 (T_STREAM_END) or 1 (T_STREAM_NEXT) * Throws TProtocolException on invalid continuation byte - Updated generate_serialize_container() for streams * Calls writeStreamBegin() with element type * Iterates elements writing T_STREAM_NEXT + element * Writes T_STREAM_END byte (0) after loop * Calls writeStreamEnd() - Updated generate_deserialize_container() for streams * Calls readStreamBegin() to get element type * Initializes empty ArrayList (streams have no size up front) * Uses while loop to read elements until T_STREAM_END * Calls readStreamEnd() - Updated generate_field_value_meta_data() to handle streams * Uses ListMetaData with TType.STREAM for metadata Generated code uses proper stream framing protocol matching C++ and Python.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit adds foundational support for streaming to Apache Thrift,
allowing chunked data transfer between client and server without
buffering entire messages in memory.
Compiler changes:
Protocol type system:
Documentation and examples:
demonstrating server, client, and bidirectional streaming
Stream primitives:
This implementation supports:
Future work includes protocol method implementations, code generator
updates, and runtime library support for various languages.
Related to JIRA issue THRIFT-1948