Skip to content

Conversation

@MehulBatra
Copy link
Contributor

@MehulBatra MehulBatra commented Apr 8, 2025

Purpose
Linked issue: #723 , #569

Utility for PojoToRowData Converter for the end users to use inbuild for a quick startup.

Brief change log:

  • Added utility to convert PojoToRowData
  • Added tests for primitive types (boolean, numeric, floating point)
  • Added tests for complex types (decimal, temporal, binary, char)
  • Added edge case tests for null values and nullable fields
  • Used Types.POJO(pojoClass) to get PojoTypeInfo
  • Implemented the Converter Pattern, added FieldConverter interface, and Type-specific converter implementations for each data type.
  • Made sure to not support for Nested Fields, as currently doesn't support one.

Tests:

  • Added unit tests for all Fluss data types including boolean, numeric, decimal, date/time, and binary types, with appropriate value verification.
  • Added a unit test to support that if a user passes a nested field, it would throw an error.

API and Format
No API or storage format changes.

Documentation
No documentation changes are required.

@MehulBatra
Copy link
Contributor Author

I have covered the complex types for the conversion also.
@polyzos @wuchong Please help me with the review.

@MehulBatra MehulBatra changed the title Pojo To RowData Utility [Connector] Pojo To RowData Utility Apr 8, 2025
@wuchong wuchong linked an issue Apr 9, 2025 that may be closed by this pull request
2 tasks
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @MehulBatra , I left some comments about the implementation, please let me know if you have any questions.

@MehulBatra
Copy link
Contributor Author

Thanks for the contribution @MehulBatra , I left some comments about the implementation, please let me know if you have any questions.

Thanks for the feedback. I will address these comments over the coming weekend and get back to you in case I am stuck!

@MehulBatra
Copy link
Contributor Author

@wuchong Jark made changes as per the comments, please have a look and let me know if I underdid or overdid anything, I will accommodate that.

@MehulBatra MehulBatra requested a review from wuchong April 21, 2025 13:39
@wuchong wuchong force-pushed the Pojo-rowData-converter branch from 2c520b6 to c7aaf2b Compare April 27, 2025 11:44
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MehulBatra

The pull request looks good to me overall, but I have one key concern: it currently implements a conversion from POJO to Flink's RowData. In my opinion, what we actually need is a utility for converting POJOs to Fluss' InternalRow. This utility would be essential for writing a DataStream<POJO> to the Fluss Sink, as all data types must eventually be converted into Fluss' InternalRow—this is the data type expected by AppendWriter and UpsertWriter.

On the other hand, the conversion from POJO to Flink's RowData is already implemented by Flink itself. This is used in scenarios where a DataStream<T> is converted into a Flink Table (e.g., via org.apache.flink.table.api.bridge.java.StreamTableEnvironment#fromDataStream(org.apache.flink.streaming.api.datastream.DataStream<T>)).

To address this, I have updated the pull request to focus on converting POJOs to Fluss' InternalRow. Additionally, I have added a pre-check for the field types of the POJO to ensure compatibility and prevent potential issues.

@wuchong
Copy link
Member

wuchong commented Apr 27, 2025

Merging...

@wuchong wuchong merged commit 8033ca7 into apache:main Apr 27, 2025
4 of 7 checks passed
polyzos pushed a commit to polyzos/fluss that referenced this pull request Apr 27, 2025
@MehulBatra
Copy link
Contributor Author

@MehulBatra

The pull request looks good to me overall, but I have one key concern: it currently implements a conversion from POJO to Flink's RowData. In my opinion, what we actually need is a utility for converting POJOs to Fluss' InternalRow. This utility would be essential for writing a DataStream<POJO> to the Fluss Sink, as all data types must eventually be converted into Fluss' InternalRow—this is the data type expected by AppendWriter and UpsertWriter.

On the other hand, the conversion from POJO to Flink's RowData is already implemented by Flink itself. This is used in scenarios where a DataStream<T> is converted into a Flink Table (e.g., via org.apache.flink.table.api.bridge.java.StreamTableEnvironment#fromDataStream(org.apache.flink.streaming.api.datastream.DataStream<T>)).

To address this, I have updated the pull request to focus on converting POJOs to Fluss' InternalRow. Additionally, I have added a pre-check for the field types of the POJO to ensure compatibility and prevent potential issues.

Thanks @wuchong for identifying and fixing this critical issue! I initially misunderstood the relationship between Flink's RowData and Fluss' InternalRow.
I'll make sure to pay closer attention to these format distinctions in future PRs. Really appreciate your help and guidance!

@MehulBatra MehulBatra changed the title [Connector] Pojo To RowData Utility [Connector] Pojo To InternalRow(Fluss) Utility Apr 27, 2025
ZmmBigdata pushed a commit to ZmmBigdata/fluss that referenced this pull request Jun 20, 2025
polyzos pushed a commit to polyzos/fluss that referenced this pull request Aug 30, 2025
polyzos pushed a commit to Alibaba-HZY/fluss that referenced this pull request Aug 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Connector] Inbuilt PojoToRowConverter utility

2 participants