Skip to content

Conversation

@polyzos
Copy link
Contributor

@polyzos polyzos commented Apr 2, 2025

This PR introduces the FlussSource for the Datastream API

@polyzos polyzos marked this pull request as ready for review April 2, 2025 08:46
@polyzos
Copy link
Contributor Author

polyzos commented Apr 2, 2025

@wuchong RowConverters are used for testing now and can be improved and exposed as helper functions to users in another PR.
But let's discuss and keep track of the required improvements and overall thoughts.

@polyzos polyzos requested a review from wuchong April 2, 2025 09:10
Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @polyzos . I left some comments. Please also rebase your branch to the latest main branch, but NOT squash commits.

return this;
}

public FlussSourceBuilder<IN> setProjectedFields(int[] projectedFields) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indexes are not friendly to use for end-users. Let's support projection by field names setProjectedFields(String... fieldNames).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuchong currently changing to fieldNames might break compatibility with the FlinkTableSource as it uses the same FlinkSource constructor, but it uses the indexes.. maybe leave it as is and introduce a separate ticket/PR to address this properly so its easier to track later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuchong i created this for tracking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we only expose fieldNames projection in the FlussSourceBuilder, but still use field index projection in the underlying FlinkSource. We can do the mapping from fieldNames to field index in the build() method, as we have got the schema in it.

We can implement it in a follow-up PR.

@polyzos polyzos force-pushed the datastream-fluss-source branch from 09ff0e0 to a947de9 Compare April 27, 2025 15:18
@polyzos polyzos force-pushed the datastream-fluss-source branch from a947de9 to d62abee Compare April 28, 2025 05:58
@polyzos
Copy link
Contributor Author

polyzos commented Apr 28, 2025

@wuchong thank you for your time and detailed feedback. PTAL as all comments should have been addressed.. let me know if I missed anything

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only left some minor comments and pushed a commit to fix it.

Will merge it once CI is passed.

offsetsInitializer,
scanPartitionDiscoveryIntervalMs,
streaming);
true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hard-cord this to true?

}

@VisibleForTesting
public OffsetsInitializer getOffsetsInitializer() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be package visible

}

@VisibleForTesting
public long getScanPartitionDiscoveryIntervalMs() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never used, can remove

return this;
}

public FlussSourceBuilder<OUT> setIsStreaming(boolean isStreaming) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove it as we don't support it yet.

public class FlussSource<OUT> extends FlinkSource<OUT> {
private static final long serialVersionUID = 1L;

public FlussSource(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to package visibility to avoid users direct use this constructor. As this constructor is evolving frequently.

return this;
}

public FlussSourceBuilder<IN> setProjectedFields(int[] projectedFields) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we only expose fieldNames projection in the FlussSourceBuilder, but still use field index projection in the underlying FlinkSource. We can do the mapping from fieldNames to field index in the build() method, as we have got the schema in it.

We can implement it in a follow-up PR.

@wuchong wuchong merged commit 2412114 into apache:main Apr 29, 2025
3 checks passed
ZmmBigdata pushed a commit to ZmmBigdata/fluss that referenced this pull request Jun 20, 2025
polyzos added a commit to polyzos/fluss that referenced this pull request Aug 30, 2025
polyzos added a commit to Alibaba-HZY/fluss that referenced this pull request Aug 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants