Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spark][core] Support input_file_name UDF #3094

Merged
merged 3 commits into from
Mar 26, 2024

Conversation

YannByron
Copy link
Contributor

Purpose

Linked issue: close #xxx

Tests

API and Format

Documentation

@JingsongLi
Copy link
Contributor

JingsongLi commented Mar 26, 2024

Thanks @YannByron for the contribution.

Hooks have invaded a lot of core code.
Consider simpler solutions:

Step 1: Modify paimon-core

Just rename RecordWithPositionIterator to FileRecordIterator.

interface FileRecordIterator<T> extends RecordReader.RecordIterator<T> {

      Path filePath();

      long returnedPosition();

      T next() throws IOException;
}

The filePath is a final member for a FileRecordIterator. One FileRecordIterator should just for only one file.

Step 2: Modify paimon-spark

We can introduce a FileHolderRecordReader to paimon-spark, to wrap old reader returned by paimon-core, In this reader, if the returned iterator is a FileRecordIterator, we can invoke InputFileBlockHolder.set and unset.

@JingsongLi
Copy link
Contributor

LGTM!

@JingsongLi JingsongLi merged commit 13de145 into apache:master Mar 26, 2024
9 checks passed
zhuangchong pushed a commit to zhuangchong/flink-table-store that referenced this pull request Mar 26, 2024
zhu3pang pushed a commit to zhu3pang/incubator-paimon that referenced this pull request Mar 29, 2024
@YannByron
Copy link
Contributor Author

#3057

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants