Skip to content

[FLINK] Support ORC filesystem sink format#12327

Open
zhanglistar wants to merge 32 commits into
apache:mainfrom
zhanglistar:codex/flink-orc-sink-format
Open

[FLINK] Support ORC filesystem sink format#12327
zhanglistar wants to merge 32 commits into
apache:mainfrom
zhanglistar:codex/flink-orc-sink-format

Conversation

@zhanglistar

@zhanglistar zhanglistar commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

Gluten Flink support ORC flilesystem sink format, solves #12203.
Depends on bigo-sg/velox4j#43 and bigo-sg/velox#52.

How was this patch tested?

UT

Was this patch authored or co-authored using generative AI tooling?

Copilot AI review requested due to automatic review settings June 22, 2026 09:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot AI review requested due to automatic review settings June 23, 2026 02:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@zhanglistar zhanglistar force-pushed the codex/flink-orc-sink-format branch from b9fe77f to 4e42142 Compare June 23, 2026 03:39
Copilot AI review requested due to automatic review settings June 23, 2026 03:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot AI review requested due to automatic review settings June 23, 2026 04:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Comment thread gluten-flink/ut/src/test/resources/nexmark/q10.sql Outdated
Copilot AI review requested due to automatic review settings June 23, 2026 04:29

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot AI review requested due to automatic review settings June 23, 2026 08:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Copilot AI review requested due to automatic review settings June 24, 2026 05:16

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@zhanglistar zhanglistar force-pushed the codex/flink-orc-sink-format branch from 647a12f to 43d0e3f Compare June 29, 2026 09:45
Copilot AI review requested due to automatic review settings June 29, 2026 11:13
Copilot AI review requested due to automatic review settings July 3, 2026 03:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Comment thread .github/workflows/flink.yml Outdated
Comment thread gluten-flink/pom.xml Outdated
Copilot AI review requested due to automatic review settings July 3, 2026 04:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment thread .github/workflows/flink.yml Outdated
Copilot AI review requested due to automatic review settings July 3, 2026 04:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Comment thread gluten-flink/pom.xml Outdated
Copilot AI review requested due to automatic review settings July 3, 2026 04:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

gluten-flink/ut/src/test/java/org/apache/gluten/table/runtime/stream/custom/NexmarkTest.java:259

  • verifyQ10OrcOutput() only checks Files.exists(outputDir), which would also pass if the path exists but is not a directory (e.g., a leftover file). Using Files.isDirectory makes the assertion stricter and avoids misleading failures later in Files.walk.
    }

    String insertQuery = sqlStatements[sqlStatements.length - 2].trim();

Copilot AI review requested due to automatic review settings July 3, 2026 04:16
endInput() now calls drainOutput(this::finishTask) instead of
finishTask() directly, so the GlutenMailboxOperatorHelper's
reentrancy protection applies during end-of-input draining. This
prevents nested drain races if a Velox callback schedules a drain
concurrently. Also fixes IOException handling in ORC output dir
cleanup.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

gluten-flink/ut/src/test/java/org/apache/gluten/table/runtime/stream/custom/NexmarkTest.java:263

  • This AssertJ call doesn't assert anything because it doesn't chain an assertion (e.g. .isTrue()). As written, the test will pass even if checkJobRunningStatus() returns false, reducing coverage for the Kafka-source path.
    String insertQuery = sqlStatements[sqlStatements.length - 2].trim();

Copilot AI review requested due to automatic review settings July 3, 2026 04:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Align finishTask() with GlutenSourceFunction which treats unknown
states as normal termination (return) rather than failing. Log a
warning and return, avoiding pipeline failures if velox4j adds
new non-error states.
CI environment may not roll .inprogress files to part-* within the
test window due to 1min rolling policy. Change filter from
startsWith(part-) to exclude marker files (startsWith(_)),
so both final and in-progress ORC data files are verified.
Copilot AI review requested due to automatic review settings July 3, 2026 06:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Comment on lines +226 to +247
// Clean the ORC output directory before running q10_orc to ensure deterministic verification.
if ("q10_orc.sql".equals(queryFileName)) {
Path orcOutputDir = Paths.get("/tmp/data/output/bid_orc");
if (Files.exists(orcOutputDir)) {
try {
try (java.util.stream.Stream<Path> files = Files.walk(orcOutputDir)) {
files
.sorted(java.util.Comparator.reverseOrder())
.forEach(
p -> {
try {
Files.deleteIfExists(p);
} catch (IOException e) {
throw new RuntimeException("Failed to delete " + p, e);
}
});
}
} catch (IOException e) {
throw new RuntimeException("Failed to clean ORC output directory", e);
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants