Skip to content

[FLINK] Support Apache Pulsar as DataSource#12297

Merged
zhanglistar merged 17 commits into
apache:mainfrom
zhanglistar:fix-flink-nexmark-window-aggregate
Jun 29, 2026
Merged

[FLINK] Support Apache Pulsar as DataSource#12297
zhanglistar merged 17 commits into
apache:mainfrom
zhanglistar:fix-flink-nexmark-window-aggregate

Conversation

@zhanglistar

@zhanglistar zhanglistar commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Add Gluten Flink planner/runtime support for Pulsar sources backed by Velox. Solves #12310 .
Depends on bigo-sg/velox#42 and bigo-sg/velox4j#35.

This patch:

  • Adds a PulsarSourceSinkFactory that maps Flink Pulsar table/source options to Velox Pulsar connector parameters.
  • Registers connector-pulsar in Gluten Flink connector config and service discovery.
  • Passes the planner classloader into runtime source/sink factory discovery so planner-side factories can be loaded reliably.
  • Makes blocked Velox source tasks wait instead of spinning.
  • Adds unit tests for Pulsar factory discovery, connector config registration, option mapping, and wrapped Pulsar source detection.

Test Plan

  • mvn -pl ut -am -Dtest=PulsarSourceSinkFactoryTest -DfailIfNoTests=false test
  • Manually verified a Flink SQL Pulsar source job with parallelism 2 against local Pulsar standalone:
    • job remained RUNNING
    • two Pulsar CPP consumers were connected under a Shared subscription
    • both Flink source subtasks consumed records
    • Pulsar backlog reached 0 for the test subscription

AI Tooling Disclosure

Cowork with codex.

Copilot AI review requested due to automatic review settings June 15, 2026 04:02
@github-actions github-actions Bot added the FLINK label Jun 15, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@zhanglistar zhanglistar changed the title Fix Flink Nexmark window aggregate tests [FLINK] Fix Flink Nexmark window aggregate tests Jun 15, 2026
@zhanglistar zhanglistar changed the title [FLINK] Fix Flink Nexmark window aggregate tests [FLINK] Support Apache Pulsar connector Jun 17, 2026
@zhanglistar zhanglistar changed the title [FLINK] Support Apache Pulsar connector [FLINK] Support Apache Pulsar as DataSource Jun 17, 2026
@zhanglistar zhanglistar force-pushed the fix-flink-nexmark-window-aggregate branch from 7814777 to 13a947e Compare June 17, 2026 03:32
Copilot AI review requested due to automatic review settings June 17, 2026 03:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@github-actions github-actions Bot added the INFRA label Jun 17, 2026
Copilot AI review requested due to automatic review settings June 17, 2026 09:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@zhanglistar

Copy link
Copy Markdown
Contributor Author

@KevinyhZou @liujiayi771 @lgbo-ustc Pls review.

@zhanglistar zhanglistar force-pushed the fix-flink-nexmark-window-aggregate branch 2 times, most recently from 1c1b8b5 to 284a079 Compare June 23, 2026 03:58
Copilot AI review requested due to automatic review settings June 23, 2026 03:58

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

processAvailableElement(sourceContext);
break;
case BLOCKED:
task.waitFor();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain the reason for adding this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was added in the Pulsar source support commit.

Previously the BLOCKED case only logged "Get empty row" and continued the loop, which was fine for Nexmark since it generates data in batches and rarely returns BLOCKED. However, for real streaming sources like Pulsar/Kafka, when there's no upstream data available, advance() returns BLOCKED. Without waitFor(), the run loop would spin indefinitely calling advance() in a tight loop and pin the CPU.

waitFor() blocks the thread until the native side has data available or the task is closed. This is the standard UpIterator usage pattern in velox4j.

Regarding cancellation: Flink calls cancel() (sets isRunning = false) followed by close() (calls task.close()). The native task close will complete the blocking future inside waitFor(), which unblocks the thread. The run loop then checks isRunning and exits.

Copilot AI review requested due to automatic review settings June 25, 2026 02:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@zhanglistar zhanglistar force-pushed the fix-flink-nexmark-window-aggregate branch 2 times, most recently from 7303242 to 8683a14 Compare June 25, 2026 08:52
Copilot AI review requested due to automatic review settings June 25, 2026 11:51

@KevinyhZou KevinyhZou left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments

return transform;
}

private int getWindowPropertyIndex(Class<? extends WindowProperty> propertyClass) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems useless code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

LOG.warn("Failed to load Velox source/sink factory", e);
}
for (String factoryClassName : FACTORY_CLASS_NAMES) {
Optional<VeloxSourceSinkFactory> factory = loadFactory(factoryClassName, parameters);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems by implemeting this, we can remove the spi org.apache.gluten.velox/.VeloxSourceSinkFactory

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, The FACTORY_CLASS_NAMES list and the SPI file register exactly the same 7 factories. getFactory() first attempts ServiceLoader (SPI), then falls back to manual class loading from FACTORY_CLASS_NAMES. This duplication is intentional — runtime modules cannot depend on planner module classes at compile time (there is a risk of circular dependencies), and the manual loadFactory explicitly tries three different class loaders (from the parameter, the thread context, and the interface's own loader), which is more robust than ServiceLoader's single-loader approach in Flink's class loader hierarchy.

If this is to be cleaned up in the future, the safer direction is to keep the FACTORY_CLASS_NAMES manual loading and remove the SPI file, because the manual path handles class loaders more robustly. However, this refactoring is outside the scope of this PR and should be verified in a real Flink deployment. For now, keep it as-is.

Copilot AI review requested due to automatic review settings June 29, 2026 02:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@KevinyhZou KevinyhZou left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhanglistar zhanglistar force-pushed the fix-flink-nexmark-window-aggregate branch from 4b2ba97 to 20ec499 Compare June 29, 2026 03:00
Copilot AI review requested due to automatic review settings June 29, 2026 04:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

zhanglistar and others added 17 commits June 29, 2026 14:13
- Simplify isPulsarSource to use class simple name check (like Kafka)
- Replace hand-rolled reflection with ReflectUtils.getObjectField
- Adapt NexmarkSourceFactory to new NexmarkGeneratorConfig API
- Adapt PrintSinkFactory to new PrintTableHandle constructor (isStdErr)
- Update CI velox4j pin to gluten-0530 latest commit
…tArgs

The NexmarkTest was failing with InaccessibleObjectException because
surefire forks a new JVM that reads argLine (not JAVA_TOOL_OPTIONS).
Adding --add-opens=java.base/jdk.internal.reflect=ALL-UNNAMED to the
POM's extraJavaTestArgs ensures the forked JVM always has the required
opens regardless of environment variables.
@zhanglistar zhanglistar force-pushed the fix-flink-nexmark-window-aggregate branch from fea7f8c to 9a326aa Compare June 29, 2026 06:13
@zhanglistar zhanglistar merged commit 22cefd5 into apache:main Jun 29, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants