Skip to content

build: add examples module on a multi-module Maven build#32

Merged
andygrove merged 2 commits into
apache:mainfrom
andygrove:feat/examples-module
May 13, 2026
Merged

build: add examples module on a multi-module Maven build#32
andygrove merged 2 commits into
apache:mainfrom
andygrove:feat/examples-module

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

  • Closes #.

Rationale for this change

The repository lacks runnable end-to-end examples. Code snippets in the docs are easy to drift out of sync with the API: there is no build step that fails when a public method is renamed or removed.

Adding an examples/ Maven module that depends on the library lets the reactor compile every example on each build, so they cannot fall behind the API. Doing this requires the repo to be a multi-module Maven project; while we're there, the parent POM gets shared dependencyManagement and plugin versions so child modules stay terse.

What changes are included in this PR?

Multi-module restructure:

  • Root pom.xml becomes the parent (datafusion-java-parent, packaging=pom) with dependencyManagement for arrow, protobuf, junit, and the library itself.
  • core/ is a new directory holding the existing library (datafusion-java). src/ moves to core/src/.
  • examples/ is a new module (datafusion-java-examples) depending on the library via ${project.version}. It wires exec-maven-plugin so each example launches with the right java.library.path and --add-opens flags.
  • native/, proto/, Makefile, and mvnw stay at the repo root unchanged.
  • Surefire's java.library.path now uses ${maven.multiModuleProjectDirectory} so it resolves under the reactor regardless of which module Maven is invoked from.
  • apache-rat-plugin runs only at the root (<inherited>false</inherited>); the rat exclude list is unchanged.

Three runnable examples under examples/src/main/java/org/apache/datafusion/examples/:

  • SqlQueryExampleregisterCsv + a SQL GROUP BY aggregation.
  • DataFrameExamplereadCsvfilter / select / withColumnRenamed / distinctwriteParquet(singleFileOutput)readParquet round-trip.
  • ProtoPlanExample — build a LogicalPlanNode directly via the generated protobuf classes and execute it through SessionContext.fromProto.

Each example creates its own throwaway data in a temp dir and cleans up, so no external fixtures (TPC-H, etc.) are required.

Docs: docs/source/contributor-guide/development.md is updated with the new repo layout and a "Running an example" section that documents the ./mvnw install -DskipTests + exec:exec flow.

Are these changes tested?

  • The full JVM test suite (./mvnw test) still passes against the relocated core/src/test/ sources — 61 tests run, 0 failures (12 skipped, same skip pattern as main when TPC-H data is absent).
  • Each example was executed end-to-end via ./mvnw -pl :datafusion-java-examples exec:exec and produces expected output:
    • SqlQueryExample prints HIGH 3 215 / MEDIUM 1 60 / LOW 1 25.
    • DataFrameExample prints a 3-row deduped table and Round-tripped row count: 3.
    • ProtoPlanExample prints 42 7.
  • spotless:check and the reactor build are clean across all three modules.

Are there any user-facing changes?

  • The published library artifact (org.apache.datafusion:datafusion-java) is unchanged — same groupId, artifactId, version, and package contents.
  • The repo layout changes: source paths move from src/main/java/... to core/src/main/java/.... IDE projects pointing at the old location need to re-import.
  • New datafusion-java-examples artifact exists but is marked maven.install.skip=true / maven.deploy.skip=true and is not intended for distribution.

andygrove added 2 commits May 13, 2026 15:45
Restructure the repository into a multi-module Maven build so a new
examples/ module can depend on the library and stay in sync with the
API automatically.

* pom.xml becomes the parent (datafusion-java-parent, packaging=pom)
  with dependencyManagement for shared versions.
* core/ holds the existing library; src/ moves to core/src/.
* examples/ is a new module (datafusion-java-examples) depending on
  datafusion-java via ${project.version}. It wires exec-maven-plugin
  so each example can be launched with the right java.library.path
  and --add-opens flags.
* native/, proto/, Makefile, and mvnw stay at the repo root.
* Surefire's java.library.path uses ${maven.multiModuleProjectDirectory}
  so the path resolves correctly under the reactor.

Three runnable examples:

* SqlQueryExample: registerCsv + SQL aggregation.
* DataFrameExample: readCsv -> filter / select / rename / distinct ->
  writeParquet -> readParquet round-trip.
* ProtoPlanExample: build a LogicalPlanNode via the generated protobuf
  classes and execute it through SessionContext.fromProto.
@andygrove andygrove merged commit 60d5342 into apache:main May 13, 2026
2 checks passed
@andygrove andygrove deleted the feat/examples-module branch May 13, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant