build: add examples module on a multi-module Maven build#32
Merged
Conversation
Restructure the repository into a multi-module Maven build so a new
examples/ module can depend on the library and stay in sync with the
API automatically.
* pom.xml becomes the parent (datafusion-java-parent, packaging=pom)
with dependencyManagement for shared versions.
* core/ holds the existing library; src/ moves to core/src/.
* examples/ is a new module (datafusion-java-examples) depending on
datafusion-java via ${project.version}. It wires exec-maven-plugin
so each example can be launched with the right java.library.path
and --add-opens flags.
* native/, proto/, Makefile, and mvnw stay at the repo root.
* Surefire's java.library.path uses ${maven.multiModuleProjectDirectory}
so the path resolves correctly under the reactor.
Three runnable examples:
* SqlQueryExample: registerCsv + SQL aggregation.
* DataFrameExample: readCsv -> filter / select / rename / distinct ->
writeParquet -> readParquet round-trip.
* ProtoPlanExample: build a LogicalPlanNode via the generated protobuf
classes and execute it through SessionContext.fromProto.
# Conflicts: # pom.xml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
The repository lacks runnable end-to-end examples. Code snippets in the docs are easy to drift out of sync with the API: there is no build step that fails when a public method is renamed or removed.
Adding an
examples/Maven module that depends on the library lets the reactor compile every example on each build, so they cannot fall behind the API. Doing this requires the repo to be a multi-module Maven project; while we're there, the parent POM gets shareddependencyManagementand plugin versions so child modules stay terse.What changes are included in this PR?
Multi-module restructure:
pom.xmlbecomes the parent (datafusion-java-parent,packaging=pom) withdependencyManagementfor arrow, protobuf, junit, and the library itself.core/is a new directory holding the existing library (datafusion-java).src/moves tocore/src/.examples/is a new module (datafusion-java-examples) depending on the library via${project.version}. It wiresexec-maven-pluginso each example launches with the rightjava.library.pathand--add-opensflags.native/,proto/,Makefile, andmvnwstay at the repo root unchanged.java.library.pathnow uses${maven.multiModuleProjectDirectory}so it resolves under the reactor regardless of which module Maven is invoked from.apache-rat-pluginruns only at the root (<inherited>false</inherited>); the rat exclude list is unchanged.Three runnable examples under
examples/src/main/java/org/apache/datafusion/examples/:SqlQueryExample—registerCsv+ a SQLGROUP BYaggregation.DataFrameExample—readCsv→filter/select/withColumnRenamed/distinct→writeParquet(singleFileOutput)→readParquetround-trip.ProtoPlanExample— build aLogicalPlanNodedirectly via the generated protobuf classes and execute it throughSessionContext.fromProto.Each example creates its own throwaway data in a temp dir and cleans up, so no external fixtures (TPC-H, etc.) are required.
Docs:
docs/source/contributor-guide/development.mdis updated with the new repo layout and a "Running an example" section that documents the./mvnw install -DskipTests+exec:execflow.Are these changes tested?
./mvnw test) still passes against the relocatedcore/src/test/sources — 61 tests run, 0 failures (12 skipped, same skip pattern asmainwhen TPC-H data is absent)../mvnw -pl :datafusion-java-examples exec:execand produces expected output:SqlQueryExampleprintsHIGH 3 215 / MEDIUM 1 60 / LOW 1 25.DataFrameExampleprints a 3-row deduped table andRound-tripped row count: 3.ProtoPlanExampleprints42 7.spotless:checkand the reactor build are clean across all three modules.Are there any user-facing changes?
org.apache.datafusion:datafusion-java) is unchanged — samegroupId,artifactId,version, and package contents.src/main/java/...tocore/src/main/java/.... IDE projects pointing at the old location need to re-import.datafusion-java-examplesartifact exists but is markedmaven.install.skip=true/maven.deploy.skip=trueand is not intended for distribution.