Apache Beam Example Code

An example Apache Beam project.

Description

This example can be used with conference talks and self-study. The base of the examples are taken from Beam's example directory. They are modified to use Beam as a dependency in the pom.xml instead of being compiled together. The example code is changed to output to local directories.

How to clone and run

Open a terminal window.
Run git clone git@github.com:eljefe6a/beamexample.git
Run cd beamexample/BeamTutorial
Run mvn compile
Create local output directory: mkdir output
Run mvn exec:java -Dexec.mainClass="org.apache.beam.examples.tutorial.game.solution.Exercise1"
Run cat output/user_score to verify the program ran correctly and the output file was created.

Using a Java IDE

Follow the IDE Setup instructions on the Apache Beam Contribution Guide.

Other Runners

Apache Flink

Follow the first steps from Flink's Quickstart to download Flink.
Create the output directory.
To run on a JVM-local cluster: mvn exec:java -Dexec.mainClass=org.apache.beam.examples.tutorial.game.solution.Exercise1 -Dexec.args='--runner=FlinkRunner --flinkMaster=[local]'
To run on an out-of-process local cluster (note that the steps below should also work on a real cluster if you have one running):
1. Start a local Flink cluster.
2. Navigate to the WebUI (typically http://localhost:8081), click JobManager, and note the value of jobmanager.rpc.port. The default is probably 6123.
3. Run mvn package to generate a JAR file. Note the location of the generated JAR (probably target/Tutorial-0.0.1-SNAPSHOT.jar)
4. Run mvn -X -e exec:java -Dexec.mainClass=org.apache.beam.examples.tutorial.game.solution.Exercise1 -Dexec.args='--runner=FlinkRunner --flinkMaster=localhost:6123 --filesToStage=target/Tutorial-0.0.1-SNAPSHOT.jar', replacing the defaults for port and JAR file if they differ.
5. Check in the WebUI to see the job listed.
Run cat output/user_score to verify the pipeline ran correctly and the output file was created.

Apache Spark

Create the output directory.
Allow all users (Spark may run as a different user) to write to the output directory. chmod 1777 output.
Change the output file to a fully-qualified path. For example, this("output/user_score"); to this("/home/vmuser/output/user_score");
Run mvn package
Run spark-submit --jars ~/.m2/repository/org/apache/beam/beam-runners-spark/0.3.0-incubating-SNAPSHOT/beam-runners-spark-0.3.0-incubating-SNAPSHOT.jar --class org.apache.beam.examples.tutorial.game.solution.Exercise2 --master yarn-client target/Tutorial-0.0.1-SNAPSHOT.jar --runner=SparkRunner

Google Cloud Dataflow

Follow the steps in either of the Java quickstarts for Cloud Dataflow to initialize your Google Cloud setup.
Create a bucket on Google Cloud Storage for staging and output.
Run mvn -X exec:java -Dexec.mainClass="org.apache.beam.examples.tutorial.game.solution.Exercise1" -Dexec.args='--runner=DataflowRunner --project=<YOUR-GOOGLE-CLOUD-PROJECT> --gcpTempLocation=gs://<YOUR-BUCKET-NAME> --outputPrefix=gs://<YOUR-BUCKET-NAME>/output/', after replacing <YOUR-GCP-PROJECT> and <YOUR-BUCKET-NAME> with the appropriate values.
Check the Cloud Dataflow Console to see the job running.
Check the output bucket to see the generated output: https://console.cloud.google.com/storage/browser/<YOUR-BUCKET-NAME>/

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
BeamTutorial		BeamTutorial
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Beam Example Code

Description

How to clone and run

Using a Java IDE

Other Runners

Apache Flink

Apache Spark

Google Cloud Dataflow

Further Reading

About

Releases

Packages

Languages

BrentDorsey/beamexample

Folders and files

Latest commit

History

Repository files navigation

Apache Beam Example Code

Description

How to clone and run

Using a Java IDE

Other Runners

Apache Flink

Apache Spark

Google Cloud Dataflow

Further Reading

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages