Skip to content

apache-beam-sandbox/pipeline-samples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pipeline-samples

These are simple exercises where there are pipelines and functions defined, that explore the usage of Apache Beam

Run on local

mvn package
java -jar target/pipelines-samples-0.1-shaded.jar

Run on local Flink cluster

mvn package -Pflink-runner
cd flink/flink-1.11.0
./bin/flink run /Users/user}/{somePath}/pipeline-samples/target/pipelines-samples-0.1-shaded.jar --runner=FlinkRunner

Run on Docker Flink cluster

Package the jar file as a fat.jar - dependencies included - using the shade plugin

mvn package -Pflink-runner

This will create a jar file in the /target/pipelines-samples-0.1-shaded.jar directory

From a windows Power shell command, start flink with docker compose, with the following commands:

set COMPOSE_CONVERT_WINDOWS_PATH=1
docker-compose up -d

Then bring up the Flink UI, I configured it for port 8888 localhost:8888

flink-ui-snapshot

upload the pipelines-samples-0.1-shaded.jar file, add the program argument :

--runner=FlinkRunner

flink-ui-upload-snapshot

You should be able to run your job, and see the results: screenshot-running-20

Run on Google Dataflow

export GOOGLE_APPLICATION_CREDENTIALS="/Users/{user}/{somePath}/XXX_credentials.json"
gcloud auth application-default login
mvn package -Pdataflow-runner
java -jar target/pipelines-samples-0.1-shaded.jar --runner=DataflowRunner --project=deloitte-beam-284202 --tempLocation=gs://deloitte-beam-sandbox/temp/ --region=us-west1

Run on Amazon EMR (Flink)

mvn package -Pflink-runner
scp -i ~/.ssh/keypair.pem ./target/pipeline-samples-0.1-shaded.jar ec2-user@ec2-xxx-xxx-xxx:/home/hadoop

Releases

No releases published

Packages

No packages published

Languages