beam-wordcount-go

Example from https://github.com/apache/beam/blob/master/sdks/go/examples/wordcount/wordcount.go

Windows Tools

PowerShell (cross-platform)
Git
Chocolatey (Windows Package Manager)

Install Go and Verify

go version

Review the installed files at C:\Program Files\Go.

Initialize Project Module

go mod init github.com/denisecase/beam-pagerank-go

Review go.mod.

Get Beam Go SDK

When building, read error messages. If a package is needed, a command to add will be provided. Run the necessary commands.

The command go get will download the package to the local module cache, and add it to the go.mod file. The -u flag indicates update.

go get -u github.com/apache/beam/sdks/v2/go/pkg/beam
go get -u github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/create
go get -u github.com/apache/beam/sdks/v2/go/pkg/beam/sideInput
go get github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs@v2.37.0

This allows us to use beam packages. Review go.mod.

Build to Executable

go build pagerank.go

Review go.sum. Read any error messages and run recommended commands as needed. Verify the new .exe executable file is created.

Run Executable

.\pagerank.exe --input <PATH_TO_INPUT_FILE> --output out.csv

Examples:

.\pagerank.exe --output denise.csv
.\pagerank.exe

Review the local dependencies at C:\Users<username>\AppData\Local\go-build.

About Go

go get - updates dependencies/versions listed in go.mod and updates local cache
go install - used to build and install the provided source file in $GOPATH$
go build - compiles and builds executable locally
go fmt - format go code
go mod tidy - keep things updated

About Beam Pipelines

From https://beam.apache.org/documentation/programming-guide/

A typical Beam driver program works as follows:

Create a Pipeline object and set the pipeline execution options, including the Pipeline Runner.
Create an initial PCollection for pipeline data, either using the IOs to read data from an external storage system, or using a Create transform to build a PCollection from in-memory data.
Apply PTransforms to each PCollection. Transforms can change, filter, group, analyze, or otherwise process the elements in a PCollection. A transform creates a new output PCollection without modifying the input collection. A typical pipeline applies subsequent transforms to each new output PCollection in turn until processing is complete. However, note that a pipeline does not have to be a single straight line of transforms applied one after another: think of PCollections as variables and PTransforms as functions applied to these variables: the shape of the pipeline can be an arbitrarily complex processing graph.
Use IOs to write the final, transformed PCollection(s) to an external source.
Run the pipeline using the designated Pipeline Runner.

Mapping Done by ParDo

A ParDo transform considers each element in the input PCollection, performs some processing function (your user code) on that element, and emits zero or more elements to an output PCollection.

The emit func(...) is useful when the number of output elements differ to the number of input elements. If its a 1:1 mapping a return makes the function easier to read.

From https://blog.gopheracademy.com/advent-2018/apache-beam/

Storing in GitHub

Create an empty repo (same folder name) in GitHub
In your local folder, git init
git remote add origin https://github.com/denisecase/beam-pagerank-go.git
git add / commit / push to repo

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
web04		web04
README.md		README.md
beam-pagerank-go.exe		beam-pagerank-go.exe
go.mod		go.mod
go.sum		go.sum
out.csv		out.csv
out1map.csv		out1map.csv
out1reduce.csv		out1reduce.csv
out2map.csv		out2map.csv
out2reduce01.csv		out2reduce01.csv
out2reduce02.csv		out2reduce02.csv
out2reduce03.csv		out2reduce03.csv
out2reduce04.csv		out2reduce04.csv
out2reduce05.csv		out2reduce05.csv
out2reduce06.csv		out2reduce06.csv
out2reduce07.csv		out2reduce07.csv
out2reduce08.csv		out2reduce08.csv
out2reduce09.csv		out2reduce09.csv
out2reduce10.csv		out2reduce10.csv
pagerank.exe		pagerank.exe
pagerank.go		pagerank.go

denisecase/beam-pagerank-go

Folders and files

Latest commit

History

Repository files navigation

beam-wordcount-go

Read More

Windows Tools

Install Go and Verify

Initialize Project Module

Get Beam Go SDK

Build to Executable

Run Executable

About Go

About Beam Pipelines

Mapping Done by ParDo

Storing in GitHub

About

Resources

Stars

Watchers

Forks

Languages