Example from https://github.com/apache/beam/blob/master/sdks/go/examples/wordcount/wordcount.go
- PowerShell (cross-platform)
- Git
- Chocolatey (Windows Package Manager)
go version
Review the installed files at C:\Program Files\Go.
go mod init github.com/denisecase/beam-pagerank-go
Review go.mod.
When building, read error messages. If a package is needed, a command to add will be provided. Run the necessary commands.
The command go get
will download the package to the local module cache,
and add it to the go.mod file. The -u
flag indicates update.
go get -u github.com/apache/beam/sdks/v2/go/pkg/beam
go get -u github.com/apache/beam/sdks/v2/go/pkg/beam/transforms/create
go get -u github.com/apache/beam/sdks/v2/go/pkg/beam/sideInput
go get github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs@v2.37.0
This allows us to use beam packages. Review go.mod.
go build pagerank.go
Review go.sum. Read any error messages and run recommended commands as needed. Verify the new .exe executable file is created.
.\pagerank.exe --input <PATH_TO_INPUT_FILE> --output out.csv
Examples:
.\pagerank.exe --output denise.csv
.\pagerank.exe
Review the local dependencies at C:\Users<username>\AppData\Local\go-build.
- go get - updates dependencies/versions listed in go.mod and updates local cache
- go install - used to build and install the provided source file in
$GOPATH$ - go build - compiles and builds executable locally
- go fmt - format go code
- go mod tidy - keep things updated
From https://beam.apache.org/documentation/programming-guide/
A typical Beam driver program works as follows:
-
Create a Pipeline object and set the pipeline execution options, including the Pipeline Runner.
-
Create an initial PCollection for pipeline data, either using the IOs to read data from an external storage system, or using a Create transform to build a PCollection from in-memory data.
-
Apply PTransforms to each PCollection. Transforms can change, filter, group, analyze, or otherwise process the elements in a PCollection. A transform creates a new output PCollection without modifying the input collection. A typical pipeline applies subsequent transforms to each new output PCollection in turn until processing is complete. However, note that a pipeline does not have to be a single straight line of transforms applied one after another: think of PCollections as variables and PTransforms as functions applied to these variables: the shape of the pipeline can be an arbitrarily complex processing graph.
-
Use IOs to write the final, transformed PCollection(s) to an external source.
-
Run the pipeline using the designated Pipeline Runner.
A ParDo transform considers each element in the input PCollection, performs some processing function (your user code) on that element, and emits zero or more elements to an output PCollection.
The emit func(...) is useful when the number of output elements differ to the number of input elements. If its a 1:1 mapping a return makes the function easier to read.
From https://blog.gopheracademy.com/advent-2018/apache-beam/
- Create an empty repo (same folder name) in GitHub
- In your local folder,
git init
git remote add origin https://github.com/denisecase/beam-pagerank-go.git
- git add / commit / push to repo