Skip to content

Combine Pub/Sub and BQ in an example Dataflow pipeline with Scio

Notifications You must be signed in to change notification settings

bartcode/dataflow-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataflow example

Example project which combines messages from Pub/Sub and data from BigQuery. There is one message queue: sensor. This is a stream of numbers which - for whatever reason - needs to be summed over a window of 5 seconds. The sum is checked against a lower and upper bound as described in a BigQuery table (see ./data/numbers.csv). The output will be a record in a BQ table, which looks as follows:

update_time number_sum number_count number_type version
1234 49 5 tens run-1
1235 490 100 hundreds run-1

The flow schema is as follows:

Architectural overview

Set up

A few things need to be set up on the Google Cloud Platform:

  • One topic in Cloud Pub/Sub:
    • sensor
  • One subscription in Cloud Pub/Sub:
    • sensor
  • Import ./data/numbers.csv into a BigQuery table called numbers.
  • Update application.conf with your configuration.

Running the code

The pipeline is deployed by running:

$ sbt "runMain example.SourceMerge"

Messages can be sent with the client, which is also included in this repository. See its README.md.

References

This project makes use of Protobuf in combination with Scio/Dataflow:

About

Combine Pub/Sub and BQ in an example Dataflow pipeline with Scio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published