Skip to content
This repository has been archived by the owner on Dec 21, 2022. It is now read-only.

GoogleCloudPlatform/dataproc-pubsub-spark-streaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

In this tutorial you learn how to deploy an Apache Spark streaming application on Cloud Dataproc and process messages from Cloud Pub/Sub in near real-time. The system you build in this scenario generates thousands of random tweets, identifies trending hashtags over a sliding window, saves results in Cloud Datastore, and displays the results on a web page.

Please refer to the related article for all the steps to follow in this tutorial:

https://cloud.google.com/solutions/using-apache-spark-dstreams-with-dataproc-and-pubsub

Contents of this repository:

  • http_function: Javascript code for the HTTP function deployed on Cloud Functions.
  • spark: Scala code for the Apache Spark streaming application.
  • tweet-generator: Python code for the randomized tweet generator.

Running the tests

To run the tests:

cd spark
mvn test

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published