Skip to content

TayPark/spark-kinesis-integration

Repository files navigation

spark-kinesis-integration

PySpark 3 + Kinesis integration

Overview

Kinesis is a fully managed message broker service on AWS. This repo contains consumer and producer implemented with python3.

Producer is a data source which push messages to message broker. Consumer pulls messages from message broker for their usage.

In this example, producer generates JSON string messages as much as variable NUM_INSERT every seconds and push to Kinesis server, whereas consumer pulls it. Also, consumer implemented with Apache Spark which supports Spark Structued Streaming. Spark Structued Streaming can process data very quickly with data type name DataFrame, a table-like data, and can easily get lookup aggregation on every stream intervals. As a result, when consumer got data, parse it, process it, then show it as table-like.

Kinesis Producer

Configure producer with boto3. AWS credentials MUST BE PLACED IN ~/.aws/config or configure with AWS CLI.

Kinesis Consumer

An Apache Spark Structured Streaming application. Configure AWS credentials with config.ini file and configparser package.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages