PySpark 3 + Kinesis integration
Kinesis is a fully managed message broker service on AWS. This repo contains consumer and producer implemented with python3.
Producer is a data source which push messages to message broker. Consumer pulls messages from message broker for their usage.
In this example, producer generates JSON string messages as much as variable NUM_INSERT
every seconds and push to Kinesis server, whereas consumer pulls it. Also, consumer implemented with Apache Spark which supports Spark Structued Streaming. Spark Structued Streaming can process data very quickly with data type name DataFrame, a table-like data, and can easily get lookup aggregation on every stream intervals. As a result, when consumer got data, parse it, process it, then show it as table-like.
Configure producer with boto3. AWS credentials MUST BE PLACED IN ~/.aws/config
or configure with AWS CLI
.
An Apache Spark Structured Streaming application. Configure AWS credentials with config.ini
file and configparser package.