Skip to content

andlaurentino/kafka-spark-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kafka + Spark examples

This repository has 3 examples of how to produce and consume data from Kafka topics based on string, json, avro and protobuf.


Requirements


Prerequisites

  1. Compile Protobuff entities
$ sbt compile
  1. Run Kafka locally
$ docker-compose -f docker/docker-compose-kafka.dev.yml up

Executing different encoding formats

1. String

This example consists of simulating that the content encoded in the topic is a simple text and that, as a result, it will be shown on the consumer's screen.

To execute, you must execute the producer and in another terminal the consumer

  • Execute producer
$ make string-producer
  • Execute consumer
$ make string-consumer

2. Json

This example consists of simulating that the content encoded in the topic is a json, the consumer must serialize the data as a dataframe in Spark.

The schema used represents an Ecommerce product, follows the schema:

  • id: string
  • name: string
  • price: double

To execute, you must execute the producer and in another terminal the consumer

  • Execute producer
$ make json-producer
  • Execute consumer
$ make json-consumer

3. Avro

This example consists of simulating that the content encoded in the topic is a avro, the consumer must serialize the data as a dataframe in Spark.

The schema used represents an Ecommerce product, follows the schema:

  • id: string
  • name: string
  • price: double

The avro file that determines the schema is located at: ./src/main/resources/product.avsc

To execute, you must execute the producer and in another terminal the consumer

  • Execute producer
$ make avro-producer
  • Execute consumer
$ make avro-consumer

4. Protobuf

This example consists of simulating that the content encoded in the topic is a protobuf, the consumer must serialize the data as a dataframe in Spark. To accomplish that was used the scalaPB library for gRPC, for more insights checkout documentation for SparkSQL

The schema used represents an Ecommerce product, follows the schema:

  • id: string
  • name: string
  • price: double

The protobu can be found at: ./src/main/protobuf/product.proto

To execute, you must execute the producer and in another terminal the consumer

  • Execute producer
$ make proto-producer
  • Execute consumer
$ make proto-consumer

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published