Spark Streaming pipeline template (Consume and produce)

This is a simple project template about Spark Streaming Pipeline application.

Functionality

It will be very simple:

I expect a simple string from Kafka as follows: 000120190501. Where the format is:
- NNNN: Information person code
- YYYYMMDD: The date when it was saved
There is a simple parse logic to convert the simple raw string in a understandable DataSet of Info.
After that, I want to retrieve the full name from Kudu table by Info parsed object
Once I have the full name, I want to build specific avro object in order to produce self.
Finally, I want to produce this avro object.

The most important thing. I want to test the streaming!!

Actually, I want to put the focus in a entire test of streaming.

But there are two kinds of tests:

Narrow (Mock way in order to have a integrate test with kafka, kudu, Confluent and Spark)
Unit (pending, I'm sorry)

By default, unit test tag is active. The profile is default-test.

To launch an integration test you will need specified the next profile narrow-test

mvn clean test -Pnarrow-test

Or you can to run the class com.adelpozo.streaming.NarrowITTest in your favourite IDE.

Info needed before launch Narrow Test.

At the moment, I'm using docker-testkit to have docker with the next containers; kudu and kafka which will be released from ScalaTest cycle.

It is highly recommended to download the following images before launching the test. Since depending on the network, it could give timeout when trying to pull in the test.

Kafka Image used: spotify/kafka Kudu Image used: usuresearch/kudu-docker-slim

The scala classes related in order to run the containers are:

com.adelpozo.streaming.utils.docker.DockerKuduService
com.adelpozo.streaming.utils.docker.DockerKafkaService

I think so the test it is very simple to understand and if you want to explore, it is very simple to tunning.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assembly		assembly
streaming		streaming
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Streaming pipeline template (Consume and produce)

Functionality

The most important thing. I want to test the streaming!!

Info needed before launch Narrow Test.

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spark Streaming pipeline template (Consume and produce)

Functionality

The most important thing. I want to test the streaming!!

Info needed before launch Narrow Test.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages