The banzai library is a transform-process-transform pipeline, that allows to implement complex data processing task in a well-structured and configurable approach.
The library is named after Banzai Pipelines, a surf reef break located in Hawaii known for its spectacular, huge waves that break in shallow water, forming large, hollow, thick curls of water that surfers can tube ride.
The benzai library is implemented as a highly configurable pipeline of processors. The pipeline can easily be extended with additional processors:
The pipeline starts with a transformer which converts the input data into the data format consumed by the processors, and ends with a tail transformer which converts the processor format into the required output format.
A transformer does an operation of type f: A->B
: It transforms an input object of one type into
another type. This can be a java type conversions, a deserialization, or a serialization.
The head transformer converts the input data of type I into the processing data type P. The tail transformer converts to processing type P into the output type P.
For pipelines that do not need this conversion (I == P == O) the library provides the
PassThroughTransformer
. It simply passes input to output.
Simple transformer converting a BibJson string into a java pojo. Full example
@SuperBuilder
public class BibJsonToBibliographyTransformer extends AbstractTransformer<String, Bibliography> {
@Override
public Bibliography processImpl(String correlationId, String input) {
Gson gson = new Gson();
return gson.fromJson(input, Bibliography.class);
}
}
A processor does an operation of type f: P->P
. Implement a concrete processor by extending
from AbstractSingleItemProcessor<T>
with your pipeline processing type.
For Type P == String:
public class StringCapitalizer extends AbstractSingleItemProcessor<String>
and
override public String processImpl(@NonNull final String correlationId, @NonNull final T input);
public String processImpl(@NonNull final String correlationId, @NonNull final String input) {
return input.toUpperCase()
}
Simple processor capitalizing all author names in a Bibliography
object. Full example
@SuperBuilder
public class BibliographyCapitalizeProcessor extends AbstractSingleItemProcessor<Bibliography> {
@Override
public Bibliography processImpl(@NonNull String correlationId,
@NonNull Bibliography bibliography) {
bibliography.setAuthor(bibliography.getAuthor().stream().map(author->new Author(author.name.toUpperCase(
Locale.ROOT))).collect(Collectors.toList()));
return bibliography;
}
}
The pipeline is the frame connecting all component (transformers, processors). It provides a pipeline builder as convenience helper to assemble the pipeline in a few lines of code:
PipeLine<InputObject, ProcessObject, OutputObject> pipeLine = PipeLine.builder(
HandlerConfiguration.builder().build(), InputObject.class, ProcessObject.class, OutputObject.class)
.addHeadTransformer(InputToProcessTransformer.builder().build())
.addStep(ProcessObjectProcessor.builder().build())
.addTailTransformer(ProcessToOutputTransformer.builder().build())
.build();
InputObject inputObject = ...;
OutputObject outputObjetc = pipeline.process(inputObject);
Bibliography pipeline connecting BibJsonToBibliographyTransformer
, BibliographyCapitalizeProcessor
, and BibliographyToBibJsonTransformer
.
Full example
PipeLine<String, Bibliography, String> pipeLine = PipeLine.builder(
HandlerConfiguration.builder().build(), String.class, Bibliography.class, String.class)
.addHeadTransformer(BibJsonToBibliographyTransformer.builder().build())
.addStep(BibliographyCapitalizeProcessor.builder().build())
.addTailTransformer(BibliographyToBibJsonTransformer.builder().build())
.build();
The library is provided through the github maven repository. Include the following dependency section in your pom file:
<repositories>
<repository>
<id>github</id>
<name>GitHub Datarocks Apache Maven Packages</name>
<url>https://maven.pkg.github.com/datarocks-ag/banzai</url>
</repository>
</repositories>
<dependency>
<groupId>org.datarocks</groupId>
<artifactId>banzai</artifactId>
<version>1.0.0</version>
</dependency>
The following code snippets show how to create a functional pipeline.
The required parameters for the handlers (transformers, processors) are supplied by
the HandlerConfiguration
.
Finally, create the HandlerConfiguration
object:
HandlerConfiguration handlerConfiguration =
HandlerConfiguration.builder()
.handlerConfigurationItem("key", "value")
.build();
PipeLine<Object, Object, Object> ipeline =
PipeLine.builder(handlerConfiguration, Object.class, Object.class, Object.class)
.addHeadTransformer(PassTroughTransformer.<Object>builder().build())
.addStep(PassThroughProcessor.builder().build())
.addTailTransformer(PassTroughTransformer.<Object>builder().build())
.build()
A fully working minimal implementation can be found in section Working Minimal Implementation.
Dummy transformer that does no conversion and just returns input object.
Dummy processor that does no conversion and just returns input object.
A working minimal implementation cab be found under BibJsonPipe