`auto_pipeline`
----------------

Say you want to create a quick linear pipeline that just takes events from one data source, transforms those events, and sends them on to a data sink. Since we're thinking in a straight line, we can think of the jupyter notebook itself as a pipeline. In the simplest case, events would come in at the top of the notebook, are processed, and then come out the bottom.

Since sometimes you might want to do some imports/setup before launching the pipeline, in reality we divide the notebook into two sections. The setup section and the pipeline section.

`auto_pipeline()` lets you turn simple jupyter notebooks into pipelines.


To use just call `auto_pipeline(source=<source>, sink=<sink>)` at some point in your notebook, and the rest of the cells in the notebook will become processors in that pipeline. The special variable `event` will be set in the pipeline after the `auto_pipeline` call. This variable will also be sent to the sink at the end of the pipeline.


Setup section 
-------------

Gets run once at launch

In [1]:
from bspump.jupyter import *
import bspump.kafka
import json

In [2]:
some_constant=3

In [3]:
@register_connection
def connection(app):
  return bspump.kafka.KafkaConnection(app, "KafkaConnection")

BitSwan BSPump version devel


In [9]:
# we define a sample event to test our pipeline.
event = b"""{"foo":"bap"}"""

We use `auto_pipeline` to mark the start of the *pipeline section*. We also specifiy the source and sink for our pipeline at this time.

In [5]:
auto_pipeline(
    source=lambda app, pipeline: bspump.kafka.KafkaSource(app, pipeline, "KafkaConnection"),
    sink=lambda app, pipeline: bspump.kafka.KafkaSink(app, pipeline, "KafkaConnection")
)

Pipeline section
----------------

Everything after this is rerun every time an event comes in. At run time, the `event` variable is automatically set with the value of the event that comes from the source.

We can do whatever transformations we please, and then, by setting `event` at the end of the notebook, the value of `event` will automatically be sent to the sink.

In [10]:
event = json.loads(event.decode("utf8"))
event

{'foo': 'bap'}

In [11]:
event["foo"] = event["foo"].upper()
event

{'foo': 'BAP'}

In [12]:
event["foo"] = (" " * some_constant).join(reversed(list(event["foo"])))
event

{'foo': 'P   A   B'}

In [13]:
event = json.dumps(event).encode()
event

b'{"foo": "P   A   B"}'