The Slurry microframework is a foundation for building reactive data processing applications. Slurry provides building blocks that allow you to take one or more asynchronous event-based data sources, and process those sources using a variety of configurable components, called sections, and feeding the processed data to one or more consumers. One of the key features of Slurry, is that processing occurs asynchronously in a task based fashion. This allows advanced time-based components to be build, like for example sections that group input based on timers.
Slurry is inspired by a mix of different ideas from programming paradigms such as functional programming, IoT data processing frameworks like Node-RED, and graphical data science frameworks like KNIME and Alteryx. From the python world, Slurry takes inspiration from iteration libraries like the built-in module itertools. When combined with asynchronous features, those libraries has lead to more advanced tools like aiostream, eventkit and asyncitertools. Especially aiostream deserves credit as an inspiration for a lot of the basic stream processing sections, that Slurry provides out of the box.
Slurry is build on top of the Trio asynchronous concurrency library. Most data processing components are expected to use the new async/await features of python 3.5, and coroutines run inside a Trio task. However, functionality is provided that allows synchronous blocking operations to occur, either in threads or in separate processes (with some standard limitations).
Slurry does not try to extend the python syntax by overloading operators. Everything is plain python with async/await syntax.
Oh, one final thing.. Maybe you are wondering - 'Why Slurry?' Well, wikipedia says - "A slurry is a mixture of solids denser than water suspended in liquid, usually water", and if you think about it, data is kind of like solids of different shapes and sizes, and it flows through pipelines, kind of like a slurry, so there we go.
The Pipeline <slurry.Pipeline>
class is a composable stream processor. It consists of a chain of PipelineSection
compatible objects, which each handle a single stream processing operation. Slurry contains a helper function called weld that takes care of composing (welding) a pipeline out of individual sections. A PipelineSection
is any object that is valid input to the weld function. This currently includes the following types:
- AsyncIterables
Async iterables are valid only as the very first
PipelineSection
. Subsequent sections will use this async iterable as input source. Placing anAsyncIterable
into the middle of a sequence of pipeline sections, will cause aValueError
.- Sections
Any
Section <slurry.sections.abc.Section>
abc subclass is a validPipelineSection
, at any position in the pipeline.- Tuples
Pipeline sections can be nested to any level by supplying a
Tuple[PipelineSection, ...]
containing one or more pipeline section-compatible objects. Output from upstream sections are automatically used as input to the nested sequence of pipeline sections.
Note
The weld
function is part of the developer api. See slurry.sections.weld.weld
for more information.
The stream processing results are accessed by calling Pipeline.tap() <slurry.Pipeline.tap>
to create an output channel. Each pipeline can have multiple open taps, each receiving a copy of the output stream.
The pipeline can also be extended dynamically with new pipeline sections with Pipeline.extend() <slurry.Pipeline.extend>
, adding additional processing.
slurry.Pipeline
Sections are individual processing steps that can be applied to an asynchronous stream of data. Sections receive input from the previous section, or from an asynchronous iterable, processes it and sends it to the section output. To use Slurry, all the user has to do is to configure these sections, and decide on the ordering of each step in the pipeline. The Pipeline
takes care of wiring together sections, so the output gets routed to the input of the subsequent section.
Behind the scenes, data is send between sections using message passing via trio memory channels. Each section is executed as a Trio task and, from the user perspective, are in principle completely non-blocking and independent of each other.
Slurry includes a library of ready made sections, with functionality inspired by other reactive frameworks. They are documented below. For more information about sections, and how to build your own sections, read the dev
.
Most ready made sections support an optional source parameter. In most cases this is semantically identical to using that source as an async interable input to the pipeline, however using the source parameter instead, may sometimes be more readable.
Some sections, like slurry.sections.Zip
, support multiple inputs, which must be supplied as parameters.
slurry.sections._refiners
slurry.sections.Map
Note
Although individual sections can be thought of as running independently, this is not a guarantee. Slurry may now, or at any later time, chose to apply certain optimizations, like merging a sequence of strictly functional operations like slurry.sections.Map
into a single operation.
slurry.sections._filters
slurry.sections.Skip
slurry.sections.SkipWhile
slurry.sections.Filter
slurry.sections.Changes
slurry.sections.RateLimit
slurry.sections._buffers
slurry.sections.Window
slurry.sections.Group
slurry.sections.Delay
slurry.sections._producers
slurry.sections.Repeat
slurry.sections.Metronome
slurry.sections.InsertValue
slurry.sections._combiners
slurry.sections.Chain
slurry.sections.Merge
slurry.sections.Zip
slurry.sections.ZipLatest