Skip to content

Latest commit

 

History

History
152 lines (110 loc) · 6.93 KB

user.rst

File metadata and controls

152 lines (110 loc) · 6.93 KB

User Guide

Introduction

The Slurry microframework is a foundation for building reactive data processing applications. Slurry provides building blocks that allow you to take one or more asynchronous event-based data sources, and process those sources using a variety of configurable components, called sections, and feeding the processed data to one or more consumers. One of the key features of Slurry, is that processing occurs asynchronously in a task based fashion. This allows advanced time-based components to be build, like for example sections that group input based on timers.

Slurry is inspired by a mix of different ideas from programming paradigms such as functional programming, IoT data processing frameworks like Node-RED, and graphical data science frameworks like KNIME and Alteryx. From the python world, Slurry takes inspiration from iteration libraries like the built-in module itertools. When combined with asynchronous features, those libraries has lead to more advanced tools like aiostream, eventkit and asyncitertools. Especially aiostream deserves credit as an inspiration for a lot of the basic stream processing sections, that Slurry provides out of the box.

Slurry is build on top of the Trio asynchronous concurrency library. Most data processing components are expected to use the new async/await features of python 3.5, and coroutines run inside a Trio task. However, functionality is provided that allows synchronous blocking operations to occur, either in threads or in separate processes (with some standard limitations).

Slurry does not try to extend the python syntax by overloading operators. Everything is plain python with async/await syntax.

Oh, one final thing.. Maybe you are wondering - 'Why Slurry?' Well, wikipedia says - "A slurry is a mixture of solids denser than water suspended in liquid, usually water", and if you think about it, data is kind of like solids of different shapes and sizes, and it flows through pipelines, kind of like a slurry, so there we go.

Pipelines

The Pipeline <slurry.Pipeline> class is a composable stream processor. It consists of a chain of PipelineSection compatible objects, which each handle a single stream processing operation. Slurry contains a helper function called weld that takes care of composing (welding) a pipeline out of individual sections. A PipelineSection is any object that is valid input to the weld function. This currently includes the following types:

AsyncIterables

Async iterables are valid only as the very first PipelineSection. Subsequent sections will use this async iterable as input source. Placing an AsyncIterable into the middle of a sequence of pipeline sections, will cause a ValueError.

Sections

Any Section <slurry.sections.abc.Section> abc subclass is a valid PipelineSection, at any position in the pipeline.

Tuples

Pipeline sections can be nested to any level by supplying a Tuple[PipelineSection, ...] containing one or more pipeline section-compatible objects. Output from upstream sections are automatically used as input to the nested sequence of pipeline sections.

Note

The weld function is part of the developer api. See slurry.sections.weld.weld for more information.

The stream processing results are accessed by calling Pipeline.tap() <slurry.Pipeline.tap> to create an output channel. Each pipeline can have multiple open taps, each receiving a copy of the output stream.

The pipeline can also be extended dynamically with new pipeline sections with Pipeline.extend() <slurry.Pipeline.extend>, adding additional processing.

slurry.Pipeline

Sections

Sections are individual processing steps that can be applied to an asynchronous stream of data. Sections receive input from the previous section, or from an asynchronous iterable, processes it and sends it to the section output. To use Slurry, all the user has to do is to configure these sections, and decide on the ordering of each step in the pipeline. The Pipeline takes care of wiring together sections, so the output gets routed to the input of the subsequent section.

Behind the scenes, data is send between sections using message passing via trio memory channels. Each section is executed as a Trio task and, from the user perspective, are in principle completely non-blocking and independent of each other.

Slurry includes a library of ready made sections, with functionality inspired by other reactive frameworks. They are documented below. For more information about sections, and how to build your own sections, read the dev.

Most ready made sections support an optional source parameter. In most cases this is semantically identical to using that source as an async interable input to the pipeline, however using the source parameter instead, may sometimes be more readable.

Some sections, like slurry.sections.Zip, support multiple inputs, which must be supplied as parameters.

Transforming input

slurry.sections._refiners

slurry.sections.Map

Note

Although individual sections can be thought of as running independently, this is not a guarantee. Slurry may now, or at any later time, chose to apply certain optimizations, like merging a sequence of strictly functional operations like slurry.sections.Map into a single operation.

Filtering input

slurry.sections._filters

slurry.sections.Skip

slurry.sections.SkipWhile

slurry.sections.Filter

slurry.sections.Changes

slurry.sections.RateLimit

Buffering input

slurry.sections._buffers

slurry.sections.Window

slurry.sections.Group

slurry.sections.Delay

Generating new output

slurry.sections._producers

slurry.sections.Repeat

slurry.sections.Metronome

slurry.sections.InsertValue

Combining multiple inputs

slurry.sections._combiners

slurry.sections.Chain

slurry.sections.Merge

slurry.sections.Zip

slurry.sections.ZipLatest