Skip to content

v0.18.0

Compare
Choose a tag to compare
@whoahbot whoahbot released this 20 Dec 22:32
· 629 commits to main since this release
6478893

Overview

  • Support for schema registries, through bytewax.connectors.kafka.registry.RedpandaSchemaRegistry and bytewax.connectors.kafka.registry.ConfluentSchemaRegistry.

  • Custom Kafka operators in bytewax.connectors.kafka.operators:
    input, output, deserialize_key, deserialize_value, deserialize,
    serialize_key, serialize_value and serialize.

  • Breaking change KafkaSource now emits a special KafkaSourceMessage to allow access to all data on consumed messages. KafkaSink now consumes KafkaSinkMessage to allow setting additional fields on produced messages.

  • Non-linear dataflows are now possible. Each operator method returns
    a handle to the Streams it produces; add further steps via calling
    operator functions on those returned handles, not the root
    Dataflow. See the migration guide for more info.

  • Auto-complete and type hinting on operators, inputs, outputs,
    streams, and logic functions now works.

  • A ton of new operators: collect_final, count_final,
    count_window, flatten, inspect_debug, join, join_named,
    max_final, max_window, merge, min_final, min_window,
    key_on, key_assert, key_split, merge, unary. Documentation
    for all operators are in bytewax.operators now.

  • New operators can be added in Python, made by grouping existing
    operators. See bytewax.dataflow module docstring for more info.

  • Breaking change Operators are now stand-alone functions; import bytewax.operators as op and use e.g. op.map("step_id", upstream, lambda x: x + 1).

  • Breaking change All operators must take a step_id argument now.

  • Breaking change fold and reduce operators have been renamed to
    fold_final and reduce_final. They now only emit on EOF and are
    only for use in batch contexts.

  • Breaking change batch operator renamed to collect, so as to
    not be confused with runtime batching. Behavior is unchanged.

  • Breaking change output operator does not forward downstream its
    items. Add operators on the upstream handle instead.

  • next_batch on input partitions can now return any Iterable, not
    just a List.

  • inspect operator now has a default inspector that prints out items
    with the step ID.

  • collect_window operator now can collect into sets and dicts.

  • Adds a get_fs_id argument to {Dir,File}Source to allow handling
    non-identical files per worker.

  • Adds a TestingSource.EOF and TestingSource.ABORT sentinel values
    you can use to test recovery.

  • Breaking change Adds a datetime argument to
    FixedPartitionSource.build_part, DynamicSource.build_part,
    StatefulSourcePartition.next_batch, and
    StatelessSourcePartition.next_batch. You can now use this to
    update your next_awake time easily.

  • Breaking change Window operators now emit WindowMetadata objects
    downstream. These objects can be used to introspect the open_time
    and close_time of windows. This changes the output type of windowing
    operators from: (key, values) to (key, (metadata, values)).

  • Breaking change IO classes and connectors have been renamed to
    better reflect their semantics and match up with documentation.

  • Moves the ability to start multiple Python processes with the
    -p or --processes to the bytewax.testing module.

  • Breaking change SimplePollingSource moved from
    bytewax.connectors.periodic to bytewax.inputs since it is an
    input helper.

  • SimplePollingSource's align_to argument now works.

What's Changed

New Contributors

  • @cra made their first contribution in #325

Full Changelog: v0.17.1...v0.18.0