Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support stream/table joins #177

Closed
dberardo-com opened this issue Nov 28, 2022 · 3 comments
Closed

[FEATURE] Support stream/table joins #177

dberardo-com opened this issue Nov 28, 2022 · 3 comments
Labels
question Further information is requested

Comments

@dberardo-com
Copy link

Is it possible to use bytewax for joining content of different kafka topics (similar to what ksqldb is doing) ?

doing this will be an example of integrating "persistent queries" (permanent background processes that never stops). is this a good use case for bytewax?

also comparing to ksqldb, what happens if the bytewax workers are killed? will those persistent queries restart automatically when workers come back up and will they use the latest/earliest committed offset on the kafka topics ? or is the restart manual?

cheers

@github-actions github-actions bot added the needs triage New issue, needs triage label Nov 28, 2022
@awmatheson
Copy link
Contributor

👋 @dberardo-com, I will try and answer your questions below.

Is it possible to use bytewax for joining content of different kafka topics (similar to what ksqldb is doing)?

Yes, you can join streams together in Bytewax. The caveat is that the native Kafka connector (KafkaInputConfig) today does not provide this functionality and you would have to use the ManualInputConfig.

Today, a dataflow (like this example) can be written with the ManualInputConfig functionality that would allow the dataflow to consume from multiple topics and join the streams together. Consuming from kafka with the ManualInputConfig will require you to manage offsets and state recovery in the manual input.

doing this will be an example of integrating "persistent queries" (permanent background processes that never stops). is this a good use case for bytewax?

Yes, persistent queries, if I understand what you mean, are a good use case for Bytewax and Stateful operators (stateful_map, reduce_window, etc.) would allow for this type of behavior. You can see a very rudimentary example of the persistent query across multiple streams in the linked example above.

also comparing to ksqldb, what happens if the bytewax workers are killed? will those persistent queries restart automatically when workers come back up and will they use the latest/earliest committed offset on the kafka topics ? or is the restart manual?

If a worker dies and you have recovery enabled, you will be able to restart the workflow and recover the state and it will start at the appropriate offset automatically. If you are using Bytewax on k8s or as a service via (waxctl)[https://www.bytewax.io/docs/deployment/waxctl] you will be able to restart automatically as well.

@awmatheson awmatheson added question Further information is requested and removed needs triage New issue, needs triage labels Nov 28, 2022
@colebaileygit
Copy link

Would be great if this could be done more cleanly, e.g. having two different inputs in a flow which can be transformed independently, and then later keyed and joined. Otherwise the whole paradigm is untyped in python and would require messy if blocks 🤔

@davidselassie
Copy link
Contributor

This is now cleanly possible in the latest version of Bytewax https://github.com/bytewax/bytewax/releases/tag/v0.18.0 . It supports having multiple independent input sources and an explicit join operator. See our documentation on joins for how this works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants