Combining ExecutionPlan and ExecutionEvent in KafkaLineageDispatcher #446

rkrumins · 2022-06-13T14:37:28Z

rkrumins
Jun 13, 2022

I am working on building out a service that takes the great work done by Spline Spark Agent in terms of capturing the lineage and metrics and eventually publishing that into the target metadata store. Spline Spark Agent works well so far.

The idea here is to use the existing KafkaLineageDispatcher but given that messages are sent separately (ExecutionPlan as one followed by ExecutionEvent one), is there a specific reason why messages are not sent together? As in, in a single Kafka message that contains the "plan" and "event" parts?

Therefore, before starting the implementation of a version of KafkaLineageDispatcher that combines messages from ExecutionPlan and ExecutionEvent or working on a custom utility that effectively joins both messages from a topic of my choice, could somebody please clarify why messages are sent separately? Also, is this done as per a specific set of edge cases?

In terms of the existing codebase, I am referring to the following:

Answered by cerveada

Jun 14, 2022

Currently it could be send in one message, but there is a plan to use this in future.

If the same job is executed twice it could be enough to send the plan just once and then two events for each execution of the job. Similar approach could be used for streaming where one job is running in micro-batches, the event could be used to update the time of last execution of micro-batch.

View full answer

cerveada · 2022-06-14T06:40:27Z

cerveada
Jun 14, 2022
Maintainer

Currently it could be send in one message, but there is a plan to use this in future.

If the same job is executed twice it could be enough to send the plan just once and then two events for each execution of the job. Similar approach could be used for streaming where one job is running in micro-batches, the event could be used to update the time of last execution of micro-batch.

3 replies

rkrumins Jun 14, 2022
Author

Thank you for the clarification and I see what you mean.

With regards to sending the logical plan just once instead of every time the same job is executed, is this expected to be available in near future?

I guess based on that, this might affect a couple of design considerations in terms of future-proofing myself when this feature becomes generally available without having to re-design the solution in the future - especially when it comes to the custom consumer side.

cerveada Jun 15, 2022
Maintainer

There is an effort to add some support for streaming happening right now, but it's still work in progres so I don't know if it will use multiple events per plan or not yet, but it is a plan to support that eventualy.

If you want to be compatible with spline producer API you should expect:

multiple events for one plan
identical plan arriving multiple times (you can safely ignore all but the first one)

rkrumins Jun 15, 2022
Author

That is great, I will certainly keep that in mind @cerveada. This certainly helped in terms of future-proofing myself and adding extra considerations for the final design.

zacayd · 2022-11-24T21:53:03Z

zacayd
Nov 24, 2022

@rkrumins - can you share if you succeeded to load lineage data from spline to kafak topic?
if yes- can you share the docker.-compose.yml and .env file?
and what has been done on the databricks side
thanks

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combining ExecutionPlan and ExecutionEvent in KafkaLineageDispatcher #446

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Combining ExecutionPlan and ExecutionEvent in KafkaLineageDispatcher #446

rkrumins Jun 13, 2022

Replies: 2 comments · 3 replies

cerveada Jun 14, 2022 Maintainer

rkrumins Jun 14, 2022 Author

cerveada Jun 15, 2022 Maintainer

rkrumins Jun 15, 2022 Author

zacayd Nov 24, 2022

rkrumins
Jun 13, 2022

Replies: 2 comments 3 replies

cerveada
Jun 14, 2022
Maintainer

rkrumins Jun 14, 2022
Author

cerveada Jun 15, 2022
Maintainer

rkrumins Jun 15, 2022
Author

zacayd
Nov 24, 2022