-
I am working on building out a service that takes the great work done by Spline Spark Agent in terms of capturing the lineage and metrics and eventually publishing that into the target metadata store. Spline Spark Agent works well so far. The idea here is to use the existing KafkaLineageDispatcher but given that messages are sent separately (ExecutionPlan as one followed by ExecutionEvent one), is there a specific reason why messages are not sent together? As in, in a single Kafka message that contains the "plan" and "event" parts? Therefore, before starting the implementation of a version of KafkaLineageDispatcher that combines messages from ExecutionPlan and ExecutionEvent or working on a custom utility that effectively joins both messages from a topic of my choice, could somebody please clarify why messages are sent separately? Also, is this done as per a specific set of edge cases? In terms of the existing codebase, I am referring to the following:
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Currently it could be send in one message, but there is a plan to use this in future. If the same job is executed twice it could be enough to send the plan just once and then two events for each execution of the job. Similar approach could be used for streaming where one job is running in micro-batches, the event could be used to update the time of last execution of micro-batch. |
Beta Was this translation helpful? Give feedback.
-
@rkrumins - can you share if you succeeded to load lineage data from spline to kafak topic? |
Beta Was this translation helpful? Give feedback.
Currently it could be send in one message, but there is a plan to use this in future.
If the same job is executed twice it could be enough to send the plan just once and then two events for each execution of the job. Similar approach could be used for streaming where one job is running in micro-batches, the event could be used to update the time of last execution of micro-batch.