Skip to content

Optimised Query to fetch attribute level dependency for all target attributes for a given execution plan for 0.6.x Spline model #936

@pratapmmmec

Description

@pratapmmmec

Background [Optional]

Currently Spline 0.6.x has capability to track attribute level lineage (backward) for target data source. Spline UI has the capability which involves two clicks. First at the target attribute and then click on details.

Question

@wajda
We are looking for AQL to fetch attribute level dependency for all attributres of target datasource in an optimal way for a given eventID

For example, we have two datasets Employee and Department and going through below transformation where we are deriving effectiveBonus by deriving it from Employee.bonus and Department.bonusMultiplier. What will be the optimized AQL for the same.

Employee:
empId
empName
deptId
bonus

Department:
deptId
deptName
bonusMultiplier

Code Logic:
Dataset empDS = // Read Employee
Dataset deptDS = // Read Department
Dataset bonus = empDS.join(deptDS, "deptId").withColumn("effectiveBonus”, col(“bonus”).multiply(col("bonusMultiplier”)));
salDS.write().save(“Final_Table”) // effectiveBonus Table

Expected Result (Flexible to be tweaked if we can get required info):
Final_Table.empId, Employee.empId
Final_Table.empName, Employee.empName
Final_Table.bonus, Employee.bonus
Final_Table.deptId, Department.deptId
Final_Table.deptName, Department.deptName
Final_Table.effectiveBonus, Employee.bonus : Department.bonusMultiplier

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions