spline #605 MergeIntoNodeBuilder: java.lang.IllegalArgumentException #606

wajda · 2023-02-16T19:06:25Z

fixes #605

In the MergeIntoNodeBuilder replaced inputAttribute.transpose with a mapping algorithm based on attribute names. Transpose cannot be used for that purpose in this place as MERGE command can have inputs with different number of attributes.
Updated DeltaMergeDSV2Job and DeltaDSV2Spec accordingly
Fixed a minor bug in the DeltaMergeDSV2Job (missing string interpolation)

… transpose

cerveada · 2023-02-17T08:02:55Z

core/src/main/scala/za/co/absa/spline/harvester/builder/plan/MergeIntoNodeBuilder.scala

+  private lazy val mergeInputs: Seq[Seq[Attribute]] = {
+    val Seq(srcAttrs, trgAttrs) = inputAttributes
+    val srcAttrsByName = srcAttrs.map(a => a.name -> a).toMap
+    trgAttrs.map(trg => {
+      val src = srcAttrsByName.get(trg.name)
+      Seq(trg) ++ src
+    })
+  }


What happens when the names don't match:

| WHEN MATCHED THEN | UPDATE SET | name = src.full_name

I would also rename src -> maybeSrc

I would also rename src -> maybeSrc

renamed

What happens when the names don't match

Well, I guess nothing. Since there is no reference from a target attribute to the expression in Spark exec plan (or at least I can't see it) we cannot do much about it. For example from the below screenshot how would I infer dependencies between full_name -> name or code -> name? The agent could guess based on e.g. Levenshtein distance, but that would be another feature.

Actually, now thinking about it more I realized that even if names match it doesn't mean there is a dependency as there could be mutual renaming in the query (UPDATE SET dst.a = src.b, dst.b = src.a).

Do you have any better suggestion how to do it properly?

Actually, now thinking about it more I realized that even if names match it doesn't mean there is a dependency as there could be mutual renaming in the query (UPDATE SET dst.a = src.b, dst.b = src.a).

Exactly. I think the only way to do this properly is to get the attribute pairing from UPDATE SET condition.

I've re-implemented the dependency resolver algorithm, it should now cover all the cases discussed above, plus sub-attribute references. Please check.

…on) to `maybeSrc`

…neageHarvester` to `OperationNodeBuilderFactory`, and delegate `MergeIntoCommand` children extraction logic to the `MergeIntoNodeBuilder` respectively.

…orythm to accommodate for renaming and sub-attributes in the MERGE clauses.

…ttr-transpose

cerveada · 2023-03-01T08:12:15Z

core/src/main/scala/za/co/absa/spline/harvester/builder/OperationNodeBuilder.scala

@@ -46,3 +47,7 @@ trait OperationNodeBuilder {

  def outputExprToAttMap: Map[sparkExprssions.ExprId, Attribute]
 }
+
+object OperationNodeBuilder {
+  type Attributes = Seq[Attribute]


I feel like this will only decrease the readability. I don't see any problems with using Seq[Attribute] as a type.

It was primarily introduced to get rid of Seq[Seq[Attribute]] in other places.

cerveada · 2023-03-01T08:44:01Z

core/src/main/scala/za/co/absa/spline/harvester/builder/plan/MergeIntoNodeBuilder.scala

-      extra = Map(CommonExtras.Synthetic -> true),
-      name = attr1.name
-    )
+  override lazy val outputAttributes: Seq[Attribute] = {


Will this method work when the input attributes are synthetic? I think in this case the proper way is to take inputAttributes and use their id's because they may differ from the Spark ones.

… as agreed on PR discussion. Use the type alias in all places where appropriate.

sonarcloud · 2023-03-03T14:02:12Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

wajda requested a review from cerveada February 16, 2023 19:11

spline #605 MergeIntoNodeBuilder: java.lang.IllegalArgumentException:…

62244c9

… transpose

wajda force-pushed the bugfix/spline-spark-agent-605-merge-attr-transpose branch from 76397f5 to 62244c9 Compare February 16, 2023 19:42

wajda linked an issue Feb 16, 2023 that may be closed by this pull request

MergeIntoNodeBuilder: java.lang.IllegalArgumentException: transpose #605

Closed

cerveada reviewed Feb 17, 2023

View reviewed changes

wajda added 6 commits February 17, 2023 09:36

spline #605 Addressing PR comment: rename src (that is of type Opti…

73a03f8

…on) to `maybeSrc`

spline #605 minor code style: remove redundant override val

ec82e1d

spline #605 minor Refactoring and code style

015388d

spline #605 Refactoring: move extractChildren(Node) method from `Li…

d40b4af

…neageHarvester` to `OperationNodeBuilderFactory`, and delegate `MergeIntoCommand` children extraction logic to the `MergeIntoNodeBuilder` respectively.

spline #605 Reimplement MERGE INTO attribute dependency resolving alg…

fdfdb44

…orythm to accommodate for renaming and sub-attributes in the MERGE clauses.

Merge branch 'release/1.0' into bugfix/spline-spark-agent-605-merge-a…

9d2acf0

…ttr-transpose

cerveada reviewed Mar 1, 2023

View reviewed changes

spline #605 Refactoring: Rename type alias Attributes -> IOAttributes…

ce751f2

… as agreed on PR discussion. Use the type alias in all places where appropriate.

cerveada approved these changes Mar 3, 2023

View reviewed changes

wajda merged commit 879fd51 into release/1.0 Mar 3, 2023

wajda deleted the bugfix/spline-spark-agent-605-merge-attr-transpose branch March 3, 2023 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spline #605 MergeIntoNodeBuilder: java.lang.IllegalArgumentException #606

spline #605 MergeIntoNodeBuilder: java.lang.IllegalArgumentException #606

wajda commented Feb 16, 2023

cerveada Feb 17, 2023 •

edited

Loading

wajda Feb 17, 2023

cerveada Feb 17, 2023

wajda Feb 28, 2023

cerveada Mar 1, 2023

wajda Mar 1, 2023

cerveada Mar 1, 2023

sonarcloud bot commented Mar 3, 2023

spline #605 MergeIntoNodeBuilder: java.lang.IllegalArgumentException #606

spline #605 MergeIntoNodeBuilder: java.lang.IllegalArgumentException #606

Conversation

wajda commented Feb 16, 2023

cerveada Feb 17, 2023 • edited Loading

Choose a reason for hiding this comment

wajda Feb 17, 2023

Choose a reason for hiding this comment

cerveada Feb 17, 2023

Choose a reason for hiding this comment

wajda Feb 28, 2023

Choose a reason for hiding this comment

cerveada Mar 1, 2023

Choose a reason for hiding this comment

wajda Mar 1, 2023

Choose a reason for hiding this comment

cerveada Mar 1, 2023

Choose a reason for hiding this comment

sonarcloud bot commented Mar 3, 2023

cerveada Feb 17, 2023 •

edited

Loading