Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spline #605 MergeIntoNodeBuilder: java.lang.IllegalArgumentException #606

Merged

Conversation

wajda
Copy link
Contributor

@wajda wajda commented Feb 16, 2023

fixes #605

  • In the MergeIntoNodeBuilder replaced inputAttribute.transpose with a mapping algorithm based on attribute names. Transpose cannot be used for that purpose in this place as MERGE command can have inputs with different number of attributes.
  • Updated DeltaMergeDSV2Job and DeltaDSV2Spec accordingly
  • Fixed a minor bug in the DeltaMergeDSV2Job (missing string interpolation)

@wajda wajda requested a review from cerveada February 16, 2023 19:11
@wajda wajda force-pushed the bugfix/spline-spark-agent-605-merge-attr-transpose branch from 76397f5 to 62244c9 Compare February 16, 2023 19:42
@wajda wajda linked an issue Feb 16, 2023 that may be closed by this pull request
Comment on lines 32 to 39
private lazy val mergeInputs: Seq[Seq[Attribute]] = {
val Seq(srcAttrs, trgAttrs) = inputAttributes
val srcAttrsByName = srcAttrs.map(a => a.name -> a).toMap
trgAttrs.map(trg => {
val src = srcAttrsByName.get(trg.name)
Seq(trg) ++ src
})
}
Copy link
Contributor

@cerveada cerveada Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the names don't match:

  | WHEN MATCHED THEN
  |   UPDATE SET
  |     name = src.full_name

I would also rename src -> maybeSrc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also rename src -> maybeSrc

renamed

What happens when the names don't match

Well, I guess nothing. Since there is no reference from a target attribute to the expression in Spark exec plan (or at least I can't see it) we cannot do much about it. For example from the below screenshot how would I infer dependencies between full_name -> name or code -> name? The agent could guess based on e.g. Levenshtein distance, but that would be another feature.

image

Actually, now thinking about it more I realized that even if names match it doesn't mean there is a dependency as there could be mutual renaming in the query (UPDATE SET dst.a = src.b, dst.b = src.a).

Do you have any better suggestion how to do it properly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now thinking about it more I realized that even if names match it doesn't mean there is a dependency as there could be mutual renaming in the query (UPDATE SET dst.a = src.b, dst.b = src.a).

Exactly. I think the only way to do this properly is to get the attribute pairing from UPDATE SET condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-implemented the dependency resolver algorithm, it should now cover all the cases discussed above, plus sub-attribute references. Please check.

…neageHarvester` to `OperationNodeBuilderFactory`, and delegate `MergeIntoCommand` children extraction logic to the `MergeIntoNodeBuilder` respectively.
…orythm to accommodate for renaming and sub-attributes in the MERGE clauses.
@@ -46,3 +47,7 @@ trait OperationNodeBuilder {

def outputExprToAttMap: Map[sparkExprssions.ExprId, Attribute]
}

object OperationNodeBuilder {
type Attributes = Seq[Attribute]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this will only decrease the readability. I don't see any problems with using Seq[Attribute] as a type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was primarily introduced to get rid of Seq[Seq[Attribute]] in other places.

extra = Map(CommonExtras.Synthetic -> true),
name = attr1.name
)
override lazy val outputAttributes: Seq[Attribute] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this method work when the input attributes are synthetic? I think in this case the proper way is to take inputAttributes and use their id's because they may differ from the Spark ones.

… as agreed on PR discussion. Use the type alias in all places where appropriate.
@sonarcloud
Copy link

sonarcloud bot commented Mar 3, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@wajda wajda merged commit 879fd51 into release/1.0 Mar 3, 2023
@wajda wajda deleted the bugfix/spline-spark-agent-605-merge-attr-transpose branch March 3, 2023 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MergeIntoNodeBuilder: java.lang.IllegalArgumentException: transpose
2 participants