-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spline #605 MergeIntoNodeBuilder: java.lang.IllegalArgumentException #606
spline #605 MergeIntoNodeBuilder: java.lang.IllegalArgumentException #606
Conversation
76397f5
to
62244c9
Compare
private lazy val mergeInputs: Seq[Seq[Attribute]] = { | ||
val Seq(srcAttrs, trgAttrs) = inputAttributes | ||
val srcAttrsByName = srcAttrs.map(a => a.name -> a).toMap | ||
trgAttrs.map(trg => { | ||
val src = srcAttrsByName.get(trg.name) | ||
Seq(trg) ++ src | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when the names don't match:
| WHEN MATCHED THEN
| UPDATE SET
| name = src.full_name
I would also rename src
-> maybeSrc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also rename
src
->maybeSrc
renamed
What happens when the names don't match
Well, I guess nothing. Since there is no reference from a target attribute to the expression in Spark exec plan (or at least I can't see it) we cannot do much about it. For example from the below screenshot how would I infer dependencies between full_name -> name
or code -> name
? The agent could guess based on e.g. Levenshtein distance, but that would be another feature.
Actually, now thinking about it more I realized that even if names match it doesn't mean there is a dependency as there could be mutual renaming in the query (UPDATE SET dst.a = src.b, dst.b = src.a
).
Do you have any better suggestion how to do it properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now thinking about it more I realized that even if names match it doesn't mean there is a dependency as there could be mutual renaming in the query (UPDATE SET dst.a = src.b, dst.b = src.a).
Exactly. I think the only way to do this properly is to get the attribute pairing from UPDATE SET condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-implemented the dependency resolver algorithm, it should now cover all the cases discussed above, plus sub-attribute references. Please check.
…on) to `maybeSrc`
…neageHarvester` to `OperationNodeBuilderFactory`, and delegate `MergeIntoCommand` children extraction logic to the `MergeIntoNodeBuilder` respectively.
…orythm to accommodate for renaming and sub-attributes in the MERGE clauses.
@@ -46,3 +47,7 @@ trait OperationNodeBuilder { | |||
|
|||
def outputExprToAttMap: Map[sparkExprssions.ExprId, Attribute] | |||
} | |||
|
|||
object OperationNodeBuilder { | |||
type Attributes = Seq[Attribute] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this will only decrease the readability. I don't see any problems with using Seq[Attribute]
as a type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was primarily introduced to get rid of Seq[Seq[Attribute]]
in other places.
extra = Map(CommonExtras.Synthetic -> true), | ||
name = attr1.name | ||
) | ||
override lazy val outputAttributes: Seq[Attribute] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this method work when the input attributes are synthetic? I think in this case the proper way is to take inputAttributes
and use their id's because they may differ from the Spark ones.
… as agreed on PR discussion. Use the type alias in all places where appropriate.
Kudos, SonarCloud Quality Gate passed!
|
fixes #605
MergeIntoNodeBuilder
replacedinputAttribute.transpose
with a mapping algorithm based on attribute names. Transpose cannot be used for that purpose in this place as MERGE command can have inputs with different number of attributes.DeltaMergeDSV2Job
andDeltaDSV2Spec
accordinglyDeltaMergeDSV2Job
(missing string interpolation)