New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14354][SQL] Let Expand take name expressions and infer output attributes #12138
Conversation
Test build #54806 has finished for PR 12138 at commit
|
Test build #54845 has finished for PR 12138 at commit
|
@@ -1659,11 +1665,12 @@ object TimeWindowing extends Rule[LogicalPlan] { | |||
val windowEnd = windowStart + window.windowDuration | |||
|
|||
CreateNamedStruct( | |||
Literal(WINDOW_START) :: windowStart :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we manually set the output of Expand here as TimestampType
(windowAttr
). As windowStart
and windowEnd
are producing long values, when we infer output from Expand's projections, we will get LongType
instead of TimestampType
. So we need to explicitly convert the LongType
to TimestampType
.
Test build #54991 has finished for PR 12138 at commit
|
retest this please. |
Test build #54993 has finished for PR 12138 at commit
|
ping @marmbrus @yhuai @cloud-fan |
child: LogicalPlan) extends UnaryNode { | ||
override def output: Seq[Attribute] = { | ||
// Take the first projection as output | ||
val preOutput = projections.head.map(_.toAttribute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a waste that we make all projections Seq[NamedExpression]
, but only use the first one to produce attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, kind of. If we only make the first projection as Seq[NamedExpression]
, I think it might cause little confusing.
If my understanding is right, the problem is: when child output changes(e.g. making them null when doing a roll up), the output of |
@cloud-fan Thanks for comment. The problem is we re-use child's output as The obvious example for this inconsistency is constraints. Previously In previous PR we just set |
Yea, and in this PR we use |
@cloud-fan yea. this solution looks complicated due to that I need to fit it into current usage of |
@cloud-fan As you replace the placeholder with child attributes, does it mean we re-use child's output too? |
no it's different when we do it in a method. Everytime the child output changes, |
We may not talk the same thing. Not the change of child output causes problem. We create new attributes in For example, if the child output is [a, b, c]. Currently we set it as |
It's because |
Why? Every time we call |
I think the output attributes of |
I may misunderstand this problem, could you add a test case in this PR to show what's wrong before? |
@cloud-fan The obvious wrong case is Expand's constraints. I've modified the test in |
What changes were proposed in this pull request?
JIRA: https://issues.apache.org/jira/browse/SPARK-14354
Currently we create
Expand
operator by specifying projections (Seq[Seq[Expression]]
) and its output. We allowExpand
to reuse child operator's attributes and so make its constraints invalid when we change the corresponding values of these attributes (e.g., making them null when doing a roll up). We should let it take name expressions and infer output itself.The problem is we re-use child's output as
Expand
s output. We will create new attributes inExpand
becauseExpand
actually performs multiple projections. However, we let the projections inExpand
asExpression
instead ofNamedExpression
and re-use child output attributes. Thus there is a inconsistency betweenExpand
's output attributes and projected values.The obvious example for this inconsistency is constraints. Previously
Expand
inherits child's constraints. As we will change child's output values by projections (e.g., set it as null), these constraints bound on child's attributes are not valid.In previous PR we just set
Expand
svalidConstraints
to empty to avoid such inconsistency. But as the result, we don't have reliable constraints afterExpand
operator.How was this patch tested?
Modified
ConstraintPropagationSuite
.