Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14354][SQL] Let Expand take name expressions and infer output attributes #12138

Closed
wants to merge 4 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Apr 3, 2016

What changes were proposed in this pull request?

JIRA: https://issues.apache.org/jira/browse/SPARK-14354

Currently we create Expand operator by specifying projections (Seq[Seq[Expression]]) and its output. We allow Expand to reuse child operator's attributes and so make its constraints invalid when we change the corresponding values of these attributes (e.g., making them null when doing a roll up). We should let it take name expressions and infer output itself.

The problem is we re-use child's output as Expands output. We will create new attributes in Expand because Expand actually performs multiple projections. However, we let the projections in Expand as Expression instead of NamedExpression and re-use child output attributes. Thus there is a inconsistency between Expand's output attributes and projected values.

The obvious example for this inconsistency is constraints. Previously Expand inherits child's constraints. As we will change child's output values by projections (e.g., set it as null), these constraints bound on child's attributes are not valid.

In previous PR we just set Expands validConstraints to empty to avoid such inconsistency. But as the result, we don't have reliable constraints after Expand operator.

How was this patch tested?

Modified ConstraintPropagationSuite.

@SparkQA
Copy link

SparkQA commented Apr 3, 2016

Test build #54806 has finished for PR 12138 at commit 376d2e1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 4, 2016

Test build #54845 has finished for PR 12138 at commit c9a3887.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1659,11 +1665,12 @@ object TimeWindowing extends Rule[LogicalPlan] {
val windowEnd = windowStart + window.windowDuration

CreateNamedStruct(
Literal(WINDOW_START) :: windowStart ::
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we manually set the output of Expand here as TimestampType (windowAttr). As windowStart and windowEnd are producing long values, when we infer output from Expand's projections, we will get LongType instead of TimestampType. So we need to explicitly convert the LongType to TimestampType.

@SparkQA
Copy link

SparkQA commented Apr 5, 2016

Test build #54991 has finished for PR 12138 at commit 8a18acd.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TimestampFromLong(child: Expression) extends UnaryExpression with ExpectsInputTypes

@viirya
Copy link
Member Author

viirya commented Apr 5, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Apr 5, 2016

Test build #54993 has finished for PR 12138 at commit 8a18acd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TimestampFromLong(child: Expression) extends UnaryExpression with ExpectsInputTypes

@viirya
Copy link
Member Author

viirya commented Apr 9, 2016

ping @marmbrus @yhuai @cloud-fan

child: LogicalPlan) extends UnaryNode {
override def output: Seq[Attribute] = {
// Take the first projection as output
val preOutput = projections.head.map(_.toAttribute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a waste that we make all projections Seq[NamedExpression], but only use the first one to produce attribute.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, kind of. If we only make the first projection as Seq[NamedExpression], I think it might cause little confusing.

@cloud-fan
Copy link
Contributor

If my understanding is right, the problem is: when child output changes(e.g. making them null when doing a roll up), the output of Expand can't reflect it. I have a simpler idea, when set the output for Expand, use a placeholder to reference to a child column, and define a output method in Expand that replace the placeholder with child attribute. How about it?

@viirya
Copy link
Member Author

viirya commented Apr 11, 2016

@cloud-fan Thanks for comment.

The problem is we re-use child's output as Expands output. We will create new attributes in Expand because Expand actually performs multiple projections. However, we let the projections in Expand as Expression instead of NamedExpression and re-use child output attributes. Thus there is a inconsistency between Expand's output attributes and projected values.

The obvious example for this inconsistency is constraints. Previously Expand inherits child's constraints. As we will change child's output values by projections (e.g., set it as null), these constraints bound on child's attributes are not valid.

In previous PR we just set Expands validConstraints to empty to avoid such inconsistency. But as the result, we don't have reliable constraints after Expand operator.

@cloud-fan
Copy link
Contributor

Yea, and in this PR we use NamedExpression to avoid re-using child's output, which should work. My proposal is that, we can use placeholders(maybe BoundReference) to reference child's output, and define an output method in Expand that replace the placeholder with child attribute. I haven't looked through all cases, but if this proposal can work, it should be simpler to implement.

@viirya
Copy link
Member Author

viirya commented Apr 11, 2016

@cloud-fan yea. this solution looks complicated due to that I need to fit it into current usage of Expand... I will try your proposal and see if it works. Thanks!

@viirya
Copy link
Member Author

viirya commented Apr 11, 2016

@cloud-fan As you replace the placeholder with child attributes, does it mean we re-use child's output too?

@cloud-fan
Copy link
Contributor

no it's different when we do it in a method. Everytime the child output changes, Expand.output will change too(think about Filter).

@viirya
Copy link
Member Author

viirya commented Apr 11, 2016

We may not talk the same thing. Not the change of child output causes problem. We create new attributes in Expand. But we use child's output as Expand's output.

For example, if the child output is [a, b, c]. Currently we set it as Expand's output too.
But when we do a roll up, we may set a, b, c to null. But our output in Expand doesn't reflect this change. So the constraints referring [a, b, c] become invalid.

@cloud-fan
Copy link
Contributor

But our output in Expand doesn't reflect this change.

It's because Expand.output is a val, if we make it a def, and use child.output to construct the output of Expand every time we call the output method, the problem should be fixed.

@viirya
Copy link
Member Author

viirya commented Apr 11, 2016

Why? Every time we call output method on Expand, it still uses child output to construct its output. But child's output doesn't know the values are changed now. Isn't it?

@viirya
Copy link
Member Author

viirya commented Apr 11, 2016

I think the output attributes of Expand operator should based on its projections, not child's output?

@cloud-fan
Copy link
Contributor

I may misunderstand this problem, could you add a test case in this PR to show what's wrong before?

@viirya
Copy link
Member Author

viirya commented Apr 13, 2016

@cloud-fan The obvious wrong case is Expand's constraints. I've modified the test in ConstraintPropagationSuite. Basically the previous usage works except for constraints inference.

@viirya viirya closed this Oct 6, 2016
@viirya viirya deleted the expand-name-expr branch December 27, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants