Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3363][SQL] Type Coercion should support every type to have null value #2246

Closed
wants to merge 5 commits into from

Conversation

adrian-wang
Copy link
Contributor

Type Coercion should support every type to have null value

@adrian-wang adrian-wang changed the title Type Coercion should support every type to have null value [SPARK-3363][SQL] Type Coercion should support every type to have null value Sep 3, 2014
@SparkQA
Copy link

SparkQA commented Sep 3, 2014

QA tests have started for PR 2246 at commit c619f0a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 3, 2014

QA tests have finished for PR 2246 at commit c619f0a.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@adrian-wang
Copy link
Contributor Author

JsonRDD also called HiveTypeCoercion.allPromotion, but with a method owned by itself, which leads to the failure. I have fixed that also.

@SparkQA
Copy link

SparkQA commented Sep 3, 2014

QA tests have started for PR 2246 at commit 46daa90.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 3, 2014

QA tests have finished for PR 2246 at commit 46daa90.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// If found return the widest common type, otherwise None
val returnType = applicableConversion.map(_.filter(t => t == t1 || t == t2).last)
// If found return the widest common type, otherwise None
applicableConversion.map(_.filter(t => t == t1 || t == t2).last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't just call findTightestCommonType here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in complex type they are different. For example what JSON cares are if the key strings are the same(MapType), which is quite different from here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yes after this the code is different, but up here this looks identical. Could this just be val returnType = findTightestCommonType(t1, t2)? (Sorry if I'm missing something obvious)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think in the future we should also support complex type promotion in findTightestCommonType(t1, t2). Anyway we can extract the current findTightestCommonType(t1, t2) as a util function. What do you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see any reason to over plan for possible extension in the future. Since this code is the same in both places I think it'd be better to just call findTightestCommonType instead of duplicating it. If we decide to add more rules in the future we can add a new function that calls this simpler version.

@adrian-wang
Copy link
Contributor Author

I made a mistake here, This part cannot use findTightestCommonType(t1, t2), because we should not return the value only if the 2 types are the same. That is, for two Struct(field) type, we need to compare the field name to decide.

@SparkQA
Copy link

SparkQA commented Sep 4, 2014

QA tests have started for PR 2246 at commit ef6f986.

  • This patch merges cleanly.

@adrian-wang
Copy link
Contributor Author

I have changed the code, now compatibleType and findTightestCommonType looks different.

@SparkQA
Copy link

SparkQA commented Sep 4, 2014

QA tests have finished for PR 2246 at commit ef6f986.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@adrian-wang
Copy link
Contributor Author

I hate graphx :(

@rxin
Copy link
Contributor

rxin commented Sep 4, 2014

Don't hate it :)

@rxin
Copy link
Contributor

rxin commented Sep 4, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Sep 4, 2014

QA tests have started for PR 2246 at commit ef6f986.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 4, 2014

QA tests have finished for PR 2246 at commit ef6f986.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

marmbrus commented Sep 5, 2014

I don't think the concern about using findTightestCommonType with struct types is valid. If the fields match up exactly we will return the type, otherwise we will fall back on the advanced resolution logic.

Here is a PR (against your PR) that implements this and seems to pass all our tests: adrian-wang#2

@adrian-wang
Copy link
Contributor Author

Hmm... you are right, I misunderstood some code in compatibleType there. Thank you!

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

QA tests have started for PR 2246 at commit c6241de.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 6, 2014

QA tests have finished for PR 2246 at commit c6241de.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have started for PR 2246 at commit c6241de.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have finished for PR 2246 at commit c6241de.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Last(child: Expression) extends PartialAggregate with trees.UnaryNode[Expression]
    • case class LastFunction(expr: Expression, base: AggregateExpression) extends AggregateFunction
    • case class Abs(child: Expression) extends UnaryExpression

@adrian-wang
Copy link
Contributor Author

The fail is in BroadcastSuite, has nothing to do with this PR...

@marmbrus
Copy link
Contributor

@JoshRosen, another flakey one in Core

@marmbrus
Copy link
Contributor

Thanks! Merged to master.

@asfgit asfgit closed this in f0c87dc Sep 10, 2014
@adrian-wang adrian-wang deleted the spark3363-0 branch September 24, 2014 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants