[SPARK-16287][SQL] Implement str_to_map SQL function #13990

techaddict · 2016-06-30T05:00:48Z

What changes were proposed in this pull request?

This PR adds str_to_map SQL function in order to remove Hive fallback.

How was this patch tested?

Pass the Jenkins tests with newly added.

SparkQA · 2016-06-30T06:58:52Z

Test build #61525 has finished for PR 13990 at commit 1f888ab.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

techaddict · 2016-07-03T10:25:34Z

cc: @cloud-fan @rxin

SparkQA · 2016-07-03T12:10:46Z

Test build #61685 has finished for PR 13990 at commit fa294bc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-07-03T13:52:16Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+ * Creates a map after splitting the input text into key/value pairs using delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, delimiter1, delimiter2]) - Creates a map after splitting the text into


delimiter1 and delimiter2 are not good names. delimiter1 is used to separate key-value pairs from the input text, and delimiter2 is used to separate key and value from each kv pair. Do you have some ideas about the naming?

how about pairDelim and pairSeperatorDelim, not very good with naming what do you suggest ?

Used delimiter1 and delimiter2 because its named that way in hive.

how about pairDelim and keyValueDelim?

yupp sound much better, let me make the change

cloud-fan · 2016-07-03T15:09:41Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+  usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into
+    key/value pairs using delimiters.
+    Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""")
+case class StringToMap(child: Expression, pairDelim: Expression, keyValueDelim: Expression)


how about renaming child to text? to make it consistent with the comment: _FUNC_(text[, pairDelim, keyValueDelim])

SparkQA · 2016-07-03T16:11:47Z

Test build #61690 has finished for PR 13990 at commit 6b2390d.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class StringToMap(child: Expression, pairDelim: Expression, keyValueDelim: Expression)

SparkQA · 2016-07-05T19:40:10Z

Test build #61767 has finished for PR 13990 at commit ca74f9a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: Expression)

cloud-fan · 2016-07-06T00:23:08Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+      .split(delim1.asInstanceOf[UTF8String], -1)
+      .map{_.split(delim2.asInstanceOf[UTF8String], 2)}
+
+    ArrayBasedMapData(array.map(_(0)), array.map(_(1))).asInstanceOf[MapData]


seems unnecessary asInstanceOf?

rxin · 2016-07-06T05:09:46Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+ * Creates a map after splitting the input text into key/value pairs using delimeters
+ */
+@ExpressionDescription(
+  usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into


this will mess up the display i think?

also we really need an example here

not sure about the display [Usage: str_to_map(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for pairDelim and '=' for keyValueDelim.]
added example

rxin · 2016-07-06T05:10:01Z

cc @dongjoon-hyun can you help review this

SparkQA · 2016-07-06T05:14:50Z

Test build #61806 has finished for PR 13990 at commit f7c03c5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-07-06T05:31:02Z

Sure, @rxin .

dongjoon-hyun · 2016-07-06T05:38:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+
+  def this(child: Expression) = {
+    this(child, Literal(","), Literal("="))
+  }


Hi, @techaddict .
Could you add one more constructor, this(child: Expression, pairDelim: Expression)?

cloud-fan · 2016-07-13T17:04:15Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+        TypeCheckResult.TypeCheckSuccess
+      } else {
+        TypeCheckResult.TypeCheckFailure(
+          s"$prettyName's arguments must be foldable, but got $children.")


mistake? 2 delimiters not all arguments

cloud-fan · 2016-07-13T17:12:54Z

sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala

+    )
+
+    // All arguments should be string literals.
+    val m1 = intercept[AnalysisException]{


let's remove these error tests from here, usually we only test the type checking logic in unit test, not end-to-end test.

SparkQA · 2016-07-13T17:15:28Z

Test build #62250 has finished for PR 13990 at commit cbc8798.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-13T19:08:47Z

Test build #62256 has finished for PR 13990 at commit e701716.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-07-13T19:14:44Z

Test build #62257 has finished for PR 13990 at commit 8172bd5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

techaddict · 2016-07-15T03:12:17Z

@cloud-fan anything else, it good to merge ?

cloud-fan · 2016-07-18T14:54:58Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+        TypeCheckResult.TypeCheckSuccess
+      } else {
+        TypeCheckResult.TypeCheckFailure(
+          s"$prettyName's delimiters must be foldable, but got $children.")


$children will print something like Seq(xxx, xxx), I think we can just say $prettyName's delimiters must be foldable

cloud-fan · 2016-07-18T15:01:05Z

Sorry I was OOO last few days, LGTM except some minor comments, thanks for working on it!

SparkQA · 2016-07-18T17:04:36Z

Test build #62471 has finished for PR 13990 at commit 1e35779.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

techaddict · 2016-07-19T02:22:19Z

@cloud-fan Comment addressed, test passed 👍

cloud-fan · 2016-07-19T05:08:46Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala

+
+  override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false)
+
+  override def checkInputDataTypes(): TypeCheckResult = {


looks like it's simpler to follow XPathExtract to do the type check, i.e. implement ExpectsInputTypes to check the type, and override checkInputDataTypes for the foldable check.

SparkQA · 2016-07-21T19:00:29Z

Test build #62681 has finished for PR 13990 at commit 8aabd37.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? This PR adds `str_to_map` SQL function in order to remove Hive fallback. ## How was this patch tested? Pass the Jenkins tests with newly added. Author: Sandeep Singh <sandeep@techaddict.me> Closes #13990 from techaddict/SPARK-16287. (cherry picked from commit df2c6d5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2016-07-22T02:06:35Z

thanks, merging to master and 2.0!

srowen · 2016-07-22T11:36:57Z

#14315 fixed the odd compile error for this.

Is this really something we should be merging in branch 2.0 now? this looks like part of a new feature, and not even obviously something for 2.0.1.

cloud-fan · 2016-07-22T13:42:42Z

@srowen please see https://issues.apache.org/jira/browse/SPARK-16275, there is an explanation why we wanna merge them into 2.0

techaddict added 2 commits July 3, 2016 15:45

[SPARK-16287][SQL] Implement str_to_map SQL function

2dd04f4

fix codeGen

fa294bc

techaddict force-pushed the SPARK-16287 branch from 1f888ab to fa294bc Compare July 3, 2016 10:21

techaddict changed the title ~~[SPARK-16287][SQL][WIP] Implement str_to_map SQL function~~ [SPARK-16287][SQL] Implement str_to_map SQL function Jul 3, 2016

cloud-fan reviewed Jul 3, 2016
View reviewed changes

Address comments

6b2390d

cloud-fan reviewed Jul 3, 2016
View reviewed changes

address comments

ca74f9a

cloud-fan reviewed Jul 6, 2016
View reviewed changes

techaddict added 3 commits July 6, 2016 08:30

Merge master

da7d6ac

address comments

f7c03c5

remove ctx.addMutableState

d1573b6

rxin reviewed Jul 6, 2016
View reviewed changes

added example

94c18ff

dongjoon-hyun reviewed Jul 6, 2016
View reviewed changes

techaddict added 2 commits July 13, 2016 20:36

add argument type checking tests

f2727a7

fix test

cbc8798

cloud-fan reviewed Jul 13, 2016
View reviewed changes

techaddict added 2 commits July 13, 2016 22:38

fix error

c705988

fix typo

e701716

cloud-fan reviewed Jul 13, 2016
View reviewed changes

remove end2end tests

8172bd5

cloud-fan reviewed Jul 18, 2016
View reviewed changes

techaddict added 2 commits July 18, 2016 20:32

Merge branch 'master' into SPARK-16287

4b60c5a

address comments

1e35779

cloud-fan reviewed Jul 19, 2016
View reviewed changes

techaddict added 2 commits July 21, 2016 22:26

Merge branch 'master' into SPARK-16287

8e2e61f

extend expectsInputtypes

8aabd37

asfgit closed this in df2c6d5 Jul 22, 2016

viirya mentioned this pull request Dec 26, 2019

[SPARK-30356][SQL] Codegen support for the function str_to_map #27013

Closed

MaxGekk mentioned this pull request Sep 2, 2022

[SPARK-40308][SQL] Allow non-foldable delimiter arguments to str_to_map function #37763

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-16287][SQL] Implement str_to_map SQL function #13990

[SPARK-16287][SQL] Implement str_to_map SQL function #13990

techaddict commented Jun 30, 2016

SparkQA commented Jun 30, 2016

techaddict commented Jul 3, 2016

SparkQA commented Jul 3, 2016

cloud-fan Jul 3, 2016

techaddict Jul 3, 2016

techaddict Jul 3, 2016

cloud-fan Jul 3, 2016

techaddict Jul 3, 2016

cloud-fan Jul 3, 2016

SparkQA commented Jul 3, 2016

SparkQA commented Jul 5, 2016

cloud-fan Jul 6, 2016

rxin Jul 6, 2016

rxin Jul 6, 2016

techaddict Jul 6, 2016

rxin commented Jul 6, 2016

SparkQA commented Jul 6, 2016

dongjoon-hyun commented Jul 6, 2016

dongjoon-hyun Jul 6, 2016

cloud-fan Jul 13, 2016

cloud-fan Jul 13, 2016

SparkQA commented Jul 13, 2016

SparkQA commented Jul 13, 2016

SparkQA commented Jul 13, 2016

techaddict commented Jul 15, 2016

cloud-fan Jul 18, 2016

cloud-fan commented Jul 18, 2016

SparkQA commented Jul 18, 2016

techaddict commented Jul 19, 2016

cloud-fan Jul 19, 2016

SparkQA commented Jul 21, 2016

cloud-fan commented Jul 22, 2016

srowen commented Jul 22, 2016

cloud-fan commented Jul 22, 2016 •

edited

Loading


		override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false)

		override def checkInputDataTypes(): TypeCheckResult = {

[SPARK-16287][SQL] Implement str_to_map SQL function #13990

[SPARK-16287][SQL] Implement str_to_map SQL function #13990

Conversation

techaddict commented Jun 30, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jun 30, 2016

techaddict commented Jul 3, 2016

SparkQA commented Jul 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jul 3, 2016

SparkQA commented Jul 5, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rxin commented Jul 6, 2016

SparkQA commented Jul 6, 2016

dongjoon-hyun commented Jul 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jul 13, 2016

SparkQA commented Jul 13, 2016

SparkQA commented Jul 13, 2016

techaddict commented Jul 15, 2016

Choose a reason for hiding this comment

cloud-fan commented Jul 18, 2016

SparkQA commented Jul 18, 2016

techaddict commented Jul 19, 2016

Choose a reason for hiding this comment

SparkQA commented Jul 21, 2016

cloud-fan commented Jul 22, 2016

srowen commented Jul 22, 2016

cloud-fan commented Jul 22, 2016 • edited Loading

cloud-fan commented Jul 22, 2016 •

edited

Loading