-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16287][SQL] Implement str_to_map SQL function #13990
Conversation
Test build #61525 has finished for PR 13990 at commit
|
cc: @cloud-fan @rxin |
Test build #61685 has finished for PR 13990 at commit
|
* Creates a map after splitting the input text into key/value pairs using delimeters | ||
*/ | ||
@ExpressionDescription( | ||
usage = """_FUNC_(text[, delimiter1, delimiter2]) - Creates a map after splitting the text into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delimiter1
and delimiter2
are not good names. delimiter1
is used to separate key-value pairs from the input text, and delimiter2
is used to separate key and value from each kv pair. Do you have some ideas about the naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about pairDelim
and pairSeperatorDelim
, not very good with naming what do you suggest ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used delimiter1
and delimiter2
because its named that way in hive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about pairDelim
and keyValueDelim
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yupp sound much better, let me make the change
usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into | ||
key/value pairs using delimiters. | ||
Default delimiters are ',' for pairDelim and '=' for keyValueDelim.""") | ||
case class StringToMap(child: Expression, pairDelim: Expression, keyValueDelim: Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about renaming child
to text
? to make it consistent with the comment: _FUNC_(text[, pairDelim, keyValueDelim])
Test build #61690 has finished for PR 13990 at commit
|
Test build #61767 has finished for PR 13990 at commit
|
.split(delim1.asInstanceOf[UTF8String], -1) | ||
.map{_.split(delim2.asInstanceOf[UTF8String], 2)} | ||
|
||
ArrayBasedMapData(array.map(_(0)), array.map(_(1))).asInstanceOf[MapData] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems unnecessary asInstanceOf
?
* Creates a map after splitting the input text into key/value pairs using delimeters | ||
*/ | ||
@ExpressionDescription( | ||
usage = """_FUNC_(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will mess up the display i think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we really need an example here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about the display [Usage: str_to_map(text[, pairDelim, keyValueDelim]) - Creates a map after splitting the text into key/value pairs using delimiters. Default delimiters are ',' for pairDelim and '=' for keyValueDelim.]
added example
cc @dongjoon-hyun can you help review this |
Test build #61806 has finished for PR 13990 at commit
|
Sure, @rxin . |
|
||
def this(child: Expression) = { | ||
this(child, Literal(","), Literal("=")) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @techaddict .
Could you add one more constructor, this(child: Expression, pairDelim: Expression)
?
TypeCheckResult.TypeCheckSuccess | ||
} else { | ||
TypeCheckResult.TypeCheckFailure( | ||
s"$prettyName's arguments must be foldable, but got $children.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mistake? 2 delimiters not all arguments
) | ||
|
||
// All arguments should be string literals. | ||
val m1 = intercept[AnalysisException]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove these error tests from here, usually we only test the type checking logic in unit test, not end-to-end test.
Test build #62250 has finished for PR 13990 at commit
|
Test build #62256 has finished for PR 13990 at commit
|
Test build #62257 has finished for PR 13990 at commit
|
@cloud-fan anything else, it good to merge ? |
TypeCheckResult.TypeCheckSuccess | ||
} else { | ||
TypeCheckResult.TypeCheckFailure( | ||
s"$prettyName's delimiters must be foldable, but got $children.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$children
will print something like Seq(xxx, xxx)
, I think we can just say $prettyName's delimiters must be foldable
Sorry I was OOO last few days, LGTM except some minor comments, thanks for working on it! |
Test build #62471 has finished for PR 13990 at commit
|
@cloud-fan Comment addressed, test passed 👍 |
|
||
override def dataType: DataType = MapType(StringType, StringType, valueContainsNull = false) | ||
|
||
override def checkInputDataTypes(): TypeCheckResult = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like it's simpler to follow XPathExtract
to do the type check, i.e. implement ExpectsInputTypes
to check the type, and override checkInputDataTypes
for the foldable check.
Test build #62681 has finished for PR 13990 at commit
|
## What changes were proposed in this pull request? This PR adds `str_to_map` SQL function in order to remove Hive fallback. ## How was this patch tested? Pass the Jenkins tests with newly added. Author: Sandeep Singh <sandeep@techaddict.me> Closes #13990 from techaddict/SPARK-16287. (cherry picked from commit df2c6d5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
thanks, merging to master and 2.0! |
#14315 fixed the odd compile error for this. Is this really something we should be merging in branch 2.0 now? this looks like part of a new feature, and not even obviously something for 2.0.1. |
@srowen please see https://issues.apache.org/jira/browse/SPARK-16275, there is an explanation why we wanna merge them into 2.0 |
What changes were proposed in this pull request?
This PR adds
str_to_map
SQL function in order to remove Hive fallback.How was this patch tested?
Pass the Jenkins tests with newly added.