-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19084][SQL] Implement expression field #16476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
cec946e
Add field expression
0160654
Add field expression unit test
b441ef6
Add doGenCode and modify the function description
320fd67
Add field expression unit test cases
f340084
Modify function description
a864449
Add support for field children which is not List
04ccadc
Add field function API for DataFrame and unit test cases
657742f
Modify @since version
62e62dc
Add Python API
0edc0e0
Add Apache License
39a4181
Modify import order
16af01a
Move Field class def to ConditionalExpressions and modify the test
35be82e
Modify the line feed's location
45a58a7
Move Field function test to ColumnExpressionSuite
0eb9817
Add null support and corresponding test cases
e78698c
Remove blank lines
4fea138
Modify field function definition to require >= 2 params
7f0f24b
Add blank line
40b3ad2
Modify the typesetting and use syntax sugar to beautify the code
7d347c2
Remove Dataset API, simplify the test cases, and add some comments
46c3986
Tune the code
4761687
Improve the performance and add the tail-rec annotation
6954b27
Add description in comment for strNull match
c565e6c
Improve the performance by reducing the type check and giving up patt…
7f0a7e9
Improve the performance by only looping through columns where dataTyp…
e1914ea
Add explicit NULL support
321c373
Add more comments/docs for parameters with multi types
ce776ec
Remove unused lines and tune the code
c8ed3d4
Change size to length
1c61252
Remove unused import and tune the doc
76278fc
Tune the doc further
ab990fd
Optimization done and align with scala style requirement
d71245f
Switch off scalastyle check in test to allow non-ascii characters
e89ae0f
Remove whitespace
082189a
Modify type check failure hint
32fa22b
Support implicit cast
b5d1675
Not do implicit cast while all have the same type
4b63e94
Update docs
c75d786
Fix
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,10 +17,13 @@ | |
|
|
||
| package org.apache.spark.sql.catalyst.expressions | ||
|
|
||
| import scala.annotation.tailrec | ||
|
|
||
| import org.apache.spark.sql.catalyst.InternalRow | ||
| import org.apache.spark.sql.catalyst.analysis.{TypeCheckResult, TypeCoercion} | ||
| import org.apache.spark.sql.catalyst.expressions.codegen._ | ||
| import org.apache.spark.sql.catalyst.expressions.codegen.Block._ | ||
| import org.apache.spark.sql.catalyst.util.TypeUtils | ||
| import org.apache.spark.sql.types._ | ||
|
|
||
| // scalastyle:off line.size.limit | ||
|
|
@@ -313,3 +316,110 @@ object CaseKeyWhen { | |
| CaseWhen(cases, elseValue) | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. | ||
| * It takes at least 2 parameters, and all parameters can be of any type. | ||
| * Implicit cast will be done when at least 2 parameters have different types, and it will be based | ||
| * on the first parameter's type. | ||
| * If the first parameter is of NumericType, all parameters will be implicitly cast to DoubleType, | ||
| * and those that can't be cast to DoubleType will be regarded as NULL. | ||
| * If the first parameter is of any other type, all parameters will be implicitly cast to StringType | ||
| * and the comparison will follow String's comparing rules. | ||
| * If the search expression is NULL, the return value is 0 because NULL fails equality comparison | ||
| * with any value. | ||
| */ | ||
| // scalastyle:off line.size.limit | ||
| @ExpressionDescription( | ||
| usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", | ||
| extended = """ | ||
| Examples: | ||
| > SELECT _FUNC_(10, 9, 3, 10, 4); | ||
| 3 | ||
| > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); | ||
| 4 | ||
| > SELECT _FUNC_('999', 'a', 999.0, 999, '999'); | ||
| 2 | ||
| """) | ||
| // scalastyle:on line.size.limit | ||
| case class Field(children: Seq[Expression]) extends Expression | ||
| with ImplicitCastInputTypesToSameType { | ||
|
|
||
| /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ | ||
| override def nullable: Boolean = false | ||
| override def foldable: Boolean = children.forall(_.foldable) | ||
|
|
||
| private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) | ||
|
|
||
| override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.length)( | ||
| TypeCollection( | ||
| BinaryType, BooleanType, DateType, NumericType, StringType, TimestampType, NullType)) | ||
|
|
||
| override def checkInputDataTypes(): TypeCheckResult = { | ||
| val result = super.checkInputDataTypes() | ||
| if (result == TypeCheckResult.TypeCheckSuccess) { | ||
| if (children.length <= 1) { | ||
| TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") | ||
| } else { | ||
| TypeCheckResult.TypeCheckSuccess | ||
| } | ||
| } else { | ||
| result | ||
| } | ||
| } | ||
|
|
||
| override def dataType: DataType = IntegerType | ||
|
|
||
| override def eval(input: InternalRow): Any = { | ||
| val target = children.head.eval(input) | ||
| @tailrec def findEqual(index: Int): Int = { | ||
| if (index == children.length) { | ||
| 0 | ||
| } else { | ||
| val value = children(index).eval(input) | ||
| if (value != null && ordering.equiv(target, value)) { | ||
| index | ||
| } else { | ||
| findEqual(index + 1) | ||
| } | ||
| } | ||
| } | ||
| if (target == null) 0 else findEqual(index = 1) | ||
| } | ||
|
|
||
| protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { | ||
| val evalChildren = children.map(_.genCode(ctx)) | ||
| val target = evalChildren(0) | ||
| val targetDataType = children(0).dataType | ||
| val dataTypes = children.map(_.dataType) | ||
|
|
||
| def updateEval(evalWithIndex: (ExprCode, Int)): String = { | ||
| val (eval, index) = evalWithIndex | ||
| s""" | ||
| ${eval.code} | ||
| if (${ctx.genEqual(targetDataType, eval.value, target.value)}) { | ||
| ${ev.value} = ${index}; | ||
| } | ||
| """ | ||
| } | ||
|
|
||
| def genIfElseStructure(code1: String, code2: String): String = { | ||
| s""" | ||
| ${code1} | ||
| else { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the floating |
||
| ${code2} | ||
| } | ||
| """ | ||
| } | ||
|
|
||
| ev.copy(code = | ||
| code""" | ||
| ${target.code} | ||
| boolean ${ev.isNull} = false; | ||
| int ${ev.value} = 0; | ||
| if (!${target.isNull}) { | ||
| ${evalChildren.zipWithIndex.tail.map(updateEval).reduceRight(genIfElseStructure)} | ||
| } | ||
| """.stripMargin) | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the name it felt like this method would put
code1inifblock andcode2inelseblock but turns out thats not the case. That floatingelselooks weird.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can change this function's name? But actually I can't think of a better name. : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe we can use
foldLeftto replace current approach to get rid of the floatingelse.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't understand, how to use
foldLeftapproach? I think we can only usefoldRightorreduceRight, because the code for latter children should be nested inner.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, right, if use
foldLeft, there is still a floatingelse. We can only usefoldRightto remove it.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to use
foldRightto remove it?My thought: If I understand your meaning of floating
elseright(could you please explain it a little bit?),foldRightandreduceRightboth can't avoid floatingelse, because we need nestedelseinelseblock, like this:if (xxx) else { if (xxx) else { ... } }, so if we avoid floatingelseingenIfElseStructure,elseshould be inupdateEval, which will make the code unclear and complicated.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks like:
You can do this with a function like you did before. It will have a empty "else" block at the end.
However this doesn't affect the functionality, just dealing with how the code looks. I don't have strong option about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think whats already present in the code is ok. Given that there is no better option without adding more complexity, lets stick with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya Maybe the order of
codeandevalWithIndexparameters should be changed.@tejasapatil I agree with your opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current code is ok.