New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5579][SQL][DataFrame] Support for project/filter using SQL expressions #4348
Conversation
rxin
commented
Feb 4, 2015
Test build #26693 has started for PR 4348 at commit
|
Test build #26693 has finished for PR 4348 at commit
|
Test FAILed. |
… to use UDFs A more convenient way to define user-defined functions. Author: Reynold Xin <rxin@databricks.com> Closes apache#4345 from rxin/defineUDF and squashes the following commits: 639c0f8 [Reynold Xin] udf tests. 0a0b339 [Reynold Xin] defineUDF -> udf. b452b8d [Reynold Xin] Fix UDF registration. d2e42c3 [Reynold Xin] SQLContext.udf.register() returns a UserDefinedFunction also. 4333605 [Reynold Xin] [SQL][DataFrame] defineUDF.
…ressions. e.g. df.selectExpr("abs(colA)", "colB") df.filter("age > 21")
@@ -2126,10 +2126,9 @@ def sort(self, *cols): | |||
""" | |||
if not cols: | |||
raise ValueError("should sort by at least one column") | |||
jcols = ListConverter().convert([_to_java_column(c) for c in cols[1:]], | |||
jcols = ListConverter().convert([_to_java_column(c) for c in cols], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies take a look at the Python changes.
Test build #26723 has started for PR 4348 at commit
|
@@ -179,10 +179,20 @@ private[sql] class DataFrameImpl protected[sql]( | |||
select((col +: cols).map(Column(_)) :_*) | |||
} | |||
|
|||
override def selectExpr(exprs: String*): DataFrame = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one could be merged into select(), column is also a valid expression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not if it has space ... it will just fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should work in these cases with this implementation.
select('*', 'a', '`the name`', 'a + 1', 'min(b) * 3')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea - but asking users to wrap a column name in backticks in strings is fairly annoying.
This select() and filter() in Python do not support expressions yet |
Test build #26723 has finished for PR 4348 at commit
|
Test PASSed. |
We can discuss more offline. For now let's keep this separate, otherwise it can be fairly annoying to use column names that contain space or column names that contain any SQL keywords. |
…ressions ```scala df.selectExpr("abs(colA)", "colB") df.filter("age > 21") ``` Author: Reynold Xin <rxin@databricks.com> Closes #4348 from rxin/SPARK-5579 and squashes the following commits: 2baeef2 [Reynold Xin] Fix Python. b416372 [Reynold Xin] [SPARK-5579][SQL][DataFrame] Support for project/filter using SQL expressions. (cherry picked from commit 40c4cb2) Signed-off-by: Reynold Xin <rxin@databricks.com>