-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13049] Add First/last with ignore nulls to functions.scala #10957
Conversation
@hvanhovell Thanks for the PR. Do you know why expr/callUDF does not work? |
@yhuai This is the cleaner fix. We could also add a match to the |
retest this please |
1 similar comment
retest this please |
Why might this be a bug fix? |
A user is trying to get this working on 1.6 using the dataframe api. That doesn't work directly because functions.scala misses the functions implemented in this PR. The indirect approach using I guess this is more a feature than a bug fix.... |
Test build #50231 has finished for PR 10957 at commit
|
Actually can you update the Python API as well? |
Test build #50462 has finished for PR 10957 at commit
|
Test build #50463 has finished for PR 10957 at commit
|
*/ | ||
def first(e: Column): Column = withAggregateFunction { new First(e.expr) } | ||
* Aggregate function: returns the first value in a group. The function does not consider null | ||
* values when the ignoreNulls flag is set to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write something like this to be more clear? And update all the docs (including Python).
"The function by default includes the first value it sees. When ignoreNulls is set to true, then it ignores the null values and includes the first non-null value. If all values are null, then null is returned."
Thanks - only some minor comment on the documentation to make it more clear. |
Test build #50464 has finished for PR 10957 at commit
|
Thanks - merging this in master. |
This PR adds the ability to specify the
ignoreNulls
option to the functions dsl, e.g:df.select($"id", last($"value", ignoreNulls = true).over(Window.partitionBy($"id").orderBy($"other"))
This PR is some where between a bug fix (see the JIRA) and a new feature. I am not sure if we should backport to 1.6.
cc @yhuai