Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13049] Add First/last with ignore nulls to functions.scala #10957

Closed
wants to merge 5 commits into from

Conversation

hvanhovell
Copy link
Contributor

This PR adds the ability to specify the ignoreNulls option to the functions dsl, e.g:
df.select($"id", last($"value", ignoreNulls = true).over(Window.partitionBy($"id").orderBy($"other"))

This PR is some where between a bug fix (see the JIRA) and a new feature. I am not sure if we should backport to 1.6.

cc @yhuai

@yhuai
Copy link
Contributor

yhuai commented Jan 27, 2016

@hvanhovell Thanks for the PR. Do you know why expr/callUDF does not work?

@hvanhovell
Copy link
Contributor Author

@yhuai expr("last(r, true)") would return an UnresolvedFunction(UnresolvedAttribute(r), Literal(true)). The problem is that the WindowSpec does not recognize UnresolvedFunction's.

This is the cleaner fix. We could also add a match to the WindowSpec for unresolved functions.

@hvanhovell
Copy link
Contributor Author

retest this please

1 similar comment
@hvanhovell
Copy link
Contributor Author

retest this please

@rxin
Copy link
Contributor

rxin commented Jan 27, 2016

Why might this be a bug fix?

@hvanhovell
Copy link
Contributor Author

A user is trying to get this working on 1.6 using the dataframe api. That doesn't work directly because functions.scala misses the functions implemented in this PR. The indirect approach using expr(...) doesn't work because WindowSpec does not support UnresolvedFunctions.

I guess this is more a feature than a bug fix....

@SparkQA
Copy link

SparkQA commented Jan 27, 2016

Test build #50231 has finished for PR 10957 at commit defcc02.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 29, 2016

Actually can you update the Python API as well?

@SparkQA
Copy link

SparkQA commented Jan 31, 2016

Test build #50462 has finished for PR 10957 at commit b002d60.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 31, 2016

Test build #50463 has finished for PR 10957 at commit 6e4da4f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
def first(e: Column): Column = withAggregateFunction { new First(e.expr) }
* Aggregate function: returns the first value in a group. The function does not consider null
* values when the ignoreNulls flag is set to true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write something like this to be more clear? And update all the docs (including Python).

"The function by default includes the first value it sees. When ignoreNulls is set to true, then it ignores the null values and includes the first non-null value. If all values are null, then null is returned."

@rxin
Copy link
Contributor

rxin commented Jan 31, 2016

Thanks - only some minor comment on the documentation to make it more clear.

@SparkQA
Copy link

SparkQA commented Jan 31, 2016

Test build #50464 has finished for PR 10957 at commit 809c999.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 31, 2016

Thanks - merging this in master.

@asfgit asfgit closed this in 5a8b978 Jan 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants