Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20371][R] Add wrappers for collect_list and collect_set #17672

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions R/pkg/NAMESPACE
Expand Up @@ -203,6 +203,8 @@ exportMethods("%in%",
"cbrt",
"ceil",
"ceiling",
"collect_list",
"collect_set",
"column",
"concat",
"concat_ws",
Expand Down
40 changes: 40 additions & 0 deletions R/pkg/R/functions.R
Expand Up @@ -3705,3 +3705,43 @@ setMethod("create_map",
jc <- callJStatic("org.apache.spark.sql.functions", "map", jcols)
column(jc)
})

#' collect_list
#'
#' Creates a list of objects with duplicates.
#'
#' @param x Column to compute on
#'
#' @rdname collect_list
#' @name collect_list
#' @family agg_funcs
#' @aliases collect_list,Column-method
#' @export
#' @examples \dontrun{collect_list(df$x)}
#' @note collect_list since 2.3.0
setMethod("collect_list",
signature(x = "Column"),
function(x) {
jc <- callJStatic("org.apache.spark.sql.functions", "collect_list", x@jc)
column(jc)
})

#' collect_set
#'
#' Creates a list of objects with duplicate elements eliminated.
#'
#' @param x Column to compute on
#'
#' @rdname collect_set
#' @name collect_set
#' @family agg_funcs
#' @aliases collect_set,Column-method
#' @export
#' @examples \dontrun{collect_set(df$x)}
#' @note collect_set since 2.3.0
setMethod("collect_set",
signature(x = "Column"),
function(x) {
jc <- callJStatic("org.apache.spark.sql.functions", "collect_set", x@jc)
column(jc)
})
9 changes: 9 additions & 0 deletions R/pkg/R/generics.R
Expand Up @@ -918,6 +918,14 @@ setGeneric("cbrt", function(x) { standardGeneric("cbrt") })
#' @export
setGeneric("ceil", function(x) { standardGeneric("ceil") })

#' @rdname collect_list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are under ###################### Expression Function Methods ##########################
which doesn't seem like the right group

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's continue here #17674 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it's good.

#' @export
setGeneric("collect_list", function(x) { standardGeneric("collect_list") })

#' @rdname collect_set
#' @export
setGeneric("collect_set", function(x) { standardGeneric("collect_set") })

#' @rdname column
#' @export
setGeneric("column", function(x) { standardGeneric("column") })
Expand Down Expand Up @@ -1358,6 +1366,7 @@ setGeneric("window", function(x, ...) { standardGeneric("window") })
#' @export
setGeneric("year", function(x) { standardGeneric("year") })


###################### Spark.ML Methods ##########################

#' @rdname fitted
Expand Down
22 changes: 22 additions & 0 deletions R/pkg/inst/tests/testthat/test_sparkSQL.R
Expand Up @@ -1731,6 +1731,28 @@ test_that("group by, agg functions", {
expect_true(abs(sd(1:2) - 0.7071068) < 1e-6)
expect_true(abs(var(1:5, 1:5) - 2.5) < 1e-6)

# Test collect_list and collect_set
gd3_collections_local <- collect(
agg(gd3, collect_set(df8$age), collect_list(df8$age))
)

expect_equal(
unlist(gd3_collections_local[gd3_collections_local$name == "Andy", 2]),
c(30)
)

expect_equal(
unlist(gd3_collections_local[gd3_collections_local$name == "Andy", 3]),
c(30, 30)
)

expect_equal(
sort(unlist(
gd3_collections_local[gd3_collections_local$name == "Justin", 3]
)),
c(1, 19)
)

unlink(jsonPath2)
unlink(jsonPath3)
})
Expand Down