[SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API docs for non-MLib changes #13394

vectorijk · 2016-05-29T10:50:13Z

What changes were proposed in this pull request?

R Docs changes
include typos, format, layout.

How was this patch tested?

Test locally.

vectorijk · 2016-05-29T10:52:20Z

cc @felixcheung @shivaram @sun-rui

SparkQA · 2016-05-29T11:00:43Z

Test build #59589 has finished for PR 13394 at commit 7961bbe.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

vectorijk · 2016-05-29T22:12:16Z

Jenkins test this please

SparkQA · 2016-05-29T22:33:00Z

Test build #59600 has finished for PR 13394 at commit 7961bbe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2016-05-30T22:06:44Z

@felixcheung Could you take a look at this PR ?

shivaram · 2016-05-30T22:10:54Z

@vectorijk Thanks for the PR. Changes look pretty good to me.
We also need to update the programming guide (the one at http://spark.apache.org/docs/latest/sparkr.html) to cover the major new features. This will include
(a) UDFs with dapply, dapplyCollect and
(b) spark.lapply for running parallel R functions
(c) the change to not require sqlContext

We can do that in a separate JIRA/PR or if you wish we can also do it in this PR.

felixcheung · 2016-05-31T01:18:01Z

(c) the change to not require sqlContext
This was in the earlier PR, under the migration guide session.

felixcheung · 2016-05-31T01:19:00Z

R/pkg/R/DataFrame.R

-#' @rdname tojson
-#' @noRd
+#' @rdname toJSON
+#' @name toJSON


wait, why are we changing this from @noRd? this is not exported from SparkR and should not be documented.

vectorijk · 2016-05-31T11:17:23Z

@shivaram For updating the programming guide, I'd love to do this in a separate PR.

SparkQA · 2016-05-31T12:21:30Z

Test build #59648 has finished for PR 13394 at commit 294cadd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2016-05-31T15:58:41Z

Thanks @vectorijk - I created https://issues.apache.org/jira/browse/SPARK-15672 for that

jkbradley · 2016-05-31T21:34:29Z

R/pkg/R/DataFrame.R

@@ -2514,7 +2529,9 @@ setMethod("attach",
 #' environment. Then, the given expression is evaluated in this new
 #' environment.
 #'
+#' @title with


@shivaram Is this supposed to be a long-form title, or just the name of the method? Looking at other examples, it looks like it should be a short description

@shivaram Yes, I also notice titles of other examples are not consistent. Which one should we use? Short description or just the name of the method.

I think we should follow the example of existing R packages and use the long form as the title. For example if you look at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html the title of the page is "Fitting Generalized Linear Models"

@vectorijk Could you please update the PR this way?

@jkbradley I will do it ASAP.

SparkQA · 2016-06-03T01:37:47Z

Test build #59909 has finished for PR 13394 at commit 432710e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-06-03T04:58:47Z

R/pkg/R/DataFrame.R

@@ -1069,6 +1079,8 @@ setMethod("first",
 #'
 #' @param x A SparkDataFrame
 #'
+#' @family SparkDataFrame functions
+#' @rdname tordd


let's revert these 2 lines as well? thanks

@felixcheung ok, should we also remove these two lines in toJSON part in line 631?

vectorijk · 2016-06-06T07:52:27Z

R/pkg/R/DataFrame.R

@@ -628,8 +628,6 @@ setMethod("repartition",
 #'
 #' @param x A SparkDataFrame
 #' @return A StringRRDD of JSON objects
-#' @family SparkDataFrame functions


@felixcheung I removed these two lines in toJSON part. Correct me, if I am wrong.

SparkQA · 2016-06-06T08:19:03Z

Test build #60030 has finished for PR 13394 at commit c6d516a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2016-06-06T20:12:27Z

R/pkg/R/DataFrame.R

@@ -647,7 +645,7 @@ setMethod("toJSON",
            RDD(jrdd, serializedMode = "string")
          })

-#' write.json
+#' Save the contents of DataFrame as a JSON file


Can we use SparkDataFrame as opposed to DataFrame (see https://issues.apache.org/jira/browse/SPARK-12148 for some more details).

jkbradley · 2016-06-13T05:32:09Z

R/pkg/R/mllib.R

@@ -582,9 +586,9 @@ setMethod("summary", signature(object = "AFTSurvivalRegressionModel"),
            return(list(coefficients = coefficients))
          })

-#' Make predictions from an AFT survival regression model
+#' predict


ditto: keep long title

- this changes might happen in apache#13109

SparkQA · 2016-06-14T07:23:52Z

Test build #60473 has finished for PR 13394 at commit 2537b8f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-06-15T19:07:51Z

This LGTM now. @shivaram @felixcheung let me know if I missed something; I'm still getting used to R doc syntax & conventions.
I'll merge after rerunning tests

felixcheung · 2016-06-15T19:28:35Z

LGTM thanks! only a minor comment - in the cases where you have a short title (eg. "predict", "Histogram") can you think of a longer more descriptive title? that would help make it consistent with everything else.

jkbradley · 2016-06-15T19:31:52Z

@felixcheung Check out the comment above from @vectorijk about putting multiple predict methods in a single page. Is there a better way to organize these?

shivaram · 2016-06-15T19:57:12Z

Yeah I think the approach used by @vectorijk is fine. We could have the title as Model Predictions instead of predict (this is what R uses when you do ?predict)

vectorijk · 2016-06-15T20:10:48Z

Thanks! @jkbradley @felixcheung @shivaram Sure. How about use title Predicted values based on model object instead of using predict (like https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.lm.html)
and use title Compute histogram statistics for given column instead of Histogram ?

jkbradley · 2016-06-15T20:25:39Z

How about Predicted values based on model (no "object")?

Compute histogram statistics for given column sounds good to me.

SparkQA · 2016-06-15T21:33:13Z

Test build #3109 has finished for PR 13394 at commit 2537b8f.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

vectorijk · 2016-06-15T22:45:57Z

R/pkg/R/mllib.R

@@ -402,6 +406,8 @@ setMethod("spark.naiveBayes", signature(data = "SparkDataFrame", formula = "form
        return(new("NaiveBayesModel", jobj = jobj))
    })

+#' Save fitted MLlib model to the input path


@jkbradley Likewise, I changed title write.ml to Save fitted MLlib model to the input path rather than Save the Bernoulli naive Bayes model to the input path. for all four different models.

SparkQA · 2016-06-15T23:13:49Z

Test build #60611 has finished for PR 13394 at commit 84bf2aa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-06-17T02:38:37Z

LGTM
Merging with master
Thanks!

…MLib changes ## What changes were proposed in this pull request? R Docs changes include typos, format, layout. ## How was this patch tested? Test locally. Author: Kai Jiang <jiangkai@gmail.com> Closes #13394 from vectorijk/spark-15490. (cherry picked from commit 5fd20b6) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

### What changes were proposed in this pull request? This pr aims to upgrade netty from 4.1.92 to 4.1.93. ### Why are the changes needed? 1.v4.1.92 VS v4.1.93 netty/netty@netty-4.1.92.Final...netty-4.1.93.Final 2.The new version brings some bug fix, eg: - Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320)) - OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365)) - Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366)) - HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368)) - Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375)) - H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388)) - Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389)) - HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394)) - Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399)) 3.The release notes as follows: - https://netty.io/news/2023/05/25/4-1-93-Final.html 4.Why not upgrade to `4-1-94-Final` version? Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version. The reference is as follows: https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164 https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. Closes #41681 from panbingkun/upgrade_netty. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

felixcheung reviewed May 31, 2016
View reviewed changes

vectorijk mentioned this pull request May 31, 2016

[SPARK-15177] [SparkR] [ML] SparkR 2.0 QA: New R APIs and API docs for mllib.R #13023

Closed

jkbradley reviewed May 31, 2016
View reviewed changes

felixcheung reviewed Jun 3, 2016
View reviewed changes

vectorijk force-pushed the spark-15490 branch from 440756e to c6d516a Compare June 6, 2016 07:51

vectorijk reviewed Jun 6, 2016
View reviewed changes

shivaram reviewed Jun 6, 2016
View reviewed changes

jkbradley reviewed Jun 13, 2016
View reviewed changes

vectorijk added 8 commits June 13, 2016 23:24

QA for non-MLlib changes

789e127

address comments

aa81083

revert changes in R/pkg/R/stats.R

8ab88d7

- this changes might happen in apache#13109

address comment and more changes

8a89ad2

address comments

9629184

use first line as the title convention

560ff0e

revert

23fc1d7

address comment

2537b8f

vectorijk force-pushed the spark-15490 branch from 72bce54 to 2537b8f Compare June 14, 2016 06:56

change more titles

84bf2aa

vectorijk reviewed Jun 15, 2016
View reviewed changes

asfgit closed this in 5fd20b6 Jun 17, 2016

shivaram mentioned this pull request Jun 20, 2016

[SPARK-16053][R] Add spark_partition_id in SparkR #13768

Closed

vectorijk mentioned this pull request Jun 21, 2016

[SPARK-16107] [R] group glm methods in documentation #13820

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API docs for non-MLib changes #13394

[SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API docs for non-MLib changes #13394

vectorijk commented May 29, 2016

vectorijk commented May 29, 2016

SparkQA commented May 29, 2016

vectorijk commented May 29, 2016

SparkQA commented May 29, 2016

shivaram commented May 30, 2016

shivaram commented May 30, 2016

felixcheung commented May 31, 2016

felixcheung May 31, 2016

vectorijk commented May 31, 2016

SparkQA commented May 31, 2016

shivaram commented May 31, 2016

jkbradley May 31, 2016

vectorijk Jun 1, 2016

shivaram Jun 1, 2016

jkbradley Jun 1, 2016

vectorijk Jun 1, 2016

SparkQA commented Jun 3, 2016

felixcheung Jun 3, 2016

vectorijk Jun 3, 2016

vectorijk Jun 6, 2016

SparkQA commented Jun 6, 2016

shivaram Jun 6, 2016

jkbradley Jun 13, 2016

SparkQA commented Jun 14, 2016

jkbradley commented Jun 15, 2016

felixcheung commented Jun 15, 2016

jkbradley commented Jun 15, 2016

shivaram commented Jun 15, 2016

vectorijk commented Jun 15, 2016

jkbradley commented Jun 15, 2016

SparkQA commented Jun 15, 2016

vectorijk Jun 15, 2016

SparkQA commented Jun 15, 2016

jkbradley commented Jun 17, 2016

[SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API docs for non-MLib changes #13394

[SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API docs for non-MLib changes #13394

Conversation

vectorijk commented May 29, 2016

What changes were proposed in this pull request?

How was this patch tested?

vectorijk commented May 29, 2016

SparkQA commented May 29, 2016

vectorijk commented May 29, 2016

SparkQA commented May 29, 2016

shivaram commented May 30, 2016

shivaram commented May 30, 2016

felixcheung commented May 31, 2016

Choose a reason for hiding this comment

vectorijk commented May 31, 2016

SparkQA commented May 31, 2016

shivaram commented May 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 6, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jun 14, 2016

jkbradley commented Jun 15, 2016

felixcheung commented Jun 15, 2016

jkbradley commented Jun 15, 2016

shivaram commented Jun 15, 2016

vectorijk commented Jun 15, 2016

jkbradley commented Jun 15, 2016

SparkQA commented Jun 15, 2016

Choose a reason for hiding this comment

SparkQA commented Jun 15, 2016

jkbradley commented Jun 17, 2016