Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs and API docs for non-MLib changes #13394

Closed
wants to merge 9 commits into from

Conversation

vectorijk
Copy link
Contributor

What changes were proposed in this pull request?

R Docs changes
include typos, format, layout.

How was this patch tested?

Test locally.

@vectorijk
Copy link
Contributor Author

cc @felixcheung @shivaram @sun-rui

@SparkQA
Copy link

SparkQA commented May 29, 2016

Test build #59589 has finished for PR 13394 at commit 7961bbe.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vectorijk
Copy link
Contributor Author

Jenkins test this please

@SparkQA
Copy link

SparkQA commented May 29, 2016

Test build #59600 has finished for PR 13394 at commit 7961bbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

@felixcheung Could you take a look at this PR ?

@shivaram
Copy link
Contributor

@vectorijk Thanks for the PR. Changes look pretty good to me.
We also need to update the programming guide (the one at http://spark.apache.org/docs/latest/sparkr.html) to cover the major new features. This will include
(a) UDFs with dapply, dapplyCollect and
(b) spark.lapply for running parallel R functions
(c) the change to not require sqlContext

We can do that in a separate JIRA/PR or if you wish we can also do it in this PR.

@felixcheung
Copy link
Member

(c) the change to not require sqlContext
This was in the earlier PR, under the migration guide session.

#' @rdname tojson
#' @noRd
#' @rdname toJSON
#' @name toJSON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, why are we changing this from @noRd? this is not exported from SparkR and should not be documented.

@vectorijk
Copy link
Contributor Author

@shivaram For updating the programming guide, I'd love to do this in a separate PR.

@SparkQA
Copy link

SparkQA commented May 31, 2016

Test build #59648 has finished for PR 13394 at commit 294cadd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

Thanks @vectorijk - I created https://issues.apache.org/jira/browse/SPARK-15672 for that

@@ -2514,7 +2529,9 @@ setMethod("attach",
#' environment. Then, the given expression is evaluated in this new
#' environment.
#'
#' @title with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivaram Is this supposed to be a long-form title, or just the name of the method? Looking at other examples, it looks like it should be a short description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shivaram Yes, I also notice titles of other examples are not consistent. Which one should we use? Short description or just the name of the method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should follow the example of existing R packages and use the long form as the title. For example if you look at https://stat.ethz.ch/R-manual/R-devel/library/stats/html/glm.html the title of the page is "Fitting Generalized Linear Models"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vectorijk Could you please update the PR this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbradley I will do it ASAP.

@SparkQA
Copy link

SparkQA commented Jun 3, 2016

Test build #59909 has finished for PR 13394 at commit 432710e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1069,6 +1079,8 @@ setMethod("first",
#'
#' @param x A SparkDataFrame
#'
#' @family SparkDataFrame functions
#' @rdname tordd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's revert these 2 lines as well? thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung ok, should we also remove these two lines in toJSON part in line 631?

@@ -628,8 +628,6 @@ setMethod("repartition",
#'
#' @param x A SparkDataFrame
#' @return A StringRRDD of JSON objects
#' @family SparkDataFrame functions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung I removed these two lines in toJSON part. Correct me, if I am wrong.

@SparkQA
Copy link

SparkQA commented Jun 6, 2016

Test build #60030 has finished for PR 13394 at commit c6d516a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -647,7 +645,7 @@ setMethod("toJSON",
RDD(jrdd, serializedMode = "string")
})

#' write.json
#' Save the contents of DataFrame as a JSON file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use SparkDataFrame as opposed to DataFrame (see https://issues.apache.org/jira/browse/SPARK-12148 for some more details).

@@ -582,9 +586,9 @@ setMethod("summary", signature(object = "AFTSurvivalRegressionModel"),
return(list(coefficients = coefficients))
})

#' Make predictions from an AFT survival regression model
#' predict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: keep long title

@SparkQA
Copy link

SparkQA commented Jun 14, 2016

Test build #60473 has finished for PR 13394 at commit 2537b8f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

This LGTM now. @shivaram @felixcheung let me know if I missed something; I'm still getting used to R doc syntax & conventions.
I'll merge after rerunning tests

@felixcheung
Copy link
Member

LGTM thanks! only a minor comment - in the cases where you have a short title (eg. "predict", "Histogram") can you think of a longer more descriptive title? that would help make it consistent with everything else.

@jkbradley
Copy link
Member

@felixcheung Check out the comment above from @vectorijk about putting multiple predict methods in a single page. Is there a better way to organize these?

@shivaram
Copy link
Contributor

Yeah I think the approach used by @vectorijk is fine. We could have the title as Model Predictions instead of predict (this is what R uses when you do ?predict)

@vectorijk
Copy link
Contributor Author

Thanks! @jkbradley @felixcheung @shivaram Sure. How about use title Predicted values based on model object instead of using predict (like https://stat.ethz.ch/R-manual/R-devel/library/stats/html/predict.lm.html)
and use title Compute histogram statistics for given column instead of Histogram ?

@jkbradley
Copy link
Member

How about Predicted values based on model (no "object")?

Compute histogram statistics for given column sounds good to me.

@SparkQA
Copy link

SparkQA commented Jun 15, 2016

Test build #3109 has finished for PR 13394 at commit 2537b8f.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@@ -402,6 +406,8 @@ setMethod("spark.naiveBayes", signature(data = "SparkDataFrame", formula = "form
return(new("NaiveBayesModel", jobj = jobj))
})

#' Save fitted MLlib model to the input path
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbradley Likewise, I changed title write.ml to Save fitted MLlib model to the input path rather than Save the Bernoulli naive Bayes model to the input path. for all four different models.

@SparkQA
Copy link

SparkQA commented Jun 15, 2016

Test build #60611 has finished for PR 13394 at commit 84bf2aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Merging with master
Thanks!

@asfgit asfgit closed this in 5fd20b6 Jun 17, 2016
asfgit pushed a commit that referenced this pull request Jun 17, 2016
…MLib changes

## What changes were proposed in this pull request?
R Docs changes
include typos, format, layout.
## How was this patch tested?
Test locally.

Author: Kai Jiang <jiangkai@gmail.com>

Closes #13394 from vectorijk/spark-15490.

(cherry picked from commit 5fd20b6)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
dongjoon-hyun pushed a commit that referenced this pull request Jun 28, 2023
### What changes were proposed in this pull request?
This pr aims to upgrade netty from 4.1.92 to 4.1.93.

### Why are the changes needed?
1.v4.1.92 VS v4.1.93
netty/netty@netty-4.1.92.Final...netty-4.1.93.Final

2.The new version brings some bug fix, eg:
- Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320))
- OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365))
- Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366))
- HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368))
- Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375))
- H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388))
- Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389))
- HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394))
- Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399))

3.The release notes as follows:
- https://netty.io/news/2023/05/25/4-1-93-Final.html

4.Why not upgrade to `4-1-94-Final` version?
Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version.

The reference is as follows:
https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164
https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA.

Closes #41681 from panbingkun/upgrade_netty.

Authored-by: panbingkun <pbk1982@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants