Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20889][SparkR] Grouped documentation for DATETIME column methods #18114

Closed
wants to merge 7 commits into from

Conversation

actuaryzhang
Copy link
Contributor

What changes were proposed in this pull request?

Grouped documentation for datetime column methods.

@actuaryzhang
Copy link
Contributor Author

actuaryzhang commented May 25, 2017

@felixcheung
Created this PR to update the doc for the date time methods, similar to #18025. About 27 date time methods are documented into one page.
I'm attaching the snapshot of part of the new help page.

image
image
image

if (class(x) == "Column") {
x <- x@jc
setMethod("datediff", signature(x = "Column"),
function(x, y) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, x and y are reversed for easy documentation. Similarly for other methods that take two arguments.

function(y, x) {
jc <- callJStatic("org.apache.spark.sql.functions", "from_utc_timestamp", y@jc, x)
setMethod("from_utc_timestamp", signature(x = "Column", tz = "character"),
function(x, tz) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the second argument to tz to be consistent with Scala, which also makes it less confusing in the doc since other methods also have y as argument that often refers to a Column.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here on calling by name

@actuaryzhang actuaryzhang changed the title [SPARK-20889][SparkR] Grouped documentation for datetime column methods [SPARK-20889][SparkR] Grouped documentation for DATETIME column methods May 25, 2017
@SparkQA
Copy link

SparkQA commented May 25, 2017

Test build #77391 has finished for PR 18114 at commit 0d2853d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

#' Returns the number of days from \code{start} to \code{end}.
#'
#' @param x start Column to use.
#' @param y end Column to use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two concerns here:

  • the doc description is now fairly generalized - is supposed to be datediff(end, start) but seems like it's not not clear which is end and which is start
  • renaming this could break user calling by name datediff(df$c, x = "foo")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung These names start and end are from the original doc. I now changed it to x and y.

function(y, x) {
jc <- callJStatic("org.apache.spark.sql.functions", "from_utc_timestamp", y@jc, x)
setMethod("from_utc_timestamp", signature(x = "Column", tz = "character"),
function(x, tz) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here on calling by name

@felixcheung
Copy link
Member

maybe it's worthwhile to separate the dff type functions into a separate rd so we don't have to rename/switch the parameter?

@actuaryzhang
Copy link
Contributor Author

@felixcheung Thank you. This is great suggestion. I will split it into two help files which should make the doc much cleaner without changing the functions.

@actuaryzhang
Copy link
Contributor Author

@felixcheung The new commit addresses your concern by splitting methods with two arguments into a separate doc.

@SparkQA
Copy link

SparkQA commented May 26, 2017

Test build #77441 has finished for PR 18114 at commit 944aa92.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • `#' @param x if of class Column, it is used to perform arithmatic operations with

@felixcheung
Copy link
Member

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented May 27, 2017

Test build #77447 has finished for PR 18114 at commit 944aa92.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • `#' @param x if of class Column, it is used to perform arithmatic operations with

@SparkQA
Copy link

SparkQA commented May 27, 2017

Test build #77452 has started for PR 18114 at commit 016bb47.

@actuaryzhang
Copy link
Contributor Author

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented May 27, 2017

Test build #77459 has finished for PR 18114 at commit 016bb47.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • `#' @param x For class Column, it is used to perform arithmetic operations with

@actuaryzhang
Copy link
Contributor Author

For the date time functions, I create two groups: one for arithmetic functions that work with two columns column_datetime_diff_functions, and the other for functions that work with only one column column_datetime_functions. Below is the screenshot for both.

image
image
image

@actuaryzhang
Copy link
Contributor Author

For the column_datetime_diff_functions:
image
image
image

@SparkQA
Copy link

SparkQA commented Jun 20, 2017

Test build #78271 has finished for PR 18114 at commit 311ccc2.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • `#' For class numeric, it is the number of months or days to be added to

@SparkQA
Copy link

SparkQA commented Jun 20, 2017

Test build #78272 has finished for PR 18114 at commit aab9199.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool, thanks

#' @name from_unixtime
#' @aliases from_unixtime,Column-method
#' @rdname column_datetime_functions
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a line with # (vs #')?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

#' format.
#' @section Details:
#' \code{from_unixtime}: Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a
#' string representing the timestamp of that moment in the current system time zone in the given format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to call out that "current system time zone" is the one in JVM - in R one could set the default TZ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#' @note to_utc_timestamp since 1.5.0
setMethod("to_utc_timestamp", signature(y = "Column", x = "character"),
function(y, x) {
jc <- callJStatic("org.apache.spark.sql.functions", "to_utc_timestamp", y@jc, x)
column(jc)
})

#' add_months
#' @section Details:
#' \code{add_months}: Returns the date that is numMonths after startDate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a bit confusing what is numMonths (x) and what is startDate (y)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was the original description. Updated to make it clearer. Also, the examples now will help users figure out how to use these methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I wasn't sure this would be the form you are using in other documentation, feel free to massage it, change it.
also might be clearer here to say that is startDate + numMonths - ie. reverse the order the names are mention to be consistent with parameter order

#'
#' Day of the week parameter is case insensitive, and accepts first three or two characters:
#' "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun".
#' @section Details:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to @details?

#' @examples
#'
#' \dontrun{
#' set.seed(11)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to set seed?

#'
#' @param y Column to compute on.
#' @param x For class Column, it is used to perform arithmetic operations with \code{y}.
#' For class numeric, it is the number of months or days to be added to \code{y}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be added or subtracted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. thx

#'
#' @param x Column to compute on.
#' @param format For \code{to_date} and \code{to_timestamp}, it is the string to use to parse
#' x Column to DateType or TimestampType. For \code{trunc}, it is the string used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space in the string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -546,18 +598,20 @@ setMethod("hash",
column(jc)
})

#' dayofmonth
#' @section Details:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to @details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

#' @export
#' @examples \dontrun{date_format(df$t, 'MM/dd/yyy')}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example is not in the new addition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back.

#' @examples
#' \dontrun{
#' to_timestamp(df$c)
#' to_timestamp(df$c, 'yyyy-MM-dd')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these examples are not added back - could you check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back.

@actuaryzhang
Copy link
Contributor Author

@felixcheung Thanks so much for the review and comments. Super helpful!
I fixed all the issues you have pointed out in the new commit.

@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78403 has finished for PR 18114 at commit a9b1049.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • `#' For class numeric, it is the number of months or days to be added to or subtracted from

@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78407 has finished for PR 18114 at commit 1381dd5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • `#' @param x For class

@actuaryzhang
Copy link
Contributor Author

@felixcheung Any idea what this message means?
This patch adds the following public classes (experimental): #' @Param x For class

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@felixcheung
Copy link
Member

I think it's just the new public class detection thing that

  • doesn't handle R code at all
  • is confused by a code line starting with the word class some where

@felixcheung
Copy link
Member

hmm, waiting for AppVeyor

@HyukjinKwon
Copy link
Member

@felixcheung, would you give me a moment to double check? I am interested in this and want to help double check.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, last nitpicking ...

#'
#' @param y Column to compute on.
#' @param x For class \code{Column}, it is the column used to perform arithmetic operations
#' with column \code{y}.For class \code{numeric}, it is the number of months or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

little nit .F ->. F

#' to_utc = to_utc_timestamp(df$time, 'PST'),
#' to_unix = unix_timestamp(df$time),
#' to_unix2 = unix_timestamp(df$time, 'yyyy-MM-dd HH'),
#' from_unix = from_unixtime(unix_timestamp(df$time)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks this one should go to column_datetime_functions or from_unixtime should be in column_datetime_diff_functions.

unix_timestamp looks a ditto.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And ... from_unixtime(df$t, 'yyyy/MM/dd HH') looks missed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this I don't get - why from_unixtime belongs to column_datetime_diff_functions?
ok I get it. looks like the *timestamp or *time methods here should not in in diff functions group?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what I found was the documentation for unix_timestamp and from_unixtime was in column_datetime_functions but the examples in column_datetime_diff_functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The examples for unix_timestamp and from_unixtime are now documented in the correct file.

#' days to be added to or subtracted from \code{y}. For class \code{character}, it is
#' \itemize{
#' \item \code{date_format}: date format specification.
#' \item \code{from_utc_timestamp, to_utc_timestamp}: time zone to use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

little nit \code{from_utc_timestamp}, \code{to_utc_timestamp}

@actuaryzhang
Copy link
Contributor Author

@HyukjinKwon Great catch. Fixed all issues you pointed out. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 22, 2017

Test build #78433 has finished for PR 18114 at commit a291279.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • `#' with column

@HyukjinKwon
Copy link
Member

This looks good to me too.

@felixcheung
Copy link
Member

merged to master. thanks!

@asfgit asfgit closed this in 19331b8 Jun 22, 2017
@actuaryzhang actuaryzhang deleted the sparkRDocDate branch June 22, 2017 17:38
robert3005 pushed a commit to palantir/spark that referenced this pull request Jun 29, 2017
## What changes were proposed in this pull request?
Grouped documentation for datetime column methods.

Author: actuaryzhang <actuaryzhang10@gmail.com>

Closes apache#18114 from actuaryzhang/sparkRDocDate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants