New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20889][SparkR] Grouped documentation for DATETIME column methods #18114
Conversation
@felixcheung |
R/pkg/R/functions.R
Outdated
if (class(x) == "Column") { | ||
x <- x@jc | ||
setMethod("datediff", signature(x = "Column"), | ||
function(x, y) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, x
and y
are reversed for easy documentation. Similarly for other methods that take two arguments.
R/pkg/R/functions.R
Outdated
function(y, x) { | ||
jc <- callJStatic("org.apache.spark.sql.functions", "from_utc_timestamp", y@jc, x) | ||
setMethod("from_utc_timestamp", signature(x = "Column", tz = "character"), | ||
function(x, tz) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the second argument to tz
to be consistent with Scala, which also makes it less confusing in the doc since other methods also have y
as argument that often refers to a Column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto here on calling by name
Test build #77391 has finished for PR 18114 at commit
|
R/pkg/R/functions.R
Outdated
#' Returns the number of days from \code{start} to \code{end}. | ||
#' | ||
#' @param x start Column to use. | ||
#' @param y end Column to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two concerns here:
- the doc description is now fairly generalized - is supposed to be
datediff(end, start)
but seems like it's not not clear which is end and which is start - renaming this could break user calling by name
datediff(df$c, x = "foo")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixcheung These names start
and end
are from the original doc. I now changed it to x
and y
.
R/pkg/R/functions.R
Outdated
function(y, x) { | ||
jc <- callJStatic("org.apache.spark.sql.functions", "from_utc_timestamp", y@jc, x) | ||
setMethod("from_utc_timestamp", signature(x = "Column", tz = "character"), | ||
function(x, tz) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto here on calling by name
maybe it's worthwhile to separate the dff type functions into a separate rd so we don't have to rename/switch the parameter? |
@felixcheung Thank you. This is great suggestion. I will split it into two help files which should make the doc much cleaner without changing the functions. |
0d2853d
to
944aa92
Compare
@felixcheung The new commit addresses your concern by splitting methods with two arguments into a separate doc. |
Test build #77441 has finished for PR 18114 at commit
|
Jenkins, retest this please |
Test build #77447 has finished for PR 18114 at commit
|
944aa92
to
cccab2a
Compare
Test build #77452 has started for PR 18114 at commit |
Jenkins, retest this please |
Test build #77459 has finished for PR 18114 at commit
|
016bb47
to
311ccc2
Compare
Test build #78271 has finished for PR 18114 at commit
|
Test build #78272 has finished for PR 18114 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool, thanks
R/pkg/R/functions.R
Outdated
#' @name from_unixtime | ||
#' @aliases from_unixtime,Column-method | ||
#' @rdname column_datetime_functions | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a line with #
(vs #'
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
R/pkg/R/functions.R
Outdated
#' format. | ||
#' @section Details: | ||
#' \code{from_unixtime}: Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a | ||
#' string representing the timestamp of that moment in the current system time zone in the given format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to call out that "current system time zone" is the one in JVM - in R one could set the default TZ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
R/pkg/R/functions.R
Outdated
#' @note to_utc_timestamp since 1.5.0 | ||
setMethod("to_utc_timestamp", signature(y = "Column", x = "character"), | ||
function(y, x) { | ||
jc <- callJStatic("org.apache.spark.sql.functions", "to_utc_timestamp", y@jc, x) | ||
column(jc) | ||
}) | ||
|
||
#' add_months | ||
#' @section Details: | ||
#' \code{add_months}: Returns the date that is numMonths after startDate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be a bit confusing what is numMonths
(x
) and what is startDate
(y
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was the original description. Updated to make it clearer. Also, the examples now will help users figure out how to use these methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I wasn't sure this would be the form you are using in other documentation, feel free to massage it, change it.
also might be clearer here to say that is startDate + numMonths
- ie. reverse the order the names are mention to be consistent with parameter order
R/pkg/R/functions.R
Outdated
#' | ||
#' Day of the week parameter is case insensitive, and accepts first three or two characters: | ||
#' "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun". | ||
#' @section Details: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to @details
?
R/pkg/R/functions.R
Outdated
#' @examples | ||
#' | ||
#' \dontrun{ | ||
#' set.seed(11) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need to set seed?
R/pkg/R/functions.R
Outdated
#' | ||
#' @param y Column to compute on. | ||
#' @param x For class Column, it is used to perform arithmetic operations with \code{y}. | ||
#' For class numeric, it is the number of months or days to be added to \code{y}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be added or subtracted
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated. thx
R/pkg/R/functions.R
Outdated
#' | ||
#' @param x Column to compute on. | ||
#' @param format For \code{to_date} and \code{to_timestamp}, it is the string to use to parse | ||
#' x Column to DateType or TimestampType. For \code{trunc}, it is the string used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra space in the string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
R/pkg/R/functions.R
Outdated
@@ -546,18 +598,20 @@ setMethod("hash", | |||
column(jc) | |||
}) | |||
|
|||
#' dayofmonth | |||
#' @section Details: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to @details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
#' @export | ||
#' @examples \dontrun{date_format(df$t, 'MM/dd/yyy')} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this example is not in the new addition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added back.
#' @examples | ||
#' \dontrun{ | ||
#' to_timestamp(df$c) | ||
#' to_timestamp(df$c, 'yyyy-MM-dd') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these examples are not added back - could you check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added back.
@felixcheung Thanks so much for the review and comments. Super helpful! |
Test build #78403 has finished for PR 18114 at commit
|
Test build #78407 has finished for PR 18114 at commit
|
@felixcheung Any idea what this message means? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I think it's just the new public class detection thing that
|
hmm, waiting for AppVeyor |
@felixcheung, would you give me a moment to double check? I am interested in this and want to help double check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, last nitpicking ...
R/pkg/R/functions.R
Outdated
#' | ||
#' @param y Column to compute on. | ||
#' @param x For class \code{Column}, it is the column used to perform arithmetic operations | ||
#' with column \code{y}.For class \code{numeric}, it is the number of months or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
little nit .F
->. F
R/pkg/R/functions.R
Outdated
#' to_utc = to_utc_timestamp(df$time, 'PST'), | ||
#' to_unix = unix_timestamp(df$time), | ||
#' to_unix2 = unix_timestamp(df$time, 'yyyy-MM-dd HH'), | ||
#' from_unix = from_unixtime(unix_timestamp(df$time))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks this one should go to column_datetime_functions
or from_unixtime
should be in column_datetime_diff_functions
.
unix_timestamp
looks a ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And ... from_unixtime(df$t, 'yyyy/MM/dd HH')
looks missed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this I don't get - why from_unixtime
belongs to column_datetime_diff_functions
?
ok I get it. looks like the *timestamp or *time methods here should not in in diff functions group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, what I found was the documentation for unix_timestamp
and from_unixtime
was in column_datetime_functions
but the examples in column_datetime_diff_functions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. The examples for unix_timestamp
and from_unixtime
are now documented in the correct file.
R/pkg/R/functions.R
Outdated
#' days to be added to or subtracted from \code{y}. For class \code{character}, it is | ||
#' \itemize{ | ||
#' \item \code{date_format}: date format specification. | ||
#' \item \code{from_utc_timestamp, to_utc_timestamp}: time zone to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
little nit \code{from_utc_timestamp}, \code{to_utc_timestamp}
@HyukjinKwon Great catch. Fixed all issues you pointed out. Thanks! |
Test build #78433 has finished for PR 18114 at commit
|
This looks good to me too. |
merged to master. thanks! |
## What changes were proposed in this pull request? Grouped documentation for datetime column methods. Author: actuaryzhang <actuaryzhang10@gmail.com> Closes apache#18114 from actuaryzhang/sparkRDocDate.
What changes were proposed in this pull request?
Grouped documentation for datetime column methods.