New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39372][R] Support R 4.2.0 #36758
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
@@ -129,7 +129,7 @@ $env:PATH = "$env:HADOOP_HOME\bin;" + $env:PATH | |||
Pop-Location | |||
|
|||
# ========================== R | |||
$rVer = "4.0.2" | |||
$rVer = "4.2.0" | |||
$rToolsVer = "4.0.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the tests passed, it failed to download RTools 4.2.0. I reverted the RTools upgrade here for now.
@@ -58,7 +58,12 @@ writeObject <- function(con, object, writeType = TRUE) { | |||
# Checking types is needed here, since 'is.na' only handles atomic vectors, | |||
# lists and pairlists | |||
if (type %in% c("integer", "character", "logical", "double", "numeric")) { | |||
if (is.na(object)) { | |||
if (is.na(object[[1]])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R 4.1 and below:
Warning in if (is.na(c(1, 2))) print("abc") :
the condition has length > 1 and only the first element will be used
R 4.2+:
Error in if (is.na(c(1, 2))) print("abc") : the condition has length > 1
Tests should pass now .. |
# Uses the first element for now to keep the behavior same as R before | ||
# 4.2.0. This is wrong because we should differenciate c(NA) from a | ||
# single NA as the former means array(null) and the latter means null | ||
# in Spark SQL. However, it requires non-trivial comparison to distinguish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g.) we should check if the input is vector, list, array, etc, which is exactly being done at getSerdeType
. However, this comparison here (up to my best knowledge) is a shortcut to avoid the overhead from getSerdeType
. So, I just decided to leave it as is for now.
cc @felixcheung @shivaram @viirya too FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks!!! |
What changes were proposed in this pull request?
This PR proposes:
Updates AppVeyor to use the latest R version 4.2.0.
Uses the correct way of checking if an object is a matrix:
is.matrix
.After R 4.2.0,
class(upperBoundsOnCoefficients) != "matrix")
fails:This fixes
spark.logit
whenlowerBoundsOnCoefficients
orupperBoundsOnCoefficients
is specified.Explicitly use the first element in
is.na
comparison. From R 4.2.0, it throws an exception as below:Previously it was a warning.
This fixes
createDataFrame
oras.DataFrame
when the data type is a nested complex type.Why are the changes needed?
To support/test the latest R. R community tends to use the latest versions aggressively.
Does this PR introduce any user-facing change?
Yes, after this PR, we officially support R 4.2.0 in SparkR.
How was this patch tested?
CI in this PR should test it out.