Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39372][R] Support R 4.2.0 #36758

Closed
wants to merge 5 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Jun 3, 2022

What changes were proposed in this pull request?

This PR proposes:

  • Updates AppVeyor to use the latest R version 4.2.0.

  • Uses the correct way of checking if an object is a matrix: is.matrix.
    After R 4.2.0, class(upperBoundsOnCoefficients) != "matrix") fails:

    -- 1. Error (test_mllib_classification.R:245:3): spark.logit -------------------
    Error in `if (class(upperBoundsOnCoefficients) != "matrix") {
        stop("upperBoundsOnCoefficients must be a matrix.")
    }`: the condition has length > 1
    

    This fixes spark.logit when lowerBoundsOnCoefficients or upperBoundsOnCoefficients is specified.

  • Explicitly use the first element in is.na comparison. From R 4.2.0, it throws an exception as below:

    Error in if (is.na(c(1, 2))) print("abc") : the condition has length > 1
    

    Previously it was a warning.

    This fixes createDataFrame or as.DataFrame when the data type is a nested complex type.

Why are the changes needed?

To support/test the latest R. R community tends to use the latest versions aggressively.

Does this PR introduce any user-facing change?

Yes, after this PR, we officially support R 4.2.0 in SparkR.

How was this patch tested?

CI in this PR should test it out.

@github-actions github-actions bot added the BUILD label Jun 3, 2022
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@@ -129,7 +129,7 @@ $env:PATH = "$env:HADOOP_HOME\bin;" + $env:PATH
Pop-Location

# ========================== R
$rVer = "4.0.2"
$rVer = "4.2.0"
$rToolsVer = "4.0.2"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the tests passed, it failed to download RTools 4.2.0. I reverted the RTools upgrade here for now.

@HyukjinKwon HyukjinKwon changed the title [SPARK-39372][INFRA][R] Update R version to 4.2.0 in AppVeyor [SPARK-39372][R] Support R 4.2.0 Jun 3, 2022
@@ -58,7 +58,12 @@ writeObject <- function(con, object, writeType = TRUE) {
# Checking types is needed here, since 'is.na' only handles atomic vectors,
# lists and pairlists
if (type %in% c("integer", "character", "logical", "double", "numeric")) {
if (is.na(object)) {
if (is.na(object[[1]])) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R 4.1 and below:

Warning in if (is.na(c(1, 2))) print("abc") :
  the condition has length > 1 and only the first element will be used

R 4.2+:

Error in if (is.na(c(1, 2))) print("abc") : the condition has length > 1

@HyukjinKwon
Copy link
Member Author

Tests should pass now ..

# Uses the first element for now to keep the behavior same as R before
# 4.2.0. This is wrong because we should differenciate c(NA) from a
# single NA as the former means array(null) and the latter means null
# in Spark SQL. However, it requires non-trivial comparison to distinguish
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.) we should check if the input is vector, list, array, etc, which is exactly being done at getSerdeType. However, this comparison here (up to my best knowledge) is a shortcut to avoid the overhead from getSerdeType. So, I just decided to leave it as is for now.

@HyukjinKwon
Copy link
Member Author

cc @felixcheung @shivaram @viirya too FYI

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@HyukjinKwon
Copy link
Member Author

Thanks!!!

@HyukjinKwon HyukjinKwon deleted the upgrade-r-appveyor branch January 15, 2024 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants