Skip to content

Conversation

@vijoshi
Copy link
Contributor

@vijoshi vijoshi commented Apr 22, 2017

What changes were proposed in this pull request?

Allow SparkR to ignore the "promise already under evaluation" error in case the user has created a delayed binding for the .sparkRsession / .sparkRjsc names in the SparkR:::.sparkREnv.

How was this patch tested?

Ran all unit tests - run-tests.sh

@SparkQA
Copy link

SparkQA commented Apr 22, 2017

Test build #76073 has finished for PR 17731 at commit 4423f5c.

  • This patch fails R style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vijoshi
Copy link
Contributor Author

vijoshi commented Apr 22, 2017

@felixcheung

@felixcheung
Copy link
Member

felixcheung commented Apr 23, 2017

this might be reasonable, but sparkR.sparkContext is only called when sparkR.session() is called, and so I'm not sure I follow how this works if someone is doing this in a brand new R session:

# do not call sparkR.session(), ever
a <- createDataFrame(iris)

...which is what I understand from the email exchange on user@, I think. Could you elaborate if this is what you are trying to support?

@SparkQA
Copy link

SparkQA commented Apr 23, 2017

Test build #76074 has finished for PR 17731 at commit c06da49.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

also, what if an user wants to explicitly create a spark session with specific parameter? the delay binding model doesn't seem to support that properly?

@vijoshi
Copy link
Contributor Author

vijoshi commented Apr 23, 2017

@felixcheung yes. We need to support these two types of possibilities:

#do not call sparkR.session() - followed by implicit reference to sparkSession
a <- createDataFrame(iris)

or

#do not call sparkR.session() - followed by implicit reference to sparkContext
doubled <- spark.lapply(1:10, function(x){2 * x})

Internal implementations of APIs like spark.lapply directly look for the sparkContext so to account for these, the sparkContext needs to be friendly to being lazily initialized.

@felixcheung
Copy link
Member

I understand these 2 cases, can you explain how your change connect to these two?
if you delay bind to ".sparkRjsc", envir = .sparkREnv, doesn't it just work?

@vijoshi
Copy link
Contributor Author

vijoshi commented Apr 23, 2017

"I understand these 2 cases, can you explain how your change connect to these two?"

Say, I do this:

delayAssign(delayedAssign(".sparkRsession", { sparkR.session(..) }, assign.env=SparkR:::.sparkREnv)

Now, when the user code such as this runs:

a <- createDataFrame(iris)

this sequence occurs:

createDataFrame() 
   > getSparkSession() 
     > get(".sparkRsession", envir = .sparkREnv) 
        > delayed evaluation of sparkR.session(...) 
            > 
               if (exists(".sparkRsession", envir = .sparkREnv))
                 sparkSession <- get(".sparkRsession", envir = .sparkREnv) # error occurs here
                     > Error "Promise already under evaluation"

The change is to ignore the "Promise under evaluation" error. At the line where error occurs, there doesn't seem to be any other possible cause for failure since the previous line of code has already checked that the .sparkRsession exists in the environment. So if we take it that the error happens only when .sparkRsession is bound lazily and ignore it - which is what my change does - the code proceeds with regular computation of sparkSession successfully.

Similar is the case with .sparkRjsc. The SparkR code inside spark.sparkContext(..) does this:

    if (exists(".sparkRjsc", envir = .sparkREnv)) {
      sparkRjsc <- get(".sparkRjsc", envir = .sparkREnv) # "Promise under evaluation" error occurs here
    }

When .sparkRjsc is lazily bound the exists(..) condition succeeds, and the ""Promise under evaluation" error occurs. If the error is ignored considering that there can't be any other cause for failure, the lazy initialization works.

@felixcheung
Copy link
Member

felixcheung commented Apr 23, 2017

so essentially it's still evaluating the get in the outer frame before when the 2nd, inner frame get is hit from the delay binding (as a way to prevent going into an infinite loop, really)

what if you have this instead to break the loop?

delayAssign(delayedAssign(".sparkRsession",
  { rm(".sparkRsession", envir = SparkR:::.sparkREnv); sparkR.session(..) },
 assign.env=SparkR:::.sparkREnv)

@vijoshi
Copy link
Contributor Author

vijoshi commented Apr 23, 2017

Thanks, I tried this out - looks like doing a rm(".sparkRsession", envir=SparkR:::.sparkREnv) is a way to prevent the infinite loop situation. If I need to setup an active binding for .sparkRsession etc, I can achieve that too, but I need to do the rm() twice, first to ensure we don't get into infinite recursion and second to restore active binding once SparkR has created and assigned a new session. Something like this:

makeActiveBinding(".sparkRsession", sparkSessionFn, SparkR:::.sparkREnv)

sparkSessionFn <- local({
    function(v) {
      if (missing(v)) {
        # get SparkSession
          if (!exists("cachedSession", envir=.GlobalEnv)) {
            # rm to ensure no infinite recursion
            rm(".sparkRsession", envir=SparkR:::.sparkREnv) 

            cachedSession <<- sparkR.session(...)

            # rm again to restore active binding
            rm(".sparkRsession", envir=SparkR:::.sparkREnv) 
            makeActiveBinding(".sparkRsession", sparkSessionFn, SparkR:::.sparkREnv)
           }
           # check and update runtime config on the session if needed
            get("cachedSession", envir=.GlobalEnv)
      } else {
       # set sparkSession
       cachedSession <<- v
      }
    }
  })


@felixcheung
Copy link
Member

great - if you have a workaround our preference would be not to change Spark for it.
.sparkREnv and variables are not public API that we support

@vijoshi
Copy link
Contributor Author

vijoshi commented Apr 24, 2017

Thanks, I will close this PR.

@vijoshi vijoshi closed this Apr 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants