Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6832][SPARKR][WIP]Handle partial reads in SparkR #14741

Closed
wants to merge 1 commit into from

Conversation

krishnakalyan3
Copy link
Member

@krishnakalyan3 krishnakalyan3 commented Aug 21, 2016

What changes were proposed in this pull request?

Handle partial reads in SparkR by implementing a retry method in R that will return partial results.
Reference : amplab-extras/SparkR-pkg#193 (comment)

How was this patch tested?

Locally by running the R test suite.

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@krishnakalyan3
Copy link
Member Author

@shivaram I am not sure on how to go about the retry method. Could you please share some example that I could refer to?.

@krishnakalyan3
Copy link
Member Author

@shivaram @davies

  • Signature of the readBin function is readBin(con, what, n, as.integer(size), endian)
    What should the value of what be when an the process is interrupted in the retry method?.
  • I am also having problems simulating this Issue. As soon as I send the kill -9 PID. My R session restarts.

@shivaram
Copy link
Contributor

The value of what shouldn't change if we are retrying the read. If it was integer it should remain integer. There is a corner case of what happens if we say read the first byte of the integer and not the remaining 3 bytes -- One way to handle that would be to read everything into a raw byte stream and do the cast to integer later. But I think we can postpone this to a later PR.

Regarding your earlier question the easiest way to retry is to run a while loop around readBin while there are more bytes to read. Also we can have a fixed upper bound and throw an error after those many retries.

Also I dont think SIGKIlLL is the appropriate signal to send - You might want to send SIGUSR1 (the code is 10 according to http://linuxandfriends.com/linux-signals/)

@krishnakalyan3
Copy link
Member Author

@shivaram thanks for the advice.

Some Issue being faced by me

  • While reading a large file from Rstudio and trying to kill the the process using Sys.getpid(), I tried to interrupt the process using the signals pskill(pid, signal = SIGUSR1/SIGCHLD). This does not seem to affect my R session and does not print Interrupt (As per my code below).
readBinFully <- function(con, what, n = 1L, size  = NA_integer_, endian) {
  while (n > 0) {
    if (con == 0) {
      cat("Interrupt")
    }
    readBin(con, what, n, size, endian = "big")
  }
}
  • sparkr.zip obtained after running install-dev.sh does not seem to reflect the changes made in my R session. (Restarting R studio solves this problem). Code below
rm(list=ls())
Sys.setenv(SPARK_HOME="/Users/krishna/Experiment/spark")
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
  • To check if there are more bytes to read, I have tried the code below. Which fails the tests in run-tests.sh
while (size > 0) {
...
}

I see that variable size takes the value NA.

Please advice on how I should be approaching these issues.

Thanks

@krishnakalyan3
Copy link
Member Author

ping @shivaram @davies, I am planning to revisit this PR.
Could you please let me know which daemon process on Linux we are trying to interrupt. I am assuming its the R process?.

@shivaram
Copy link
Contributor

Yes - we are trying to interrupt the daemon R process. This is launched on the executor machines and communicates with the worker R processes and the JVM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants