Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R crash on Redis server loss #20

Closed
richfitz opened this issue Nov 23, 2015 · 21 comments
Closed

R crash on Redis server loss #20

richfitz opened this issue Nov 23, 2015 · 21 comments

Comments

@richfitz
Copy link

Minimal, contrived, example:

system("redis-server --port 7777", wait=FALSE)
r <- new(RcppRedis::Redis, "localhost", 7777)
r$ping()
r$exec("SHUTDOWN")

will crash R with

> r$exec("SHUTDOWN")
[15435] 23 Nov 10:54:23.121 # User requested shutdown...
[15435] 23 Nov 10:54:23.122 # Redis is now ready to exit, bye bye...

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: .External(list(name = "CppMethod__invoke_notvoid", address = <pointer: 0x2fab870>,     dll = list(name = "Rcpp", path = "/home/rich/lib/R/library/Rcpp/libs/Rcpp.so",         dynamicLookup = TRUE, handle = <pointer: 0x35d8390>,         info = <pointer: 0x7fc132594c40>), numParameters = -1L),     <pointer: 0x271b430>, <pointer: 0x362b4a0>, .pointer, ...)
 2: r$exec("SHUTDOWN")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

Seen on various flavours of OSX and Linux. The gdb backtrace is:

(gdb) bt
#0  Redis::extract_reply (this=0xef74a0, reply=0x0) at Redis.cpp:76
#1  0x00007ffff2d54884 in Redis::exec (this=0xef74a0, cmd=...) at Redis.cpp:157
#2  0x00007ffff2d531e3 in Rcpp::CppMethod1<Redis, SEXPREC*, std::string>::operator() (this=this@entry=0x1d6dcd0, object=0xef74a0, args=args@entry=0x7fffffffbde0)
    at /home/rich/lib/R/library/Rcpp/include/Rcpp/module/Module_generated_CppMethod.h:111
#3  0x00007ffff2d5b25f in Rcpp::class_<Redis>::invoke_notvoid (this=<optimised out>, method_xp=<optimised out>, object=0x2007a60, args=0x7fffffffbde0, 
    nargs=<optimised out>) at /home/rich/lib/R/library/Rcpp/include/Rcpp/module/class.h:234
#4  0x00007ffff348b1ec in CppMethod__invoke_notvoid (args=<optimised out>) at Module.cpp:220
#5  0x00007ffff78f1161 in ?? () from /usr/lib/R/lib/libR.so
#6  0x00007ffff79369db in Rf_eval () from /usr/lib/R/lib/libR.so
#7  0x00007ffff7938b60 in ?? () from /usr/lib/R/lib/libR.so
#8  0x00007ffff79367e3 in Rf_eval () from /usr/lib/R/lib/libR.so
#9  0x00007ffff7937b6f in Rf_applyClosure () from /usr/lib/R/lib/libR.so
#10 0x00007ffff79365bf in Rf_eval () from /usr/lib/R/lib/libR.so
#11 0x00007ffff795da32 in Rf_ReplIteration () from /usr/lib/R/lib/libR.so
#12 0x00007ffff795dd81 in ?? () from /usr/lib/R/lib/libR.so
#13 0x00007ffff795de34 in run_Rmainloop () from /usr/lib/R/lib/libR.so
#14 0x00000000004007eb in main ()
#15 0x00007ffff7261ec5 in __libc_start_main (main=0x4007d0 <main>, argc=1, argv=0x7fffffffde78, init=<optimised out>, fini=<optimised out>, 
    rtld_fini=<optimised out>, stack_end=0x7fffffffde68) at libc-start.c:287
#16 0x000000000040081b in _start ()

A fix would be to add something like:

  if (reply == NULL) {
    error("Failure communicating with the Redis server");
  }

after every redisCommand / redisCommandArgv in src/Redis.cpp

While the above is through Redis server shutdown the same thing happens if the connection is lost (e.g. client and server on different machines on different hosts and a loss of the link between). I have confirmed this with toxiproxy.

@eddelbuettel
Copy link
Owner

Not sure how much I really care. Redis is a generally always up quality and when it is down access from R may be the least worry.

Pull requests welcome. I am unlikely to do this myself.

@russellpierce
Copy link
Sponsor Contributor

I might take this one on; I'm loathe to see an entire R session crash because of an intermittent network error. But, I suppose if I did take it on we should also add some of the benchmarking code to the testing framework - just to be sure that none of the changes result in a notable performance hit.

@eddelbuettel
Copy link
Owner

Easy -- prefix each user-space command with a ping :)

We would at least one line to each function emitting a request to Redis. Worth it?

@russellpierce
Copy link
Sponsor Contributor

Well, the ping will tell you that it is up at the time of the ping, but not that it was actually up/connected correctly at the time of your request. It also adds an additional round-trip to any transaction. Clearly NVG. Instead, I was thinking instead of wrapping redisCommand and redisCommandArgv with a safety check that a NULL wasn't received as the reply. It would mean refactoring the code to use the NULL safe versions of those functions.

I quickly banged out the following:

    void nullReplyCheck(redisReply *reply) {
        if (reply == NULL) {
            Rcpp::stop("Recieved NULL reply; potential connection loss with Redis");
        }
    }

    void *redisCommandNULLSafe(redisContext *c, const char *format, ...) {
        va_list myargs;
        va_start(myargs, format);

        redisReply *reply = 
            static_cast<redisReply*>(redisvCommand(c, format, myargs));
        nullReplyCheck(reply);
        return reply;
    }

    void *redisCommandArgvNULLSafe(redisContext *c, int argc, const char **argv, const size_t *argvlen) {
        redisReply *reply = 
            static_cast<redisReply*>(redisCommandArgv(c, argc, argv, argvlen));
        nullReplyCheck(reply);
        return reply;
    }

It seems to pass tests and survives the described example. What I don't know is if it would recover from a transitory loss of the connection. I haven't played around with toxiproxy yet, maybe @richfitz could do so and let us know if this solution saves us from anything other than the test example.

@eddelbuettel
Copy link
Owner

Nice. I had the same idea but was in a rush and didn't see / remember redisCommand ...

@russellpierce
Copy link
Sponsor Contributor

I stupidly have been making my edits to RcppRedis on my master branch. Once we've resolved the pending pull request, then I'll submit this one for consideration.

@russellpierce
Copy link
Sponsor Contributor

Replicated the test as described here https://github.com/richfitz/toxiproxyr. Confirmed that the proposed code works under that case as well.

@eddelbuettel
Copy link
Owner

I stupidly have been making my edits to RcppRedis on my master branch.

We've all been there. Maybe save the files by hand and do a git reset ? It would be cleaner.

@russellpierce
Copy link
Sponsor Contributor

@eddelbuettel You can probably close this issue.

@eddelbuettel
Copy link
Owner

Just to be plain: because your PR addressed it?

@russellpierce
Copy link
Sponsor Contributor

Yes; I believe this issue is fully addressed by #23. It might be nice if @richfitz would confirm; but I feel reasonably certain in my testing results.

@russellpierce
Copy link
Sponsor Contributor

I actually was able to experience this one in the wild while torturing a Redis server. The modifications ended up working as expected. Though, it got me wondering about whether trying to automatically rebuild the connection at least once would make sense before restoring to throwing an error.

@eddelbuettel
Copy link
Owner

One could. But then: try once? Twice? N+1 times? At some point you gotta error out.

@russellpierce
Copy link
Sponsor Contributor

True, and I kind of suspect that retrying the connection wouldn't have solved anything in the case I experienced; although it might protect users who incorrectly use a fork of the connection argument. In the latter vein, I was thinking of defaulting to one attempt after a 1 second delay, but perhaps adding yet another configurable parameter to init.

EDIT: Actually a hard limit of 1 would be easiest. Promising more than one would require thinking carefully about how to handle errors when making the initial connection / writing out separate reconnection code.

@eddelbuettel
Copy link
Owner

I can live with trying once. The server may just be away 'temporarily' (even for an upgrade). It would be a nice to have, but I don't think it is a priority.

(In general the edge-case handling in RcppRedis is of course rudimentary. We make day. "One day" @richfitz will gift us a real package on CRAN and then we all switch anyway :) Until then, hack mode...)

@russellpierce
Copy link
Sponsor Contributor

In addition to not being on CRAN one of my reservations about redux was not knowing how it would do in speed compared to RcppRedis especially on serialization/deserialization; for some reason I ended up with the misconception that was happening using methods similar to rredis. At any rate, I was pleasantly surprised, redux compares favorably to RcppRedis.

@eddelbuettel
Copy link
Owner

Nice benchmarks! And thumbs up for empirics. I only glanced (about to head out) but you also included serialization cost, right?

@russellpierce
Copy link
Sponsor Contributor

I believe I did account for the serialization costs, but I always welcome peer review.

@bwlewis
Copy link
Contributor

bwlewis commented Dec 13, 2015

I'm sorry that this comment is off-topic from the original issue posting.

Note that the first example serializes the object 1000 times in the
hiredis and rredis loops, but only once in the redux benchmark loop.

But to be fair I don't think that matters too much. I was a bit surprised
that rredis was ~ 10x slower here, so I made some easy performance optimizations to the rredis github master branch. For comparison, I now get (on my desktop Ubuntu quad-core PC running R-3.2.3):

       test replications elapsed relative
1 hiredis()         1000   0.110    1.000
2  rredis()         1000   0.281    2.555

for the file adapted from http://rpubs.com/rpierce/simpleBenchmarkRedux (and slightly faster yet if you uncomment the pipelining lines):

library(RcppRedis)
library(rredis)
library(rbenchmark)

data(trees)
fit <- lm(log(Volume) ~ log(Girth) + log(Height), data=trees)

redis <- new(Redis)
hiredis <- function() redis$set("fits", fit)
rredis <- function() redisSet("fits2", fit)
redisConnect()

#redisSetPipeline(TRUE)
res <- benchmark(hiredis(), rredis(), replications=1000)[,1:4]
#redisSetPipeline(FALSE)
print(res)

The rest of the ~ 2x difference with RcppRedis is not easy however. Much of it is in tryCatch for things like interrupts of transfers (important in many real-world settings), other problems (too large of payloads), etc.

Again, sorry for the diversion.

On 12/11/15, drknexus notifications@github.com wrote:

I believe I did account for the serialization costs, but I always welcome
peer review.


Reply to this email directly or view it on GitHub:
#20 (comment)

@russellpierce
Copy link
Sponsor Contributor

A welcome diversion; I'm not sure there is a good forum functionality in which we could have discussed this otherwise.

@bwlewis Are you sure the first example serializes the object 1000 times in the hiredis and rredis loops, but only once in the redux benchmark loop? I'm not seeing it. The difference is that both rredis and RcppRedis implicitly do the serialization whereas with redux you have to manually handle it to coerce the input to $SET.

Regardless, I'm thrilled to hear that you found some places for performance enhancement in rredis.

@eddelbuettel
Copy link
Owner

Closing per request earlier in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants