EINTR & Lua signal handling #8
Replies: 13 comments 1 reply
-
Steve Estes (from YottaDB) answered: There are few if any redundant/obvious questions when it comes to the difficult subject of Linux Signals. This is not a place to tread lightly as mistakes here can be problematic to figure out. And the threaded nature of some languages compounds this issue tremendously. I'll try to point you in the right direction. But first, I think I need to back up and understand some basics of what this is. I initially searched the internet for "MLua" and found an "mlua" project on github that turned out to be some sort of binding project for Lua and Rust. But I've since been informed you are working on what basically is a Lua wrapper for YottaDB via an M/Lua binding from a much different repository than 'mlua'. This makes understanding how Lua handles signals fairly important to answer questions like these and right now, I have none of that. So what I'll do for now is answer the immediate questions you pose as well as I can given my lack of Lua understanding. Before we start, I do have one question though - are Lua processes threaded or in some way of its own does it support concurrent operations within a process? This would determine which of the simple API models you are using - either the Simple API or the Simple Threaded API (the Simple Threaded API calls mostly end with _st or _t). So let's go through the methods and questions you mention:
As a side-note, these questions you pose are somewhat similar to some of the questions we had to answer and deal with in creating the Go wrapper. Go definitively had its own ideas on how to handle signals and they didn't square with YottaDB's. So what we did was turn the situation around and let Go handle all of the signals which it then notified YottaDB of (via a call) when certain signals had fired and it would drive the appropriate timers. This support is currently specific to Go though it could probably be generalized fairly quickly as I always thought it might be useful in handling other wrappers. Perhaps this method is overkill for your solution but I thought to mention it as a potential option if everything else failed. It means some fairly complex code to handle all the signals and it needs to be written by you (use the Go wrapper for your guide here - mostly in init.go in the Go wrapper). We started off with EINTR and ended up with signals. While they can be fairly intertwined, this is only true if one won't or can't make the standard be to NOT specify SA_RESTART when registering signal handlers. If you or other packages must specify SA_RESTART, then things get more complicated. I know of no way to handle EINTR errors other than changing the code around them to check for that error and do the right thing. Hopefully this answer helps your understanding at least if there's not a solution here yet. If you need more help in this area, I would like to ask for a link to the relevant Lua materials. I don't want to take the time to totally immerse myself and learn Lua at this stage but if you can point me to docs on how Lua works with signals, that may be enough. |
Beta Was this translation helpful? Give feedback.
-
Thanks for this detailed answer. Let me, in turn, answer some of your questions. MLua is a Lua language plugin for the MUMPS (currently only ydb) database. It embeds Lua 'into' MUMPS meaning that MUMPS can call/access Lua code and vice versa. It has nothing to do with Rust. (I was not even aware of the other github projects named mlua. Fascinating. Thanks for pointing this out!). It is closely integrated with the lua-yottadb wrapper (which I have taken over maintenance of). The difference between the two projects is that lua-yottadb enables Lua access to M, whereas MLua enables M access to Lua. MLua is the master project that incorporates lua-yottadb and thus enables both directions. So the "main program" in this case will not be Rust at all, which is unrelated, but rather a MUMPS application. The MUMPS program will be calling Lua to do some sort of task. Having said that, this question is also applicable to Lua applications that use lua-yottadb to invoke MUMPS. Lua itself is single-threaded and currently uses the non-threaded API. If a need arises for it to co-exist with other ydb plugins that are threaded, MLua may need to switch to calling the non-threaded API, but this will be driven by the third-party (C/Go) application's threading, not by Lua. Lua does have co-operative co-routines, but they do not create the issues that threads have. There are third-party libraries that add threads to Lua, but there is no need to use them and there is a warning in the MLua documentation about threading. Implementing EINTR support for native Lua is possible, maybe even as a dynamic patch rather than a Lua rebuild (I have to check). This sounds nice, but has a major problem. Specifically, the problem is that Lua is very small and doesn't have many libraries built-in. So common functions (like sockets) are often done with third party libraries. But third party libraries (like sockets) do not support EINTR, so we'd have to also modify third-party libraries -- and then the question is which third party library to support. Thus you can see why masking or SA_RESTARTing would be attractive as a more general solution. I hear you about YDB not supporting SA_RESTART. However, can I modify my idea slightly to say that MLua can dynamically switch on the SA_RESTART flag for signals only while it is running Lua code, and then restore the SA_RESTART flag after returning to ydb. This would be identical to the sigmask method except that this way YDB would still run its signal service routine to keep track of how many interrupts it got, but it would have to process them after the return from Lua. Would this be helpful? If we used this kind of SA_RESTART mechanism, can you tell me whether YDB would be able to keep time better (say if it keeps time by by counting repeated alarm signals), or would this make no difference from the mask option? Obviously, YDB still wouldn't be able to follow through on the effect of the signal until Lua returned to YDB. It's interesting to know about the calls that Go uses to drive the YDB timers. Is this documented anywhere, or is it only available as code in the Go wrapper? |
Beta Was this translation helpful? Give feedback.
-
[Steve] Ah OK - so an M main program is calling Lua modules using external calls I assume?
[Steve] This response is for whichever product it applies to - Go's 'goroutines' are lightweight semi-threads but without thread complications too but the devil is in the details. What must be prevented at all costs is if two of these cooperative routines have made calls into the YottaDB runtime at the same time. If you can programmatically make sure that that does not happen, you can continue to use the unthreaded forms of things. But if that cannot be guaranteed, then the threaded versions of SimpleAPI calls must be used as these acquire an engine lock when they run that keeps all other out. But at this time you don't need to worry about those as if your routine is started as an M routine, this currently disables threaded access.
[Steve] Who would be setting up these handlers? When YottaDB initializes, it sets handlers for most things. For the suspend signals or the fatal signals, and SIGALRM it sets YDB specific handlers (always) that are defined without SA_RESTART. For any other signal, if the signal handler is neither of SIG_DFL or SIG_IGN, then the signal handler is left alone (unless we are in Go where SIGUSR2 gets used). But we do install handlers for the fatal signals and the suspend/resume type signals (SIGCHLD, SIGTSTP, SIGTTIN, SIGTTOU, SIGCONT, SIGUSR2). But for most of those signals, when we take them over, we also save the previous handler. When the signal occurs, if drives our handler first, then if we return from that, it drives the previous handler too. [Steve] Saving/restoring the SA_RESTART flags on the handlers for each call to Lua sounds like it would be fairly expensive as that's a system call for each signal to save it or restore it. Strictly from a performance standpoint, it would be much cheaper to mask the signals off on a call to Lua and restore the flags with a single system call on the way back. But it all depends on what signals you would be blocking. I would highly recommend to not block any of the fatal or suspend/resume type of signals. Those need to get though as fast as possible. But I would have to think about this one a bit before deciding if it could work. I can think of ways it could run into trouble but with everything single threaded, I'm not sure any of those situations apply. If threading though, it is probably a can of worms.
[Steve] YottaDB effectively ignores multiple signals seen at the same time. It doesn't affect anything one way or the other if duplicate signals are lost (if they are even kept by the OS).
[Steve] The calls used to handle signals are not documented as they require changes to YDB C source code to support additional languages so they are only documented through their use in the Go wrapper at this time. |
Beta Was this translation helpful? Give feedback.
-
Yes.
Correct me if I'm wrong, but I believe this means the key issue is whether the threading system is preemptive (i.e. whether it can be interrupted by the operating system and switched to another thread). Lua co-routines are not preemptive at all: it uses only one OS thread. Therefore Lua co-routines cannot interrupt YDB while it is running, and so they cannot make two simultaneous calls into YDB. Goroutines, on the other hand, DO allow OS preemption (they use OS threads under certain conditions), which means they are not safe for YDB.
YDB would set up the handlers: Lua does not use signals. However, my proposal is that MLua switches on and off SA_RESTART dynamically on the handler that YDB has already set up. My specific question here is: would this be better than the sigmask method for the reasons stated, or would it provide no benefit?
True. So what I'm thinking of here is something like using SA_RESTART on SIGALRM alone, and sigmask OFF all other signals (except fatal ones) during the call to Lua. That would mean only 2 system calls. But if SA_RESTART isn't going to help YDB any more than pure masking, then no point in adding SA_RESTART.
re fatal signals, I agree: no reason to block them. But Suspend/resume signals? Why do you think they would be speed critical? It seems that if a process allows itself to be suspended, it's not speed critical. Resumption might be more of an issue, but I'm having trouble envisaging a use-case.
For now, we only have to solve the single-threaded problem. If we ever switch to using the threaded YDB calls in future, it will be purely because the M app also needs to call a third-party plugin that is threaded (and YDB can't have an app use both single-threaded and threaded calls). If that happens, YDB will protect against its own preemption using the lock mechanism built into its threaded calls.
Ok, perhaps this answers my question: that SA_RESTART is no improvement over sigmask. But let me make sure. Could there be any other reason YDB might benefit from running its SIGALRM service routine immediately (as it can with SA_RESTART) as opposed to running it when the slow-I/O Lua call finally returns (the sigmask method)? Summary It sounds like these are my choices:
Why would 4 be better than 3? Can't YDB do whatever handling it needs to in the signal service routine? I'm guessing the problem is that YDB is not re-entrant, so it can't necessarily do what it needs to do in the service routine. But then wouldn't the Go wrapper have the same YDB re-entrant problem when calling YDB to handle that timer? There's no way YDB could have finished its current task since Go retried the EINTR and hasn't yet returned from its the long-running I/O operation. Thanks again for your answers and the discussion is helpful. |
Beta Was this translation helpful? Give feedback.
-
Steve replied: You are correct. There is no issue if Lua co-routines are not pre-emptive.
[Steve] At this level (my necessarily very high level view), I don't see a difference between the two except for performance. SA_RESTART is likely to be more expensive.
[Steve] Some of the suspend signals are sent when when a process tries to do a read or a write but cannot because it has been backgrounded or some other reason. In that case, the OS send a SIGTTIN or SIGTTOU appropriately to give YottaDB a chance to do something before the process is put into a wait. There's some housekeeping that YottaDB does before it then sends itself the SIGSTOP to pause the process. I do not know the ramifications of not allowing YottaDB to timely handle these signals.
[Steve] I think the signal block method is going to be not only the best (in terms of performance) but also the safest method. But what do you mean by the "slow-I/O Lua call" ? Lua is going to be doing IO? Hopefully nothing in Lua needs SIGALRM. Have you ever tried running Lua with SIGALRM blocked? |
Beta Was this translation helpful? Give feedback.
-
Ok, for MLua it sounds like we're agreed that sigmask is the best solution for MLua because it will mostly call Lua for short tasks, and the programmer should expect to keep tasks short -- I will put a note in the manual warning the programmer that servicing dbase signals and interrupts will be stalled during MLua calls. But for the lua-yottadb wrapper (which I'm also working on), the story may be quite different. If I use the sigmask method also for lua-yottadb, the dbase, SIGALRM, etc., will stall until Lua next calls the M API (perhaps?) or calls an M routine (which my newest lua-yottadb now supports). Can you talk a little bit about the fallout if we use sigmask in this scenario? Let me propose, as a foil, that it should be fine since the Lua program can't really expect ydb to be doing anything until he calls it.
I'm using the term "slow" IO as it's discussed in Yes, I'm certain that Lua itself doesn't need SIGALRM: I've searched the source and it does nothing with signals at all except to set a Ctrl-C handler. This will only be true until the user uses a third-party signal library, of course.
The confusion is that although my initial question is about MLua, which is not a wrapper, I am also updating lua-yottadb, which is a wrapper. |
Beta Was this translation helpful? Give feedback.
-
Steve interacts:
[Steve] So long as no timer lasts longer than the return back to Lua, it should be fine. The only such timers I'm aware of right now are the TP timeout timer ($ZMAXTPTIME) where one sets a timeout which pops when the timer is exhausted if it isn't cancelled before the TP transaction commits and $ZTIMEOUT which is not associated with TP or anything else, it just sets a timer that pops when it expires and drives M code. I think all other timeouts occur (if they occur at all) within the same call they would be set in. Note that these are both "well behaved" timers that set flags and are only looked at during safe points in the code like the beginning of an M line of code so they were always going to wait for a return to M space anyway. [Steve] Actually, I just remembered one other timer that it will interfere with. Once data has been written, there is a flush timer that pops about every second (per region) and does some IO (flushes dirty buffers) that will be affected by the time spent in Lua/MLua with its disabled SIGALRMs. So long as it doesn't get too far behind, it should be OK. But keep a watch on the number of dirty buffers to make sure all buffers aren't dirty which would make a read wait until some buffers were flushed. |
Beta Was this translation helpful? Give feedback.
-
I'm getting close to having everything I need for the warnings in the docs. Just a bit more clarification on this one:
Suppose a lua-yottadb app calls M, and M returns. Now the Lua app has what it needs and will spend the next 10 minutes without returning to M. It would make sense to flush YDB IO buffers at this point (especially write buffers!). Is there some API or other means that lua-yottadb can use to force YDB to flush buffers? What about YDB database changes -- will they also need flushing? Am I right in thinking that calling M command I'm having trouble determining the exact overlap between the various VIEW flush commands (DBFLUSH, DBSYNC, EPOCH, FLUSH). I'm guessing that FLUSH does all of them including and fsync, is that right?
Are we talking about PEEKBYNAME? From the manual:
|
Beta Was this translation helpful? Give feedback.
-
I'm editing out some email interaction here, but Steve's answers to the question above, come back later as follows:
[Steve] I don’t know that your questions apply anymore as we really don’t want to get you involved in buffer flushes and having to watch the free blocks but I’ll answer them anyway:
|
Beta Was this translation helpful? Give feedback.
-
Steve's summary of the discussion so far: OK, that's great! We'll work on the "easier one" first (MLua)! I do want to pursue separate tracks for dealing with these two products as I think that while the questions may be the same for each of the two products, the answers may be different because of the difference in perspective (MLua - Spends most of its time in M, calls into Lua for short periods / lua-yottadb - Spends most of its time in Lua with sporadic calls into M). Also, please be sure to copy support on all email back to me/us so everyone knows what the situation is and someone can step in and respond if I'm not available. I’ve had a chance to sit back and think about these and have some internal discussions so let's back up a bit and summarize where we are - I think we went down a few rabbit holes in our previous wide-ranging discussions when I didn't understand the needs of the two products and where they fit into things. So I think we need to pull back and start at the top again with a more precise focus. First off, I’d like to talk about testing. I assume you have tests for both of these products? Are these tests things you can share with us? It would be good to have some tests that can be added to our test system so when we run regression tests (every weekend across several systems), the Lua wrapper gets a good workout. We’d actually like to incorporate the lua-yottadb wrapper into our own tests and get them to the level our other wrappers are – we have similar tests we write for each language wrapper supported. It may be that wrapper changes need to be made to get there as we do some significant testing using signals. Once MLua is complete, we could add its tests in as well. That gives us an early warning if something we change affects a wrapper or some other form of plug-in unintentionally. Next, lets talk about MLua:
Now let’s talk about lua-yottadb:
|
Beta Was this translation helpful? Give feedback.
-
Hi Steve. Thanks for the clear email summarising our discussion, and the distinctions between the various VIEW "FLUSH" commands. I understand and agree with you that lua-yottadb has a different perspective since one would think it spends most of its time in Lua. MLua
Yes, we have unit tests which you can run yourself with I have implemented MLua handling of signals using option 3 as we discussed. Blocking during Lua code is on by default but you can switch it off and handle EINTR errors yourself, if you wish. The list of blocked signals is in mlua.c:
For MLua I think it best not to handle (or block) SIGINT -- retaining the YDB ^C hander suits this scenario better. Please take the time to read my updated 'Signals'` section of the README and comment on the PR if you see issues. You say you are now starting to think that the SA_RESTART option might be better, at least for SIGALRM. [Edit] What specific task might this help with? Flushing? Does the YDB signal handler actually perform the flush, or does it simply defer the flush until the current system call returns? If the latter, then the SA_RESTART method will have no benefit, because Lua will stay in its long-running system call and not return to YDB. Lua-yottadbI have also implemented lua-yottadb. You can also clone this functionality from my branch and from the associated PR #22 on upstream. I have not written a unit test for lua-yottadb yet, but it will be similar to the I have added reference docs on the relevant functions yottadb.require() and ydb_signals(). But I have not yet updated the README.md (though it will be similar to the MLua README linked above). Let me know if it looks good to you. |
Beta Was this translation helpful? Give feedback.
-
[Steve] The point of SA_RESTART is that the handler runs immediately. Now that said, buffer flushes can be deferred for a number of reasons but most probably, they will run immediately. |
Beta Was this translation helpful? Give feedback.
-
Excellent. Then it is worth implementing this. Thanks for the clarification. |
Beta Was this translation helpful? Give feedback.
-
I will record below a discussion I had with Steve from YottaDB on this topic on 10 March 2023.
I am looking into the best way to make MLua handle EINTR errors from signal interruptions. There appears to be more than one way to solve this, and I have some questions about which ones are permissible. I'm rather new to the details of Linux signals, so I've been reading up, but forgive me if these questions are a little obvious.
YDB recommends that the C application wrap all relevant system calls to re-try when EINTR is returned. This should be possible for the Lua core since it uses a few of the affected system calls. However, that will not protect user-imported modules like the socket library. Hence, and to avoid code changes internal to Lua itself, I'm looking into alternative ways.
Simplest option: Can I simply turn off the signals (e.g. SIGALRM) using pthread_sigmask() while running Lua code? They will be queued and will thus run when I return from Lua code. Or does ydb depend on counting how many were missed? In which case, ydb could have used a real-time signal to achieve this, since real-time signals queue all missed interrupts.
Another option: It seems to me that if YDB were to set the SA_RESTART flag when it sets up the signals, this would resolve the problem. Is there a particular reason that YDB does not set this flag? Specifically, would there be a problem if I made MLua to switch on the SA_RESTART for all the signals that ydb uses? This can be done dynamically using sigaction(). I see that resty-cli solved this problem this way here.
Beta Was this translation helpful? Give feedback.
All reactions