Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coroutines don't work in Hammerspon #2306

Closed
asmagill opened this issue Feb 18, 2020 · 22 comments
Closed

Coroutines don't work in Hammerspon #2306

asmagill opened this issue Feb 18, 2020 · 22 comments

Comments

@asmagill
Copy link
Member

asmagill commented Feb 18, 2020

At all.

With the following from a coroutine tutorial (https://riptutorial.com/lua/example/11751/create-and-use-a-coroutine):

thread2 = coroutine.create(function()
        for n = 1, 5 do
            print("honk "..n)
            coroutine.yield()
        end
    end)

as soon as I enter coroutine.resume(thread2), I get:

false	NSInvalidArgumentException: NSConcreteAttributedString initWithString:: nil value
(
	0   CoreFoundation                      0x00007fff4bcb7acd __exceptionPreprocess + 256
	1   libobjc.A.dylib                     0x00007fff763b9a17 objc_exception_throw + 48
	2   CoreFoundation                      0x00007fff4bcb78ff +[NSException raise:format:] + 201
	3   Foundation                          0x00007fff4dec59b6 -[NSConcreteAttributedString initWithString:] + 222
	4   Foundation                          0x00007fff4dee2392 -[NSConcreteAttributedString initWithString:attributes:] + 27
	5   internal.so                         0x0000000115fa369e console_printStyledText + 3198
	6   LuaSkin                             0x000000010de00107 luaD_precall + 5239
	7   LuaSkin                             0x000000010dde63fd luaV_execute + 142605
	8   LuaSkin                             0x000000010de06f19 resume + 1065
	9   LuaSkin                             0x000000010dcd91ac luai_objcttry + 428
	10  LuaSkin                             0x000000010ddf8ed3 luaD_rawrunprotected + 1155
	11  LuaSkin                             0x000000010de056ce lua_resume + 4334
	12  LuaSkin                             0x000000010dd7231a auxresume + 186
	13  LuaSkin                             0x000000010dd71ce9 luaB_coresume + 121
	14  LuaSkin                             0x000000010de00107 luaD_precall + 5239
	15  LuaSkin                             0x000000010dde6a57 luaV_execute + 144231
	16  LuaSkin                             0x000000010de0415c luaD_call + 236
	17  LuaSkin                             0x000000010de04532 luaD_callnoyield + 194
	18  LuaSkin                             0x000000010dd9d86a f_call + 410
	19  LuaSkin                             0x000000010dcd91ac luai_objcttry + 428
	20  LuaSkin                             0x000000010ddf8ed3 luaD_rawrunprotected + 1155
	21  LuaSkin                             0x000000010de0a031 luaD_pcall + 1089
	22  LuaSkin                             0x000000010dd9c0e8 lua_pcallk + 4872
	23  LuaSkin                             0x000000010ddf5ffd luaB_xpcall + 189
	24  LuaSkin                             0x000000010de00107 luaD_precall + 5239
	25  LuaSkin                             0x000000010dde63fd luaV_execute + 142605
	26  LuaSkin                             0x000000010de0415c luaD_call + 236
	27  LuaSkin                             0x000000010de04532 luaD_callnoyield + 194
	28  LuaSkin                             0x000000010dd9d86a f_call + 410
	29  LuaSkin                             0x000000010dcd91ac luai_objcttry + 428
	30  LuaSkin                             0x000000010ddf8ed3 luaD_rawrunprotected + 1155
	31  LuaSkin                             0x000000010de0a031 luaD_pcall + 1089
	32  LuaSkin                             0x000000010dd9c0e8 lua_pcallk + 4872
	33  LuaSkin                             0x000000010dca4de5 -[LuaSkin protectedCallAndTraceback:nresults:] + 709
	34  Hammerspoon                         0x000000010d63d21b MJLuaRunString + 1147
	35  Hammerspoon                         0x000000010d631889 -[MJConsoleWindowController run:] + 345
	36  Hammerspoon                         0x000000010d631d26 -[MJConsoleWindowController tryMessage:] + 1014
	37  AppKit                              0x00007fff494e8644 -[NSApplication(NSResponder) sendAction:to:from:] + 312
	38  AppKit                              0x00007fff49552992 -[NSControl sendAction:to:] + 86
	39  AppKit                              0x00007fff494e468f -[NSTextField textDidEndEditing:] + 924
	40  Hammerspoon                         0x000000010d636288 -[HSGrowingTextField textDidEndEditing:] + 584
	41  CoreFoundation                      0x00007fff4bc64346 __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 12
	42  CoreFoundation                      0x00007fff4bc642c0 ___CFXRegistrationPost_block_invoke + 63
	43  CoreFoundation                      0x00007fff4bc6422a _CFXRegistrationPost + 404
	44  CoreFoundation                      0x00007fff4bc6c6d8 ___CFXNotificationPost_block_invoke + 87
	45  CoreFoundation                      0x00007fff4bbd5064 -[_CFXNotificationRegistrar find:object:observer:enumerator:] + 1642
	46  CoreFoundation                      0x00007fff4bbd4417 _CFXNotificationPost + 732
	47  Foundation                          0x00007fff4de5ba7b -[NSNotificationCenter postNotificationName:object:userInfo:] + 66
	48  AppKit                              0x00007fff495f4efb -[NSTextView(NSPrivate) _giveUpFirstResponder:] + 415
	49  AppKit                              0x00007fff49521ca6 -[NSTextView doCommandBySelector:] + 194
	50  AppKit                              0x00007fff49521bba -[NSTextInputContext(NSInputContext_WithCompletion) doCommandBySelector:completionHandler:] + 228
	51  AppKit                              0x00007fff49516e1a -[NSKeyBindingManager(NSKeyBindingManager_MultiClients) interpretEventAsCommand:forClient:] + 2972
	52  AppKit                              0x00007fff49be2fdf __84-[NSTextInputContext _handleEvent:options:allowingSyntheticEvent:completionHandler:]_block_invoke_5 + 341
	53  AppKit                              0x00007fff49be2e75 __84-[NSTextInputContext _handleEvent:options:allowingSyntheticEvent:completionHandler:]_block_invoke_3.784 + 74
	54  AppKit                              0x00007fff4951df23 -[NSTextInputContext tryHandleEvent_HasMarkedText_withDispatchCondition:dispatchWork:continuation:] + 87
	55  AppKit                              0x00007fff49be2dfb __84-[NSTextInputContext _handleEvent:options:allowingSyntheticEvent:completionHandler:]_block_invoke.781 + 237
	56  HIToolbox                           0x00007fff4aec7efb __TSMProcessRawKeyEventWithOptionsAndCompletionHandler_block_invoke_5 + 70
	57  HIToolbox                           0x00007fff4aec6db9 ___ZL23DispatchEventToHandlersP14EventTargetRecP14OpaqueEventRefP14HandlerCallRec_block_invoke + 110
	58  AppKit                              0x00007fff49bdd70e __55-[NSTextInputContext handleTSMEvent:completionHandler:]_block_invoke.265 + 575
	59  AppKit                              0x00007fff49518512 __55-[NSTextInputContext handleTSMEvent:completionHandler:]_block_invoke_2 + 74
	60  AppKit                              0x00007fff4951849a -[NSTextInputContext tryHandleTSMEvent_HasMarkedText_withDispatchCondition:dispatchWork:continuation:] + 87
	61  AppKit                              0x00007fff49517c92 -[NSTextInputContext handleTSMEvent:completionHandler:] + 1749
	62  AppKit                              0x00007fff49517545 _NSTSMEventHandler + 306
	63  HIToolbox                           0x00007fff4ae5d22e _ZL23DispatchEventToHandlersP14EventTargetRecP14OpaqueEventRefP14HandlerCallRec + 1422
	64  HIToolbox                           0x00007fff4ae5c5df _ZL30SendEventToEventTargetInternalP14OpaqueEventRefP20OpaqueEventTargetRefP14HandlerCallRec + 371
	65  HIToolbox                           0x00007fff4ae5c465 SendEventToEventTargetWithOptions + 45
	66  HIToolbox                           0x00007fff4aec3f8f SendTSMEvent_WithCompletionHandler + 383
	67  HIToolbox                           0x00007fff4aec43fa __SendUnicodeTextAEToUnicodeDoc_WithCompletionHandler_block_invoke + 387
	68  HIToolbox                           0x00007fff4aec4268 __SendFilterTextEvent_WithCompletionHandler_block_invoke + 221
	69  HIToolbox                           0x00007fff4aec3fde SendTSMEvent_WithCompletionHandler + 462
	70  HIToolbox                           0x00007fff4aec3de3 SendFilterTextEvent_WithCompletionHandler + 225
	71  HIToolbox                           0x00007fff4aec3aa4 SendUnicodeTextAEToUnicodeDoc_WithCompletionHandler + 280
	72  HIToolbox                           0x00007fff4aec384e __utDeliverTSMEvent_WithCompletionHandler_block_invoke_2 + 283
	73  HIToolbox                           0x00007fff4aec36ad __utDeliverTSMEvent_WithCompletionHandler_block_invoke + 355
	74  HIToolbox                           0x00007fff4aec34cb TSMKeyEvent_WithCompletionHandler + 598
	75  HIToolbox                           0x00007fff4aec325a __TSMProcessRawKeyEventWithOptionsAndCompletionHandler_block_invoke_4 + 250
	76  HIToolbox                           0x00007fff4aec3089 __TSMProcessRawKeyEventWithOptionsAndCompletionHandler_block_invoke_3 + 257
	77  HIToolbox                           0x00007fff4aec2dce __TSMProcessRawKeyEventWithOptionsAndCompletionHandler_block_invoke_2 + 282
	78  HIToolbox                           0x00007fff4aec2b32 __TSMProcessRawKeyEventWithOptionsAndCompletionHandler_block_invoke + 274
	79  HIToolbox                           0x00007fff4aec2127 TSMProcessRawKeyEventWithOptionsAndCompletionHandler + 3398
	80  AppKit                              0x00007fff49be2cd9 __84-[NSTextInputContext _handleEvent:options:allowingSyntheticEvent:completionHandler:]_block_invoke_3.779 + 110
	81  AppKit                              0x00007fff49be22d6 __204-[NSTextInputContext tryTSMProcessRawKeyEvent_orSubstitution:dispatchCondition:setupForDispatch:furtherCondition:doubleSpaceSubstitutionCondition:doubleSpaceSubstitutionWork:dispatchTSMWork:continuation:]_block_invoke.734 + 115
	82  AppKit                              0x00007fff49be21c7 -[NSTextInputContext tryTSMProcessRawKeyEvent_orSubstitution:dispatchCondition:setupForDispatch:furtherCondition:doubleSpaceSubstitutionCondition:doubleSpaceSubstitutionWork:dispatchTSMWork:continuation:] + 245
	83  AppKit                              0x00007fff49be2892 -[NSTextInputContext _handleEvent:options:allowingSyntheticEvent:completionHandler:] + 1286
	84  AppKit                              0x00007fff49be2086 -[NSTextInputContext _handleEvent:allowingSyntheticEvent:] + 107
	85  AppKit                              0x00007fff49516004 -[NSView interpretKeyEvents:] + 209
	86  AppKit                              0x00007fff49515e2d -[NSTextView keyDown:] + 726
	87  AppKit                              0x00007fff49363367 -[NSWindow(NSEventRouting) _reallySendEvent:isDelayedEvent:] + 6840
	88  AppKit                              0x00007fff49361667 -[NSWindow(NSEventRouting) sendEvent:] + 478
	89  AppKit                              0x00007fff49201889 -[NSApplication(NSEvent) sendEvent:] + 2953
	90  AppKit                              0x00007fff491ef5c0 -[NSApplication run] + 755
	91  AppKit                              0x00007fff491deac8 NSApplicationMain + 777
	92  Hammerspoon                         0x000000010d62b3d2 main + 34
	93  libdyld.dylib                       0x00007fff77b883d5 start + 1
)

I suspected this might happen if the coroutine actually invoked anything that used LuaSkin, but had forgotten that we no longer use lua_pcall anywhere by itself -- it's always invoked indirectly through LuaSkin's protectedCallAndTraceback:nresults:.

As I'm coming to understand it:

  • Reason: creating a coroutine invokes lua_newthread, which creates a new lua_State variable; when the coroutine is running (i.e. resume is invoked), it's the new state's stack that has things pushed/pulled from it, not the one that LuaSkin was initialized with...

  • Fix: Nontrivial. Not only would we need to change every [LuaSkin shared] line in our C-functions to something like [LuaSkin sharedWithState:L], we'd need to implement lua_lock and lua_unlock so that callbacks could still use [LuaSkin shared] (and thus use the initial state created when first initialized/reloaded) to protect the shared global space between the differing lua_State's.

Which is half-way to making the whole thing support multi-threading (at the Application level, not the psuedo co-routine lua level) anyways (see point 3 at https://stackoverflow.com/a/54046050).

So while the fix might be a "good idea"™, implementing it will be a giant PITA.

Unless I'm missing something obvious (somebody smarter please tell me that I am! Please, please, please!)

@asmagill asmagill changed the title Coroutines don'r work in Hammerspon Coroutines don't work in Hammerspon Feb 18, 2020
@asmagill
Copy link
Member Author

asmagill commented Feb 18, 2020

Ok, not quite "don't work at all"...

if the code which creates and uses the coroutine is in one block, and doesn't use anything which involves LuaSkin (i.e. it's pure lua, not even a single function we've added that uses checkArgs:), then it will work:

This works (when pasted into the console as one "line" of input):

a = ""
cothread2 = coroutine.create(function()
    for n = 1, 5 do
        a = a .. "honk "..n.."\n"
        coroutine.yield()
    end
end)
while coroutine.status(cothread2) ~= "dead" do coroutine.resume(cothread2) end
print(a)

This crashes (when pasted into the console as one "line" of input) -- yeah, it's a stupid example, but hs.canCheckForUpdates was the first thing I found that only uses LuaSkin for a single [skin checkArgs:LS_TBREAK];, about the simplest usage I could think of):

a = ""
cothread2 = coroutine.create(function()
    for n = 1, 5 do
        a = a .. tostring(hs.canCheckForUpdates()) .. "\n"
        coroutine.yield()
    end
end)
while coroutine.status(cothread2) ~= "dead" do coroutine.resume(cothread2) end
print(a)

edit: made it more clear that blocks should be pasted in as one entry if testing in the console

@asmagill
Copy link
Member Author

Obviously few/none of our users currently utilize coroutines, or we'd have heard something by now, but it is a feature lua promotes, so its only a matter of time...

Possible short term band-aids that I can think of:

  • remove coroutine library from our implementation of lua (either by modifying lua source, or adding coroutine = nil to hs._coresetup. Cowardly cop-out, and removes the possibility of the one case that does work -- using them within a single file/function/block with pure lua.

  • partially implement above fix -- modify functions to use [LuaSkin sharedWithState:L] and throw a lua error if L doesn't == skin.L that says something like LuaSkin incompatible with coroutines. Pro: should prevent crashes; Con: inconsistent module support within coroutines -- older modules (pre LuaSkin) may partially work, if they don't use LuaSkin to check arguments or manipulate data; newer or updated stuff won't work at all, since we've pretty much standardized on some common patterns that use LuaSkin a lot.

  • continue to hope no one notices. Cons: obvious.

In any case, if we do end up adding #2304, hs_yield can't be invoked within a coroutine because we haven't implemented lua_lock and lua_unlock (yet); but I think I can easily make this detect such a situation by checking lua_isyieldable() so this issue isn't a show-stopper (though it does remove the need for one of the tests I was considering)

More thoughts if I think of them...

@latenitefilms
Copy link
Contributor

I've tried to use coroutines in the past, but it just made my head hurt so I gave up. These crashes could explain some of the issues I was running into at the time.

@asmagill
Copy link
Member Author

Ok, after some more pondering...

lua_lock and lua_unlock actually shouldn't be necessary, if we stay on a single application thread...

We'll still need to make LuaSkin multi lua_State* compatible for coroutine support (which is also a necessary step for supporting multiple application threads), but since only a single "lua thread" will ever be active at a given instant in a singe application threaded model, lua_lock/unlock don't help us.


@cmsj, some ideas I'd like your comments on about how to make LuaSkin support different lua_State values (these both assume adding sharedWithState:(lua_State *L) and changing functions in our modules -- but not in delegates or other callbacks -- to use it instead of the shared contructor):

  1. add (readonly?) property to store initial lua_State into during creation; when shared called, set self._L to this value; when sharedWithState:L called, set self._L to the L passed in. Pros: simple, Cons: may have to call sharedWithState:L multiple times within a function if it's possible the current value of skin._L might have changed -- currently (I think) this only affects the proposed hs_yield, but maybe also others, if we ever write C functions which utilize/participate in coroutines; won't be sufficient if we ever add true multi application thread support.

  2. create (and store to avoid re-creating?) new LuaSkin instance for each new lua_State seen. Pros: muti application thread safe (if/when we decide to go there), as long as no single lua_State is allowed to run on more than one thread at a time; Cons: more changes to LuaSkin because registered type helpers, etc. have to be moved to class properties; how do we detect when lua_thread is garbage collected (so we can remove its LuaSkin instance, if we saved it)? How do we handle a reload if multiple instances still exist (if we can attach a __gc to a lua_thread type, this might be a non-issue, but I don't think we can)

Other possibilities I'm missing?


Re multiple application threads -- If we do end up adding hs.yield, and maybe even consider attaching it to a debug hook so it runs after every n lua calls, the extra effort to support multiple application threads becomes less attractive to me. IMHO the times when yielding isn't sufficient might be better served by moving something into Objective-C itself which is a lot faster and most things become easy (well, easier) to multithread and trigger a callback upon completion.

Just my (current) 2 cents... ask me again in a month and I'll probably think something totally different 😜

@asmagill
Copy link
Member Author

  1. modify the lua5.3 source for coroutine.yield and coroutine.resume so that they change LuaSkin's internal property for lua_State; Pros: simplest of all requiring no (or very minimal) changes to existing modules; Cons: patching lua source further, which means may have to revisit with 5.4, not sure yet how nested coroutines can be, may need our own stack like structure to remember what to set the property back to. Potentially screwing up something that none of us are that familiar with, so will we really notice subtle changes/errors? Would complicate supporting multiple application threads, if that is something we decide to do later.

@cmsj
Copy link
Member

cmsj commented Feb 18, 2020

If I’m reading this right, it seems like we’ve broken coroutines by building LuaSkin (and really everything else) to assume that there is only ever one lua_State object.
That being the case, is it actually a good idea to try and remove such a core assumption?

@asmagill
Copy link
Member Author

In short, yes... though it's still probably a good idea to keep the main lua thread's state, maybe as a class property, so callbacks/delegates/other places where no lua state is passed in don't have to jump through too many hoops. As long as we make sure that the primary state always runs on the main application thread (even if we decide to allow application threading at a later date), they'll run one at a time and shouldn't pose a problem.

But otherwise, yes, we really should support multiple states everywhere else.

@cmsj
Copy link
Member

cmsj commented Feb 19, 2020

I'm pretty wary of making this kind of change for a Lua feature that's apparently so unpopular that nobody has noticed it's been broken for over 5 years.

For the pure C entrypoints (ie callbacks/delegates), how would we do anything other than fetch the main state object? As-in, how would the C entrypoint even know which state object it should try and use? If it could find the right state object, how would it know whether or not the co-routine that created it, still exists?

Even assuming we could figure that sort of stuff out, this would also mean some fairly significant changes to LuaSkin - not only do we only have one lua_State object at the moment, we intentionally only have one LuaSkin object no matter how many times [LuaSkin shared] is called. Would each new co-routine state object need to inherit all of the registered modules/helpers?

I take the point that it's bad we've broken co-routines, but I'm honestly not sure I want to fix them again, if the cost is significantly higher complexity, and a big risk of breaking in all sorts of weird new ways.

@latenitefilms
Copy link
Contributor

FWIW - I guess my question would be WHY do we need coroutines, and is there a better way to achieve the same outcome with another solution? Most of the things I'd use coroutines for can be implemented in other ways, such as running things on another thread in Objective-C-land and using a callback.

For now, I'd just make coroutines = nil and document it somewhere, so it's clear that coroutines aren't supported in Hammerspoon. If someone else NEEDS them in the future, we can always re-assess.

@sharpobject
Copy link

Hello,

Coroutines are a fundamental control flow construct that ships with the language. Removing them is comparable to removing the function keyword. I only recently learned that Hammerspoon exists, and I am willing to try to fix the mess that currently prevents it from working with coroutines and with moonjit.

@asmagill
Copy link
Member Author

Callbacks should always run on the main lua_State (the lua docs call it a thread, which I've sometimes called a "lua thread" to be pedantic, but it's really primarily a separate isolated lua stack) --the initial lua_State created when the lua engine was initialized -- because they are a new code-block triggered by the external event, not the resuming of something a coder suspended.

That's why I recommend creating the new method sharedWithState: and applying it only to functions and methods we add, but not within the callback/delegate/event code that handles externally triggered application events. If we go with the first option I outline above, the internal changes to LuaSkin are manageable and there is still only one instance required.

As to why its taken so long to find out... well, most of our users aren't programmers per se, and those that are didn't start with Lua... coroutines are a fundamental part of the language, and the primary answer to the fact that it's by default single threaded. I've meant to learn about them for a long time, but since (1) potentially long running lua, no matter whether its on 1 "lua thread" or 1000 causes Hammerspoon to block, but with #2304 that might no longer be an issue, and (2) I personally can code relatively quickly in Objective-C and can use a background thread it if I want.

If a true lua developer wanted to do something, they'd expect coroutines to exist and wouldn't bother without them.

As I have come to understand the problem, the fix outlined as the first option above won't be difficult, just tedious.

I'm opposed to burying our head in the sand by removing that the library -- we won't be using Lua, then, but a subset of it. Subsets are fine in memory constrained environments, not on a modern desktop.

As a first step, I think adding the following after the NSThread check in the sharedWithDelegate: method of LuaSkin will at least stop the application crashes, though 90% of our module functions/methods still won't be usable within a coroutine:

    int isMainThread = lua_pushthread(self.L) ;
    lua_pop(self.L, 1) ; // remove thread we just pushed onto the stack
    if (!isMainThread) {
        luaL_error(self.L, "LuaSkin shared object incompatible with coroutines") ;
        return nil ;
    }

I'll test this in a few hours to be certain.

@asmagill
Copy link
Member Author

Ok, no, that's stupid... we'd just be checking the saved state, which we already know is the main "thread"... let me think some more...

@sharpobject
Copy link

sharpobject commented Feb 19, 2020

If the limitation is only that you cannot call Hammerspoon API functions from coroutines, you may be able to "fix" the problem from the Lua side by implementing asymmetric coroutines on top of symmetric coroutines on top of asymmetric coroutines, placing a trampoline inside of your new fake coroutine.resume, and wrapping Hammerspoon API functions to make sure they are called from the main lua_thread even when the user tries to use them from inside a coroutine.

A sketch of what this would look like, on the library side, which I've called co_wrap.lua over here:

local select = select
local table_insert = table.insert
local table_remove = table.remove
-- utility to save multiple returns or varargs for later use
local NIL = {}
local function drop_one(x, ...)
  return ...
end
local save_exprs
save_exprs = function(return_first_x, ...)
  if return_first_x == 0 then
    local ret = {}
    for i=1,select("#", ...) do
      local item = select(i, ...)
      if item == nil then
        item = NIL
      end
      ret[i] = item
    end
    return ret
  end
  return select(1, ...), save_exprs(return_first_x-1, drop_one(...))
end
local unsave_exprs
unsave_exprs = function(t, idx, len)
  if not idx then
    idx = 0
    len = #t
  end
  if idx == len then
    return
  end
  idx = idx + 1
  local ret = t[idx]
  if ret == NIL then
    ret = nil
  end
  return ret, unsave_exprs(t, idx, len)
end

local co_stack = {}
local co_running = {}
local n_cos = 0
local real_yield = coroutine.yield
local real_resume = coroutine.resume
local real_status = coroutine.status
local real_create = coroutine.create
function coroutine.resume(co, ...)
  -- can't have the same coroutine running twice in our fake stack of coroutines 
  if co_running[co] then
    return false, "cannot resume non-suspended coroutine"
  end
  -- if we are called from the main thread, resume and trampoline
  if n_cos == 0 then
    n_cos = n_cos + 1
    co_stack[n_cos] = co
    co_running[co] = true
    local ok, op
    local stuff = save_exprs(0, co, ...)
    while true do
      local prev_co = stuff[1]
      ok, op, stuff = save_exprs(2, real_resume(unsave_exprs(stuff)))
      if ok and op == "resume" then
        -- trampoline and pass stuff to nested co
        n_cos = n_cos + 1
        co_stack[n_cos] = stuff[1]
        co_running[stuff[1]] = true
      elseif ok and (op == "yield" or op == "die") then
        -- coroutine yielded or died
        -- trampoline and pass stuff to parent co
        co_running[prev_co] = nil
        co_stack[n_cos] = nil
        n_cos = n_cos - 1
        if n_cos == 0 then
          -- parent co is topmost lua thread, so return
          return true, unsave_exprs(stuff)
        else
          table_insert(stuff, 1, true)
          table_insert(stuff, 1, co_stack[n_cos])
        end
      elseif ok and op == "call" then
        -- coroutine wants to call some function from the main thread
        -- get the results and pass them back to same co
        local func = table_remove(stuff, 1)
        stuff = save_exprs(0, func(unsave_exprs(stuff)))
        table_insert(stuff, 1, co_stack[n_cos])
      elseif not ok then
        -- coroutine errored
        -- trampoline and pass stuff to parent co
        co_running[prev_co] = nil
        co_stack[n_cos] = nil
        n_cos = n_cos - 1
        if n_cos == 0 then
          -- parent co is topmost lua thread, so return
          return false, op, unsave_exprs(stuff)
        else
          table_insert(stuff, 1, op)
          table_insert(stuff, 1, false)
          table_insert(stuff, 1, co_stack[n_cos])
        end
      else
        error("Fake coroutine lib: Not sure what happened")
      end
    end
  else
    -- tell the loop on the main thread that I'd like to resume pls
    return real_yield("resume", co, ...)
  end
end

function coroutine.yield(...)
  -- tell the loop on the main thread that I'd like to yield pls
  return real_yield("yield", ...)
end

function coroutine.status(co)
  if co_running[co] then
    return "running"
  else
    return real_status(co)
  end
end

function coroutine.create(f)
  return real_create(function(...)
    -- tell the loop on the main thread that I'd like to die pls
    return "die", f(...)
  end)
end

function coroutine.call_from_main_thread(f)
  return function(...)
    if n_cos == 0 then
      return f(...)
    else
      -- tell the loop on the main thread that I'd like to call pls
      return real_yield("call", f, ...)
    end
  end
end

--TODO: also reimplement coroutine.wrap

--coroutine.running should work fine as is.

On the user side:

require"co_wrap"
get_main_thread = coroutine.call_from_main_thread(coroutine.running)
get_my_thread = coroutine.running
a = coroutine.create(function()
    local cos = {}
    local function print_some_numbers(n)
      for i=1,n do print("inner", i, coroutine.yield(i*i)) end
      print(get_main_thread())
      print(get_my_thread())
    end
    for i=1,3 do
      cos[i] = coroutine.create(print_some_numbers)
    end
    local n = 1
    while true do
      for i=1, #cos do
        if coroutine.status(cos[i]) ~= "dead" then
          print("outer", coroutine.resume(cos[i],n))
          n = n + 1
        end
      end
      print(coroutine.yield())
    end
  end)

for i=1,5 do print(coroutine.resume(a)) end

This prints

outer	true	1
outer	true	1
outer	true	1
true

inner	1	4
thread: 0x7f901b800608	true
thread: 0x7f901b408c08	false
outer	true
inner	1	5
outer	true	4
inner	1	6
outer	true	4
true

inner	2	7
thread: 0x7f901b800608	true
thread: 0x7f901b408f88	false
outer	true
inner	2	8
outer	true	9
true

inner	3	9
thread: 0x7f901b800608	true
thread: 0x7f901b408928	false
outer	true
true

true

If you remove lines 8 and 9, then you can get output that doesn't change based on the addresses of coroutines, and you can try running it with and without co_wrap. Over here it produces identical output with and without co_wrap on lua 5.3.5. This isn't a guarantee that I haven't written any bugs, of course.

Edit: While this may let users run code that uses coroutines, it won't be as fast as you might like. If you'd like to go fast, I think the right play is to fix the problem for real and to use some version of luajit.

@asmagill
Copy link
Member Author

Maybe... If I'm understanding this correctly (can't test it right now, not at my computer, so this is all though experiments for me for the next couple of hours or so), it still puts the onus on the user to know what can and can't be run within a coroutine, and if they get it wrong, the Hammerspoon application crashes.

Long term, we need to implement sharedWithState: in LuaSkin.

Short term, I need a way to detect, without having access the the lua_State passed into a C function, whether or not the state stored in the LuaSkin properties matches it or not...

Let me make that clearer, cuz I only understand it because I'm thinking about it right now and it still sounds like mud to me...

For the following, the "active" lua_State is the one passed into the C function by the lua engine.

Lets say I have a C function accessible from lua:

static int my_func(lua_State *L) {
    LuaSkin *skin = [LuaSkin shared] ; // get the shared LuaSkin instance
    [skin checkArgs:LS_TNUMBER, LS_TBREAK] ; // check the args on the stack
    ... do our stuff
    return 1;
}

[skin checkArgs:...] does some type checking on the stack passed in... internally it uses a lot of lua_type(...) and lua_to...(...) function calls, but all of them are done with skin.L as the state, not the state that was passed into my_func. And this is what causes the application crash when skin.L != L.

We could add if (skin.L != L) { return luaL_error(L, "not coroutine safe") ; } after the first line, but we'd have to do it for every function and method we've added, and that's a crap-ton. Plus it doesn't make them work in coroutines, when simply changing skin.L for the duration of this C function would, so I'm willing to do the crap-ton once to switch [LuaSkin shared] to [LuaSkin sharedWithState:L] but not twice: once to make it safer (i.e. non-crashy), then again to make it work with coroutines, once I've made the necessary changes to LuaSkin itself.

It would be so much easier to test whether the "active" lua_State == skin.L within [LuaSkin shared], but as you can see, that's not passed L, and so far, rummaging through the source for lua, I'm not seeing a way to determine which lua_State is the "active" one (i.e. the one that initiated the most recent lua_(p)call).

To complicate matters, when handling a callback from the macOS, LuaSkin *skin = [LuaSkin shared] ; gets invoked when the lua engine is idle and NO lua_(p)call has been performed (so there is no "active" state)... now granted, in this case, a coroutine didn't cause the callback, so using skin.L on the main lua thread is fine, but it does complicate determining which lua_State is "active", assuming I can find a way at all...

@sharpobject
Copy link

Hammerspoon could wrap all the functions that must be wrapped before exposing them to the user. I'm not sure if you have a big list of those somewhere though.

@sharpobject
Copy link

I don't think there's an easy way to find the active state, but there is an easy way to find the main state. So it's too bad you're not asking the opposite question of "I have an active state, and can you tell me if it's the main one?"

It seems like to solve the problem for real, checkArgs should accept a lua_State.

@asmagill
Copy link
Member Author

asmagill commented Feb 19, 2020

There are actually a lot of other helpers that LuaSkin provides, so I think the better solution is to change the first line to LuaSkin *skin = [LuaSkin sharedWithState:L] in our functions, but leave it LuaSkin *skin = [LuaSkin shared] in our callback handlers...

Internally, if withState is called, skin.L is set to that, and if not (or NULL is passed, which might be clearer in intent), it will set it to the main one which we can cache in a class property.

That requires the least changes (outside of LuaSkin itself) but still leaves open the possibility for a crash if one our developers uses the wrong version of shared in the wrong place, which is why I was hoping for a way to detect it...

Oh well, I think then moving all of them to sharedWithState: is the way to go, but allow it to be NULL; that way, using [LuaSkin shared] won't compile and the dev will have to make an explicit choice.

@cmsj, not sure if you've been following these last comments, but unless you have objections, I think this is what I'm going to start doing on a new branch tonight/tomorrow so we can put it out for testing hopefully in a couple of days.

@cmsj
Copy link
Member

cmsj commented Feb 19, 2020

@asmagill I am following along. I have no objections to trying to fix this, but I have some fairly big concerns about the viability of this.

I hope I'm wrong, but my suspicion is that we're going to very quickly find that all sorts of things are going to break - e.g. we don't just have an assumption that there is only one lua_State object, we also assume there's only one LuaSkin object which is 100% coupled to the single Lua state. We assume that each library can have its own global refTable and that assumption is now on very shaky ground. Those are just the ones that come immediately to mind.

tl;dr I don't currently see how any of our C API can be safe when it's being interacted with by multiple Lua states.

@asmagill
Copy link
Member Author

I think we’re OK as long as we stick to one application thread, since every function (with the exception of the proposed hs.yield and I think I know how to address that) runs to completion before another L could sneak in there.

A few of the functions which offer to run in the background but don’t return a userdata will have to be examined to make sure they ask for a new skin object rather than inherit their parents before the dispatch_async, but I’ve been meaning to re-evaluate these anyways because they are also prone to crashing if the callback doesn’t trigger till after a reload, so...

At any rate, the only way to know is to try it; we can keep it in a branch so only devs run it for a while if you like.

@cmsj
Copy link
Member

cmsj commented Feb 19, 2020

@asmagill I think I'm mostly concerned about all the state in C that will now have multiple Lua environments mutating it.

I agree that the only way to be sure is to try, so maybe if we have a branch with the modifications to LuaSkin and then we need to hammer the HS API from the coroutines and the main thread and see what happens.

@asmagill
Copy link
Member Author

Admittedly this is a sort of band-aid over a framework that should probably (eventually) be refactored now that we've realized that one of the core assumptions that went into making it was mistaken, but I think it should work, once we've addressed all dispatch_* blocks that assume the persistence of skin and don't bother setting it again...

I'll keep you guys posted and let you know when a fork is ready for testing.

@asmagill
Copy link
Member Author

Closing as this has been addressed in master; reopen if you think I've missed some point that still needs to be addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants