Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't let Lua code cause panics or aborts #38

Closed
5 tasks done
jonas-schievink opened this issue Aug 12, 2017 · 17 comments
Closed
5 tasks done

Don't let Lua code cause panics or aborts #38

jonas-schievink opened this issue Aug 12, 2017 · 17 comments

Comments

@jonas-schievink
Copy link
Contributor

jonas-schievink commented Aug 12, 2017

This is a tracking issue for the remaining cases where Lua code can cause a Rust panic or a direct abort. This is generally unwanted, because untrusted Lua code should not be able to abort the Rust program. Although panics can be caught on the Rust-side, it is intended that all errors caused by Lua are propagated using LuaResult.

Note that fixing these issues will still not make it advisable to run arbitrary untrusted Lua code. Proper sandboxing support and execution limits need to be exposed first.

  • Accessing resurrected UserData directly causes a panic. Instead, a Lua error should be raised. This error can be caught by Lua (this isn't very useful, but also not a big problem) or will propagate to the Rust code that caused the Lua code to run.
  • Exceeding the stack size with Rust callbacks will lead either to an assertion failure (which is an abort) in Lua or to a panic. rlua should automatically provide and enforce a recursion limit, raising a Lua error when hit.
  • According to the readme: There are currently no checks on argument sizes. An abort can be caused by passing too large Variadics or MultiValues. When Rust calls a function with too many arguments, an Error should be directly returned. When Lua calls a Rust callback with too many arguments, the Lua runtime should already handle this. However, a Rust callback can also return too many values (Lua "only" guarantees that at least 1000 values can be returned). This can be handled by raising a Lua error. Alternatively, we can define this as a bug on the Rust side and cause a panic. This can lead to subtle bugs, however, when the number of values depends on the inputs from Lua or is otherwise unclear.
  • Errors in __gc cause an abort. Raising a Lua error from here is unsafe, so it is caught and the program is aborted. This might be a reasonable default behaviour in some cases, but it would be nice if a user-defined callback could be provided.
  • Allocation failure causes an abort. While this matches Rust's behaviour, a way to limit the amount of memory used by Lua code is essential to be able to run untrusted code.

I hope these are all cases. If not, feel free to comment/edit, of course.

@kyren
Copy link
Contributor

kyren commented Aug 13, 2017

This is a good overview of the remaining issues, and as far as I'm aware I think you've mentioned everything left.

The first three are pretty straightforward I think. I think, for the case of a rust callback returning too many values, raising a lua error is the right approach there.

The fourth and fifth... I'm not exactly sure what the best strategy is. I suppose for a __gc error, you could optionally provide a callback like you said, and one user strategy would be to simply log the error and ignore it. If that callback panics though, I think back on the rust side the only thing that rlua could do would be to abort at that point, because it can't propagate the panic. Maybe that's perfectly fine? If that strategy seems workable to you, I think that one is also not so bad.

The point I'm most unsure about is the fifth one. I'm perfectly happy to allow custom allocators, or a pooled allocator with a memory limit, or heck I guess I could even just keep track of allocated / freed memory using the standard allocator. The thing I'm unsure about is what do I do once the allocation limit is hit? I can't panic, because it's not panic safe unless I make every call to an 'm' function be within a pcall, and we already have decided that's ridiculous and we're avoiding that at all costs. I could.. abort, but it would abort anyway right? I could call a callback, but all that callback can do is set a flag. I SUPPOSE what I could do is set a flag that the memory limit has been reached, and then at every point where control would otherwise return to lua, check that flag and instead trigger a lua memory error (which I would also need to make un-catchable). Finally, everywhere control would pass back to rust, I would also need to pass back some kind of memory error, but there are APIs that currently do not return Result that also might allocate. Of all the points, this one is the one that seems hardest to come up with a plan of attack for, and I'm very interested in your input on it.

@jonas-schievink
Copy link
Contributor Author

Oh, right, I'm not sure what can be done if the memory limit is hit.

I SUPPOSE what I could do is set a flag that the memory limit has been reached, and then at every point where control would otherwise return to lua, check that flag and instead trigger a lua memory error (which I would also need to make un-catchable).

Seems like this is the best option? Or perhaps the only one, to be precise... It seems hard to get right because we'd need to do this at a lot of places, but I don't see any other option.

@kyren
Copy link
Contributor

kyren commented Aug 17, 2017

Seems like this is the best option? Or perhaps the only one, to be precise... It seems hard to get right because we'd need to do this at a lot of places, but I don't see any other option.

That would also require EVERY function to return Result, are we sure that's worth it? Maybe it is, I mean quite a lot of them already do..

@jonas-schievink
Copy link
Contributor Author

I think it would be sufficient if we only did the check when control is returned back to Lua (so that Rust code can allocate all it wants), and then raise an uncatchable error. That way only API functions that may invoke Lua code will report OOM errors, which will not change the API much.

Of course, this is a bit of a footgun, but I see no other way to implement this without cluttering the entire API with Results that are only ever useful when you're doing pretty specific things.

@kyren
Copy link
Contributor

kyren commented Aug 17, 2017

Right.. okay that's not TOO bad, but it's a very subtle behavior. It IS however exactly what you would want if you were trying to prevent scripts from being able to cause DOS attacks, so in that sense it seems like reasonable behavior. I like the rule that an out of memory error becomes an error only when control would otherwise return to a lua script, or when it occurs directly from lua itself.

@naturallymitchell
Copy link

Could safer-lua-api help?

@jonas-schievink
Copy link
Contributor Author

@mitchtbaum Looks interesting, it would probably fix #21 too

@kyren
Copy link
Contributor

kyren commented Aug 30, 2017

That's really interesting, we actually discussed doing the technique that safer-lua-api uses before, using thread local storage to store the arguments to functions before using lua_pcall.

You know, looking through safer-lua-api, I guess there are less 'm' functions in the lua api than I thought, so maybe I need to sit and think through this again about what the best approach is. I really like that safer-lua-api exists, and I do want to use it, but I need to think about whether I want another C depencency. If I did use it, I think I would definitely import it into the project in its generated C form rather than invoking lua or cmake during the build procedure.

In any case, thank you for the suggestion!

@kyren
Copy link
Contributor

kyren commented Oct 14, 2017

Just to add to this list, we also need to make recursive callbacks an error rather than a panic.

@psychon
Copy link

psychon commented Nov 10, 2017

I want to add to the list some more:

  • 'function f() end t={__gc=f} print(t.__gc) setmetatable({}, t) print(t.__gc) t.__gc()' (calls the __gc-metamethod that rlua created internally without the right arguments)
  • function f() end t={__gc=f} setmetatable({}, t) t.__gc = error() t=nil collectgarbage("collect")' (gets rid of the wrapper around __gc that rlua created internally, allowing __gc to fail without rlua noticing)

Basically, the whole magic you do with safe_setmetatable and __gc does not work since metatables can be modified even after they were set.

@jonas-schievink
Copy link
Contributor Author

The second example is pretty concerning (possible unsafety), I didn't know you could change __gc after it's set (but it makes sense that this works).

@psychon
Copy link

psychon commented Nov 10, 2017

Non-helpful comment ahead:
How about just documenting that having Lua mess with __gc is unsafe? You could even enforce this by having your current safe_setmetatable complain when a __gc metamethod is present (and e.g. unset it before the call to lua_setmetatable and reset it afterwards).

@jonas-schievink
Copy link
Contributor Author

There's no precedent for declaring something you can do in regular Lua code as unsafe in rlua - it would be a bit like "giving up", so it would be nice to get this working safely, somehow ;)

(we did previously ban the debug library, but it's well known that it can cause unsafe stuff, and it's localized to a certain part of the library only, not something inherent like __gc)

@kyren
Copy link
Contributor

kyren commented Dec 4, 2017

I have made a ton of changes in preparation for 0.10, I believe at least cases 1) and 4) are currently solved.. maybe.. hopefully.

The whole API should now be 'm' function safe, and there are 3 new error variants for gc errors, recursive callback errors, and expired userdata errors.

I'm looking for feedback before actually releasing 0.10, and to just let it sit a little bit to let you guys take a look at it. Also, I'd like to get it integrated into spellbound and make sure it doesn't cause any problems, because spellbound makes one hell of an integration test.

@jonas-schievink
Copy link
Contributor Author

@kyren Nice work! I've converted the list to a checklist, feel free to tick off items that are done. I'm relatively busy with university stuff so I won't have much time to check out all the changes, unfortunately.

@kyren
Copy link
Contributor

kyren commented Feb 10, 2018

I've (maybe) fixed bullet points 2 and 3 with d331e4b. I think the way Lua works internally is a bit different than I thought, because I didn't do anything explicit to fix 2, and fixing 3 simply involved checking for lua_checkstack failure.

I've added tests for everything I can think of, but it's possible I've missed something, or am missing a way that the fixes are not complete.

@kyren
Copy link
Contributor

kyren commented Jan 16, 2019

The worst thing about this bug is that I think it may have actually been too easy

@kyren kyren closed this as completed Jan 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants