New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long compilation times when registering usertype with lots (>50) of member functions. #126

Closed
eliasdaler opened this Issue Jun 22, 2016 · 29 comments

Comments

Projects
None yet
6 participants
@eliasdaler
Contributor

eliasdaler commented Jun 22, 2016

I have a class which has a lot of member functions. I pass them all in usertype constructor but that probably generates huge template intantiations of some functions. This causes CPP file in VS2015 to compile for 3-4 minutes while consuming 1.5 gigabytes of RAM.

What I need is ability to register functions one by one, not all at once. I realize that this may not be the most efficient way to do this, but compilation times are very important for me, more that runtime performance when registering this stuff.

I propose something like:

usertype<SomeClass> ut("SomeClass");
ut.add_function("someFunc", &SomeClass::someFunc);
...
@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jun 22, 2016

The old system used a few less template instantiations. I'm going to have to spend some time to analyze the templates we instantiate and see if I can slim down / crush a few things. This is going to take me a long time to do, so I apologize in advance: there won't be a quick fix.

@ThePhD ThePhD self-assigned this Jun 22, 2016

@ThePhD ThePhD added this to the Bike.Shed milestone Jun 22, 2016

@eliasdaler

This comment has been minimized.

Contributor

eliasdaler commented Jun 22, 2016

Neat, anything I can help with?

And btw, maybe even if we keep number of template instantuations to minimum, compiler may still have problems with really long types of tuples?
Do you actually need to have types of ALL functions at once at all times? Passing them one by one should achieve the same thing (though it may be slower and less convenient, but at least stuff will compile much faster).

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jun 22, 2016

Maybe. Problem is, if you don't have all the functions/variables in one place -- or a way to refer to them -- then you cannot do things like override the __index metamethod in order to use that to call functions statically and such, because you need to make sure every function/variable can be retrieved through the __index C-Function you registered into Lua.

This is going to take some designing that I'm not going to have the time to brain-power through for the next few weeks, unfortunately.

@nabijaczleweli

This comment has been minimized.

nabijaczleweli commented Jun 22, 2016

lol are you serious
Are you really complaining about compilation times? What work is done at compile-time is not done at runtime, hence better performance, and performance is the thing we, as programmers, are striving for.
Compilation times do not exist, what exists is performance. You compile once and perform forever. You do not want to sacrifice performance for compilation time.

@Nava2

This comment has been minimized.

Contributor

Nava2 commented Jun 23, 2016

Not to be rude, but have you considered that 50 members might be too many? Perhaps if you slimmed down the class it'd help compile-time performance?

@eliasdaler

This comment has been minimized.

Contributor

eliasdaler commented Jun 23, 2016

I have a class which acts like a wrapper around game objects. It contains a pointer to entity and has lots of member functions like setAIState, setAnimation, setAttack, etc. which are not part of Entity class and do pretty complex stuff at times. (they use entity to get needed component and then call function for that or do some error checking which I don't want to expose in Lua).

Yes, I could have used free functions for that, but I find it easier to write someGuy:setAnimation("someAnimation") then setAnimation(someGuy, "someAnimation").
You can argue with that example, but I think there may be other uses of class with lots of member functions.

Compilation times are very important for me because I make a game and speed of iteration depends on compilation times. Having to wait for 3-4 minutes (sometimes even more) each time I make a small change in wrapper function is not something I want to do...
Runtime performance for registration was never a problem with other Lua/C++ libs. It took some milliseconds during game loading time and didn't perform that much worse than free functions.

@eliasdaler

This comment has been minimized.

Contributor

eliasdaler commented Jun 23, 2016

@ThePhD so, if I understand correctly, the problem is that you have to check if user passes only functions or only members when registering usertype so you can use __index metamethod for speeding things up? Or maybe there are some other reasons why you have to use std::tuple?

UPD: looked at how Kaguya handles registration: while addFunction(...) stuff may look verbose and not very cool for you, I think that it will generate a lot less code and will compile much faster. (Not sure about the perfomance, though...)
Are there any big differences in that regard between Kaguya and Sol2?

@Morwenn

This comment has been minimized.

Morwenn commented Jun 23, 2016

If you have problem with long type names, this presentation about Boost.DI offers a way to reduce compile times making the compiler use smaller type names in some cases. Apparently long type names generated by templates (e.g. tuples of dozens of elements) are one of the primary things that slow down compilation. The technique proposed in the presentation works well for Clang but is a bit harder to use in GCC.

Anyway, you can use Louis Dionne's metabench to estimate compile times. You can also read his presentation Metaprogramming for the brave where he presents indices-based tricks to reduce compile times (almost always better than template recursion). If you use these, don't hesitate to use Xeo's O(log n) implementation of std::index_sequence since some compilers only have a slower O(n) implementation (GCC prior to version 6). If you target the most recent Clang, use their implementation though, it's O(1) thanks to compiler intrinsics.

That's pretty much what you can do to reduce compile times: using even more template tricks to reduce the overhead of other template tricks. Good luck.

@eliasdaler

This comment has been minimized.

Contributor

eliasdaler commented Jun 23, 2016

Btw, with LuaBridge I get 1 mb .obj file when I register a huge class (VS 2015 Update 3, that's including all function implementations). With sol2 it's ~22 mb. O:
That shows how much of template stuff is generated, I think.

@nabijaczleweli

This comment has been minimized.

nabijaczleweli commented Jun 23, 2016

The bigger the object file, the better performance.
Also, 22 millibits is semiliterally nothing.

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jun 23, 2016

@Morwenn I didn't know you could actually shorten names at compile-time. That's probably going to be of a big help.

The other thing I need to do is strip the internal make_regs function, which basically lists the tuple twice thanks to having to apply the indices trick twice.

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jun 23, 2016

Unfortunately, we have to support GCC as far back as 4.9, and clang as far back as... was it 3.3 or 3.4? Either way, I'm probably going to come to regret it, but basically I support (Latest version of Visual Studio), and then every version of g++ / clang++ that can match that. It's actually the source of some dumb #ifdef's in the source.

@Ung0d

This comment has been minimized.

Ung0d commented Jun 25, 2016

What I want to add here is that code, where I create a new_usertype with (in my case) > 28 member functions registered, wont compile until I activate -ftemplate-depth=1024 compiler flag. So the current implementation of new_usertype seems to easily cap the default template-depth (of 900 I think). Anyone else experiencing that?

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jun 27, 2016

I experienced that with older compilers @Ung0d . I'm going to be pushing an update that should vastly reduce the number of template instantiations done, but I'm not exactly sure if it'll save you.

For what it's worth, later versions of Clang and g++ up the natural limit to 1024 by default (they anticipate more code using templates).

ThePhD added a commit that referenced this issue Jun 27, 2016

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jun 27, 2016

@eliasd Can you give the latest commit a try? I tried some new template techniques that didn't involve deep recursion, and you might get some better compile times / object file sizes.

Note that the single-header-file is also included in the repository now, at https://github.com/ThePhD/sol2/blob/develop/single/sol/sol.hpp

@Ung0d

This comment has been minimized.

Ung0d commented Jun 27, 2016

I'm using gcc 5.3, wouldn't call that old.

About the latest commit: I dont need -ftemplate-depth=1024 anymore now so there is definitely a difference. Cant really tell something about compile time, it wasnt that slow before on my machine. Just wondered about the required flag. Its gone now.

@eliasdaler

This comment has been minimized.

Contributor

eliasdaler commented Jun 28, 2016

@ThePhD going to check it out soon. :)

And btw, is it still not possible to provide interface for adding member functions one by one?
Suppose I register class SomeClass:

class SomeClass {
    void f(int);
};

// Args = string, void SomeClass::*(int)
sol::new_usertype<SomeClass>("SomeClass", "f", &SomeClass::f);

but then I add new member function int SomeClass::f2(int) and want to register it:

// Args = string, void SomeClass::*(int), string, int SomeClass::*(int)
sol::new_usertype<SomeClass>("SomeClass", "f", &SomeClass::f, "f2", &SomeClass::f2);

Now template instantiation of new_usertype function has to recompile including all the stuff it shares it's variadic template arguments with. That's because Args argument pack has changed.

If it's possible to add one function per function call then there'd be less template stuff to generate (no stuff to generate at all if I add member function with the same signature as the one that was already added).

Something like this:

auto usertype = sol::new_usertype<SomeClass>("SomeClass");
usertype.addFunction("f", &SomeClass::f);
userType.addFunction("f2", &SomeClass::f2);
userType.addFunction("f3", &SomeClass::f3); 
// if decltype(f3) == void SomeClass::*(int), then there'll be no new stuff for compiler 
// to genenerate as first call already instantiates template function with the same type

If I understand correctly, users need to add all member functions / properties at once for you to be able to check if user added only properties or only member functions to optimize stuff with __index meta-method stuff. Yep, it's hard to make interface which will be able to do this if you let users add functions in several calls, but I think there's something that we can come up with.
Is there any other stuff I'm missing?

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jun 28, 2016

There is no way to make add_function work with the current inferface, without either reworking the entire internals of usertype (for the third time) or adding additional runtime overhead penalty. add_function is a runtime expression, and thusly would require us to concatenate its types to the tuple usertype currently represents in your example and then return a new usertype with the additional information appended to the internal tuple.

The other way would be to accept the runtime penalty (how we were doing it before) and simply take the performance hit for performing type erasure (e.g., behind a virtual function call). However, that would essentially be reverting to the older way of doing metatables, which I specifically re-engineered for the speed.

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jun 28, 2016

If you do not register any member variables, you may be able to get away with something like we do in the tests here. If you notice on line 753 and below, we pull out the metatable and manipulate things directly. If you like, you could potentially add items directly into the metatable: however, if the __index metamethod is overriden (.e.g, when you add variables to the mix), then this technique won't exactly work.

@eliasdaler

This comment has been minimized.

Contributor

eliasdaler commented Jun 28, 2016

Tried this out, it worked, awesome! (Yeah, I'm still going to test how it perform with lots of member variables, but a bit later, sorry)

Am I doing this right?

class Test {
public:
    void f() { ... }
    void g() { ... }
private:
};

...

lua.new_usertype<Test>("Test", sol::constructors<sol::types<>>());
sol::table test_table = lua["Test"];
test_table["f"] = &Test::f;
test_table["g"] = &Test::g;
// or just test_table.set("f", &Test::f, "g", &Test::g);
// hopefully it won't generate as much template stuff as new_usertype. :)

But still, I think that run time overhead when registering stuff is not that important because it happpens so quick no matter what method you use (need to benchmark this to be sure).
I completely understand the complexity of the situation and don't want to demand too much, but that was the only reason at this moment which was stopping me from using sol2 in my main project. (anyway, it looks like using meta-table approach is going to solve my problem).

Maybe it's worth mentioning this technique somewhere in the docs, btw. It's not very safe, but there are some things in sol (like sol::function without error checking) which are not safe and okay to use if you know what you do. :)

P. S. Can't properties be added if I override __index of __index? :s
Something like that:

someTable = {}
prop_mt = { property = 5 }
f_mt = { f = function() print("test") end }
setmetatable(f_mt, { __index = prop_mt})
setmetatable(someTable, { __index = f_mt })
someTable.f()
print(someTable.property)

(or does this suck?)

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jun 29, 2016

The method you speak of doesn't work in Lua or with the Lua C API. In many cases, we need to get a reference to "this" for properties (e.g., non-static member variables). If you put an index of an index, the first argument isn't the userdata (the "this" pointer) anymore, it's the table that lookup failed on (in your example, it would give us f_mt as the "this" object, not someTable).

In our previous version, we already took an unacceptable amount of overhead compared to the theoretical maximum obtained by a heavily templated, macro-laden Lua binding called OOLua, as well as compared to luwra. While we have more features than both of those frameworks combined, we were not the fastest in all categories. The recent improvements detailed here helped change that.

@eliasdaler

This comment has been minimized.

Contributor

eliasdaler commented Jun 29, 2016

Oh, alright, guess situation is a lot more difficult, than I've thought.
What kinda sucks is that OOLua probably has pretty fast compilation times. Macros sucks a lot, but sometimes they're much faster to do something what you have to do with lots of templates (not advocating for templates, ha-ha)

Btw, am I missing final performance when I'm registering member functions through metatables or this approach alright?

I've tried to test the speed of compilation with new version of sol2, but compiler runs out of available heap space, even though I have 2 GB of RAM available at the time of compilation.

UPD: tried registering class with 78 member functions with the method you've previously shown: it takes around 0.1 - 0.2 ms which is pretty fast. Couldn't do it with sol::table::set though (VS says that generated types are too long, this is probably one of the reasons new_usertype fails).

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jun 29, 2016

You might be taking a performance hit (serialization and deserialization of member function pointers through Lua can either cost nothing or cost a lot: the variance is pretty high), and you're definitely taking a storage size hit (there's registration overhead per-function now, instead of just once for all functions). It probably won't be big enough to matter.

@ThePhD ThePhD closed this in 09ee4db Jun 29, 2016

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jul 7, 2016

Note that we have made a more official version for this @eliasdaler, called "simple_usertype", which can be registered using the function lua.new_simple_usertype( ... );. All variable calls are turned into their function equivalents, among other things. Still working to write the docs for it, but it should get you the quick compile time speeds you want without too much (if any) runtime penalty.

@eliasdaler

This comment has been minimized.

Contributor

eliasdaler commented Jul 7, 2016

Thanks so much for this! Now it doesn't feel like a hack over existing system. :D
Will totally use it for heavier types! :)

@eliasdaler

This comment has been minimized.

Contributor

eliasdaler commented Jul 8, 2016

Btw, am I doing something wrong here? This doesn't work, Lua says that Test.new is nil

#include <sol.hpp>
#include <iostream>

struct Test {
    Test(int x) : x(x) {}
    void f() { std::cout << "Bark" << std::endl; }
    void g() { std::cout << "Bork" << std::endl; }
    int x;
};

int main()
{
    sol::state lua;
    lua.new_simple_usertype<Test>("Test",
        sol::constructors<sol::types<int>>(),
        "f", &Test::f,
        "g", &Test::g);
    try {
        lua.script("local t=Test.new(42); t:f()");
    }
    catch (sol::error& e) {
        std::cout << e.what() << std::endl;
    }
}
@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jul 8, 2016

I herped the derp. Fixing now.

@ThePhD

This comment has been minimized.

Owner

ThePhD commented Jul 9, 2016

Should be all fixed now, with some tests to make sure I don't make the same mistake.

@eliasdaler

This comment has been minimized.

Contributor

eliasdaler commented Jul 9, 2016

Nice, thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment