You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many LLAMA constructs depend on compile time iteration of type lists. For very big record dimensions, some constructs become painfully slow.
Consider the following clang timetrace from a LLAMA example using the ROOT HEP framework:
The first part marked with "ROOT" is parsing ROOT's headers, the part marked with "D" is LLAMA's dumping code. The part between the black bars is parsing LLAMA's headers (~30ms). Then comes a very long part "InstantiateFunction", which comes from this code:
llama::forEachLeaf<Event>([&](auto coord) {
using Name = llama::GetTag<Event, decltype(coord)>;
using Type = llama::GetType<Event, decltype(coord)>;
auto column = ntuple->GetView<Type>(llama::structName<Name>());
for (std::size_t i = 0; i < n; i++)
view(i)(coord) = column(i);
});
The big issue here is that the Event record dimension is large (~200 fields). The loop created by forEachLeaf is linear in compilation time and not yet the problem, but view(i)(coord) eventually calles into the AoS::blobNrAndOffset, which calles offsetOf<RD, RC>, and the implementation of offsetOf is linear again. Thus, the whole code snipped is quadratic in compilation time.
The situation could be improved a lot by making use of template instantiation memoization. That is, if the value of offsetOf<RD, RC> could somehow depend on the value of offsetOf<RD, RC - 1>, the compiler could reuse memoized previous template instantiations and the code snipped should compile in linear time.
The big difficulty is that record coordinates are hierarchical and RC - 1 is a complex operation. A good solution would be to linearize record coordinates before they are passed to mappings, and mappings deal solely in linear coordinates.
The text was updated successfully, but these errors were encountered:
#241 improved the situation a lot. However, for a llama::Record with over 1000 fields, clang still takes around 2min to compile. This is a lot better, but still annoying.
Many LLAMA constructs depend on compile time iteration of type lists. For very big record dimensions, some constructs become painfully slow.
Consider the following clang timetrace from a LLAMA example using the ROOT HEP framework:
![image](https://user-images.githubusercontent.com/1224051/118399157-1b86d600-b65c-11eb-8b25-692a9eeea5f8.png)
The first part marked with "ROOT" is parsing ROOT's headers, the part marked with "D" is LLAMA's dumping code. The part between the black bars is parsing LLAMA's headers (~30ms). Then comes a very long part "InstantiateFunction", which comes from this code:
The big issue here is that the
Event
record dimension is large (~200 fields). The loop created byforEachLeaf
is linear in compilation time and not yet the problem, butview(i)(coord)
eventually calles into theAoS::blobNrAndOffset
, which callesoffsetOf<RD, RC>
, and the implementation ofoffsetOf
is linear again. Thus, the whole code snipped is quadratic in compilation time.The situation could be improved a lot by making use of template instantiation memoization. That is, if the value of
offsetOf<RD, RC>
could somehow depend on the value ofoffsetOf<RD, RC - 1>
, the compiler could reuse memoized previous template instantiations and the code snipped should compile in linear time.The big difficulty is that record coordinates are hierarchical and
RC - 1
is a complex operation. A good solution would be to linearize record coordinates before they are passed to mappings, and mappings deal solely in linear coordinates.The text was updated successfully, but these errors were encountered: