Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compile time #240

Closed
bernhardmgruber opened this issue May 16, 2021 · 1 comment · Fixed by #241 or #246
Closed

Improve compile time #240

bernhardmgruber opened this issue May 16, 2021 · 1 comment · Fixed by #241 or #246

Comments

@bernhardmgruber
Copy link
Member

Many LLAMA constructs depend on compile time iteration of type lists. For very big record dimensions, some constructs become painfully slow.

Consider the following clang timetrace from a LLAMA example using the ROOT HEP framework:
image
The first part marked with "ROOT" is parsing ROOT's headers, the part marked with "D" is LLAMA's dumping code. The part between the black bars is parsing LLAMA's headers (~30ms). Then comes a very long part "InstantiateFunction", which comes from this code:

llama::forEachLeaf<Event>([&](auto coord) {
    using Name = llama::GetTag<Event, decltype(coord)>;
    using Type = llama::GetType<Event, decltype(coord)>;
    auto column = ntuple->GetView<Type>(llama::structName<Name>());
    for (std::size_t i = 0; i < n; i++)
        view(i)(coord) = column(i);
});

The big issue here is that the Event record dimension is large (~200 fields). The loop created by forEachLeaf is linear in compilation time and not yet the problem, but view(i)(coord) eventually calles into the AoS::blobNrAndOffset, which calles offsetOf<RD, RC>, and the implementation of offsetOf is linear again. Thus, the whole code snipped is quadratic in compilation time.

The situation could be improved a lot by making use of template instantiation memoization. That is, if the value of offsetOf<RD, RC> could somehow depend on the value of offsetOf<RD, RC - 1>, the compiler could reuse memoized previous template instantiations and the code snipped should compile in linear time.

The big difficulty is that record coordinates are hierarchical and RC - 1 is a complex operation. A good solution would be to linearize record coordinates before they are passed to mappings, and mappings deal solely in linear coordinates.

@bernhardmgruber
Copy link
Member Author

bernhardmgruber commented May 17, 2021

#241 improved the situation a lot. However, for a llama::Record with over 1000 fields, clang still takes around 2min to compile. This is a lot better, but still annoying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant