Improve compile time #240

bernhardmgruber · 2021-05-16T13:44:08Z

Many LLAMA constructs depend on compile time iteration of type lists. For very big record dimensions, some constructs become painfully slow.

Consider the following clang timetrace from a LLAMA example using the ROOT HEP framework:

The first part marked with "ROOT" is parsing ROOT's headers, the part marked with "D" is LLAMA's dumping code. The part between the black bars is parsing LLAMA's headers (~30ms). Then comes a very long part "InstantiateFunction", which comes from this code:

llama::forEachLeaf<Event>([&](auto coord) {
    using Name = llama::GetTag<Event, decltype(coord)>;
    using Type = llama::GetType<Event, decltype(coord)>;
    auto column = ntuple->GetView<Type>(llama::structName<Name>());
    for (std::size_t i = 0; i < n; i++)
        view(i)(coord) = column(i);
});

The big issue here is that the Event record dimension is large (~200 fields). The loop created by forEachLeaf is linear in compilation time and not yet the problem, but view(i)(coord) eventually calles into the AoS::blobNrAndOffset, which calles offsetOf<RD, RC>, and the implementation of offsetOf is linear again. Thus, the whole code snipped is quadratic in compilation time.

The situation could be improved a lot by making use of template instantiation memoization. That is, if the value of offsetOf<RD, RC> could somehow depend on the value of offsetOf<RD, RC - 1>, the compiler could reuse memoized previous template instantiations and the code snipped should compile in linear time.

The big difficulty is that record coordinates are hierarchical and RC - 1 is a complex operation. A good solution would be to linearize record coordinates before they are passed to mappings, and mappings deal solely in linear coordinates.

The text was updated successfully, but these errors were encountered:

bernhardmgruber · 2021-05-17T16:26:09Z

#241 improved the situation a lot. However, for a llama::Record with over 1000 fields, clang still takes around 2min to compile. This is a lot better, but still annoying.

bernhardmgruber mentioned this issue May 16, 2021

Various compile time improvements #241

Merged

bernhardmgruber linked a pull request May 17, 2021 that will close this issue

Various compile time improvements #241

Merged

bernhardmgruber mentioned this issue May 17, 2021

More compile time improvements #246

Merged

bernhardmgruber linked a pull request May 17, 2021 that will close this issue

More compile time improvements #246

Merged

bernhardmgruber closed this as completed in #246 May 17, 2021

bernhardmgruber mentioned this issue May 19, 2021

[on hold] RNTuple example #245

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve compile time #240

Improve compile time #240

bernhardmgruber commented May 16, 2021

bernhardmgruber commented May 17, 2021 •

edited

Loading

Improve compile time #240

Improve compile time #240

Comments

bernhardmgruber commented May 16, 2021

bernhardmgruber commented May 17, 2021 • edited Loading

bernhardmgruber commented May 17, 2021 •

edited

Loading