API for optimized access order by 1uc · Pull Request #453 · BlueBrain/MorphIO

1uc · 2023-05-22T08:06:40Z

This MR proposes an API which will enable us to optimize morphology loading for certain reorderable loops. The target are loops which know upfront the names of the morphologies they need to load, and don't require them to be loaded in a particular order.

The types of optimization this unlocks are:

Reordering the loop to reduces large strides in the file.
Loading morphologies in (small) batches.

This commit introduces two helper classes that reduce the verbosity of `std::enable_if<...>` constructs.

matz-e

Looks good to me. load_unordered seems a fitting name, I guess something like load_optimized or just bulk_load would be obfuscating the biggest caveat (not loading the list as specified).

1uc · 2023-05-23T11:30:01Z

Back to draft, because the internals can be simplified.

The point of this API is to enable collections to load morphologies in any order they see fit. For HDF5 containers tests have shown that the access pattern matters a lot in terms of performance. The idea is a generic API that allows optimizing the iteration order for reorderable loops, e.g., for k, morph_name in enumerate(morphology_names): morph = collection.load(morph_name) f(k, morph) can be replaced with for k, morph in collection.load_unordered(morphology_names): assert collection.load(morphology_names[k]) == morph f(k, morph) This commit only adds the minimum required for this to work. In particular, it doesn't optimize the access pattern.

This commit implement the optimized access order for merged containers.

1uc · 2023-05-23T13:23:22Z

This PR now also includes an optimized version for merged containers.

mgeplf · 2023-05-24T12:11:43Z

+        template <class U = M>
+        typename enable_if_mutable<U, std::pair<size_t, M>>::type operator*() const;
+
+        void operator++() const;


prob. not important, but I tend to like the symmetry of having operator++(int)

Okay, then we'd probably better also change the return type to const Iterator& and Iterator (or whatever the proper return types are), otherwise the distinction makes little sense.

We've had to redo the LoadUnordered<M>::Iterator, it had multiple bugs related to unintended shallow copying. Tests have been added to check for this type of issue.

Also this restructures the file such that we declare in the header, then in the source file first we define; and then explicitly instantiate.

1uc · 2023-05-30T13:49:29Z

Should we allow random access? This would matter in loops which look like this:

std::vector<std::string> morphology_names = ...;
auto load_unordered = collection.load_unordered(morphology_names);

// Add some mechanism which doesn't guarantee the order in which
// chunks are accessed. For example
#pragma omp parallel for
for(size_t chunk = 0; chunk < n_chunks; ++chunk) {
    size_t offset = chunk*chunk_size;
    process_chunk(chunk_size, load_unordered.begin()+offset);
}

void process_chunk(size_t chunk_size, LoadUnordered<Morphology>::Iterator it) {
    for(size_t i = 0; i < chunk_size; ++i) {
        auto [k, morph] = *(it++);
        // computations  
    }
}

But technically the above doesn't traverse the morphologies in the optimal order. However, if we make the iterator shared, then we have more thread-safety concerns.

What would be more useful in TD would be access to the internal argsort:

auto morphology_names = ...;
auto collection = ...;

auto access_order = morphio::argsort(collection, morphology_names);
for(size_t i : access_order) {
    auto morph_name = morphology_names[i];
    process(morph_name);
}

matz-e · 2023-05-30T14:12:34Z

Seams reasonable? Although, for "user-friendliness", maybe just sorting the morphology names without a round-robin trip through indices on the user side would be nicer?

1uc · 2023-05-30T14:31:26Z

Sorting the morphology names directly loses valuable information, e.g. it prevents one from looking up auxiliary information if things are stored in vectors of equal size, with the convention that index i can be used to retrieve data for that morphology, e.g. morphology_names[i], metadata[i], etc.

More concretely in TD we have the case that we store stuff in a big std::vector<MetaData> then we need to fish stuff out of it by passing the indices of the morphologies we want to load, which then grabs the morphology name and uses (a wrapper of) morphio::Collection::load to load the morphology. I don't see us injecting the iterator into that setup easily, because the step of selecting (which requires the optimized order) and loading the morphology are separated by several function calls in TD. However, they always happen together with the iterator approach. Note that knowing only the order of the names of the morphologies to be loaded is insufficient (or inefficient) in TD.

This commit allows users to perform an argsort of the morphology names they want to load. This can be useful in situations where it's hard to pass the iterator, e.g. through multiple layers of function calls.

1uc added 3 commits May 22, 2023 10:36

Copy-ctor takes a const-reference.

155a9dc

Add enable_if_{im,}mutable.

f775ea7

This commit introduces two helper classes that reduce the verbosity of `std::enable_if<...>` constructs.

Refactor unit-tests to use parametrize.

b269f97

1uc force-pushed the 1uc/optimized-access-order-api branch from 280d929 to 83bcfb1 Compare May 22, 2023 08:37

1uc marked this pull request as ready for review May 22, 2023 09:43

matz-e reviewed May 22, 2023

View reviewed changes

1uc marked this pull request as draft May 23, 2023 11:29

1uc added 2 commits May 23, 2023 15:19

Implement optimized access order.

1891e88

This commit implement the optimized access order for merged containers.

1uc force-pushed the 1uc/optimized-access-order-api branch from ab7c787 to 1891e88 Compare May 23, 2023 13:19

1uc marked this pull request as ready for review May 23, 2023 14:11

mgeplf reviewed May 24, 2023

View reviewed changes

1uc added 2 commits May 24, 2023 17:27

Fix iterators.

1958110

Appease missing definition errors.

325b365

Also this restructures the file such that we declare in the header, then in the source file first we define; and then explicitly instantiate.

1uc force-pushed the 1uc/optimized-access-order-api branch 3 times, most recently from 527fc9d to d2cd472 Compare May 31, 2023 09:11

Access to the internal argsort of access order.

83b91f3

This commit allows users to perform an argsort of the morphology names they want to load. This can be useful in situations where it's hard to pass the iterator, e.g. through multiple layers of function calls.

1uc force-pushed the 1uc/optimized-access-order-api branch from d2cd472 to 83b91f3 Compare May 31, 2023 09:30

Mark 'experimental' features as such.

89a76ea

1uc force-pushed the 1uc/optimized-access-order-api branch from 78def01 to 89a76ea Compare May 31, 2023 11:17

mgeplf approved these changes May 31, 2023

View reviewed changes

mgeplf merged commit d4b026e into master May 31, 2023

mgeplf deleted the 1uc/optimized-access-order-api branch May 31, 2023 12:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API for optimized access order#453

API for optimized access order#453
mgeplf merged 9 commits intomasterfrom
1uc/optimized-access-order-api

1uc commented May 22, 2023

Uh oh!

matz-e left a comment

Uh oh!

1uc commented May 23, 2023

Uh oh!

1uc commented May 23, 2023

Uh oh!

Uh oh!

Uh oh!

mgeplf May 24, 2023

Uh oh!

1uc May 24, 2023

Uh oh!

1uc May 25, 2023

Uh oh!

1uc commented May 30, 2023

Uh oh!

matz-e commented May 30, 2023

Uh oh!

1uc commented May 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

1uc commented May 22, 2023

Uh oh!

matz-e left a comment

Choose a reason for hiding this comment

Uh oh!

1uc commented May 23, 2023

Uh oh!

1uc commented May 23, 2023

Uh oh!

Uh oh!

Uh oh!

mgeplf May 24, 2023

Choose a reason for hiding this comment

Uh oh!

1uc May 24, 2023

Choose a reason for hiding this comment

Uh oh!

1uc May 25, 2023

Choose a reason for hiding this comment

Uh oh!

1uc commented May 30, 2023

Uh oh!

matz-e commented May 30, 2023

Uh oh!

1uc commented May 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants