Performance improvements in compartment report mapping handling. by hernando · Pull Request #156 · BlueBrain/Brion

hernando · 2017-05-29T12:39:19Z

No description provided.

dnachbaur · 2017-05-29T14:39:36Z

+                                  report->getGIDs().end());
+// With 8 threads there's some speed up, but very small. We stick with 6
+// to avoid using more than 1 socket.
+#pragma omp parallel for num_threads(6)


what happens if for some reason you have not 6 threads? is omp using less automatically or overcommits here? I guess this observation or benchmark is valid for one specific machine of ours. At least I would mention that.

In the end it's a memory-bound operation (I don't see any computation), hence the speed-up is quite small indeed. I guess the change to indices.resize is probably enough, unless VTune really says threading makes sense here :)

According to msdn, num_threads is the maximum number of threads that will execute the parallel region, unless dynamic adjustment is disabled, in which case the result of overcommiting is implementation defined (it's even allowed to abort :(). This clause also overrides the effect of OMP_NUM_THREADS
Something I've observed for sure, is that due to the NUMA architecture, going to another socket becomes slower (even more than x2) and unpredictable.

I didn't use VTune because I did it remotely and I haven't spend time to learn how to use the command line tools, but now you're an "expert", we can look into it tomorrow together if you want. The only profiling I was using was measuring the time of the whole operation and "sampling" with the debugger.
Without threading it takes x3 times more.

arsenius7 · 2017-05-30T05:52:27Z

@@ -734,8 +744,6 @@ bool CompartmentReportBinary::_parseMapping()
            sectionOffsets[k.first] = k.second.first;


const auto& and better name for 'k'?

arsenius7 · 2017-05-30T05:52:57Z

+        std::sort(sectionsMapping.begin(), sectionsMapping.end());

        // now convert the maps into the desired mapping format
        uint64_ts& sectionOffsets = offsetMapping[idx];


why do you prefer explicit type over & here?

I didn't touch this code.

arsenius7 · 2017-05-30T05:55:37Z


                if (previous != LB_UNDEFINED_UINT16)
-                    sectionsMapping[previous].second = count;
+                    sectionsMapping[sectionsMapping.size() - 2].second.second =


what happens for 1-section reports? ;)

Then this code is never executed. This is already the case of soma reports and the tests pass :)
Note that this line would only be executed if previous = undefined and current != previous. In that case the push back in line 722 has already been executed twice. You could argue that 730 is actually wrong if the cell has no compartments at all, but if that was the case, that would mean there's something reaaaly broken somewhere else.

arsenius7 · 2017-05-30T05:57:46Z

@@ -706,23 +716,23 @@ bool CompartmentReportBinary::_parseMapping()
                const uint64_t frameIndex =
                    j + ((info.dataOffset - _header.dataBlockOffset) /
                         sizeof(float));


is sizeof(float) machine-dependent?

It is, but since all the machines we use are IEEE 745 I guess it's impossible to find something different from 4. In any case, I prefer keeping sizeof(float) for consistency with the rest of code style.

You do realize, of course, that using sizeof(float) for dealing with locally allocated memory and for something created externally is a different story.
If the spec says it's 4 bytes for measurement in the report I would rather see the constant '4'.
But I'm not an expert on machine-dependent float type size to insist on that.

That's true

arsenius7 · 2017-05-30T06:01:38Z

+        perCellOffsets[idx] = info.accumCompartments;

        for (int32_t j = 0; j < info.numCompartments; ++j)
        {


static_cast for current = value?
Why is it downcasted to uint16 from float?

I think uin16_t came before, inherited from the first code used to parse h5 reports. Why the binary report uses floats as the datatype for the mapping it a mystery to me. This can be changed if needed, but not in this PR.

I think the float is used so that "index" is of the same size as "frame" and thus is read with same code (although I do not justify this).
The thing is, by downcasting to uint16 you are risking an overflow, as you mentioned before.
But if the "original" type is float, why restricting to uint16?

Maybe using the same datatype makes sense in reporting lib... It you're really concerned about it, please create a ticket so we don't forget. I'll add an assert for the moment, because changing uint16_t to uint32_t impacts the public API and the wrapping, not just this file.

arsenius7 · 2017-05-30T06:03:08Z

@@ -706,23 +716,23 @@ bool CompartmentReportBinary::_parseMapping()
                const uint64_t frameIndex =
                    j + ((info.dataOffset - _header.dataBlockOffset) /


I would probably add an assert that info.dataOffset >= _header.dataBlockOffset, unless it is guaranteed somewhere else.

It isn't. An assert will still crash in release builds, don't you prefer an exception?

logically, it's an assert since it should not happen, right?

Yes, my point is that to avoid crashing the Python interpreter it's better an exception and maybe you prefer that.

As long as there is a paranoid check that does not affect the performance, I have no preference here.
If you think logic_error is better, it's fine with me.

I added a check some lines up to make this function return an error. That will throw a runtime_error. It's just an if in not the heaviest loop at loop and I haven't noticed any significant difference in performance.
In my opinion this is not a logic_error because it's not the users' fault (as opposed to writing e.g. sqrt(-1)), it could be due to a corrupted file.

arsenius7 · 2017-05-30T06:04:04Z

+#pragma omp parallel for num_threads(6)
+    for (size_t idx = 0; idx < cells.size(); ++idx)
    {
+        CellInfo& info = cells[idx];


arsenius7 · 2017-05-30T06:07:00Z


        // < sectionID, < frameIndex, numCompartments > >
-        typedef std::map<uint16_t, std::pair<uint64_t, uint16_t>>
+        typedef std::vector<std::pair<uint16_t, std::pair<uint64_t, uint16_t>>>


is it really worth a typedef here?

arsenius7 · 2017-05-30T06:11:10Z

@@ -653,6 +663,10 @@ bool CompartmentReportBinary::_parseMapping()

        if (_header.byteswap)


just out of curiosity: is there a way to combine get<> with byteswap if needed?
Could help to avoid funny errors if new get<>s are added for some reason.

Maybe, but get is a free function and whether bytes need swapping is a member variable inside the header, so the change you request requires either making get a member of add an argument, and none of the them is a nice change.

arsenius7 · 2017-05-30T06:13:39Z

+    indices.resize(indicesCount);

-    for (auto gid : report->getGIDs())
+    std::vector<uint32_t> gidList(report->getGIDs().begin(),


why this copy is needed and why is it not const? does it have anything to do with threading?

It could be const, but the copy is needed because the openmp loop needs something that is random accessible and report->getGIDs() returns a std::set.

dnachbaur · 2017-06-02T13:53:15Z

+                                        report->getGIDs().end());
+// With 8 threads there's some speed up, but very small. We stick with 6
+// to avoid using more than 1 socket.
+#pragma omp parallel for num_threads(6)


this shows no effect for me (and vtune)

...so I suggest to remove it

dnachbaur · 2017-06-02T13:53:43Z

-    for (const CellInfo& info : cells)
+// With 8 threads there's some speed up, but very small. We stick with 6
+// to avoid using more than 1 socket.
+#pragma omp parallel for num_threads(6)


add schedule(dynamic). does 800ms to 600ms for me

hernando · 2017-06-06T10:09:22Z

Retest this please

hernando requested review from arsenius7, dnachbaur and mgeplf May 29, 2017 12:39

dnachbaur reviewed May 29, 2017

View reviewed changes

arsenius7 reviewed May 30, 2017

View reviewed changes

Performance improvements in compartment report mapping handling.

c24c32b

hernando force-pushed the mapping_perf branch from fc95c5b to 5433044 Compare June 1, 2017 15:59

hernando pushed a commit that referenced this pull request Jun 1, 2017

CR #156

5433044

hernando force-pushed the mapping_perf branch from 5433044 to 4e4e730 Compare June 1, 2017 16:22

hernando pushed a commit that referenced this pull request Jun 1, 2017

CR #156

4e4e730

dnachbaur approved these changes Jun 2, 2017

View reviewed changes

CR #156

801b9f2

hernando force-pushed the mapping_perf branch from 4e4e730 to 801b9f2 Compare June 2, 2017 14:20

hernando merged commit b5d0fd7 into master Jun 6, 2017

hernando deleted the mapping_perf branch June 6, 2017 11:09

		@@ -734,8 +744,6 @@ bool CompartmentReportBinary::_parseMapping()
		sectionOffsets[k.first] = k.second.first;

		@@ -706,23 +716,23 @@ bool CompartmentReportBinary::_parseMapping()
		const uint64_t frameIndex =
		j + ((info.dataOffset - _header.dataBlockOffset) /

		@@ -653,6 +663,10 @@ bool CompartmentReportBinary::_parseMapping()

		if (_header.byteswap)

Conversation

hernando commented May 29, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hernando Jun 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hernando commented Jun 6, 2017

Uh oh!

Reviewers

Assignees

Labels

hernando Jun 1, 2017 •

edited

Loading