Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear the data buffer after constructing the Objects in a collection #129

Merged
merged 1 commit into from
Aug 28, 2020

Conversation

tmadlener
Copy link
Collaborator

BEGINRELEASENOTES

  • Reduce memory footprint by clearing intermediately used I/O buffers.

ENDRELEASENOTES

Since the data are copied into the Objs from the m_data buffer that is
filled by I/O there is no use to keep the second copy of the data around
any longer.

It would probably be possible to slightly change the layout of the Obj classes from using a Data data member (as value) to a Data* data member which could then point directly into the I/O buffer of the collections. We would then not need the copy of the data that is currently done in prepareAfterRead. Keeping track of the data would then be a bit more work as it is at the moment, but to the actual user classes this should be completely transparent. However, I am not actually sure how this would impact performance. Both for I/O and for later usage.

In any case this small fix here at least cleans up the I/O buffer after all its contents have been copied into the actual Objs, which should improve the memory footprint. The factor by which it is improved depends on how many different relations a given datatype has.

Since the data are copied into the Objs from the m_data buffer that is
filled by I/O there is no use to keep the second copy of the data around
any longer.
@hegner
Copy link
Collaborator

hegner commented Aug 27, 2020

Hi Thomas!
Thanks. There are two reasons for the copy. One is that you do not want to have that additional level of pointer indirection every time you want to access the data (data locality). And the other one is to keep the I/O buffers separated from the event data. That is important for multithreading as the I/O buffers may be used again while an event is being processed.

@tmadlener
Copy link
Collaborator Author

Hi Benedikt,

One is that you do not want to have that additional level of pointer indirection every time you want to access the data (data locality).

Yes that makes sense. This might also be a bit use case dependent, but it would be interesting to see whether having that additional pointer indirection could be worth the fact that we could then store all the Data blobs contiguously in memory. E.g. for cases where we only want to loop over a given collection and are only interested in the member data but not the relations.

In the current implementation the actually used objects are stored in a std::deque<Obj*> and to me it is not entirely obvious how contiguous the actual Obj are in memory in the end, since they are created on the heap in prepareAfterRead. From a brief look with gdb it seems like they are pretty much contiguous (at least in the tests), but as far as I know there is no guarantee that this always happens and in principle it is possible that the Objs (and the Data therein) are scattered around all over in memory, while only the Obj* in the std::deque are "more or less" contiguously stored. In that sense it could be beneficial to have all the Data blobs contiguous in one buffer.

In the end we would a "realistic" benchmark case (or better yet, multiple ones), to see if the potential (but maybe unlikely?) scattering of the Obj in memory implies a higher cost than an additional pointer indirection.


And the other one is to keep the I/O buffers separated from the event data. That is important for multithreading as the I/O buffers may be used again while an event is being processed.

Good point. However, as far as I can see this is not done in the current ROOTReader implementation, right? That will always allocate a new Collection and then associate the I/O buffer with the corresponding data branch in the input file and then hand off ownership to the EventStore that requested it. On the ROOTWriter side it is true that we currently re-use the I/O buffers for each event, but also that should be manageable in principle, since we could reset the buffer address before writing of each event. From the looks of it, this is what is currently done in K4FWCore.

@hegner
Copy link
Collaborator

hegner commented Aug 28, 2020

Yes, benchmarks would be good.
What concerns the multithreading - indeed that is not existing, but I made the overall layout with a particular design in mind. As I was absent from the project for more than a year so I could be a bit out of date of what is still left.
In any case - we should move that to a dedicated thread.

@hegner hegner merged commit 6a1e506 into AIDASoft:master Aug 28, 2020
@tmadlener tmadlener deleted the clear-io-buffer branch September 7, 2020 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants