Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SONATA compartment report performance improvements #239

Merged
merged 5 commits into from
Aug 24, 2018

Conversation

hernando
Copy link
Contributor

No description provided.

doc/Changelog.md Outdated
- Added two query options to configure chunking and chunk cache size when
opening/creating SONATA compartment reports.
- Added a new overload of brion::CompartmentReport::write to allow
optimizations for full frames writes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full frame(s) writes.

@@ -51,6 +51,11 @@ std::vector<hsize_t> _computeChunkDims(const std::vector<uint32_t>& cellSizes,
float cellsToFramesRatio,
size_t blockSize = _blockSize)
{
if (cellsToFramesRatio == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write 0.f since this is float comparison?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also check for negative input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typically use integers and rely on the compiler to do the right thing. But it may be a bad practice on my side.
I've added the check in the parsing code, so this function will never get a negative number. Do you want an assert?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine, you can leave it as is.

@@ -89,6 +98,38 @@ bool _verifyFile(const HighFive::File& file)
return true;
}

size_t _parseCacheSizeOption(const URI& uri)
{
const auto i = uri.findQuery("cache_size");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does i stand for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an iterator to a std::pair<std::string, std::string> pair. I could call it keyValueIter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it, iter or keyValueIter is fine.

else
size = 0;
}
if (size == 0 and pos != value.size())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be enough to only check size == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because size == 0 is a valid input, it's used for disabling the cache (which may be beneficial in some situations)

@@ -116,7 +158,9 @@ CompartmentReportHDF5::CompartmentReportHDF5(
_parseBasicCellInfo();
_subset = false;
}
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add an else clause and remove the return above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here, but Stefan favored this coding style and it's used thorough all the library. If I do what you request, Daniel you suggest exactly the opposite.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, lets keep it.

// a full frame. Note that the maximum cache size allowed by this number
// and the actual maximum cache size are independent.
// To avoid collisions reading traces we also need a number of slots s,
// such as blocks[1] ∤ s. At the same time we want blocks[0] ∤ s.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write "blocks[1] does not divide by s" to avoid using utf-8 symbols.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a concern having utf-8 symbols in 2018?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually avoid it but I might be living in the past. Do as you please :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a comment, so worst case it would look wrong.

" [file://]/path/to/report.(h5|hdf5)";
" "
"[file://]/path/to/"
"report.(h5|hdf5)[?[cache_size=(auto|num_bytes):][cell_to_frames="
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code uses cellS_to_frames

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe try to put spaces differently to fix this horrendous clang formatting here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It got formatted that way because I had a single string. If I split it by hand, clang-format won't break it.


const auto& value = i->second;
if (value == "auto")
return std::numeric_limits<size_t>::max();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know too much about the intrinsics here, but this is more like 'max' cache size. Shall it not be depend on the size of the files or number of rows/frames/cells? 'auto' would imply something 'smart' to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something smart, but I found a bit annoying to document it in the description. Auto means allocating space for a whole frame or trace, whichever is bigger. It's not really max, std::numeric_limits<size_t>::max() is used as a magic value to identify this request in the client code. To make it more semantic I've replaced it with a constant.

const double timestamp)
{
if (gids.empty())
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is more like 'false' to me intuitively

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to me. If you ask me to read nothing, I will succeed reading nothing. Why should it be a failure?

{frameCount, _targetMapping.frameSize});
slice.read(buffer);
return true;
// Cheking if the target GIDs are a single slice in the source file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo 'Cheking'


void CompartmentReportHDF5::_parseWriteOptions(const URI& uri)
{
for (auto i = uri.queryBegin(); i != uri.queryEnd(); ++i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto&

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it doesn't matter, but since this is just an iterator, don't you think that adding a reference would make the code slower for a long loop? Even with optimizations I have the feeling that the code would have more pointer chasing with the reference.

const auto& value = i->second;
if (key == "cells_to_frames")
{
if (value == "inf")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not documented that 'inf' is valid value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I've also documented what cache_size=auto does.

void CompartmentReportHDF5::_reopenDataSet(size_t cacheSizeHint)
{
// Getting the chunk dims
auto properties = H5Dget_create_plist(_data->getId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HighFive has no support for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because it's a getter and we have started adding support for different properties quite recently. For the few number of lines it is I prefer not messing with HighFive.

Juan Hernando Vieites added 4 commits August 23, 2018 11:44
Chunk cache size can be configured with the query parameter cache_size
Chunk dimensions can be modified to favor traces vs frames using the query
parameter cells_to_frames.
@hernando
Copy link
Contributor Author

Any more comments on this one?

@karjonas
Copy link
Contributor

lgtm

@hernando hernando merged commit de59dc1 into BlueBrain:master Aug 24, 2018
@hernando hernando deleted the h5_report branch August 27, 2018 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants