New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OVDB-96: CMake Memory Allocators #447
OVDB-96: CMake Memory Allocators #447
Conversation
Signed-off-by: Dan Bailey <danbailey@ilm.com>
Signed-off-by: Dan Bailey <danbailey@ilm.com>
Signed-off-by: Dan Bailey <danbailey@ilm.com>
When building for Houdini or Maya will this prefer the version of jemalloc or tbbmalloc that ships with the app? If not, should it? |
Signed-off-by: Dan Bailey <danbailey@ilm.com>
Yes, it should. I just added a change which sets the Jemalloc library directory path in OpenVDBHoudiniSetup.cmake so it will now prefer to pick it up from the Houdini install. TBB malloc was already being picked up from the Houdini install if USE_TBBMALLOC was set to on. I also updated Circle CI to only install jemalloc for non-Houdini builds as well, just to be sure it doesn't fall back to the pre-installed version. I don't know about Maya. |
CMakeLists.txt
Outdated
@@ -152,6 +162,7 @@ mark_as_advanced( | |||
OPENVDB_ENABLE_RPATH | |||
USE_HOUDINI | |||
USE_MAYA | |||
USE_JEMALLOC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
USE_TBBMALLOC here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And actually, if we decide to set USE_JEMALLOC to ON by default, USE_JEMALLOC probably shouldn't be an advanced option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, mark_as_advanced is just about controlling what shows up when using CMake from the UI? So we should include both jemalloc and tbbmalloc in that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's correct, I'd classify choosing a concurrent allocator as an advanced option, one which we should try to automate or at least choose a good default for (in this case jemalloc). But if jemalloc isn't always available (and can thus cause CMake errors) it probably shouldn't be hidden from the user.
CMakeLists.txt
Outdated
Use Jemalloc while building openvdb components. If USE_JEMALLOC is OFF, CMake attempts to query the | ||
located openvdb build configuration to decide on Jemalloc support. You may set this to off to disable Jemalloc. | ||
Although not strictly required, it is strongly recommended to use a concurrent memory allocator. | ||
Note that you must choose between Jemalloc or TBB malloc, or otherwise disable them both.]=] ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is on by default, is that correct? is jemalloc usually available on mac/centos/windows outside of houdini/maya builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, Maya ships with both tbbmalloc and (as of Maya 2017) "awjemalloc", which seems to be jemalloc with renamed symbols. Since we already require TBB, it might be better to make tbbmalloc the default allocator outside of Houdini builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found jemalloc to provide a decent performance improvement over tbbmalloc, so perhaps we make jemalloc the default outside of non-Maya builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Idclip - do you have a good idea how we would do this in the CMake? Make USE_TBBMALLOC default to ON and USE_JEMALLOC default to OFF when USE_MAYA is ON otherwise OFF and ON respectively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CMake support enum options? jemalloc and tbbmalloc should be mutually exclusive, so what we probably really want is a CONCURRENT_MALLOC option for which the choices are NONE, JEMALLOC, TBBMALLOC, and APPLICATION, where the latter specifies jemalloc for Houdini builds and tbbmalloc for Maya builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CMake support enum options?
afaik no - you have to use strings i.e.
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release CACHE STRING
"Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel." FORCE
)
endif()
You can have dependent CMake options - we use them a bit:
CMAKE_DEPENDENT_option(USE_JEMALLOC "use jemalloc" ON "USE_MAYA" OFF)
CMAKE_DEPENDENT_option(USE_TBBMALLOC "use tbbmalloc" OFF "USE_MAYA" ON)
https://cmake.org/cmake/help/latest/module/CMakeDependentOption.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For AUTO I would say the logic should be jemalloc for Houdini builds; tbbmalloc for Maya builds; and for all other builds jemalloc if it is available, otherwise tbbmalloc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that's ideal, but I'm not sure how easy it is to implement the "if it is available" part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove the REQUIRED keyword from the find_package call so that it doesn't error when searching for jemalloc. Then you can use if(TARGET Jemalloc::jemalloc) ... To check to see if the imported target was created/found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, I'll give that a go.
This looks great, thanks @danrbailey. Do we have any more specific advice for choosing a mem allocator with vdb aside from "use a concurrent one" - i.e. none vs je vs tbb? Not part of this PR, but it would be great to run some tests and get some numbers out so we can support the recommendation. |
I happen to have some benchmarks I ran recently. I did a very simple test case where I was comparing a vector of around 100 million or so "512 float" buffers to mimic the data storage in OpenVDB and attempt to isolate allocator performance (run across 24 logical cores):
I haven't looked at any tuning, in particular jemalloc offers options to balance memory usage vs performance. I think what surprised me most was how much the deallocation time varied. I ran an additional test with jemalloc allocating in serial and then deallocating in parallel (related to OVDB-93) and the deallocation time remained the same as the serial one. I'm still trying to understand what causes this. Right now, we always deserialize .vdb files in serial, so this is an interesting aspect to consider regarding how we do IO. |
This is really interesting, thanks! Maybe even worth adding to the documentation, though perhaps we should do some more vdb specific testing first. Either way this definitely seems like enough of a reason to continue to default to jemalloc |
Signed-off-by: Dan Bailey <danbailey@ilm.com>
set(CONCURRENT_MALLOC "Jemalloc") | ||
endif() | ||
endif() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a warning here if CONCURRENT_MALLOC is something other than "Auto", "Tbbmalloc", "Jemalloc", or "None", and then maybe set it to either "Auto" or "None"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it fallback on Auto.
CMakeLists.txt
Outdated
set(CONCURRENT_MALLOC Auto CACHE STRING | ||
"Choose a concurrent malloc library, options are: None Auto Jemalloc Tbbmalloc. | ||
Although not required, it is strongly recommended to use a concurrent memory allocator. | ||
Auto is the default and implies Tbbmalloc for Maya and Jemalloc for non-MAYA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "Auto is the default and implies Jemalloc, unless USE_MAYA is ON or Jemalloc is unavailable, in which case Tbbmalloc is used"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks.
JE Malloc is only used on Linux. Windows uses tbb_malloc. It is very hard to inject into the windows runtime to replace the system allocator. TBB actually has to do some very suspicious binary hacking at the machine code level to replace the base allocator. (And it is a patch that windows updates sometimes breaks...) We've not managed to get such a patch for Windows. We're also still running jemalloc 3.6.0. I've tried several times to move up to 5.0, 5.1, and 5.2. The problem is that the 5.X series uses lots of address space. I've got nothing against this personally, it is a wise way to get faster performance with no significant cost (other than TLB allocation). But users will freak out and submit bugs if the VIRT column of top goes to 128GB, even if it is only using 20GB of actual physical RAM. |
Signed-off-by: Dan Bailey <danbailey@ilm.com>
cf5c4c8
to
c750819
Compare
Signed-off-by: Dan Bailey <danbailey@ilm.com>
I think I've addressed all the feedback now, let me know if there's anything else outstanding in this implementation. |
I've just tried to run Seems like a related issue here: facebook/rocksdb#1616 jemalloc/jemalloc#420 |
Thanks for highlighting this - I think the simplest solution is to default to using tbbmalloc on Mac, which would leave just Linux on jemalloc by default. I'll give that a go and verify that fixes this issue. |
Signed-off-by: Dan Bailey <danbailey@ilm.com>
I've pushed a change to use TBB malloc on Mac and that seems to be working correctly now. |
CMakeLists.txt
Outdated
@@ -417,6 +432,16 @@ if(USE_MAYA) | |||
include(OpenVDBMayaSetup) | |||
endif() | |||
|
|||
# Configure malloc library. Use Tbbmalloc for Windows or Maya and Jemalloc otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this comment to include the macos decision with a reference to the problem (as looks like we can actually use jemalloc on mac after a specific version). Also perhaps expand on why windows uses tbbmalloc too for future reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
openvdb/doc/changes.txt
Outdated
@@ -17,6 +17,8 @@ Improvements: | |||
@vdblink::tree::LeafNode::modifyValueAndActiveState() | |||
LeafNode::modifyValueAndActiveState@endlink now modify voxel values | |||
in place for improved performance. | |||
- Add CMake support for linking against Jemalloc and TBB malloc and enable | |||
Jemalloc by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably add windows/mac notes here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Re-tested on mac, looks good |
Signed-off-by: Dan Bailey <danbailey@ilm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Haven't tested on centos, will do that tomorrow, but worked for me fine on macos
Add support for Jemalloc and TBB malloc and enable Jemalloc by default.
This seems to build for me on Linux and Windows fine. @jmlait - a quote from your chapter in "Multithreading for Visual Effects":
Can you expand on this? Are there any platforms that you'd suggest we make TBB malloc the default? Also curious to know if there is any configuration in which Houdini is shipped without using Jemalloc?