Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OVDB-96: CMake Memory Allocators #447

Merged

Conversation

danrbailey
Copy link
Contributor

Add support for Jemalloc and TBB malloc and enable Jemalloc by default.

This seems to build for me on Linux and Windows fine. @jmlait - a quote from your chapter in "Multithreading for Visual Effects":

On platforms where linking to jemalloc is not practical, we resort to tbb::malloc_proxy.

Can you expand on this? Are there any platforms that you'd suggest we make TBB malloc the default? Also curious to know if there is any configuration in which Houdini is shipped without using Jemalloc?

Signed-off-by: Dan Bailey <danbailey@ilm.com>
Signed-off-by: Dan Bailey <danbailey@ilm.com>
Signed-off-by: Dan Bailey <danbailey@ilm.com>
@danrbailey danrbailey requested review from jmlait and Idclip May 28, 2019 18:59
@pcucka
Copy link
Contributor

pcucka commented May 28, 2019

When building for Houdini or Maya will this prefer the version of jemalloc or tbbmalloc that ships with the app? If not, should it?

Signed-off-by: Dan Bailey <danbailey@ilm.com>
@danrbailey
Copy link
Contributor Author

When building for Houdini or Maya will this prefer the version of jemalloc or tbbmalloc that ships with the app? If not, should it?

Yes, it should. I just added a change which sets the Jemalloc library directory path in OpenVDBHoudiniSetup.cmake so it will now prefer to pick it up from the Houdini install. TBB malloc was already being picked up from the Houdini install if USE_TBBMALLOC was set to on.

I also updated Circle CI to only install jemalloc for non-Houdini builds as well, just to be sure it doesn't fall back to the pre-installed version. I don't know about Maya.

CMakeLists.txt Outdated
@@ -152,6 +162,7 @@ mark_as_advanced(
OPENVDB_ENABLE_RPATH
USE_HOUDINI
USE_MAYA
USE_JEMALLOC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USE_TBBMALLOC here as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, if we decide to set USE_JEMALLOC to ON by default, USE_JEMALLOC probably shouldn't be an advanced option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, mark_as_advanced is just about controlling what shows up when using CMake from the UI? So we should include both jemalloc and tbbmalloc in that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's correct, I'd classify choosing a concurrent allocator as an advanced option, one which we should try to automate or at least choose a good default for (in this case jemalloc). But if jemalloc isn't always available (and can thus cause CMake errors) it probably shouldn't be hidden from the user.

CMakeLists.txt Outdated
Use Jemalloc while building openvdb components. If USE_JEMALLOC is OFF, CMake attempts to query the
located openvdb build configuration to decide on Jemalloc support. You may set this to off to disable Jemalloc.
Although not strictly required, it is strongly recommended to use a concurrent memory allocator.
Note that you must choose between Jemalloc or TBB malloc, or otherwise disable them both.]=] ON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on by default, is that correct? is jemalloc usually available on mac/centos/windows outside of houdini/maya builds?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, Maya ships with both tbbmalloc and (as of Maya 2017) "awjemalloc", which seems to be jemalloc with renamed symbols. Since we already require TBB, it might be better to make tbbmalloc the default allocator outside of Houdini builds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found jemalloc to provide a decent performance improvement over tbbmalloc, so perhaps we make jemalloc the default outside of non-Maya builds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Idclip - do you have a good idea how we would do this in the CMake? Make USE_TBBMALLOC default to ON and USE_JEMALLOC default to OFF when USE_MAYA is ON otherwise OFF and ON respectively?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CMake support enum options? jemalloc and tbbmalloc should be mutually exclusive, so what we probably really want is a CONCURRENT_MALLOC option for which the choices are NONE, JEMALLOC, TBBMALLOC, and APPLICATION, where the latter specifies jemalloc for Houdini builds and tbbmalloc for Maya builds.

Copy link
Contributor

@Idclip Idclip May 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CMake support enum options?

afaik no - you have to use strings i.e.

if(NOT CMAKE_BUILD_TYPE)
  set(CMAKE_BUILD_TYPE Release CACHE STRING
    "Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel." FORCE
  )
endif()

You can have dependent CMake options - we use them a bit:

CMAKE_DEPENDENT_option(USE_JEMALLOC "use jemalloc" ON "USE_MAYA" OFF)
CMAKE_DEPENDENT_option(USE_TBBMALLOC "use tbbmalloc" OFF "USE_MAYA" ON)

https://cmake.org/cmake/help/latest/module/CMakeDependentOption.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For AUTO I would say the logic should be jemalloc for Houdini builds; tbbmalloc for Maya builds; and for all other builds jemalloc if it is available, otherwise tbbmalloc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that's ideal, but I'm not sure how easy it is to implement the "if it is available" part.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the REQUIRED keyword from the find_package call so that it doesn't error when searching for jemalloc. Then you can use if(TARGET Jemalloc::jemalloc) ... To check to see if the imported target was created/found

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I'll give that a go.

@Idclip
Copy link
Contributor

Idclip commented May 29, 2019

This looks great, thanks @danrbailey. Do we have any more specific advice for choosing a mem allocator with vdb aside from "use a concurrent one" - i.e. none vs je vs tbb? Not part of this PR, but it would be great to run some tests and get some numbers out so we can support the recommendation.

@danrbailey
Copy link
Contributor Author

This looks great, thanks @danrbailey. Do we have any more specific advice for choosing a mem allocator with vdb aside from "use a concurrent one" - i.e. none vs je vs tbb? Not part of this PR, but it would be great to run some tests and get some numbers out so we can support the recommendation.

I happen to have some benchmarks I ran recently. I did a very simple test case where I was comparing a vector of around 100 million or so "512 float" buffers to mimic the data storage in OpenVDB and attempt to isolate allocator performance (run across 24 logical cores):

Allocator Allocate (seconds) Deallocate (seconds)
glibc malloc (serial) 22.6 24.6
glibc malloc (parallel) 90.0 12.9
tbb malloc (serial) 8.33 31.5
tbb malloc (parallel) 2.73 15.4
je malloc (serial) 2.21 1.64
je malloc (parallel) 1.15 0.37

I haven't looked at any tuning, in particular jemalloc offers options to balance memory usage vs performance.

I think what surprised me most was how much the deallocation time varied. I ran an additional test with jemalloc allocating in serial and then deallocating in parallel (related to OVDB-93) and the deallocation time remained the same as the serial one. I'm still trying to understand what causes this. Right now, we always deserialize .vdb files in serial, so this is an interesting aspect to consider regarding how we do IO.

@Idclip
Copy link
Contributor

Idclip commented May 29, 2019

I happen to have some benchmarks I ran recently. I did a very simple test case where I was comparing a vector of around 100 million or so "512 float" buffers to mimic the data storage in OpenVDB and attempt to isolate allocator performance (run across 24 logical cores):

This is really interesting, thanks! Maybe even worth adding to the documentation, though perhaps we should do some more vdb specific testing first. Either way this definitely seems like enough of a reason to continue to default to jemalloc

Signed-off-by: Dan Bailey <danbailey@ilm.com>
set(CONCURRENT_MALLOC "Jemalloc")
endif()
endif()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a warning here if CONCURRENT_MALLOC is something other than "Auto", "Tbbmalloc", "Jemalloc", or "None", and then maybe set it to either "Auto" or "None"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it fallback on Auto.

CMakeLists.txt Outdated
set(CONCURRENT_MALLOC Auto CACHE STRING
"Choose a concurrent malloc library, options are: None Auto Jemalloc Tbbmalloc.
Although not required, it is strongly recommended to use a concurrent memory allocator.
Auto is the default and implies Tbbmalloc for Maya and Jemalloc for non-MAYA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "Auto is the default and implies Jemalloc, unless USE_MAYA is ON or Jemalloc is unavailable, in which case Tbbmalloc is used"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

@jmlait
Copy link
Contributor

jmlait commented May 30, 2019

Can you expand on this? Are there any platforms that you'd suggest we make TBB malloc the default? Also curious to know if there is any configuration in which Houdini is shipped without using Jemalloc?

JE Malloc is only used on Linux. Windows uses tbb_malloc. It is very hard to inject into the windows runtime to replace the system allocator. TBB actually has to do some very suspicious binary hacking at the machine code level to replace the base allocator. (And it is a patch that windows updates sometimes breaks...) We've not managed to get such a patch for Windows.

We're also still running jemalloc 3.6.0. I've tried several times to move up to 5.0, 5.1, and 5.2. The problem is that the 5.X series uses lots of address space. I've got nothing against this personally, it is a wise way to get faster performance with no significant cost (other than TLB allocation). But users will freak out and submit bugs if the VIRT column of top goes to 128GB, even if it is only using 20GB of actual physical RAM.

Signed-off-by: Dan Bailey <danbailey@ilm.com>
@danrbailey
Copy link
Contributor Author

I think I've addressed all the feedback now, let me know if there's anything else outstanding in this implementation.

@Idclip
Copy link
Contributor

Idclip commented May 31, 2019

I've just tried to run vdb_test on MacOS against Houdini 17.0.506 and immediately get the following error:
vdb_test(65605,0x1177d65c0) malloc: *** malloc_zone_unregister() failed for 0x7fff95729000
The last address doesn't seem to change.

Seems like a related issue here: facebook/rocksdb#1616 jemalloc/jemalloc#420
and the solution they went with: rust-lang/jemalloc#15

@danrbailey
Copy link
Contributor Author

I've just tried to run vdb_test on MacOS against Houdini 17.0.506 and immediately get the following error:
vdb_test(65605,0x1177d65c0) malloc: *** malloc_zone_unregister() failed for 0x7fff95729000
The last address doesn't seem to change.

Seems like a related issue here: facebook/rocksdb#1616 jemalloc/jemalloc#420
and the solution they went with: rust-lang/jemalloc#15

Thanks for highlighting this - I think the simplest solution is to default to using tbbmalloc on Mac, which would leave just Linux on jemalloc by default. I'll give that a go and verify that fixes this issue.

Signed-off-by: Dan Bailey <danbailey@ilm.com>
@danrbailey
Copy link
Contributor Author

I've pushed a change to use TBB malloc on Mac and that seems to be working correctly now.

CMakeLists.txt Outdated
@@ -417,6 +432,16 @@ if(USE_MAYA)
include(OpenVDBMayaSetup)
endif()

# Configure malloc library. Use Tbbmalloc for Windows or Maya and Jemalloc otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment to include the macos decision with a reference to the problem (as looks like we can actually use jemalloc on mac after a specific version). Also perhaps expand on why windows uses tbbmalloc too for future reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -17,6 +17,8 @@ Improvements:
@vdblink::tree::LeafNode::modifyValueAndActiveState()
LeafNode::modifyValueAndActiveState@endlink now modify voxel values
in place for improved performance.
- Add CMake support for linking against Jemalloc and TBB malloc and enable
Jemalloc by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably add windows/mac notes here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Idclip
Copy link
Contributor

Idclip commented Jun 1, 2019

Re-tested on mac, looks good

Signed-off-by: Dan Bailey <danbailey@ilm.com>
Copy link
Contributor

@Idclip Idclip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Haven't tested on centos, will do that tomorrow, but worked for me fine on macos

@danrbailey danrbailey merged commit 9c0bbd5 into AcademySoftwareFoundation:master Jun 4, 2019
@danrbailey danrbailey deleted the memoryallocator branch June 4, 2019 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants