Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triangulated cached conversions for IECoreGL::MeshPrimitive #9

Merged
merged 4 commits into from
Jul 10, 2013

Conversation

ldmoser
Copy link

@ldmoser ldmoser commented Jun 19, 2013

So this solution tries to solve two issues:

A) Our profiles indicate that there's lots of computation going on to the triangulation of meshes when rendering them with IECoreGL. We would like to take advantage of repetitive geometry and cache intermediate results (vertex triangulation, triangulated primVars, generation of normals) as opposed to only get the cache benefits for the entire Mesh (animated geo does not match and have to be reconverted entirely).

B) We envision developing ways to update the GL scene, using a scene edits mechanism. So ideally we would want to convert the IECore primitive to IECoreGL once, and then, replace the whole primitive OR update only a subset of the primitive variables (the ones that were affected), by providing the original non-triangulated prim vars.

I decided to focus the solution on Meshes, since that's the most common geometry in a SceneCache and leave other types of primitives unmodified for now, but the same ideas could be applied for PointsPrimitive and CurvesPrimitive.

Solution adopted: (1) expand the CachedConverter to accept custom converter objects (callable objects with a hash function). Then, (2) adding an additional constructor for IECoreGL::MeshPrimitive where it receives the original non-triangulated topology. And lastly, (3) create converters for each one of the basic operations that happens when converting a Mesh to IECoreGL (including triangulation).

I have some open questions for this approach:

  • The concept of the CachedConverter seems pretty useful for other situations. Should we instead create a IECore::CachedResults where it always expect a callable object to be passed (with a hash function) that does the operation? So it could be used on general IECore Ops, such as the Normals, Triangulate, etc. I think the only reason why not, would be that the IECoreGL::CachedConverter cache must be cleared at safe threads.
  • The triangulation algorithms are now in IECoreGL::MeshPrimitive. But I left some TODOs to consider unifying the code with the triangulate op in IECore.
  • The computation of normals is currently done in the conversion. But the issue (B) would require it to be in the addPrimitiveVariable() every time that "P" is redefined.
  • The computation of "st" is currently done in the ToGLMeshConverter (as before). I guess that we would want to support in Cortex V2f UV primVars and deprecate that code eventually.

Lucio Moser added 3 commits June 19, 2013 10:14
… converter object that also is responsible for giving a unique hash.
…he original non-triangulated topology and then, addPrimitiveVariable will triangulate any primvars added. This will be useful for updating scenes in the future.

Also using the CachedConverter for all triangulations and normal computation.
@andrewkaufman
Copy link
Member

John, did you have a chance to look over this one?

{
IECore::MurmurHash h;
h.append( "TriangulatedVertexPrimVar");
h.append( m_faceVaryingLength );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to tweak this to get it to build on mac :

h.append( (uint64_t)m_faceVaryingLength );

@johnhaddon
Copy link
Member

This looks grand to me, aside from the few very minor comments I've made inline.

I quite like the idea of having a generalised cached results mechanism in IECore, but I think for now it'd be preferable to go with the implementation as-is and worry about generalising later. The tricky part of the implementation (the LRUCache) is already shared among the different caching mechanisms we already have, so I'm not sure how much benefit we'd get from generalising further. Each candidate for the generalised form (ops, readers, etc) also seem like they might require a slightly different interface (searchpaths for the CachedReader, delayed destruction and contents for the GL converter) so I think generalising further might be more trouble than it's worth.

Do you have any performance figures you could share for this Lucio? From what I recall in our discussions before it was quite significant, and I've certainly seen profiles where triangulation and facevarying conversion was the primary bottleneck, so it might be nice to give people an idea of what benefit they can expect to see here.

I'm also curious to know if the bound computation in addPrimitiveVariable() is now showing up in profiles at all - might that also be a candidate for caching?

…is not present and clearing the object in the CachedConverter to prevent misuse.
@ldmoser
Copy link
Author

ldmoser commented Jul 10, 2013

Thanks for the review John! I believe I've addressed all of your notes.

I will try to get some data about performance, currently I don't have anything conclusive for this particular implementation, mainly because I was already testing the next round of changes, which would be about only loading the animated primVars as opposed to the whole geometry...

andrewkaufman added a commit that referenced this pull request Jul 10, 2013
Triangulated cached conversions for IECoreGL::MeshPrimitive
@andrewkaufman andrewkaufman merged commit 50f0822 into ImageEngine:master Jul 10, 2013
@ldmoser
Copy link
Author

ldmoser commented Jul 15, 2013

I'm having second thoughts about this pull request.

After profiling a bit and testing, I've realized some things:

  1. Now each Mesh that is converted is reporting twice the memory it actually consumes in the cache, because each primVar is stored there, plus the whole Mesh is also stored (due to the original cachedConverter mechanism used in the Renderer). As a result, I had to set the IECOREGL_CACHEDCONVERTER_MEMORY higher to prevent cache cleanups while changing the frames in an animation and take advantage of the intermediate cached results.

  2. The Normals converter is making the reported memory usage even worse because it passes the whole Mesh as the input object to convert and the CachedConverter takes it's memoryUsage value instead of the memory usage from the object that is stored in the cache. I have a local fix which is inlined with the other converters (it's a low level one that operated directly with "P" data), but the cost is still a bit wrong since it related to the input object, not the object that is stored in the cache.

  3. When turning off the original cache mechanism ( by setting automaticInstancing = false ), and using a huge cache size, then switching between two frames a couple times (to add everything to cache), I haven't managed to get same timings. There's a certain cost on reusing intermediate results and I'm not sure we have seen a scenario where it justifies the overhead.

What I have been thinking is on turning off the intermediate cache for now (by calling the custom converter functions directly, without the use of CachedConverter), until we are sure they are necessary. But, keep the ability to call the addPrimitiveVariable passing the non-triangulated primvars.

Thoughts?

@johnhaddon
Copy link
Member

I'd still like to see some hard performance numbers for this change - it's all about optimisation so without numbers it's very hard to have a meaningful discussion. How about the time for a single run through of a sequence of a character-sized deforming quad mesh with changing P and with IECOREGL_CACHEDCONVERTER_MEMORY set high enough that cache removals are a non-issue? By making sure the cache is cleared before each run we should negate any benefits of automaticInstancing and get some idea of the benefits of this change alone. I've definitely seen profiles where the triangulation and conversion of the entire mesh when only a primvar was changing was the major overhead, so I'd be hopeful that the numbers would be good.

Unless we do have compelling numbers (either for this case or for a use case using addPrimitiveVariable()) my preference would be to back out of the changes. They introduce a fair bit of additional complexity and duplicated code over the original which is absolutely fine if they bring benefits, but not really worth it if they don't.

Assuming we want to keep the changes, perhaps we can address the memory measurement problem properly, as it's really not a problem with your changes but with the existing mechanism. The CachedConverter was introduced mid-major-version so the current memory usage estimate is the closest we could get at the time without introducing incompatible changes. Perhaps we could introduce a base class shared by all the GL types (rather than just using RunTimeTyped as we do now), and that could provide a memoryUsage() method? I wonder also if there might be some mileage in converting direct from the non-triangulated IECore::Data to an IECoreGL::Buffer, thus genuinely reducing the data storage?

ldmoser pushed a commit to ldmoser/cortex that referenced this pull request Jul 18, 2013
…eds some rethinking as it competes resources from the GL buffers and the gains are not necessarily coming from the cache, but from the fact we stopped using the TriangulateOp and stop duplicating the Mesh.
johnhaddon added a commit to johnhaddon/cortex that referenced this pull request Jun 10, 2014
This reverts commit 6e257c8, which was meant to have been reverted in 153df0c, along with the rest of ImageEngine#9.
johnhaddon added a commit to johnhaddon/cortex that referenced this pull request Jun 17, 2014
This reverts commit 6e257c8, which was meant to have been reverted in 153df0c, along with the rest of ImageEngine#9.
johnhaddon added a commit to johnhaddon/cortex that referenced this pull request Dec 20, 2019
`DisplayTileCallbackFactory::create()` returned the same DisplayTileCallback instance every time, and up until now took responsibility for deleting it. This has always been sketchy because Appleseed holds the returned DisplayTileCallback's with `auto_release_ptr`, so `DisplayTileCallback::release()` must avoid double deletes by just doing nothing. This worked if `DisplayTileCallback::release()` was called before `DisplayTileCallbackFactory::release()`, but Appleseed 2.1 has reversed the call order so that the factory is destroyed before the callback is. This means `DisplayTileCallback::release()` is called on an already-deleted instance, and crashes, with a stack trace like so :

```
#0  0x0000000000000000 in ?? ()
#1  0x00007fffe18153a6 in foundation::auto_release_ptr<renderer::ITileCallback>::reset (this=0x7fff8c012148, ptr=0x0)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/foundation/utility/autoreleaseptr.h:187
ImageEngine#2  0x00007fffe181137f in renderer::(anonymous namespace)::ProgressiveFrameRenderer::~ProgressiveFrameRenderer (this=0x7fff8c0120a0, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/progressive/progressiveframerenderer.cpp:178
ImageEngine#3  0x00007fffe1811592 in renderer::(anonymous namespace)::ProgressiveFrameRenderer::~ProgressiveFrameRenderer (this=0x7fff8c0120a0, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/progressive/progressiveframerenderer.cpp:187
ImageEngine#4  0x00007fffe18115ca in renderer::(anonymous namespace)::ProgressiveFrameRenderer::release (this=0x7fff8c0120a0)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/progressive/progressiveframerenderer.cpp:191
ImageEngine#5  0x00007fffe1712fb7 in foundation::auto_release_ptr<renderer::IFrameRenderer>::~auto_release_ptr (this=0x7fff8c01f928, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/foundation/utility/autoreleaseptr.h:124
ImageEngine#6  0x00007fffe17121be in renderer::RendererComponents::~RendererComponents (this=0x7fff8c01f890, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/renderercomponents.h:73
ImageEngine#7  0x00007fffe1712288 in std::default_delete<renderer::RendererComponents>::operator() (this=0x7fff8c000ae0, __ptr=0x7fff8c01f890)
   at /opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/unique_ptr.h:76
ImageEngine#8  0x00007fffe1711e0d in std::unique_ptr<renderer::RendererComponents, std::default_delete<renderer::RendererComponents> >::reset (this=0x7fff8c000ae0,
   __p=0x7fff8c01f890) at /opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/unique_ptr.h:347
ImageEngine#9  0x00007fffe1710632 in renderer::CPURenderDevice::~CPURenderDevice (this=0x7fff8c0009b0, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/device/cpu/cpurenderdevice.cpp:112
ImageEngine#10 0x00007fffe1710988 in renderer::CPURenderDevice::~CPURenderDevice (this=0x7fff8c0009b0, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/device/cpu/cpurenderdevice.cpp:129
ImageEngine#11 0x00007fffe181f21c in std::default_delete<renderer::IRenderDevice>::operator() (this=0x3093868, __ptr=0x7fff8c0009b0)
   at /opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/unique_ptr.h:76
ImageEngine#12 0x00007fffe181eafd in std::unique_ptr<renderer::IRenderDevice, std::default_delete<renderer::IRenderDevice> >::~unique_ptr (this=0x3093868,
   __in_chrg=<optimized out>) at /opt/rh/devtoolset-6/root/usr/include/c++/6.3.1/bits/unique_ptr.h:239
ImageEngine#13 0x00007fffe181d545 in renderer::MasterRenderer::Impl::~Impl (this=0x3093820, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/masterrenderer.cpp:168
ImageEngine#14 0x00007fffe181cfdb in renderer::MasterRenderer::~MasterRenderer (this=0x3088d70, __in_chrg=<optimized out>)
   at /disk1/john/dev/gafferDependencies/Appleseed/working/appleseed-2.1.0-beta/src/appleseed/renderer/kernel/rendering/masterrenderer.cpp:584
ImageEngine#15 0x00007fffc67d0ae0 in (anonymous namespace)::AppleseedRenderer::~AppleseedRenderer() () from /disk1/john/dev/build/gaffer/lib/libGafferAppleseed.so
ImageEngine#16 0x00007fffd8a734b6 in GafferScene::InteractiveRender::stop() () from /disk1/john/dev/build/gaffer/lib/libGafferScene.so
ImageEngine#17 0x00007fffd8a735bf in GafferScene::InteractiveRender::update() () from /disk1/john/dev/build/gaffer/lib/libGafferScene.so
```

The solution here is to return a new DisplayTileCallback every time `DisplayTileCallbackFactory::create()` is called, and to destroy those the regular way in `DisplayTileCallback::release()`. This works fine for interactive renders, because Appleseed will only ever make a single DisplayTileCallback at a time. For batch renders it is completely broken though, because Appleseed will call `DisplayTileCallbackFactory::create()` once per thread, and every thread will end up writing to a different display driver. So far we've only used the display driver for interactive renders though, so maybe this is OK.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants