Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SceneCache caching and GL optimizations #33

Closed
andrewkaufman opened this issue Jul 17, 2013 · 17 comments
Closed

SceneCache caching and GL optimizations #33

andrewkaufman opened this issue Jul 17, 2013 · 17 comments
Labels
core Issues with core Cortex functionality gl Issues related to the Cortex OpenGL integration houdini Issues related to the Cortex Houdini integration maya Issues related to the Cortex Maya integration

Comments

@andrewkaufman
Copy link
Member

Continuing the discussion in the comments of #9. What can we do to optimize SceneCache use when hosted in 3d apps? Lucio has 2 branches currently:

  1. with an LRUCache for loaded objects, which is stored per SceneCache, but without the GL optimizations from Triangulated cached conversions for IECoreGL::MeshPrimitive #9
  2. above + ToGL optimizations from Triangulated cached conversions for IECoreGL::MeshPrimitive #9

So far his testing in Maya has shown branch 1 to be advantageous on a large crowd of similar low res characters, and on a single high res, constant topology, animated character with many small parts. Branch 2 helps a bit, but is a marginal benefit in Maya. He was finding that the GL cache was getting overloaded with data, and he needed to set a higher max cost.

@andrewkaufman
Copy link
Member Author

I've done similar testing in Houdini with the following results (from Houdini's performance monitor). i is loading a new frame, ii is loading the previous frame with a 500mb GL cache limit, and iii is the same with a 2gb GL cache limit.

Having the object cache for SceneCaches is certainly resulting in advantageous cook times for repeated frames, especially on the high res character. The GL optimizations seem to help quite dramatically on the crowd, regardless of cache size, while on the high res character, appropriate cache size is crucial.

crowd, branch 1)
    i:   1.213 cook, 4.656 draw
    ii:  1.086 cook, 4.628 draw
    iii: 1.056 cook, 4.638 draw

crowd, branch 2 )
    i:   1.174 cook, 2.737 draw
    ii:  1.064 cook, 2.749 draw
    iii: 1.061 cook, 2.644 draw

heavyCharacter, branch 1)
    i:   0.863 cook, 1.993 draw
    ii:  0.173 cook, 1.967 draw
    iii: 0.170 cook, 0.375 draw

heavyCharacter, branch 2 )
    i:   0.860 cook, 1.582 draw
    ii:  0.170 cook, 1.514 draw
    iii: 0.164 cook, 0.366 draw

@andrewkaufman
Copy link
Member Author

Currently, Lucio's SceneCache object cache is implemented per file with a hardcoded limit. It would be nicer to have centralized control over such things. Current thoughts are to have a global object cache which stores objects by hash. This could then be used by all SceneInterfaces, Readers, Gaffer computations, etc. to access objects. Those classes could store internal caches which map their appropriate data to a hash that is used to access the global object cache. Something like:

ObjectCache g_objectCache; // shared by absolutely everyone!!

SceneCache::readObject()
{
        MurmurHash h = calculateObjectHashByMappingInputParameters();
        if( ConstObjectPtr o = g_objectCache.get( h ) )
        {
                return o;
        }
        else
        {
                ConstObjectPtr o = readObjectForReal();
                g_objectCache.set( o.hash(), o );
                return o;
        }
}

If all the object creation mechanisms used a similar approach, we'd could have a single cachedObjectMemoryLimit that applied across the whole of Cortex and even on into Gaffer.

@andrewkaufman
Copy link
Member Author

The global object cache would still be different from the OpenGL cache, so perhaps if this approach works out, then we should revert the OpenGL triangulation stuff to reduce memory consumption, and keep the idea in our back pocket.

@andrewkaufman
Copy link
Member Author

It would also be nice to have easy ways to clear or query the global object cache. This way we could give app users a single button to clear out all the Cortex data, or to dispay them information about Cortex memory usage. It might also be useful to record statistics about cache misses and such.

@ldmoser
Copy link

ldmoser commented Jul 18, 2013

If the global cache is only about Object, maybe it would be even easier to only pass the object to the set function.

  g_objectCache.set( o );

@ldmoser
Copy link

ldmoser commented Jul 18, 2013

We found a problem on my branches exploring the SceneCache cache, that I was not returning a copy of the IECore::Objects (transform,attributes and the object), which is conceptually wrong from the SceneInterface definition. By adding the copy() it slowed down my original measurements in Maya and they show in the profile as taking about 20% of the time in the update of the crowd scene. So, if we consider the ObjectCache to be used, here and potentially in other implementations of SceneInterface, maybe we should also consider changing the SceneInterface to return constant pointers to data. Does that make sense to you?

@ldmoser
Copy link

ldmoser commented Jul 18, 2013

Also another thing to consider... If the SceneCache will know how to build an object from previously cached objects (with same constant topology animated objects), then we could remove the readObjectPrimitiveVariables() from the public interface of SceneInterface and let the implementations responsible for that...

@johnhaddon
Copy link
Member

20% seems a lot, especially given that we're using lazy-copy-on-write. Did your profile show any details about which part of the copying might have been taking the time?

If having in-memory caching is something we intend to make a big part of our SceneCache implementation, then yes, I think moving to ConstPtr return types in SceneInterface is fine. If it's more of an optional thing that we only enable sometimes, then it seems a bit harder to justify the extra copy forced upon people who want to change the object they just loaded. Seems like the current feeling is that in-memory caching is good (particularly when it's shared across all sorts of Cortex use cases) so ConstPtr feels OK to me.

@johnhaddon
Copy link
Member

I'd like to get rid of readObjectPrimitiveVariables() if possible - SceneInterface is quite a large API already and if the only reason for that method was an optimisation that is no longer necessary then I'd like to see it gone. Let's wait and see if it has any utility in implementing the Alembic SceneInterface in #29, and if not we can remove it.

@ldmoser
Copy link

ldmoser commented Jul 18, 2013

Attached the profile showing the Object::copy.
profile

@johnhaddon
Copy link
Member

OK - so actual copying of data isn't showing up, so that's good (the lazy-copy-on-write is working properly). So is the plan to go with ConstPtr returns from SceneInterface anyway?

@ldmoser
Copy link

ldmoser commented Jul 18, 2013

I think I will create a new branch, where we can create the implementation of the ObjectCache and then, start using it in the SceneCache before changing the SceneInterface. We can always "simulate" the benefits of the ConstPtr return by not calling the copy() function for now. I'm interested seeing how the ObjectCache will help in the cache of the non-Mesh data in the scene, such as the transforms and attributes, that would exist in LinkedScenes with complex environments and affect performance. But these data would be way smaller than the meshes, and I wander if we will benefit from having this indirection: [sceneCacheKey]=>[hash] and then ObjectCache for [hash]=>[Object].

@johnhaddon
Copy link
Member

Sounds like a plan.

@andrewkaufman
Copy link
Member Author

I think Lucio's recently merged ObjectPool branch address this ticket. Can we close it now?

@johnhaddon
Copy link
Member

Seems like it to me...

@ldmoser
Copy link

ldmoser commented Aug 14, 2013

I guess so. Although the subject on GL optimization will probably continue. Some ideas discussed so far was:

  • Optimization of bounding boxes: using shaders? reusing the CurvesPrimitives? creating a BoxPrimitive?
  • Using less memory by converting/triangulating from IECore directly to GL buffers.
  • Only converting/triangulating the primVars that the GL shaders will see.
  • Only compute the normals if the shaders need it.

@andrewkaufman
Copy link
Member Author

Let's open a new issue to discuss those if we need. This one covers a lot of details thats taken care of now, and is quite hard to dig through and find the things still todo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues with core Cortex functionality gl Issues related to the Cortex OpenGL integration houdini Issues related to the Cortex Houdini integration maya Issues related to the Cortex Maya integration
Projects
None yet
Development

No branches or pull requests

3 participants