Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shader define expression optimization #13936

Merged
merged 11 commits into from
Jun 12, 2023

Conversation

Dok11
Copy link
Contributor

@Dok11 Dok11 commented Jun 2, 2023

In my scene with 50+ shaders the ShaderDefineExpression.infixToPostfix was called 15 000–35 000 times while scene is loading. In average for one loading it takes 54.28ms for all calls and 34.39ms for first 15 000 calls.
After optimized from this PR I achieve next result: 9.68ms for all calls and 7.09ms for first 15 000 calls.

I made two simple changes:

  1. If infix does not contain operators, then it will not be modified, so we can return it as is.
  2. As I mention above this function called thousands times while scene loading. But the input string are not unique every time, in fact, in my case, it was only 168 unique infixes. So there is a reason to use memoization, because the result of this function deterministic and has no side effects as I can see.
    For safety work with memory I add a limit for memoize cache as 50000 entries. In my case the 100 entries takes 6504 chars of memory, so with the basic interpolation it might be around 640 KB for 50 000 entries. As all we know, 640K ought to be enough for anybody :)

So, just with this two changes performance of the function was boosted 5.6x times and for my middle (as I think now) scene it saves 45ms (or almost 3 frames) for loading.

@bjsplat
Copy link
Collaborator

bjsplat commented Jun 2, 2023

Please make sure to label your PR with "bug", "new feature" or "breaking change" label(s).
To prevent this PR from going to the changelog marked it with the "skip changelog" label.

@bjsplat
Copy link
Collaborator

bjsplat commented Jun 2, 2023

Copy link
Contributor

@Popov72 Popov72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Could you run additional tests with some of the PGs in the visualization test list (try to use the heaviest PGs), just to have a larger corpus of tests?

@sebavan
Copy link
Member

sebavan commented Jun 8, 2023

@Popov72 I ll do a round of review once you validate the changes.

Copy link
Member

@sebavan sebavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a huge fan of the static way the cache is working. basically it only grows to 50000 and then no cache. I wonder if there would be a way to not keep oldest entries or rotate them instead ?

@Dok11
Copy link
Contributor Author

Dok11 commented Jun 8, 2023

I am not a huge fan of the static way the cache is working. basically it only grows to 50000 and then no cache. I wonder if there would be a way to not keep oldest entries or rotate them instead ?

Of course there are many things that we can make with memoization cache, but the current value seems a huge limit and this property is not private, so it can be changed, if developer would like to increase cache size.
I can make cache more smart if you want.

@sebavan
Copy link
Member

sebavan commented Jun 8, 2023

would be cool as long as it does not slow things down @Popov72 any thoughts ?

@Dok11
Copy link
Contributor Author

Dok11 commented Jun 9, 2023

To save performance we should remove outdated cache pretty rarely, not each cache access.
So, as an idea, in the cache we will save the last time access, when we reach the limit we will run cache clear that removes half of cache stack (25,000 in out case) by picking items that a long time was not accessed.
As other way, we can save the number of access times to each cache item and while cache clear removes only items that rarely needed.
I can try some of this idea, if you like it. Or let's discuss what else we can make here.

@sebavan
Copy link
Member

sebavan commented Jun 9, 2023

Love the idea of cleaning like this !!!!

@sebavan
Copy link
Member

sebavan commented Jun 9, 2023

Let s ensure the cache write access for the date won t slow things more than it was before the PR

@Dok11
Copy link
Contributor Author

Dok11 commented Jun 10, 2023

Let s ensure the cache write access for the date won t slow things more than it was before the PR

If InfixToPostfixCacheLimitSize have not enough size and clearCache calls every time, than it has slow down. For example, my scene utilize 168 cache items while loading and if InfixToPostfixCacheLimitSize=50 (just fifty items) and every fiftieth half items in cache removes that it really slow down to 27.86ms for all function calls and 15.37ms for first 10 000 calls - it is better than before PR, but not good as it possible.

But when InfixToPostfixCacheLimitSize has enough size I measured 5.57ms for first 10 000 calls and 12.13ms for all calls.

@sebavan sebavan enabled auto-merge June 12, 2023 15:08
@sebavan sebavan merged commit 7c6f37a into BabylonJS:master Jun 12, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants