Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Studio] Optimize FindLoopNoAcc #55

Merged
merged 8 commits into from
Feb 20, 2024

Conversation

takahiro0327
Copy link
Contributor

@takahiro0327 takahiro0327 commented Feb 16, 2024

Slightly optimized FindLoopNoAcc.
Loading large scenes, which used to take 95 seconds, is now 90 seconds.
Please merge if it looks OK.

The optimization I have implemented is simple, it records the path where the object was found in a previous query and searches that path the next time there is a query with the same name.

About 30% of the queries fall into this category and are accelerated.
Seventy percent of the queries return null.

In addition, I added the ability to output the scene loading time to the log.

[Info   :Studio Optimizations] Scene Loaded: 87.10[s]

@takahiro0327
Copy link
Contributor Author

In fact, this change may not be as fast.
Disturbance factors are more significant.
On average, I'd expect a 1-3 second reduction versus a 90s load time.

@ManlyMarco
Copy link
Collaborator

ManlyMarco commented Feb 16, 2024

In fact, this change may not be as fast. Disturbance factors are more significant. On average, I'd expect a 1-3 second reduction versus a 90s load time.

Would caching the results maybe be worthwhile? It could help if there are many repeated calls on the same Transform, maybe cache all child transforms in that case and re-check only if there is no match found in cache? I think I did something similar to this in a different plugin, maybe ABMX (edit: here).

@takahiro0327
Copy link
Contributor Author

Thanks for the review.
No, I also tried caching the child transforms.
But stopped because it was slightly slower.

Dictionary<string,GameObject> _nameToGObj;

There are more queries that return null because no object is found than queries that do not. Therefore, it would have been faster if child transforms could be cached and non-existent transforms could be determined faster.
But it did not.
There are probably a lot of unrelated child objects mixed in with the query. I think the cost of recording all those children is too high.

My method is caching the paths. I think the nice thing about this method is that the cached path may work for another transform search.

@takahiro0327 takahiro0327 marked this pull request as draft February 17, 2024 01:11
@takahiro0327
Copy link
Contributor Author

I've come up with an idea to speed up Find a bit, so I'm going to experiment with it.

@takahiro0327
Copy link
Contributor Author

Please review with this.

I've tried to keep the scanning progress on multiple queries.
Maybe it is faster.

Searches for the same name were about twice per character.
The cache in trasform is of relatively low value.
The first search would register in the cache, the second search would match the cache. That's it.

The number of offspring varied by trasform: there were trasforms with 700 children and others with 70-90 children.
I think the reason it was slower when I tried it before is that caching the children in the 700 one would cost more to add to the table.

Measured with a 200MB file with 20 characters. Frankly, it is ambiguous whether it is getting faster or slower.

before:

92.9
94.8
128.63
127.9
97.1

after:

94.6
94.7
116.7
123.8
109.6

after2(current):

95.4
90.8
113.6
112.8
95.7

It was faster as far as I could see in the mono profiler, but I think the overhead is too large to be inaccurate.
before1: 41.64[s]

0	2408390	IllusionFixes.StudioOptimizations:FindLoopNoAcc (UnityEngine.Transform,string)	41642654700	3002847232

after1: 4.19[s]

0	865620	IllusionFixes.StudioOptimizations:FindLoopNoAccWithPaths (UnityEngine.Transform,string,System.Collections.Generic.List`1<string>)	4192795500	274575360
0	14170	IllusionFixes.StudioOptimizations:FindLoopNoAcc (UnityEngine.Transform,string)	803156100	46448640

after2: 0.07[s]

0	14170	IllusionFixes.StudioOptimizations:FindLoopNoAcc (UnityEngine.Transform,string)	35402800	598016
0	14170	IllusionFixes.FindLoopAssistant:FindChild (UnityEngine.Transform,string)	32667100	598016
0	8321	IllusionFixes.FindLoopAssistant/Frame:NextChild ()	3458500	278528
0	132	IllusionFixes.FindLoopAssistant:RegisterPath (UnityEngine.Transform,UnityEngine.Transform,string)	2176800	94208

before_MonoProfilerOutput_2024-02-17_18-43-32.csv
after1_MonoProfilerOutput_2024-02-17_18-47-39.csv
after2_MonoProfilerOutput_2024-02-17_18-57-52.csv

@takahiro0327 takahiro0327 marked this pull request as ready for review February 17, 2024 10:15
@takahiro0327
Copy link
Contributor Author

My PC was the cause of the blurry numbers.
It was a few percent, but it was decently fast.

before: ave:55:64[s]

Scene Loaded: 58.17[s]
Scene Loaded: 54.54[s]
Scene Loaded: 54.77[s]
Scene Loaded: 56.30[s]
Scene Loaded: 54.42[s]

after1: ave:53.68[s]

Scene Loaded: 53.63[s]
Scene Loaded: 53.58[s]
Scene Loaded: 54.02[s]
Scene Loaded: 53.67[s]
Scene Loaded: 53.51[s]

after2: ave:53.34[s]

Scene loading completed: 53.2[s]
Scene loading completed: 53.8[s]
Scene loading completed: 53.5[s]
Scene loading completed: 53.2[s]
Scene loading completed: 53.0[s]

Copy link
Collaborator

@ManlyMarco ManlyMarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Someone else reviewed it on discord, you can have look if you want here.

@takahiro0327
Copy link
Contributor Author

I'll write it here because I don't want it to get caught in a search later.

Sorry about the title, but it's not FindLoop optimization, it's FindLoopNoAcc optimization.
It is only used in the bone initialization process.
The goal is to speed up scene loading.

image

Items accumulate in _nameToPath. However, most items accumulate only two.
This is because there are few kinds of root transforms.
image

The exception to this is the hair bone, which has variations, so it accumulates.

Still, it is much faster than doing Find on dozens of children each time.
If you only accumulate two items, you can almost always find one of them.
Much faster than scanning dozens or hundreds.

image

image

@takahiro0327
Copy link
Contributor Author

In the data I checked, 30% of the queries are initially returned and the remaining 70% return null.
The other percentage of returns is less than 1-2%.
The _childMap speeds up the 70% of queries that return null. It was necessary to keep track of all children to determine non-existent transforms without having to scan through dozens of children.

image

@takahiro0327
Copy link
Contributor Author

However, recording all of the root's children conversely slowed it down.
This was due to the existence of a root with hundreds of children, and the high cost of recording all of those children. Also, the root with hundreds of children did not return nulls.
In such a root, not all children were seen.

A _findStack was added to store the scanning status of children.
This minimizes the amount of Find children.
Together with _childMap, it can return null immediately.

@takahiro0327
Copy link
Contributor Author

takahiro0327 commented Feb 18, 2024

I don't think there is room for optimization in a regular FindLoop.
Removing the caller may be effective, but there does not seem to be room for optimization in the FindLoop itself.

In this case, optimization was performed based on the following assumptions.

  1. if a query comes in with the same transformation as the previous query, assume that the transformation has not been changed since the previous query.
  2. the structure of the transform is roughly the same
  3. but do not make assumptions about what kind of children you have or what kind of structure you have

Since it is a series of calls from one parent function, 1 is a reasonable assumption.

Copy link
Collaborator

@ManlyMarco ManlyMarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I agree that it's probably as good as it can realistically get without modifying the game code to avoid the need for FindLoop in the first place.

@takahiro0327
Copy link
Contributor Author

You're right that the return is small compared to the complexity of the code.
But it is also true that it reduces about 100ms per character, and loading time has to be accumulated like that to reduce it. There is no doubt that loading time is reduced by seconds when there are many characters.

It is dangerous to complicate the code if the changes are frequent or have a large impact area.
This kind of change should not be included.
However, I believe this is not the case here.

@ManlyMarco ManlyMarco merged commit d1349a7 into IllusionMods:master Feb 20, 2024
@ManlyMarco ManlyMarco changed the title Optimize findloop [Studio] Optimize FindLoopNoAcc Feb 20, 2024
@takahiro0327
Copy link
Contributor Author

Thanks for the merge!

@takahiro0327 takahiro0327 deleted the optimize_findloop branch February 21, 2024 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants