Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline more ECS functions #8083

Merged
merged 10 commits into from
Apr 12, 2023
Merged

Inline more ECS functions #8083

merged 10 commits into from
Apr 12, 2023

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Mar 14, 2023

Objective

Upon closer inspection, there are a few functions in the ECS that are not being inlined, even with the highest optimizations and LTO enabled:

  • Almost all WorldQuery::init_fetch calls. Affects Query::get calls in hot loops. In particular, the WorldQuery implementation for () is used everywhere as the default filter and is effectively a no-op.
  • Entities::get. Affects Query::get, World::get, and any component insertion or removal.
  • Entities::set. Affects any component insertion or removal.
  • Tick::new. I've only seen this in component insertion and spawning.
  • ArchetypeRow::new
  • BlobVec::set_len

Almost all of these have trivial or even empty implementations or have significant opportunity to be optimized into surrounding code when inlined with LTO enabled.

Solution

Inline them

@james7132 james7132 added A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times labels Mar 14, 2023
@james7132
Copy link
Member Author

Holding off on taking this out of draft until #8053 is merged since some of the optimizations using inlined Entities::get and Entities::set relies on the unwrap when handling bundles to be removed in release builds.

@james7132
Copy link
Member Author

@james7132
Copy link
Member Author

@cart
Copy link
Member

cart commented Mar 21, 2023

I just kicked off a merge of #8053

@james7132 james7132 marked this pull request as ready for review March 22, 2023 13:36
@james7132 james7132 requested review from cart and JoJoJet March 28, 2023 20:21
Copy link
Member

@JoJoJet JoJoJet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd be interested in seeing if there's any visible difference in benchmarks, but it's not necessary.

@mockersf
Copy link
Member

Is there an impact on build time with those new inlines?

@james7132
Copy link
Member Author

Is there an impact on build time with those new inlines?

Good question.Definitely worth measuring. I would assume it would increase the amount of generated code whenever larger functions like Entities::get and SparseSets::get are used, but it probably doesn't have that strong of an impact without LTO enabled.

@james7132 james7132 requested a review from mockersf March 31, 2023 23:24
@james7132
Copy link
Member Author

@mockersf I checked the compiler output changes for this PR: james7132/bevy_asm_tests@214de16

Seems like there's a slight increase in codegen (~100-170 instructions) for component insertion/removal, and a signifcant decrease for any fetch or iteration (i.e. Query iteration, Query::get, and World::get).

@mockersf mockersf added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Apr 11, 2023
@james7132
Copy link
Member Author

As a final sanity check, I reran microbenchmarks for this PR. Gains are generally pretty small or within error margin, with the exception of Query::get and World::get for sparse components.

group                                           main                                    more-inline
-----                                           ----                                    -----------
add_remove/sparse_set                           1.02   770.1±49.09µs        ? ?/sec     1.00   752.8±43.03µs        ? ?/sec
add_remove/table                                1.00  1146.2±35.93µs        ? ?/sec     1.00  1148.6±36.77µs        ? ?/sec
add_remove_big/sparse_set                       1.00  822.2±131.87µs        ? ?/sec     1.01  828.3±155.04µs        ? ?/sec
add_remove_big/table                            1.01      2.4±0.11ms        ? ?/sec     1.00      2.4±0.45ms        ? ?/sec
get_or_spawn/batched                            1.01   307.0±17.62µs        ? ?/sec     1.00   303.8±13.68µs        ? ?/sec
get_or_spawn/individual                         1.01   488.8±56.75µs        ? ?/sec     1.00   483.0±55.45µs        ? ?/sec
heavy_compute/base                              1.02   219.6±26.82µs        ? ?/sec     1.00    214.5±2.24µs        ? ?/sec
insert_commands/insert                          1.00   373.1±28.46µs        ? ?/sec     1.01   376.1±34.78µs        ? ?/sec
insert_commands/insert_batch                    1.01   301.4±22.15µs        ? ?/sec     1.00   299.6±18.21µs        ? ?/sec
insert_simple/base                              1.00    362.8±6.63µs        ? ?/sec     1.00    361.3±6.05µs        ? ?/sec
insert_simple/unbatched                         1.00   754.7±14.42µs        ? ?/sec     1.00   751.0±40.23µs        ? ?/sec
iter_fragmented/base                            1.00   346.2±10.32ns        ? ?/sec     1.00    344.7±7.17ns        ? ?/sec
iter_fragmented/foreach                         1.00   161.6±21.58ns        ? ?/sec     1.07   173.4±30.21ns        ? ?/sec
iter_fragmented/foreach_wide                    1.02      3.8±0.09µs        ? ?/sec     1.00      3.7±0.07µs        ? ?/sec
iter_fragmented/wide                            1.07      4.1±0.26µs        ? ?/sec     1.00      3.8±0.10µs        ? ?/sec
iter_fragmented_sparse/base                     1.00      7.6±0.20ns        ? ?/sec     1.05      7.9±0.37ns        ? ?/sec
iter_fragmented_sparse/foreach                  1.01      7.8±0.31ns        ? ?/sec     1.00      7.8±0.26ns        ? ?/sec
iter_fragmented_sparse/foreach_wide             1.00     40.1±1.55ns        ? ?/sec     1.01     40.6±2.27ns        ? ?/sec
iter_fragmented_sparse/wide                     1.00     42.1±1.40ns        ? ?/sec     1.00     42.3±1.50ns        ? ?/sec
iter_simple/base                                1.07      8.9±0.23µs        ? ?/sec     1.00      8.3±0.08µs        ? ?/sec
iter_simple/foreach                             1.00      8.4±0.28µs        ? ?/sec     1.00      8.4±0.17µs        ? ?/sec
iter_simple/foreach_sparse_set                  1.03     26.9±0.53µs        ? ?/sec     1.00     26.1±0.58µs        ? ?/sec
iter_simple/foreach_wide                        1.00     41.8±0.87µs        ? ?/sec     1.03     43.2±0.24µs        ? ?/sec
iter_simple/foreach_wide_sparse_set             1.00    114.9±1.70µs        ? ?/sec     1.04    119.8±3.59µs        ? ?/sec
iter_simple/sparse_set                          1.00     29.3±0.77µs        ? ?/sec     1.00     29.3±0.55µs        ? ?/sec
iter_simple/system                              1.00      8.5±0.17µs        ? ?/sec     1.00      8.5±0.29µs        ? ?/sec
iter_simple/wide                                1.00     39.4±0.74µs        ? ?/sec     1.00     39.3±0.35µs        ? ?/sec
iter_simple/wide_sparse_set                     1.02    129.2±6.91µs        ? ?/sec     1.00    127.1±4.60µs        ? ?/sec
query_get/50000_entities_sparse                 1.00    308.6±5.09µs        ? ?/sec     1.00    308.3±1.44µs        ? ?/sec
query_get/50000_entities_table                  1.00    266.0±1.41µs        ? ?/sec     1.00    266.0±2.35µs        ? ?/sec
query_get_component/50000_entities_sparse       1.03   742.1±39.70µs        ? ?/sec     1.00   722.9±24.46µs        ? ?/sec
query_get_component/50000_entities_table        1.00   757.5±13.13µs        ? ?/sec     1.00   757.3±32.86µs        ? ?/sec
query_get_component_simple/system               1.00    561.9±7.47µs        ? ?/sec     1.01   565.6±10.25µs        ? ?/sec
query_get_component_simple/unchecked            1.00    716.6±8.94µs        ? ?/sec     1.00    717.4±6.52µs        ? ?/sec
query_get_many_10/50000_calls_sparse            1.13      4.9±0.64ms        ? ?/sec     1.00      4.3±0.75ms        ? ?/sec
query_get_many_10/50000_calls_table             1.13      4.5±0.63ms        ? ?/sec     1.00      3.9±0.16ms        ? ?/sec
query_get_many_2/50000_calls_sparse             1.06   708.4±63.05µs        ? ?/sec     1.00  668.9±145.37µs        ? ?/sec
query_get_many_2/50000_calls_table              1.04   719.4±81.49µs        ? ?/sec     1.00   692.2±34.30µs        ? ?/sec
query_get_many_5/50000_calls_sparse             1.18      2.1±0.35ms        ? ?/sec     1.00  1755.6±141.57µs        ? ?/sec
query_get_many_5/50000_calls_table              1.05  1906.7±231.01µs        ? ?/sec    1.00  1813.9±96.22µs        ? ?/sec
spawn_commands/2000_entities                    1.03   182.8±13.81µs        ? ?/sec     1.00    177.0±6.97µs        ? ?/sec
spawn_commands/4000_entities                    1.03   367.3±25.23µs        ? ?/sec     1.00   356.4±12.60µs        ? ?/sec
spawn_commands/6000_entities                    1.00   520.8±28.99µs        ? ?/sec     1.03   535.2±24.06µs        ? ?/sec
spawn_commands/8000_entities                    1.01   742.7±41.84µs        ? ?/sec     1.00   734.1±34.18µs        ? ?/sec
spawn_world/10000_entities                      1.03  898.1±118.55µs        ? ?/sec     1.00   873.4±86.67µs        ? ?/sec
spawn_world/1000_entities                       1.08    93.2±11.56µs        ? ?/sec     1.00     86.4±8.39µs        ? ?/sec
spawn_world/100_entities                        1.09      9.6±1.43µs        ? ?/sec     1.00      8.8±0.89µs        ? ?/sec
spawn_world/10_entities                         1.00  895.9±157.15ns        ? ?/sec     1.01   908.4±87.62ns        ? ?/sec
spawn_world/1_entities                          1.04    93.4±14.79ns        ? ?/sec     1.00    89.7±12.34ns        ? ?/sec
world_entity/50000_entities                     1.05   104.9±12.09µs        ? ?/sec     1.00    100.1±0.19µs        ? ?/sec
world_get/50000_entities_sparse                 1.10   227.0±32.64µs        ? ?/sec     1.00    205.7±1.00µs        ? ?/sec
world_get/50000_entities_table                  1.01    172.8±3.20µs        ? ?/sec     1.00    171.4±1.96µs        ? ?/sec
world_query_for_each/50000_entities_sparse      1.00     53.6±0.83µs        ? ?/sec     1.00     53.6±0.17µs        ? ?/sec
world_query_for_each/50000_entities_table       1.00     27.2±0.22µs        ? ?/sec     1.00     27.2±0.16µs        ? ?/sec
world_query_get/50000_entities_sparse           1.04    100.9±8.07µs        ? ?/sec     1.00     96.8±0.44µs        ? ?/sec
world_query_get/50000_entities_sparse_wide      1.00    194.5±2.70µs        ? ?/sec     1.00    195.0±0.42µs        ? ?/sec
world_query_get/50000_entities_table            1.00    126.6±7.07µs        ? ?/sec     1.00    126.2±1.19µs        ? ?/sec
world_query_get/50000_entities_table_wide       1.03    235.5±3.84µs        ? ?/sec     1.00    229.4±2.03µs        ? ?/sec
world_query_iter/50000_entities_sparse          1.00     54.0±0.47µs        ? ?/sec     1.00     53.8±0.14µs        ? ?/sec
world_query_iter/50000_entities_table           1.00     27.2±0.18µs        ? ?/sec     1.00     27.2±0.08µs        ? ?/sec

@cart
Copy link
Member

cart commented Apr 12, 2023

Did two builds just to check for major build time regressions.

This PR: 1m 14s
Base branch of this PR: 1m 16s

No significant changes. Maybe slightly faster but probably just within the noise.

@cart cart added this pull request to the merge queue Apr 12, 2023
Merged via the queue into bevyengine:main with commit 2ec38d1 Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants