Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wasmtime: refactor the pooling allocator for components #6835

Merged
merged 11 commits into from Aug 18, 2023

Conversation

fitzgen
Copy link
Member

@fitzgen fitzgen commented Aug 10, 2023

We used to have one index allocator, an index per instance, and give out N
tables and M memories to every instance regardless how many tables and memories
they need.

Now we have an index allocator for memories and another for tables. An instance
isn't associated with a single instance, each of its memories and tables have an
index. We allocate exactly as many tables and memories as the instance actually
needs.

Ultimately, this gives us better component support, where a component instance
might have varying numbers of internal tables and memories.

Additionally, you can now limit the number of tables, memories, and core
instances a single component can allocate from the pooling allocator, even if
there is the capacity for that many available. This is to give embedders tools
to limit individual component instances and prevent them from hogging too much
of the pooling allocator's resources.


TODO before landing:

  • Update RELEASES.md with a heads up about the config changes and give a small guide of how to migrate existing set ups

@fitzgen fitzgen requested a review from a team as a code owner August 10, 2023 22:54
@fitzgen fitzgen removed the request for review from a team August 10, 2023 22:55
/// The `MemoryAllocationIndex` was given from our `InstanceAllocator` and
/// must be given back to the instance allocator when deallocating each
/// memory.
memories: PrimaryMap<DefinedMemoryIndex, (MemoryAllocationIndex, Memory)>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure whether this is better as-written, or if moving the MemoryAllocationIndex into wasmtime_runtime::Memory is better. Feel free to bike shed.

@github-actions github-actions bot added wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime labels Aug 11, 2023
@github-actions
Copy link

Subscribe to Label Action

cc @peterhuene

This issue or pull request has been labeled: "wasmtime:api", "wasmtime:config"

Thus the following users have been cc'd because of the following labels:

  • peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@github-actions
Copy link

github-actions bot commented Aug 11, 2023

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

  • If you added a new Config method, you wrote extensive documentation for
    it.

    Our documentation should be of the following form:

    Short, simple summary sentence.
    
    More details. These details can be multiple paragraphs. There should be
    information about not just the method, but its parameters and results as
    well.
    
    Is this method fallible? If so, when can it return an error?
    
    Can this method panic? If so, when does it panic?
    
    # Example
    
    Optional example here.
    
  • If you added a new Config method, or modified an existing one, you
    ensured that this configuration is exercised by the fuzz targets.

    For example, if you expose a new strategy for allocating the next instance
    slot inside the pooling allocator, you should ensure that at least one of our
    fuzz targets exercises that new strategy.

    Often, all that is required of you is to ensure that there is a knob for this
    configuration option in wasmtime_fuzzing::Config (or one
    of its nested structs).

    Rarely, this may require authoring a new fuzz target to specifically test this
    configuration. See our docs on fuzzing for more details.

  • If you are enabling a configuration option by default, make sure that it
    has been fuzzed for at least two weeks before turning it on by default.


To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the
.github/label-messager.json configuration file.

Learn more.

@fitzgen fitzgen requested review from a team as code owners August 11, 2023 19:09
@github-actions github-actions bot added fuzzing Issues related to our fuzzing infrastructure wasmtime:docs Issues related to Wasmtime's documentation labels Aug 11, 2023
@github-actions
Copy link

Subscribe to Label Action

cc @fitzgen

This issue or pull request has been labeled: "fuzzing", "wasmtime:docs"

Thus the following users have been cc'd because of the following labels:

  • fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@fitzgen fitzgen force-pushed the refactor-pooling-allocator branch 2 times, most recently from a9587b8 to fcc174e Compare August 11, 2023 22:30
@github-actions github-actions bot added the wasi Issues pertaining to WASI label Aug 11, 2023
@fitzgen fitzgen requested review from cfallin and removed request for alexcrichton August 14, 2023 23:13
@fitzgen
Copy link
Member Author

fitzgen commented Aug 14, 2023

Alex is out of office till the end of the week; do you think you could take a look at this @cfallin?

@cfallin
Copy link
Member

cfallin commented Aug 15, 2023

Alex is out of office till the end of the week; do you think you could take a look at this @cfallin?

I can possibly take a look, but I'm dealing with pretty bad wrist RSI right now and trying to learn to use my machine with voice dictation so it might take quite a lot of time. if it can wait until Alex is back maybe that's better...

@jameysharp
Copy link
Contributor

The first two commits in this PR are tiny enough that I've just reviewed them and would be happy to sign off on them. Unfortunately the third PR is the interesting part and is a little more overwhelming, and I can't say much about it yet.

On a brief skim I can at least say that moving MemoryAllocationIndex into wasmtime_runtime::Memory like you suggest would remove a bunch of changes which just add .1 into various places. (Similarly for table allocations, I assume?) I don't know what other impact that would have so I'm not sure why you didn't go with that option to begin with.

The other thing that jumps out at me is that extracting MemoryPool/TablePool/StackPool to separate modules looks like it might be easy to split out as a separate PR to reduce the amount of churn in this commit.

@cfallin
Copy link
Member

cfallin commented Aug 15, 2023

I'd be happy to do a live review over Zoom if that would help... I'm just awfully slow at typing right now!

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Together with earlier zoom review and associated comments, this overall looks great to me! High-quality implementation with good attention paid to safety (e.g. index newtypes). A few comments below as well but nothing too major.

crates/wasmtime/src/config.rs Outdated Show resolved Hide resolved
crates/wasmtime/src/config.rs Show resolved Hide resolved
crates/wasmtime/src/config.rs Show resolved Hide resolved
crates/wasmtime/src/store.rs Outdated Show resolved Hide resolved
@fitzgen
Copy link
Member Author

fitzgen commented Aug 16, 2023

FWIW, 25 instantiation benchmarks "improved", while 9 "regressed". I think this is basically all within the noise.

sequential/default/data_segments.wat
                        time:   [16.207 µs 16.300 µs 16.396 µs]
                        change: [-14.937% -13.832% -12.750%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
sequential/pooling/data_segments.wat
                        time:   [4.1824 µs 4.2111 µs 4.2460 µs]
                        change: [-1.4282% +0.2191% +1.7369%] (p = 0.79 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

parallel/default/data_segments.wat: with 1 thread
                        time:   [16.374 µs 16.470 µs 16.579 µs]
                        change: [-12.172% -10.245% -8.4601%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe
parallel/default/data_segments.wat: with 2 threads
                        time:   [22.580 µs 22.767 µs 22.995 µs]
                        change: [-9.2143% -7.3138% -4.8914%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe
parallel/default/data_segments.wat: with 3 threads
                        time:   [32.501 µs 32.825 µs 33.211 µs]
                        change: [-5.8662% -3.5161% -0.8508%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe
parallel/default/data_segments.wat: with 4 threads
                        time:   [55.409 µs 56.588 µs 57.990 µs]
                        change: [-6.6418% -3.2057% +0.3236%] (p = 0.08 > 0.05)
                        No change in performance detected.
parallel/pooling/data_segments.wat: with 1 thread
                        time:   [4.2061 µs 4.2405 µs 4.2806 µs]
                        change: [-0.5488% +1.1592% +2.8516%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe
parallel/pooling/data_segments.wat: with 2 threads
                        time:   [4.9711 µs 5.0071 µs 5.0478 µs]
                        change: [+0.0367% +1.6271% +3.1874%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe
parallel/pooling/data_segments.wat: with 3 threads
                        time:   [5.5409 µs 5.6522 µs 5.8000 µs]
                        change: [+2.4451% +7.1764% +11.976%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe
parallel/pooling/data_segments.wat: with 4 threads
                        time:   [6.0691 µs 6.3499 µs 6.7057 µs]
                        change: [-1.7330% +7.2518% +17.575%] (p = 0.13 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  20 (20.00%) high severe

deserialize/deserialize/data_segments.wat
                        time:   [33.431 µs 33.809 µs 34.249 µs]
                        change: [+1.4039% +3.1953% +5.1095%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
deserialize/deserialize_file/data_segments.wat
                        time:   [31.674 µs 31.989 µs 32.413 µs]
                        change: [-1.6587% +0.5155% +3.0517%] (p = 0.66 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe

sequential/default/empty.wat
                        time:   [2.8884 µs 2.9012 µs 2.9153 µs]
                        change: [-13.844% -8.4599% -3.6245%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
sequential/pooling/empty.wat
                        time:   [2.9106 µs 2.9333 µs 2.9626 µs]
                        change: [-4.1436% -1.5437% +1.6409%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe

parallel/default/empty.wat: with 1 thread
                        time:   [2.9121 µs 2.9299 µs 2.9504 µs]
                        change: [+1.0205% +2.5318% +4.2334%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  10 (10.00%) high mild
  7 (7.00%) high severe
parallel/default/empty.wat: with 2 threads
                        time:   [3.2615 µs 3.3018 µs 3.3505 µs]
                        change: [+1.9848% +3.3641% +4.8919%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  14 (14.00%) high severe
parallel/default/empty.wat: with 3 threads
                        time:   [3.4042 µs 3.4264 µs 3.4529 µs]
                        change: [+1.3432% +3.2608% +5.7222%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe
parallel/default/empty.wat: with 4 threads
                        time:   [3.5997 µs 3.7497 µs 3.9238 µs]
                        change: [-2.5449% +3.1724% +8.8255%] (p = 0.29 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) high mild
  17 (17.00%) high severe
parallel/pooling/empty.wat: with 1 thread
                        time:   [2.9411 µs 2.9568 µs 2.9740 µs]
                        change: [-6.8359% -4.3504% -2.0078%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
parallel/pooling/empty.wat: with 2 threads
                        time:   [3.2878 µs 3.3114 µs 3.3471 µs]
                        change: [-12.572% -10.707% -9.0176%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
parallel/pooling/empty.wat: with 3 threads
                        time:   [3.4820 µs 3.5166 µs 3.5603 µs]
                        change: [-11.531% -8.9122% -6.6623%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
parallel/pooling/empty.wat: with 4 threads
                        time:   [3.6409 µs 3.7537 µs 3.8965 µs]
                        change: [-16.527% -11.957% -6.6742%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  6 (6.00%) high mild
  13 (13.00%) high severe

deserialize/deserialize/empty.wat
                        time:   [30.528 µs 30.743 µs 31.004 µs]
                        change: [-2.9185% -0.4947% +1.5731%] (p = 0.68 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
deserialize/deserialize_file/empty.wat
                        time:   [31.130 µs 31.350 µs 31.596 µs]
                        change: [-1.4528% -0.2984% +0.7568%] (p = 0.62 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

sequential/default/spidermonkey.wasm
                        time:   [17.126 µs 17.343 µs 17.598 µs]
                        change: [-24.108% -23.194% -22.072%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
sequential/pooling/spidermonkey.wasm
                        time:   [5.6331 µs 5.6638 µs 5.6966 µs]
                        change: [-4.0009% -2.8480% -1.6732%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

parallel/default/spidermonkey.wasm: with 1 thread
                        time:   [17.304 µs 17.442 µs 17.610 µs]
                        change: [-23.116% -20.805% -18.284%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe
parallel/default/spidermonkey.wasm: with 2 threads
                        time:   [31.028 µs 31.269 µs 31.553 µs]
                        change: [-1.1847% +0.3038% +1.7367%] (p = 0.68 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
parallel/default/spidermonkey.wasm: with 3 threads
                        time:   [39.188 µs 39.735 µs 40.395 µs]
                        change: [-14.835% -12.205% -9.7141%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  8 (8.00%) high mild
  8 (8.00%) high severe
parallel/default/spidermonkey.wasm: with 4 threads
                        time:   [59.137 µs 60.393 µs 61.785 µs]
                        change: [-10.352% -6.7148% -2.9688%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
parallel/pooling/spidermonkey.wasm: with 1 thread
                        time:   [5.7202 µs 5.7692 µs 5.8275 µs]
                        change: [-5.0476% -2.2400% +0.5254%] (p = 0.13 > 0.05)
                        No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
  10 (10.00%) high mild
  3 (3.00%) high severe
parallel/pooling/spidermonkey.wasm: with 2 threads
                        time:   [6.9303 µs 6.9849 µs 7.0466 µs]
                        change: [-0.1129% +4.3773% +9.6818%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe
parallel/pooling/spidermonkey.wasm: with 3 threads
                        time:   [8.0897 µs 8.1976 µs 8.3418 µs]
                        change: [-1.6239% +3.7225% +9.3172%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high severe
parallel/pooling/spidermonkey.wasm: with 4 threads
                        time:   [9.6367 µs 10.057 µs 10.593 µs]
                        change: [-4.8839% +3.2482% +12.152%] (p = 0.44 > 0.05)
                        No change in performance detected.
Found 20 outliers among 100 measurements (20.00%)
  5 (5.00%) high mild
  15 (15.00%) high severe

deserialize/deserialize/spidermonkey.wasm
                        time:   [11.630 ms 11.684 ms 11.743 ms]
                        change: [-3.1513% -2.3446% -1.5742%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild
Benchmarking deserialize/deserialize_file/spidermonkey.wasm: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.7s, enable flat sampling, or reduce sample count to 60.
deserialize/deserialize_file/spidermonkey.wasm
                        time:   [1.1071 ms 1.1189 ms 1.1354 ms]
                        change: [-1.2033% +1.1429% +3.8242%] (p = 0.38 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) high mild
  13 (13.00%) high severe

sequential/default/small_memory.wat
                        time:   [11.358 µs 11.437 µs 11.523 µs]
                        change: [-39.079% -38.266% -37.471%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
sequential/pooling/small_memory.wat
                        time:   [3.9798 µs 4.0016 µs 4.0261 µs]
                        change: [+1.0485% +2.5775% +4.5576%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

parallel/default/small_memory.wat: with 1 thread
                        time:   [11.213 µs 11.275 µs 11.353 µs]
                        change: [-32.990% -32.172% -31.378%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
parallel/default/small_memory.wat: with 2 threads
                        time:   [20.071 µs 20.145 µs 20.225 µs]
                        change: [-12.427% -11.448% -10.464%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
parallel/default/small_memory.wat: with 3 threads
                        time:   [25.655 µs 25.978 µs 26.414 µs]
                        change: [-15.242% -12.123% -9.2153%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe
parallel/default/small_memory.wat: with 4 threads
                        time:   [37.618 µs 38.369 µs 39.122 µs]
                        change: [-15.044% -12.768% -10.341%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
parallel/pooling/small_memory.wat: with 1 thread
                        time:   [3.9885 µs 4.0232 µs 4.0678 µs]
                        change: [+0.6034% +2.4216% +4.0956%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
parallel/pooling/small_memory.wat: with 2 threads
                        time:   [4.7548 µs 4.7850 µs 4.8202 µs]
                        change: [-0.2723% +0.8170% +1.7389%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
parallel/pooling/small_memory.wat: with 3 threads
                        time:   [4.9112 µs 4.9459 µs 4.9913 µs]
                        change: [-2.0811% +1.1116% +4.6606%] (p = 0.54 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe
parallel/pooling/small_memory.wat: with 4 threads
                        time:   [5.4325 µs 5.7838 µs 6.2023 µs]
                        change: [+1.8127% +10.297% +19.397%] (p = 0.02 < 0.05)
                        Performance has regressed.
Found 21 outliers among 100 measurements (21.00%)
  1 (1.00%) high mild
  20 (20.00%) high severe

deserialize/deserialize/small_memory.wat
                        time:   [30.289 µs 30.493 µs 30.719 µs]
                        change: [-1.3081% +1.0727% +3.6305%] (p = 0.42 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
deserialize/deserialize_file/small_memory.wat
                        time:   [31.315 µs 31.534 µs 31.788 µs]
                        change: [-0.9317% +0.3888% +1.8105%] (p = 0.60 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe

sequential/default/wasi.wasm
                        time:   [16.558 µs 16.649 µs 16.750 µs]
                        change: [-25.611% -24.753% -23.899%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
sequential/pooling/wasi.wasm
                        time:   [5.4302 µs 5.4648 µs 5.5051 µs]
                        change: [-1.5963% -0.4439% +0.6725%] (p = 0.45 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

parallel/default/wasi.wasm: with 1 thread
                        time:   [16.963 µs 17.099 µs 17.247 µs]
                        change: [-22.726% -21.819% -20.838%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
parallel/default/wasi.wasm: with 2 threads
                        time:   [29.800 µs 30.070 µs 30.396 µs]
                        change: [-8.0261% -6.9323% -5.8986%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
parallel/default/wasi.wasm: with 3 threads
                        time:   [41.575 µs 42.025 µs 42.543 µs]
                        change: [-14.413% -12.331% -10.346%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe
parallel/default/wasi.wasm: with 4 threads
                        time:   [62.538 µs 63.781 µs 65.096 µs]
                        change: [-12.456% -9.5987% -6.6781%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
parallel/pooling/wasi.wasm: with 1 thread
                        time:   [5.4351 µs 5.4636 µs 5.4958 µs]
                        change: [-4.5508% -3.2820% -2.1305%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe
parallel/pooling/wasi.wasm: with 2 threads
                        time:   [6.6933 µs 6.7343 µs 6.7819 µs]
                        change: [+4.2012% +6.7091% +9.2872%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  7 (7.00%) high mild
  9 (9.00%) high severe
parallel/pooling/wasi.wasm: with 3 threads
                        time:   [7.8589 µs 7.9351 µs 8.0393 µs]
                        change: [+2.0956% +3.7618% +5.3506%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
parallel/pooling/wasi.wasm: with 4 threads
                        time:   [9.5130 µs 9.8876 µs 10.356 µs]
                        change: [-6.2590% +0.6151% +7.5760%] (p = 0.86 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe

deserialize/deserialize/wasi.wasm
                        time:   [204.86 µs 206.12 µs 207.50 µs]
                        change: [-0.8695% +1.3453% +3.7360%] (p = 0.25 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
deserialize/deserialize_file/wasi.wasm
                        time:   [80.138 µs 80.533 µs 80.982 µs]
                        change: [-4.6080% -1.8013% +1.1720%] (p = 0.24 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

We will have multiple kinds of index allocators soon, so clarify which one this
is.
This will be used in future commits refactoring the pooling allocator.
We used to have one index allocator, an index per instance, and give out N
tables and M memories to every instance regardless how many tables and memories
they need.

Now we have an index allocator for memories and another for tables. An instance
isn't associated with a single instance, each of its memories and tables have an
index. We allocate exactly as many tables and memories as the instance actually
needs.

Ultimately, this gives us better component support, where a component instance
might have varying numbers of internal tables and memories.

Additionally, you can now limit the number of tables, memories, and core
instances a single component can allocate from the pooling allocator, even if
there is the capacity for that many available. This is to give embedders tools
to limit individual component instances and prevent them from hogging too much
of the pooling allocator's resources.
@fitzgen fitzgen added this pull request to the merge queue Aug 17, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 17, 2023
@fitzgen fitzgen added this pull request to the merge queue Aug 17, 2023
The exact `cfg`s that unlock the tests that use these are platform and feature
dependent and ends up being like 5 things and super long. Simpler to just allow
unused for when we are testing on other platforms or don't have the compile time
features enabled.
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 17, 2023
@fitzgen fitzgen enabled auto-merge August 17, 2023 19:19
@fitzgen fitzgen added this pull request to the merge queue Aug 17, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 17, 2023
Also fix a couple scenarios where we could leak indices if allocating an index
for a memory/table succeeded but then creating the memory/table itself failed.
@fitzgen fitzgen enabled auto-merge August 18, 2023 19:52
@fitzgen fitzgen added this pull request to the merge queue Aug 18, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 18, 2023
@fitzgen fitzgen enabled auto-merge August 18, 2023 21:01
@fitzgen fitzgen added this pull request to the merge queue Aug 18, 2023
Merged via the queue into bytecodealliance:main with commit a34427a Aug 18, 2023
18 checks passed
@fitzgen fitzgen deleted the refactor-pooling-allocator branch August 18, 2023 22:04
eduardomourar pushed a commit to eduardomourar/wasmtime that referenced this pull request Aug 19, 2023
…ance#6835)

* Wasmtime: Rename `IndexAllocator` to `ModuleAffinityIndexAllocator`

We will have multiple kinds of index allocators soon, so clarify which one this
is.

* Wasmtime: Introduce a simple index allocator

This will be used in future commits refactoring the pooling allocator.

* Wasmtime: refactor the pooling allocator for components

We used to have one index allocator, an index per instance, and give out N
tables and M memories to every instance regardless how many tables and memories
they need.

Now we have an index allocator for memories and another for tables. An instance
isn't associated with a single instance, each of its memories and tables have an
index. We allocate exactly as many tables and memories as the instance actually
needs.

Ultimately, this gives us better component support, where a component instance
might have varying numbers of internal tables and memories.

Additionally, you can now limit the number of tables, memories, and core
instances a single component can allocate from the pooling allocator, even if
there is the capacity for that many available. This is to give embedders tools
to limit individual component instances and prevent them from hogging too much
of the pooling allocator's resources.

* Remove unused file

Messed up from rebasing, this code is actually just inline in the index
allocator module.

* Address review feedback

* Fix benchmarks build

* Fix ignoring test under miri

The `async_functions` module is not even compiled-but-ignored with miri, it is
completely `cfg`ed off. Therefore we ahve to do the same with this test that
imports stuff from that module.

* Fix doc links

* Allow testing utilities to be unused

The exact `cfg`s that unlock the tests that use these are platform and feature
dependent and ends up being like 5 things and super long. Simpler to just allow
unused for when we are testing on other platforms or don't have the compile time
features enabled.

* Debug assert that the pool is empty on drop, per Alex's suggestion

Also fix a couple scenarios where we could leak indices if allocating an index
for a memory/table succeeded but then creating the memory/table itself failed.

* Fix windows compile errors
geekbeast pushed a commit to geekbeast/wasmtime that referenced this pull request Aug 21, 2023
…eature/kserve

* 'feature/kserve' of github.com:geekbeast/wasmtime:
  Refactor Wasmtime CLI to support components (bytecodealliance#6836)
  Bump the wasm-tools family of crates (bytecodealliance#6861)
  Wasmtime: refactor the pooling allocator for components (bytecodealliance#6835)
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great to me, thanks again for tackling this!

/// The `TableAllocationIndex` was given from our `InstanceAllocator` and
/// must be given back to the instance allocator when deallocating each
/// table.
tables: PrimaryMap<DefinedTableIndex, (TableAllocationIndex, Table)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this in person as well, but would it be possible to eschew this index (and the one above) and infer the index from an address in the pooling allocator?

// Every `InstanceAllocatorImpl` is an `InstanceAllocator` when used
// correctly. Also, no one is allowed to override this trait's methods, they
// must use the defaults. This blanket impl provides both of those things.
impl<T: InstanceAllocatorImpl> InstanceAllocator for T {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to keep this as one trait? Inferring why this was split into two it seems like it wants to guarantee that the default implementations of methods are used, but this is purely internal and it's already an unsafe trait, so I think that should be enough to cover the bases? (I don't think we're at risk of duplicating these default trait method implementations anywhere)

let table = mem::take(table);
assert!(table.is_static());
fn decrement_core_instance_count(&self) {
self.live_core_instances.fetch_sub(1, Ordering::AcqRel);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind throwing in a debug assert here that the return value is not 0? (e.g. this never goes negative)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and the other decrement methods too)

alexcrichton added a commit to alexcrichton/wasmtime that referenced this pull request Aug 31, 2023
This commit addresses some more fallout from bytecodealliance#6835 by updating some
error messages and adding clauses for new conditions. Namely:

* Module compilation is now allowed to fail when the module may have
  more memories/tables than the pooling allocator allows per-module.
* The error message for the core instance limit being reached has been
  updated.
github-merge-queue bot pushed a commit that referenced this pull request Aug 31, 2023
* Fix some warnings on nightly Rust

* Fix some more fuzz-test cases from pooling changes

This commit addresses some more fallout from #6835 by updating some
error messages and adding clauses for new conditions. Namely:

* Module compilation is now allowed to fail when the module may have
  more memories/tables than the pooling allocator allows per-module.
* The error message for the core instance limit being reached has been
  updated.
eduardomourar pushed a commit to eduardomourar/wasmtime that referenced this pull request Sep 6, 2023
…6943)

* Fix some warnings on nightly Rust

* Fix some more fuzz-test cases from pooling changes

This commit addresses some more fallout from bytecodealliance#6835 by updating some
error messages and adding clauses for new conditions. Namely:

* Module compilation is now allowed to fail when the module may have
  more memories/tables than the pooling allocator allows per-module.
* The error message for the core instance limit being reached has been
  updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fuzzing Issues related to our fuzzing infrastructure wasi Issues pertaining to WASI wasmtime:api Related to the API of the `wasmtime` crate itself wasmtime:config Issues related to the configuration of Wasmtime wasmtime:docs Issues related to Wasmtime's documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants