Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Poor performance using Metal on MacOS #1699

Open
piedoom opened this issue Jun 12, 2019 · 24 comments

Comments

@piedoom
Copy link
Contributor

commented Jun 12, 2019

Description

When running certain Amethyst examples (as well as in games that use Rendy), 3D performance is poor on MacOS. 2D seems to be unaffected.

Reproduction Steps

  1. Use a MacOS
  2. Run any example with --release (preferably one that contains moving elements or an fps counter)
  3. It may take a bit for the performance to slow. Try running several examples.

What You Expected to Happen

Assets should load, display, and refresh at a reasonable rate

What Actually Happened

Assets loaded and displayed, but FPS is poor at ~1-5 fps.

Your Environment

MacOS mojave 10.14.3, Amethyst Master
GPU: Radeon Pro 455 2048 MB (Additionally, Intel HD Graphics 530 1536MB, but I suspect it is using the former)

Additional Context

~~This happens only with select examples, and may be due to a pass(?). For example, Debug Lines example runs great, and so does the animation example. Arc Ball Camera and Rendy example all have poor performance. ~~ this happens with all examples

@piedoom piedoom changed the title [BUG] Poor performance using Metal / Intel on MacOS [BUG] Poor 3D performance using Metal / Intel on MacOS Jun 12, 2019

@piedoom piedoom changed the title [BUG] Poor 3D performance using Metal / Intel on MacOS [BUG] Poor 3D performance using Metal on MacOS Jun 12, 2019

@piedoom piedoom changed the title [BUG] Poor 3D performance using Metal on MacOS [BUG] Poor performance using Metal on MacOS Jun 13, 2019

@piedoom

This comment has been minimized.

Copy link
Contributor Author

commented Jun 13, 2019

Update: after some further testing, I can confirm it seems that this affects all amethyst projects on my computer, including 2D.

Oddly enough, the first few times I ran examples, they worked fine. After a while though, the performance took a hit for all subsequent examples. I'm not sure if this is a local issue or something like a throttling issue, etc. If anyone can test and repro let me know.

@Frizi

This comment has been minimized.

Copy link
Member

commented Jun 14, 2019

We did some further testing, and it seems like the slowdown is likely happening on buffer writes. Single update of view matrix took 100ms on it's own. But this turned out to not be stable actually. Also the issue sometimes randomly disappear and performance looks decent..

@jaynus

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2019

That sounds like it must be an API issue within HAL, or how it is performing buffer writes. 100ms for a uniform buffer write is beyond insane - something has to be happening incorrectly under the hood for that specific set of hardware.

@cgati

This comment has been minimized.

Copy link

commented Jun 16, 2019

Attaching thread_profile_debug.json.gz at the request of @Frizi from our Discord discussion.

Also experienced the same issue on the release build, at no time was the build performant.
thread_profile_release.json.gz

Running on macOS 10.14.5 on tag v0.11.0 (e75dc2c) with a Radeon Pro 560X and Intel UHD Graphics 630

This was the "pong" example.

@Frizi

This comment has been minimized.

Copy link
Member

commented Jun 17, 2019

@cgati those profiles looks very similarily to what i've got from @piedoom before, i.e. most calls that write anything are super slow, but also not every single one of them. It looks really weird and kinda unpredictible. It might be a driver bug, but who knows. I suspect that the root cause is in how HAL translates writes to metal, but to confirm that we would have to reproduce it outside of amethyst. One way to do that would be to try running something on vulkan through hal portability, like DOTA or dolphin emulator for example (those two were tested quite a lot by gfx team, so should otherwise be quite stable). If those are smooth, then we have to find what we specificially do that makes it slow, and other engines are not doing.

Also another approach is to profile it in greater details, see what exact code is executing so slowly/what stalls the GPU if anything. Xcode's "Instruments" can help here, but i don't know how to get meaningful data from it yet. I also don't have a hardware to reproduce that, so can't really experiment on my own beyond just learning the tools on fully performant setup.

@ohnoimdead

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2019

Another thread_profile.json.zip for data collection.

@ohnoimdead

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2019

And a pong.log.zip with some potentially relevant gfx_backend_metal::window warns

@distransient distransient added this to Needs triage in Bugs Jun 18, 2019

@distransient distransient moved this from Needs triage to Important priority in Bugs Jun 18, 2019

@OliverHofkens

This comment has been minimized.

Copy link

commented Jul 7, 2019

I'm hitting the same issue in my game: the first couple of seconds everything works great, with the frame limiter doing its job just fine. After 2.5-3 seconds, amethyst_rendy::pass::flat2d:write starts taking around 100ms. Every now and then, things speed up again for a couple of frames.

I'm using just a DrawFlat2DTransparentDesc pass, with some 8x8 sprites.

My environment:

OS X 10.14.5
Radeon Pro 555X (or Intel UHD Graphics 630)
rustc 1.36.0 (a53f9df32 2019-07-03)
Amethyst 0.11.0

Profiling:
thread_profile.json.gz

@BrianMWest

This comment has been minimized.

Copy link

commented Jul 14, 2019

I am encountering this issue with a project without any systems activated and also on the pong example in 0.11.1. It runs at probably 1-2 FPS. Here is my thread profile. I tried disabling the frame limiter and still see the issue with or without it.
thread_profile.json.gz

@Frizi

This comment has been minimized.

Copy link
Member

commented Jul 16, 2019

All of the thread profiles shows that gpu memory write operations are super slow. I suspect it is either something about rendy or hal that makes it super slow on those drivers. I need more granular profile to actually pinpoint the exact problem. Can somebody run this through xcode tools to gather function-level timings and see what takes so much time inside hal?

@Frizi

This comment has been minimized.

Copy link
Member

commented Jul 17, 2019

For anyone seeing this still, i'd really appreciate a profile. Here are steps how to gather a flamegraph on recent macos.

  1. clone amethyst and add this to main cargo.toml
[profile.release]
debug = true
  1. install cargo flamegraph
cargo install flamegraph
  1. in project directory, build 'rendy' example
cargo build --release --example rendy --features="animation gltf metal"
  1. copy executable to amethyst directory and run with profiling (sudo required because for macos dtrace special reasons)
cp ./target/release/examples/rendy .
sudo flamegraph ./rendy

keep it running for a short while and press ESC.

There should be a "flamegraph.svg" generated. If it failed with an error, try again.

Please zip and upload that generated file here.

@jaynus

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2019

pinging for visibility @OliverHofkens @piedoom @cgati @ohnoimdead

@OliverHofkens

This comment has been minimized.

Copy link

commented Jul 17, 2019

@Frizi Thanks for the clear steps!

The Rendy example acts really weird. It shows a window with the moving spheres for a second and then crashes with the following error log, complaining about missing assets:
rendy_example.log.zip

I've tried multiple times, the same thing happens every time. You probably can't learn much from the flamegraph in this state:
flamegraph.svg.zip

Other examples seem to work fine.

@OliverHofkens

This comment has been minimized.

Copy link

commented Jul 17, 2019

Not sure if this is representative but this is the flamegraph of my own project (details in my previous comment: #1699 (comment)):

flamegraph.svg.zip

@Frizi

This comment has been minimized.

Copy link
Member

commented Jul 17, 2019

@OliverHofkens Hard to tell from your project's flamegraph really. I can't see anything from your second graph. The errors in rendy example are because you have cloned the repository without git-lfs. Try installing it and doing git lfs fetch.

Alternatively, You can try running animation example instead, but please verify if it's still stuttering by activating an animation (press space or other keys).

@asgeir

This comment has been minimized.

Copy link

commented Jul 22, 2019

The rendy example just crashes so I ran the animation example (which was constantly complaining about "Failed to acquire animated component." but did at-least move the sphere around when I pressed space)
flamegraph.svg.zip

@OliverHofkens

This comment has been minimized.

Copy link

commented Jul 29, 2019

Hey again, sorry for the late reply. I fetched the assets with git lfs fetch, but the rendy example still crashes with similar errors.
The animation example works fine, no stuttering at all. It's constantly complaining about acquiring an animated component though, just like @asgeir says...

@vultix

This comment has been minimized.

Copy link
Contributor

commented Aug 11, 2019

@Frizi Here is my flamegraph for the rendy example as you requested:
flamegraph.svg.zip
The framerate was terrible (as it always is for me) during the run. I see the same low framerate for the evoli project as well, so if you want a flamegraph of that project let me know!

In case you need device information:
2018 15-inch Macbook Pro
Processor: 2.9 GHz Intel Core i9
Memory: 32 GB 2400 MHz DDR4
Graphics: Radeon Pro 560X 4 GB, Intel UHD Graphics 630 1536 MB

@asgeir @OliverHofkens I initially had the same problem, git lfs fetch executed but the rendy example would crash. I had to completely remove the project and run git clone again to fix the issue.

@blakehawkins

This comment has been minimized.

Copy link

commented Aug 12, 2019

Looks like master is not building the rendy example at all:

$ cargo build --release --example rendy --features="animation gltf metal"
   Compiling amethyst v0.12.0 (/Users/blakeh/Documents/blakehawkins/amethyst)
error[E0425]: cannot find value `world` in this scope
   --> examples/rendy/main.rs:601:77
    |
601 |             PrefabLoaderSystemDesc::<ScenePrefabData>::default().build(&mut world),
    |                                                                             ^^^^^ not found in this scope

error[E0425]: cannot find value `world` in this scope
   --> examples/rendy/main.rs:606:45
    |
606 |             GltfSceneLoaderSystem::new(&mut world),
    |                                             ^^^^^ not found in this scope

error[E0425]: cannot find value `assets_directory` in this scope
   --> examples/rendy/main.rs:657:22
    |
657 |         assets_dir & assets_directory,
    |                      ^^^^^^^^^^^^^^^^ not found in this scope

error[E0425]: cannot find value `world` in this scope
   --> examples/rendy/main.rs:660:9
    |
660 |         world,
    |         ^^^^^ not found in this scope

warning: unused import: `ReadExpect`
  --> examples/rendy/main.rs:16:13
   |
16 |             ReadExpect, ReadStorage, System, SystemData, World, Write, WriteStorage,
   |             ^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

error[E0061]: this function takes 3 parameters but 4 parameters were supplied
   --> examples/rendy/main.rs:656:20
    |
656 |       let mut game = Application::new(
    |  ____________________^
657 | |         assets_dir & assets_directory,
658 | |         Example::new(),
659 | |         game_data,
660 | |         world,
661 | |     )?;
    | |_____^ expected 3 parameters

error[E0369]: binary operation `&` cannot be applied to type `std::path::PathBuf`
   --> examples/rendy/main.rs:657:20
    |
657 |         assets_dir & assets_directory,
    |         ---------- ^ ---------------- _
    |         |
    |         std::path::PathBuf
    |
    = note: an implementation of `std::ops::BitAnd` might be missing for `std::path::PathBuf`

error: aborting due to 6 previous errors

Some errors have detailed explanations: E0061, E0369, E0425.
For more information about an error, try `rustc --explain E0061`.
error: Could not compile `amethyst`.

To learn more, run the command again with --verbose.

@OliverHofkens

This comment has been minimized.

Copy link

commented Aug 15, 2019

I just upgraded my project to Amethyst v0.12.0 and it feels like this problem has become worse (although I can't prove it). Rendy also complains more in the terminal.

I tried to build the rendy example on the v0.12.0 tag, which compiles, but still fails after drawing 2 or 3 frames with: Failed to load sub asset Asset("prefab/rendy_example_scene.ron"). The file definitely exists in the correct location, as do all files imported or mentioned within. I tried to completely remove the project and start over as @vultix suggested, but that didn't seem to help.

Not sure this might help but this is the flamegraph of the first seconds before the crash:
flamegraph.svg.zip

@ekarademir

This comment has been minimized.

Copy link

commented Aug 18, 2019

I'm experiencing a similar issue.

Amethyst v0.12.0
macOS High Sierra v10.13.6
MacBook Pro 15 inch
Intel Core i7 3.1 GHz
Radeon Pro 560 and Intel HD Graphics 630
16 GB Memory

Following the profiling recipe posted by @Frizi above. Rendy example on master has problems so you'd need to checkout the latest stable tag:

git checkout -b v0.12.0

Here is the zipped flamegraph for rendy example
flamegraph.svg.zip

My issue emerged while following the pong tutorial and when I came to the step where we move the paddles. It was very sluggish. I blamed axes emulation but then got sluggish results with renderable example.

I've also tried other libraries that use metal, like gfx-rs (which amethyst relies on) also wgpu-rs, their examples run fine.

Interesting note. If I connect external screens, I get better performance. Unplug the screens, sluggish again.

Also while running CPU load didn't go beyond 2%

@Deedasmi

This comment has been minimized.

Copy link

commented Sep 6, 2019

Same issue.

Rust: rustc 1.37.0 (eae3437df 2019-08-13)
Amethyst: Master
Mac:
Screen Shot 2019-09-06 at 9 22 47 AM

Both examples built in release mode with feature=metal

sudo flamegraph ./animation 2> animation.err > animation.log
animation.tar.gz

sudo flamegraph ./pong_tutorial_05 2> pong_05.err > pong_05.log
pong.tar.gz

Fairly short runs for the sake of clean graphs. Performance degrades rapidly the longer it runs. Rough eye-timing for Pong goes something like 6->0.2->0.05 FPS in 1 minute intervals. Game continues to run, and eventual frame updates show massive jumps in scores. Only render pass seems to fall behind. I stopped it early, but usually "No frame is available" starts getting output to the stdout.

@Frizi

This comment has been minimized.

Copy link
Member

commented Sep 6, 2019

@Deedasmi It would actually be benefitial to let it run a little bit longer, to see what really takes time once the slowdown happens. From your trace it seems like the most cpu time is spent spinlocking inside rayon. Second biggest is eventloop. There is also visible queue submit, but the time spent there is minimal.

This pattern is actually visible in most of the graphs submitted here. I don't think the bug is really about event loop, that contradicts our manual tracing logs that show much time being spent on buffer writes.

I think we need better way to profile this. Those flamegraphs are unfortunately way less useful than I anticipated. I don't know much about profiling tools available on macos though, all i can guess is that there is a right one inside "instruments" app.

@Deedasmi

This comment has been minimized.

Copy link

commented Sep 6, 2019

Ran it again this morning and got same 3-5 FPS, then it completely stopped rendering at 10/10. The log shows the score counting up between numerous "Frame not available" messages, but the screen never updated.
pong_05_2.tar.gz

Looking into the instruments app now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.