Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mysteriously poor performance on macOS #7644

Open
nielsmh opened this issue Jul 8, 2019 · 72 comments

Comments

@nielsmh
Copy link
Contributor

@nielsmh nielsmh commented Jul 8, 2019

Version of OpenTTD

1.9.0 and later, at least

Expected result

All operating systems have comparable performance when running on comparable hardware.

Actual result

Several people are reporting that the macOS version of OpenTTD has unreasonably poor performance, and the framerate window does not reveal any obvious reason.

This bug covers the situation when all of the following occur at the same time:

  • Game is running on macOS
  • Frame rate on a new game (256x256) with no mods is below 30 fps (typically even below 10 fps)
  • Sum of "Game loop", "Drawing" and "Video output" times in the Frame rate window is below 10 ms
  • Toggling "Full animation" does not have a significant impact on performance

Reported on TT-Forums and this comment on bug #7247.

Note that this is not related to #7247, as this bug manifests even when there are zero vehicles in game. The poor performance is also caused by something not being measured, since the sum of the game loop, drawing, and video output times do not add up to the time it should to cause the total frame rate.

Steps to reproduce

Run official OpenTTD builds on a macOS 10.14 system.

@nielsmh nielsmh added bug Mac OS labels Jul 8, 2019
@andythenorth

This comment has been minimized.

Copy link
Contributor

@andythenorth andythenorth commented Jul 8, 2019

This manifests for me on Mojave (10.14) and did not occur on High Sierra (10.13).

Notably:

  • FFWD was insanely fast on 10.13 following OpenTTD improvements in 2018-2019
  • FFWD barely works on 10.14
  • on a small game (2 AIs, 1500 vehicles total, 512x256 map), performance is dropping to 16 FPS, and one CPU core is pegged at 100% by OpenTTD
@nielsmh

This comment has been minimized.

Copy link
Contributor Author

@nielsmh nielsmh commented Jul 8, 2019

Some additional notes:

  • The total frame time budget in OpenTTD is 30 ms. The target frame rate is 33.33 fps, which is reached by having on average 30 ms between each frame.
  • This means that as long as the total of game loop (time taken simulating the game world), drawing (time producing the graphics for display on screen), and video output (time taken copying the produced graphics to the OS display driver) is less than 30 ms, the frame rate should be roughly 33.33 fps, when not in fast-forward mode.
  • Converting frame rate to frame time: ft = 1000 / fps
  • Converting frame time to frame rate: fps = 1000 / ft
  • When not in fast-forward mode, the OpenTTD "video driver" component inserts delays between frames to reach an average of 30 ms frame time.
  • When fast-forward mode is enabled, these delays are not inserted and instead the game is simulated and drawn as fast as the computer can handle.

Example, taken from times measured on my Windows system running 1.9.2 64 bit:

  • Frame rate on the title screen is about 32.96 fps, so the frame time is about 30.33 ms
  • The game loop, drawing, and video output times are 0.50 ms, 2.62 ms, 2.50 ms, the sum is 5.62 ms, so if I could enable fast-forward on the title screen, the expected rate would be at least 177.94 fps.
  • The actual rate in fast-forward mode can be higher, since a recent change allows multiple game ticks to be simulated between each draw, so the total game loop rate might end up closer to 2000 fps.

The frame rate window (and fps console command) also display a line for Sound mixing, this is generally not relevant, since sound mixing is run in a separate thread. It should never affect total frame rate. (It could get relevant if someone reported crackling/popping sound.)

@wousser

This comment has been minimized.

Copy link

@wousser wousser commented Jul 9, 2019

As requested:

ProductName:	Mac OS X
ProductVersion:	10.14.5
BuildVersion:	18F132

1.8.0
No issues
Both with Full Animation Enabled or Disabled the game runs smoothly. (No fps command in 1.8.0)

1.9.0
Full animation Enabled:

Game loop rate: 14.94fps (expected 33.33fps)
Drawing rate: 14.96fps (expected 33.33fps)
Video output rate: 35.99fps (expected 60.00fps)

Full animation Disabled:

Game loop rate: 29.18fps (expected 33.33fps)
Drawing rate: 29.28fps (expected 33.33fps)
Video output rate: 155.36fps (expected 60.00fps)

1.9.2
Full animation Enabled:

Game loop rate: 14.70fps (expected 33.33fps)
Drawing rate: 14.66fps (expected 33.33fps)
Video output rate: 54.38fps (expected 60.00fps)

Full animation Disabled:

Game loop rate: 30.87fps (expected 33.33fps)
Drawing rate: 30.88fps (expected 33.33fps)
Video output rate: 195.84fps (expected 60.00fps)

I'd like to add that this only occurs when playing full-screen or full window. If I reduce the window size to 1/4 of the screen, the FPS increases to ~30.

@Aliaric

This comment has been minimized.

Copy link

@Aliaric Aliaric commented Jul 12, 2019

Confirm
ProductName: Mac OS X
ProductVersion: 10.14.5

MacBook Pro 15 2017

@AbeWJS

This comment has been minimized.

Copy link

@AbeWJS AbeWJS commented Aug 16, 2019

OpenTTD is causing color space / pixel format conversion in the background. Likely due to mismatch of backing surface format chosen by OpenTTD and native surface format of the display/OS.

The slowdown scales with OpenTTD resolution and Mac machine's high resolution exacerbates the problem. This format/color space conversion happens on OpenTTDs main thread after a call to CGContextDrawImageWithOptions.

I don't have time to look at this in the source, but that is where the problem comes from. Perhaps these pointers are enough information?

@andythenorth

This comment has been minimized.

Copy link
Contributor

@andythenorth andythenorth commented Aug 16, 2019

I don't have any skills in Mac debugging, but I used Apple instructions to set up an Instruments run, long paste of results here: https://paste.openttdcoop.org/pef0g4kgn/mnd7ty/raw

TL;DR

  • 98.5% of time is spent in OTTD_QuartzView drawRect
  • digging down, 29% of the time is spent in RGBAf16_sample_RGBAf_inner

A cursory google search turns up a few other people with similar issue, nothing conclusive

Apple developer guide is here: https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/CocoaViewsGuide/Optimizing/Optimizing.html#//apple_ref/doc/uid/TP40002978-CH11-SW4

My results are for macOS 10.14, on a 2.7Ghz i7

@nielsmh

This comment has been minimized.

Copy link
Contributor Author

@nielsmh nielsmh commented Aug 16, 2019

So if the rendering was changed to use explicit GPU acceleration instead of blitting 2D surfaces, it might resolve the issue? Since the GPU could receive texture data in our native pixel format, and a shader could convert it as necessary.

@SoothedTau

This comment has been minimized.

Copy link

@SoothedTau SoothedTau commented Aug 16, 2019

@nielsmh

You could indeed switch to a MTKView (or other Metal view) and use Metal to do the transfer while having a shader convert the image to the final format. This would work just fine.

Though I don't think you have to do it via Metal as it should be possible to set things up correctly without requiring Metal. This new behaviour likely has to do with changes to Layer Backed Views introduced in 10.14 SDK, I'll see if I can find out what exactly introduces the observed behavior.

In the end it's just a choice but I could imagine not wanting to introduce a new dependency if it's not necessary.

@nielsmh

This comment has been minimized.

Copy link
Contributor Author

@nielsmh nielsmh commented Aug 16, 2019

I think we definitely want to support macOS versions as far back as reasonably possible, at least not having hard dependencies on newer features.

@faeton

This comment has been minimized.

Copy link

@faeton faeton commented Aug 21, 2019

Just wanted to confirm here that the issue is important and in place. Tested on 10.14 and 10.13 both showing bad results, low fps, mouse pointer lags.

@SoothedTau

This comment has been minimized.

Copy link

@SoothedTau SoothedTau commented Aug 22, 2019

Trying to track this down but having issues getting a build working. Built master repo without any compression libs, build succeeded, but when executing I immediately encounter an error in libdyld: stack_not_16_byte_aligned.

Probably missing some steps getting the build running. Can dig further if someone can help me out here :)

@DIYglenn

This comment has been minimized.

Copy link

@DIYglenn DIYglenn commented Aug 23, 2019

Anything I can test?
If anything I can at least confirm the issue on a 5K iMac 2019 - 1.9.2 is unplayable, 1.8.0 seems to work as normal (but slower with 1440p resolution, which might actually be twice the resolution due to Retina)

@DIYglenn

This comment has been minimized.

Copy link

@DIYglenn DIYglenn commented Aug 27, 2019

From my thread on Reddit:

I have a 13” MBP w/ TouchBar. When I play in the native resolution, the game is laggy as all hell... but when I connect it to an external monitor (1080p), it’s fine. When I switch to Windows 10 via Bootcamp also with 1080p resolution, it’s perfectly fine.
I don’t have a solution to the issue, but I hope this might shed some light.
I hope you’ll share the solution when you solve it! Good luck!

So is the issue only Retina related?
Anyone with non-retina Mac's that experience problems?
I have a Mac Pro with Mojave and a 1440p monitor (non-retina) where I can fire it up and test.

@nielsmh

This comment has been minimized.

Copy link
Contributor Author

@nielsmh nielsmh commented Aug 27, 2019

I think it's worth testing with both 32bpp and 8bpp blitters. Probably worth testing with driver debug enabled (level 1) to be sure the expected blitter is selected.

I'm not entirely sure about how to run with a commandline on macOS, it's too many years since, but try this from a terminal:
OpenTTD.app/Contents/MacOS/openttd -b 8bpp -d driver=1

@DIYglenn

This comment has been minimized.

Copy link

@DIYglenn DIYglenn commented Aug 27, 2019

I think it's worth testing with both 32bpp and 8bpp blitters. Probably worth testing with driver debug enabled (level 1) to be sure the expected blitter is selected.

I'm not entirely sure about how to run with a commandline on macOS, it's too many years since, but try this from a terminal:
OpenTTD.app/Contents/MacOS/openttd -b 8bpp -d driver=1

Tried your exact command. Got a "pixelated" error. Interesting.
Skjermbilde 2019-08-27 kl  13 14 47

This did however work:
openttd -b 8bpp-optimized -d driver=1

Not sure what 8bpp vs 8bpp-optimized is?

Although I can confirm that
/Applications/OpenTTD/OpenTTD.app/Contents/MacOS/openttd -b 8bpp-optimized -d driver=1
Does indeed seem to make 1.9.2 run as good as 1.8.0. Mouse is a bit laggy on the menu (both in 1.8.0 and 1.9.2). Moving around the map isn't as smooth as on my Windows PC.

I also tested with 32bpp-optimized and this is really interesting!!
/Applications/OpenTTD/OpenTTD.app/Contents/MacOS/openttd -b 32bpp-optimized -d driver=1
I'm using A-base, and it seems to run really smooth! The mouse pointer seems to be "perfect", no lag. Not sure about moving around the map, seems like it's as smooth as it can be.

  • I'll have to test with a better mouse than my Magic Trackpad 2.

Explaination?

@DIYglenn

This comment has been minimized.

Copy link

@DIYglenn DIYglenn commented Aug 27, 2019

Tests:
1.9.2 normal. Mouse is laggy, game is laggy, moving around gives noticeable stutter:
1 9 2

1.9.2 with blitter 32bpp-optimized. Mouse is butter smooth in menus and in-game, moving around seems to give a slight stutter, could be better? I feel it's smoother on my PC:
1 9 2-32bpp

1.9.2 with blitter 8bpp-optimized. Mouse is stuttering a bit (maybe) but much better, moving around seems like with 32bpp-optimized - fairly smooth, but seems to have a slight stutter.
1 9 2-8bpp

All in all it is 30FPS vs 8FPS and the game feels playable. But it feels just as "stuttery" in 1.8.0 as well. Shouldn't it be possible to get a lot more than 30FPS?

Changing resolution between 5120x2880 and 2560x1440 makes absolutely no difference, so it seems like it's running at 1440p? Decreasing resolution with 1.9.2 "normal" increases FPS, but never much higher than ~30FPS.

Changing resolution with 1.9.2 8bpp-optimized or 32bpp-optimized makes no difference. It's steady at 30-35FPS no matter the resolution, even down to a tiny window. (although 1440p and 2880p seems to be the same resolution, retina-thing?)

Output when running the command:

dbg: [driver] Successfully loaded blitter '32bpp-optimized'
dbg: [driver] Could not change to foreground application. Error -50
dbg: [driver] Successfully probed video driver 'cocoa'
dbg: [driver] Successfully probed sound driver 'cocoa'
dbg: [driver] Successfully probed music driver 'cocoa'
@nielsmh

This comment has been minimized.

Copy link
Contributor Author

@nielsmh nielsmh commented Aug 27, 2019

I'm interested in what the default blitter is when you don't override it on the commandline.

@DIYglenn

This comment has been minimized.

Copy link

@DIYglenn DIYglenn commented Aug 27, 2019

I'm interested in what the default blitter is when you don't override it on the commandline.

I would guess 32bpp-anim. At least that gave me the same performance as before.
Any way I can test?
Also - what FPS should I expect?

@nielsmh

This comment has been minimized.

Copy link
Contributor Author

@nielsmh nielsmh commented Aug 27, 2019

If you run OpenTTD.app/Contents/MacOS/openttd -d driver=1 (i.e. leave out the blitter argument but keep the debug argument) it should write to the terminal which blitter it selects.

The expected framerate (when simulation is not too heavy for the CPU) is 33.33 fps, but anything between 32 and 34 is acceptable.

@SoothedTau

This comment has been minimized.

Copy link

@SoothedTau SoothedTau commented Aug 27, 2019

From my thread on Reddit:

I have a 13” MBP w/ TouchBar. When I play in the native resolution, the game is laggy as all hell... but when I connect it to an external monitor (1080p), it’s fine. When I switch to Windows 10 via Bootcamp also with 1080p resolution, it’s perfectly fine.
I don’t have a solution to the issue, but I hope this might shed some light.
I hope you’ll share the solution when you solve it! Good luck!

So is the issue only Retina related?
Anyone with non-retina Mac's that experience problems?
I have a Mac Pro with Mojave and a 1440p monitor (non-retina) where I can fire it up and test.

As I mentioned earlier; this is a colourspace/format problem. The OS has detected that the pixels it is being given by OpenTTD are not in the exact colourspace/format that the monitor is accepting at that time. An external monitor likely has a different, in this case more compatible, colourspace than the built-in screen, which is why the problem doesn't exist.

I've checked the colourspace code and it seems fine though. Obviously selecting a different blitter changes how the OS handles things, which is great info. It would be useful to know the exact difference between the blitters, in particular what kind of surface they request from the OS and what colourspace they choose for it.

@DIYglenn

This comment has been minimized.

Copy link

@DIYglenn DIYglenn commented Aug 27, 2019

If you run OpenTTD.app/Contents/MacOS/openttd -d driver=1 (i.e. leave out the blitter argument but keep the debug argument) it should write to the terminal which blitter it selects.

The expected framerate (when simulation is not too heavy for the CPU) is 33.33 fps, but anything between 32 and 34 is acceptable.

Oh OK. I read something about expecting 800 frames in the thread that led me here, so that put me off.

@DIYglenn

This comment has been minimized.

Copy link

@DIYglenn DIYglenn commented Aug 27, 2019

@SoothedTau Ok, sorry. If there’s anything I can test, let me know.

@AbeWJS

This comment has been minimized.

Copy link

@AbeWJS AbeWJS commented Aug 27, 2019

@SoothedTau Ok, sorry. If there’s anything I can test, let me know.

No need to apologise! All data is good data, your comment cemented what I already thought. I've seen this behavior on macOS before where external screens don't have the same issues simply due to being a different type of screen. (Retina vs Non-retina for example, but HDR/Non-HDR as well).

I'm still stuck on not being able to run the game after having built it, I wanted to debug and test some minor changes but as I can't produce working executables it's a little hard :).

@nielsmh

This comment has been minimized.

Copy link
Contributor Author

@nielsmh nielsmh commented Aug 27, 2019

If you run OpenTTD.app/Contents/MacOS/openttd -d driver=1 (i.e. leave out the blitter argument but keep the debug argument) it should write to the terminal which blitter it selects.
The expected framerate (when simulation is not too heavy for the CPU) is 33.33 fps, but anything between 32 and 34 is acceptable.

Oh OK. I read something about expecting 800 frames in the thread that led me here, so that put me off.

That's in fast-forward. The game needs to run at 33.33 fps in normal mode to run at full speed, but should generally be capable of running much faster in fast-forward mode, depending on CPU. Definitely do test with a new game (no vehicles, 256x256 map) how fast it can run.

@DIYglenn

This comment has been minimized.

Copy link

@DIYglenn DIYglenn commented Aug 28, 2019

Ah. Makes sense. I’ll do some more testing in my systems.
So where’s the problem with building it? Is there anything I can do to test?

@SoothedTau

This comment has been minimized.

Copy link

@SoothedTau SoothedTau commented Sep 4, 2019

Had some time to look at this last evening and the issue is, as expected, due to a mismatch in colourspaces:

OpenTTD queries the system for the colourspace of the monitor. On many monitors (including external ones!) this returns a pretty standard sRGB colourspace which works just fine with how OpenTTD handles colours internally. What happens in this case, and which causes problems for OpenTTD, is that the main display of iMacs and MacBooks actually returns a P3 colourspace as its best fit because these displays are capable of more than what sRGB supports (they are pretty good displays!). Now OpenTTD tries to use P3 even though it really shouldn't as its outputting in the sRGB colourspace which means the pixel data is incompatible and thus won't look right. Additionally, the system introduces a conversion step from an internal image to the image the OS will use to show the window, this step is very costly as the P3 colourspace requires much more data per pixel than the usual sRGB space and the pixel formats involved in P3 are far more complex to convert between.

All relevant code for this problem is in wnd_quartz.mm. Here is a breakdown of it: The NSWindow that OpenTTD uses (created on line 287, WindowQuartzSubdriver::SetVideoMode) is in the P3 colourspace by default when running on the main display of a Mac. The CGContext that is used (created on line 605, WindowQuartzSubdriver::WindowResized) to store OpenTTD's drawing results internally is created with a colourspace from QZ_GetCorrectColorSpace (line 109) which returns whatever CGDisplayCopyColorSpace (line 116) returns which is in the case of running on the main display of a Mac; also P3. These two combined result in very slow calls to CGContextDrawImage (line 191 and 212, (void)drawRect:(NSRect)invalidRect) which is what drags down the performance of the game.

To get around the issue I made changes to 2 lines. Objective is to prevent running with a P3 colourspace. macOS is pretty smart and will make sure everything looks right even when mixing colourspaces as long as we tell it in what space the pixels we give it are. With that in mind I changed line 116 to always return a standard sRGB colourspace:

colorSpace = CGColorSpaceCreateWithName(kCGColorSpaceSRGB);

And added a new line after line 291 that sets the colourspace of the window to match the one we use internally:

[this->window setColorSpace:[[[NSColorSpace alloc]initWithCGColorSpace:QZ_GetCorrectColorSpace()] autorelease]];

With these two changes I get much improved performance and everything looks right. I can't make the actual change in the code for reasons so someone else will have to do that but it should be pretty straightforward. I should add that this fix did not go through extensive testing and there are edge-cases I'm likely unaware of but I think it should work for most if not all cases. Probably important to test on an external monitor too to see if that's still alright.

P.S.: If someone is willing to; is anyone else seeing issues with compiling the release version on macOS? I was seeing misaligned stack errors and had to enable debug mode to compile a working binary. Wondering if any of you sees the same error? I was compiling with Xcode 11 beta on Catalina.

@Honza1987

This comment has been minimized.

Copy link

@Honza1987 Honza1987 commented Sep 12, 2019

@nielsmh

This comment has been minimized.

Copy link
Contributor Author

@nielsmh nielsmh commented Sep 12, 2019

Sorry the attached screenshot didn't arrive in Github, you'll have to visit the issue webpage to post that I think.

@SoothedTau

This comment has been minimized.

Copy link

@SoothedTau SoothedTau commented Sep 12, 2019

Hi, so this one is even worse than 1.9.2. - check the screen I attached - monitor activity shows it takes a lot of CPU, if I understand well :-) Jan

What hardware are you running on? OS version?

Also, it's worse how? There is expected to be a lot of CPU usage even with the fix though. I would expect it to be better though, so the fact that its not is surprising.

@bedrich-schindler

This comment has been minimized.

Copy link

@bedrich-schindler bedrich-schindler commented Sep 12, 2019

I have the same problem on macOS Mojave 10.14.6 on my MacBook Pro 2017 15". When I use external monitor, it works fine. But if built-in MacBook display is used, it is lagging so much.

@Honza1987

This comment has been minimized.

Copy link

@Honza1987 Honza1987 commented Sep 12, 2019

Here I took printscreen of monitor activity when playing 1.9.3. - if its helpful
Snímek obrazovky 2019-09-12 v 16 48 04

@Cin316

This comment has been minimized.

Copy link

@Cin316 Cin316 commented Sep 13, 2019

FYI, this is the performance I see on 1.8.0 with a simple world. It uses a good amount of processing power, but it doesn't usually hit 100%.

image

@andythenorth

This comment has been minimized.

Copy link
Contributor

@andythenorth andythenorth commented Sep 13, 2019

TL;DR: there are multiple Mac performance issues, this patch is only intended to address one.

I can see the significantly increased CPU use in master compared to 1.8, but I haven't bisected to find the rev that introduced it yet. In master, OpenTTD will peg at or near to 100% of the CPU most of the time. It may or may not also be connected to upgraded macOS version.

Co-incidental (maybe related, maybe not ) is that Mac FFWD performance got a huge fix sometime around 1.8. FFWD in master then regressed to be highly inconsistent (anywhere from 1x to 1000x). That hasn't been investigated enough to find a cause yet.

I also made a video of demonstrating an odd Mac performance issue where FPS lags badly, then FPS is restored to expected value by opening a news message window. That has not had further investigation afaik.

The patch in this PR for the colorspace does not, as far as I can tell in testing, affect the CPU performance negatively or positively. What it does do, at least for some of us, is restore the FPS when 'Full Animation' is enabled, which has been broken for years on Mac.

My testing is on the native retina screen on a 13" Macbook Pro, I don't have an external screen to test with.

michicc added a commit to michicc/OpenTTD that referenced this issue Sep 14, 2019
@michicc

This comment has been minimized.

Copy link
Member

@michicc michicc commented Sep 14, 2019

@Honza1987 If you can compile, you could try the alternative patch from PR #7741. The PR is made for master, but should apply to 1.9.3 branch as well.

michicc added a commit to michicc/OpenTTD that referenced this issue Sep 14, 2019
@Honza1987

This comment has been minimized.

Copy link

@Honza1987 Honza1987 commented Sep 14, 2019

@LordAro LordAro closed this in 994664d Sep 16, 2019
nielsmh added a commit to nielsmh/OpenTTD that referenced this issue Sep 16, 2019
nielsmh added a commit that referenced this issue Sep 16, 2019
@Honza1987

This comment has been minimized.

Copy link

@Honza1987 Honza1987 commented Sep 17, 2019

@LordAro

This comment has been minimized.

Copy link
Member

@LordAro LordAro commented Sep 17, 2019

An additional fix was added into 1.9.3 (on top of the changes in 1.9.3-RC1), which we believe improve things further. As such, we (currently) consider the issue to be solved

@Honza1987

This comment has been minimized.

Copy link

@Honza1987 Honza1987 commented Sep 17, 2019

@andythenorth

This comment has been minimized.

Copy link
Contributor

@andythenorth andythenorth commented Sep 17, 2019

I put an earlier comment, but the ticket is quite busy eh? :)

#7644 (comment)

TL:DR; the color space patches only solve one limited performance issue with full-animation. For me. Others may get different results. There are other performance issues which may or may not be solveable.

@Honza1987

This comment has been minimized.

Copy link

@Honza1987 Honza1987 commented Sep 17, 2019

@holmesmr

This comment has been minimized.

Copy link

@holmesmr holmesmr commented Nov 6, 2019

I've got pretty horrendus performance on a 2016 MBP pro on 1.9.3 running macOS 10.15, even though this is fixed (perhaps actually a different issue?). This machine was fine running 1.8.0 on 10.14 - unfortunately I can't tell which caused the issue as I upgraded to 1.9.3 in response to 1.8.0 being 32-bit.

Screenshot 2019-11-06 at 23 51 23

As can be seen, performance very bad, yet not even 2ms of work is being done per frame. During this time the game runs at about 80-90% CPU, across 2 cores for about 40-50% load each. There's no other significant load on system CPU or GPU resources and there's plenty of free memory.
@andythenorth

This comment has been minimized.

Copy link
Contributor

@andythenorth andythenorth commented Nov 7, 2019

That mirrors my results with 1.9.3 (or current master) on macOS 10.14

  • 14-16fps
  • low current and average values for all the measured timings
  • system CPU and other resources highly available

I only get this 14-16fps behaviour intermittently.

It's unrelated to map size, or number of vehicles and stations, it can occurs with just one or two vehicle, or with a complex game. It occurs with full animation on or off.

When this behaviour isn't present, the game runs at 34fps for me on a bare map, and gradually declines to 21-22fps, in line with the complexity of the game, number of vehicles etc.

This will sound odd, but try opening a recent news window (click the news bar at the bottom of the screen)....

....when this odd, no-obvious-cause, 14-16fps behaviour occurs for me, it resolves immediately as the news window scrolls open. I haven't made a video to prove it yet though 😄

Macbook Pro 2019, 2.8Ghz

@holmesmr

This comment has been minimized.

Copy link

@holmesmr holmesmr commented Nov 7, 2019

I did see mention of this quirk, but on my machine opening the recent news window didn’t seem to fix it. Additionally, this isn’t intermittent - it happens every time I load the game, on both the menu and ingame.

@andythenorth

This comment has been minimized.

Copy link
Contributor

@andythenorth andythenorth commented Nov 7, 2019

I made a very inconclusive video on current self-compiled master. It shows the variability of fps with

  • full animation on / off
  • highly variable ffwd behaviour

https://dev.openttdcoop.org/attachments/download/9528/7644-mac-performance.mov

Doesn't offer any clues. I tried autosave on / off to see if that related to the ffwd problem: it appears to be unrelated.

For comparison I made a video of 1.9.1: https://dev.openttdcoop.org/attachments/download/9529/7644-mac-performance-1-9-1.mov

  • full animation performance on normal game speed is worse, as expected, compared to master
  • FFWD performance is orders of magnitude better in for both full-animation on and off

Finally I made a comparison with 1.9.3:
https://dev.openttdcoop.org/attachments/download/9530/7644-mac-performance-1-9-3.mov

  • full animation performance is about same as 1.9.1
  • FFWD performance is much faster than self-compiled master and much worse than 1.9.1

Still no single obvious conclusion.

I also found the video I made of lolz behaviour with news messages. That predates the colourspace fix in this PR (by one month). It was always intermittent, but I haven't been able to trigger it recently, doesn't mean it's gone, but eh. It's worth watching as it's so bizarre. https://dev.openttdcoop.org/attachments/download/9489/mac_fps.mov

@Honza1987

This comment has been minimized.

Copy link

@Honza1987 Honza1987 commented Nov 7, 2019

@andythenorth

This comment has been minimized.

Copy link
Contributor

@andythenorth andythenorth commented Nov 7, 2019

Hi there, still same problem even in 1.10.0 I made video of that game and it is still so slow … there is no even different between fast and normal play

Try it with 'full animation' off? Probably still slower than 1.8.0, but worth testing.

@nielsmh nielsmh reopened this Nov 7, 2019
@andythenorth

This comment has been minimized.

Copy link
Contributor

@andythenorth andythenorth commented Nov 8, 2019

This needs one or both of

@nielsmh

This comment has been minimized.

Copy link
Contributor Author

@nielsmh nielsmh commented Nov 8, 2019

I'll see if I can define some more measurement points video drivers can report on, some time soon.

@holmesmr

This comment has been minimized.

Copy link

@holmesmr holmesmr commented Nov 11, 2019

I did some cursory timings and the graph on my machine looks to indicate there's a massive amount of copying and image format conversion in the drawRect routine when Full Animation is on. Timings were taken with Nightly 2019-11-03 build so that debug symbols make these easier to read.

Like the previous run, there's a big trough in CPU time when Full Animation is off:
image

With full animation off, I'm able to get ~33fps as expected. The call graph for a 7s window with this off looks like this:

55.00 ms    9.4%	0 s	 	  -[OTTD_QuartzView drawRect:]
39.00 ms    6.7%	0 s	 	   CGContextDrawImageWithOptions
39.00 ms    6.7%	0 s	 	    backing_store_DrawImage.llvm.7187528833409788268
39.00 ms    6.7%	0 s	 	     backing_store_delegate
37.00 ms    6.3%	0 s	 	      __backing_store_DrawImage_block_invoke
37.00 ms    6.3%	0 s	 	       ripc_DrawImage
35.00 ms    6.0%	0 s	 	        ripc_AcquireRIPImageData
35.00 ms    6.0%	0 s	 	         RIPImageCacheGetRetained
28.00 ms    4.8%	0 s	 	          RIPImageDataInitializeShared
28.00 ms    4.8%	0 s	 	           CGSImageDataLock
24.00 ms    4.1%	1.00 ms	 	            img_data_lock
21.00 ms    3.6%	0 s	 	             img_raw_read
20.00 ms    3.4%	0 s	 	              get_chunks_direct
20.00 ms    3.4%	0 s	 	               CGDataProviderDirectGetBytesAtPositionInternal
19.00 ms    3.2%	0 s	 	                provider_for_destination_get_bytes_at_position_inner
14.00 ms    2.4%	0 s	 	                 CGDataProviderDirectGetBytesAtPositionInternal
13.00 ms    2.2%	13.00 ms	 	                  _platform_memmove$VARIANT$Haswell
1.00 ms    0.1%	0 s	 	                  CGDataProviderRetainBytePtr
5.00 ms    0.8%	1.00 ms	 	                 vImageConverterConvert
4.00 ms    0.6%	0 s	 	                  vImageConverter_convert_internal
1.00 ms    0.1%	0 s	 	                0x7fff2e057120
1.00 ms    0.1%	1.00 ms	 	              0x7fff2e056ba0
1.00 ms    0.1%	0 s	 	             _CFRelease
1.00 ms    0.1%	1.00 ms	 	             0x7fff2e054c61
3.00 ms    0.5%	0 s	 	            img_image
2.00 ms    0.3%	0 s	 	             CGDataProviderCreateForDestinationWithImage
1.00 ms    0.1%	1.00 ms	 	             0x7fff2e051670
1.00 ms    0.1%	1.00 ms	 	            0x7fff2e054b4f
7.00 ms    1.2%	0 s	 	          cache_set_and_retain
7.00 ms    1.2%	0 s	 	           _cache_enforce_limits
1.00 ms    0.1%	0 s	 	        ripc_GetRenderingState
1.00 ms    0.1%	0 s	 	        ripc_RenderImage
1.00 ms    0.1%	0 s	 	         RIPLayerBltImage
1.00 ms    0.1%	0 s	 	          ripl_Mark
1.00 ms    0.1%	0 s	 	           argb32_image
1.00 ms    0.1%	0 s	 	            argb32_image_mark_image
1.00 ms    0.1%	1.00 ms	 	             argb32_image_mark_argb32
1.00 ms    0.1%	0 s	 	      0x7fff2afb6c4b
1.00 ms    0.1%	0 s	 	      -[_NSCGSWindowBackingStore lockBackBuffer]
9.00 ms    1.5%	0 s	 	   CGBitmapContextCreateImage
8.00 ms    1.3%	0 s	 	    CGDataProviderCreateTrustedWithCopyOfData
8.00 ms    1.3%	0 s	 	     CGDataProviderCreateWithCopyOfData
1.00 ms    0.1%	0 s	 	    CGImageCreate
6.00 ms    1.0%	0 s	 	   _CFRelease
1.00 ms    0.1%	0 s	 	   0x7fff2e040e5e

And with Full Animation on:

3.10 s   82.7%	0 s	 	  -[OTTD_QuartzView drawRect:]
3.04 s   81.2%	0 s	 	   CGContextDrawImageWithOptions
3.04 s   81.2%	0 s	 	    backing_store_DrawImage.llvm.7187528833409788268
3.04 s   81.2%	0 s	 	     backing_store_delegate
3.04 s   81.1%	0 s	 	      __backing_store_DrawImage_block_invoke
3.04 s   81.1%	0 s	 	       ripc_DrawImage
1.84 s   49.1%	0 s	 	        ripc_RenderImage
1.84 s   49.1%	0 s	 	         RIPLayerBltImage
1.84 s   49.1%	0 s	 	          ripl_Mark
1.84 s   49.1%	0 s	 	           argb32_image
1.84 s   49.1%	0 s	 	            argb32_image_mark_image
1.84 s   49.0%	1.84 s	 	             argb32_image_mark_argb32
1.00 ms    0.0%	1.00 ms	 	             0x7fff2e0b00c0
1.00 ms    0.0%	1.00 ms	 	             madvise
1.00 ms    0.0%	1.00 ms	 	             0x7fff2e0affb8
1.00 ms    0.0%	1.00 ms	 	             0x7fff2e0b00c7
1.19 s   31.8%	0 s	 	        ripc_AcquireRIPImageData
1.19 s   31.8%	0 s	 	         RIPImageCacheGetRetained
1.10 s   29.3%	0 s	 	          RIPImageDataInitializeShared
1.10 s   29.3%	0 s	 	           CGSImageDataLock
1.10 s   29.3%	0 s	 	            img_data_lock
1.10 s   29.2%	0 s	 	             img_raw_read
1.09 s   29.2%	0 s	 	              get_chunks_direct
1.09 s   29.2%	0 s	 	               CGDataProviderDirectGetBytesAtPositionInternal
1.08 s   28.8%	8.00 ms	 	                provider_for_destination_get_bytes_at_position_inner
604.00 ms   16.1%	1.00 ms	 	                 vImageConverterConvert
598.00 ms   15.9%	0 s	 	                  vImageConverter_convert_internal
2.00 ms    0.0%	2.00 ms	 	                  0x7fff652aadfc
1.00 ms    0.0%	1.00 ms	 	                  0x7fff2e02529b
1.00 ms    0.0%	1.00 ms	 	                  0x7fff652aae10
1.00 ms    0.0%	1.00 ms	 	                  0x7fff2e025b92
454.00 ms   12.1%	0 s	 	                 CGDataProviderDirectGetBytesAtPositionInternal
434.00 ms   11.5%	434.00 ms	 	                  _platform_memmove$VARIANT$Haswell
8.00 ms    0.2%	0 s	 	                  CGDataProviderRetainBytePtr
5.00 ms    0.1%	0 s	 	                  CGDataProviderReleaseBytePtr
1.00 ms    0.0%	1.00 ms	 	                  pthread_mutex_lock
1.00 ms    0.0%	1.00 ms	 	                  0x7fff2e0262e6
1.00 ms    0.0%	1.00 ms	 	                  0x7fff2dcd40e6
1.00 ms    0.0%	1.00 ms	 	                  0x7fff652aaeb0
1.00 ms    0.0%	0 s	 	                  _CFRelease
1.00 ms    0.0%	1.00 ms	 	                  0x7fff652b4637
1.00 ms    0.0%	1.00 ms	 	                  0x7fff2e0260aa
3.00 ms    0.0%	3.00 ms	 	                 _platform_memmove$VARIANT$Haswell
1.00 ms    0.0%	1.00 ms	 	                 0x7fff2e056c35
1.00 ms    0.0%	1.00 ms	 	                 0x7fff2e056c51
1.00 ms    0.0%	1.00 ms	 	                 0x7fff2e025a0a
1.00 ms    0.0%	1.00 ms	 	                 0x7fff2e0259cc
1.00 ms    0.0%	1.00 ms	 	                 0x7fff2e025980
1.00 ms    0.0%	1.00 ms	 	                 0x7fff652aae85
1.00 ms    0.0%	1.00 ms	 	                 0x7fff2e025a27
1.00 ms    0.0%	1.00 ms	 	                 0x7fff2e056c2b
1.00 ms    0.0%	0 s	 	                 0x7fff2e0259bc
1.00 ms    0.0%	1.00 ms	 	                 0x7fff2e37e83a
1.00 ms    0.0%	0 s	 	                 0x7fff2e0259e6
1.00 ms    0.0%	1.00 ms	 	                 0x7fff2e025b5b
1.00 ms    0.0%	0 s	 	                0x7fff2e056ed8
1.00 ms    0.0%	1.00 ms	 	                0x7fff2e056e2f
1.00 ms    0.0%	1.00 ms	 	                0x7fff2e057033
1.00 ms    0.0%	1.00 ms	 	                CGDataProviderReleaseBytePtr
1.00 ms    0.0%	1.00 ms	 	                0x7fff2e057737
1.00 ms    0.0%	1.00 ms	 	                0x7fff2e056e43
1.00 ms    0.0%	1.00 ms	 	                0x7fff2e0570a3
1.00 ms    0.0%	1.00 ms	 	                _platform_memmove$VARIANT$Haswell
1.00 ms    0.0%	1.00 ms	 	                0x7fff2e057070
1.00 ms    0.0%	1.00 ms	 	                CGBitmapPixelInfoGetBitsPerPixel
1.00 ms    0.0%	1.00 ms	 	                0x7fff2e0263e2
1.00 ms    0.0%	1.00 ms	 	                0x7fff2e056f7f
1.00 ms    0.0%	1.00 ms	 	                0x7fff2e056e61
1.00 ms    0.0%	1.00 ms	 	               0x7fff2e056d93
1.00 ms    0.0%	1.00 ms	 	              0x7fff2e056ba4
1.00 ms    0.0%	0 s	 	             _CFRelease
1.00 ms    0.0%	0 s	 	             CGSImageDataHandleCreate
1.00 ms    0.0%	0 s	 	              malloc_zone_calloc
1.00 ms    0.0%	0 s	 	            img_image
1.00 ms    0.0%	0 s	 	             CGDataProviderCreateForDestinationWithImage
95.00 ms    2.5%	0 s	 	          cache_set_and_retain
95.00 ms    2.5%	0 s	 	           _cache_enforce_limits
2.00 ms    0.0%	0 s	 	        ripc_GetRenderingState
1.00 ms    0.0%	0 s	 	        RIPImageCacheRelease
1.00 ms    0.0%	0 s	 	       0x7fff2e04c687
1.00 ms    0.0%	0 s	 	      -[_NSCGSWindowBackingStore dirtyBackBufferInRegion:]
1.00 ms    0.0%	0 s	 	    0x7fff2b2aa546
45.00 ms    1.2%	0 s	 	   _CFRelease
11.00 ms    0.2%	0 s	 	   CGBitmapContextCreateImage
11.00 ms    0.2%	0 s	 	    CGDataProviderCreateTrustedWithCopyOfData
11.00 ms    0.2%	0 s	 	     CGDataProviderCreateWithCopyOfData
1.00 ms    0.0%	1.00 ms	 	   0x7fff2e04bd68

As you can see, when enabled, the routine takes up ~80% of CPU time vs ~9% when off.

Raw instruments trace (in xcode format)

@Eddi-z

This comment has been minimized.

Copy link
Contributor

@Eddi-z Eddi-z commented Nov 11, 2019

Well, it is somewhat expected that palette animation is slow, because almost no hardware supports the kind of palette animation anymore, that the game was built on. on modern hardware, you'd expect this to be done in a GPU shader, but to my knowledge, noone has actually implemented this

however, that's not a reason why it should be this bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.