Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With lots of vehicles, PerformanceAccumulator has a large performance impact itself #7247

Open
PeterN opened this issue Feb 19, 2019 · 15 comments · May be fixed by #7248

Comments

@PeterN
Copy link
Member

@PeterN PeterN commented Feb 19, 2019

Version of OpenTTD

master-gef7e47a53a

Expected result

Performance meter monitors performance of vehicle ticks with minimal adverse affect.

Actual result

Due to large amount of individual timing, Performance meter consumes significant CPU itself when timing for lots of vehicles.

Steps to reproduce

  • Load the 'wentbourne' save.
  • Open the Frame rate and observe the Simulation rate.
  • Edit source to disable the PerformanceAccumulator for vehicles. Comment out the PerformanceAccumulator lines in src/roadveh_cmd.cpp, src/aircraft_cmd.cpp, src/ship_cmd.cpp, and src/train_cmd.cpp
  • Open the Frame rate window and observe the Simulation rate.

On my particular system, the simulation rate in master is around 8 fps.
After disabling the PerformanceAccumulator, and no other changes, the simulation rate increases by 50% to around 12 fps.

@PeterN PeterN changed the title With lots of vehicles, PerformanceAccumulator large performance impact itself With lots of vehicles, PerformanceAccumulator has a large performance impact itself Feb 19, 2019
@PeterN

This comment has been minimized.

Copy link
Member Author

@PeterN PeterN commented Feb 19, 2019

@nielsmh

This comment has been minimized.

Copy link
Contributor

@nielsmh nielsmh commented Feb 19, 2019

Unfortunately fixing this will probably require rather large restructuring of the vehicle ticks code, to separate vehicle types into each their own arrays, so all road vehicles, trains, etc. can be processed as single groups.

It might be possible to somehow compile two versions of the vehicle tick functions, one with and one without measurements, and change the dynamic dispatch depending on a setting, but that's likely also tricky to get right.

@James103

This comment has been minimized.

Copy link
Contributor

@James103 James103 commented Feb 19, 2019

Calculating benchmarks from the initial comment in this issue...

(Note: tps = ticks per second (sim rate), different from fps = frames per second (graphics rate). This is despite OpenTTD listing sim rate and graphics rate in the same units)

  • Simulation rate (control): 8 tps (125 mspt from above, 700-800 mspt on a Core 2 Duo T5800 2 GHz)
  • Simulation rate (PerformanceAccumulator disabled): 12 tps (83.33 mspt)
  • Simulation rate improvement: 41.67 mspt
  • Number of vehicles in saved game: 13,899 (NOTE: Road vehicles > 5000 (= 5499) in single company, possibly hacked max vehicles?)

Therefore, the Vehicle PerformanceAccumulator takes ~3 microseconds (on a mid-range CPU) per vehicle per tick. Not a lot, but it adds up over thousands of vehicles to ~42 mspt at 14K vehicles.

@PeterN

This comment has been minimized.

Copy link
Member Author

@PeterN PeterN commented Feb 19, 2019

There's way more vehicles than that, each non-front train vehicle is also counted individually, despite an earlier exit in its tick handler. In this save most trains are 7 tiles long, so 4833 * 14 = 67662 rail vehicles, and that's not counting the longer ones.

@James103

This comment has been minimized.

Copy link
Contributor

@James103 James103 commented Feb 19, 2019

Re-calculating based on PeterN's comment:

  • Number of vehicles in saved game: 78,275 (computed by running a script)
  • Simulation rate improvement: 41.67 mspt (from earlier comment)
    Therefore, the Vehicle PerformanceAccumulator takes ~532 nanoseconds (on a mid-range CPU) per vehicle per tick. Not a lot, but it adds up over thousands of vehicles to ~42 mspt at 80K total vehicles and railway cars.

The script code is as follows (put this in Start() in a gamescript to run):
NOTE: Line 6 of the following code assumes that GSVehicle.GetNumWagons counts the engine of a train in the number of wagons that a train has. Delete line 6 (vehicle_count -= vehicles.Count();) if that's not the case.

local vehicles = GSVehicleList();
local vehicle_count = vehicles.Count();
GSLog.Info("There are "+vehicle_count+" vehicles in this saved game.");
vehicles.Valuate(GSVehicle.GetVehicleType);
vehicles.KeepValue(GSVehicle.VT_RAIL);
vehicle_count -= vehicles.Count();
for (local x = vehicles.Begin(); !vehicles.IsEnd(); x = vehicles.Next()) {
	vehicle_count += GSVehicle.GetNumWagons(x);
}
GSLog.Info("");
GSLog.Info("If you consider each wagon of each train as a separate vehicle, then...");
GSLog.Info("...there are "+vehicle_count+" vehicles in this saved game.");

@JGRennison

This comment has been minimized.

Copy link
Contributor

@JGRennison JGRennison commented Feb 19, 2019

The performance cost of PerformanceAccumulator varies significantly by platform, on my Linux machine the difference between 'Simulation rate (control)' and 'Simulation rate (PerformanceAccumulator disabled)' as described above on trunk is only about 7.7 ms/t vs 8.0 ms/t.

FOR_ALL_VEHICLES_OF_TYPE iterates over the entire Vehicle array and dereferences each pointer to check the type, which is still somewhat expensive to do multiple times for each vehicle type.

With the exception of effect vehicles, the vehicle array changes relatively infrequently.
Per-type arrays of vehicles to call the tick function of can be prepared in advance (and updated/re-generated when necessary), and used in each call to CallVehicleTicks instead of iterating the entire vehicle array on each call.
This also has the advantage that non-front vehicles can be excluded, as the tick function would return immediately in these cases.

As the tick function is being called in typed groups, the tick function for the particular vehicle type can be called directly instead of using a virtual method call (e.g. v->T::Tick()).

@PeterN

This comment has been minimized.

Copy link
Member Author

@PeterN PeterN commented Feb 19, 2019

Interesting that it doesn't affect performance significantly for you. I am running on Linux as well, however it's within a VM under Windows.

And yes, separate lists would be better, but for me even with the extra iterating the improvement is so significant that it's worth doing.

@Eddi-z

This comment has been minimized.

Copy link
Contributor

@Eddi-z Eddi-z commented Feb 19, 2019

maybe depends on /sys/devices/system/clocksource/clocksource0/current_clocksource?

@PeterN

This comment has been minimized.

Copy link
Member Author

@PeterN PeterN commented Feb 19, 2019

Hmm, hyperv_clocksource_tsc_page so probably optimized for hyperv but that doesn't mean much.

I guess I will have to test native on both Linux and Windows.

@pirogronian

This comment has been minimized.

Copy link

@pirogronian pirogronian commented Apr 11, 2019

I noticed significant performance drop of 1.9.1 in comparision to 1.8 on Arch Linux x86_64. I'm not enough skilled to find and disable PerformanceAccumulator, but it indeed affected FPS, lowering it twice, with double high of CPU load (which dropped immediately to nearly zero when paused). Lowering cargo distribution accuracy and elongating graph recalculation time didn't help. I noticed it playing previously saved game, with lots of vehicles.
I found also on tt-forums an OSX user who reported similar problem.

@PeterN

This comment has been minimized.

Copy link
Member Author

@PeterN PeterN commented Apr 11, 2019

Roughly how many vehicles (including wagons) do you have in your game?

@pirogronian

This comment has been minimized.

Copy link

@pirogronian pirogronian commented Apr 11, 2019

225 trains x ~8 wagons + 105 buses x ~ 2 wagons + 27 ships + 64 planes = 1800 + 210 + 64 + 27 = 2101
Maybe it's important: I have Intel Core 2 Duo CPU inside Compal FL90 laptop. It's old but have worked fine up to now...
Update: I built CityMania 1.9.1 client (which has PerformanceAccumulator disabled for vehicles) abd run my gamesave without any performance problems.

@wousser

This comment has been minimized.

Copy link

@wousser wousser commented Jun 24, 2019

Can confirm on macOs 10.14.5.
OpenTTD, >1.9.0, slow fps, unplayable. Also noticeable on the new game screen.
image

OpenTTD 1.8.0 no issues.

@nielsmh

This comment has been minimized.

Copy link
Contributor

@nielsmh nielsmh commented Jun 24, 2019

The slowness on macOS is a separate issue, I believe. If the measurement of vehicle ticks processing was an issue in your case, the measurement of Game Loop total would be much higher than it is, but it's not. The issue in this ticket is specifically for very large games that have thousands of individual vehicles running.

The frame rate measured in your specific case (14.97 fps) means there are 66.8 ms between the beginning of each iteration of the game loop, but the sum of the times (0.36 + 3.14 + 0.01) does not add up to that amount at all. Hence something outside of a measurement block must be the cause. (The PerformanceAccumulator time spent on vehicle ticks is not part of the vehicle tick times, but is part of the total game loop time.)

@nielsmh

This comment has been minimized.

Copy link
Contributor

@nielsmh nielsmh commented Jul 8, 2019

@wousser Can I ask you to assist with some details on your situation over in #7644, which covers the bug you're seeing? Mainly just exact OS version, and preferably also which hardware you're running on, screen resolution OpenTTD is running at, and whether Fast Forward has any effect on the frame rate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.