New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Threading Management? #220

Closed
Max98 opened this Issue Jun 1, 2015 · 4 comments

Comments

Projects
None yet
4 participants
@Max98
Contributor

Max98 commented Jun 1, 2015

After the release of TestBuild 3, few AMD users have noticed a huge performance increase on the AMD FX 6300 using a single vehicle spawned. (http://www.rigsofrods.com/threads/119110-Test-Build-Rigs-of-rods-0-4-5-0-dev?p=1381463&viewfull=1#post1381463)

I've been discussing with one of them, and I asked him to force the game to use only 3 threads. (NumThreadsInThreadPool in ror.cfg) and yet, his fps increase even more to 172fps rather than 152fps. Maybe the threading mangement isn't doing the job correctly?

@only-a-ptr

This comment has been minimized.

Show comment
Hide comment
@only-a-ptr

only-a-ptr Jun 2, 2015

Member

Good work tracing the AMD perf issue.

One suspicious area is FlexBody/ThreadPool logic. Each flexbody is a subclass of IThreadTask, and for each frame, it's pushed into task queue in main thread and then processed/removed by ThreadWorkers. https://github.com/RigsOfRods/rigs-of-rods/blob/master/source/main/physics/Beam.cpp#L3845 This is bad because:

  • Popping a task from queue for processing requires locking the queue. Locks are slow.
  • It makes no sense to process flexbodies individually - number of flexbodies and their computational cost are known in advance and don't change throughout a lifetime of the vehicle. Flexbodies should be pre-sorted into "Flexbody batches" and assigned to threads directly.

What we need to pinpoint the AMD problem is a realtime performance graph window showing various stages of simulation processing in various colors. I've already been thinking about this for a while so I'll take a shot at implementing it - the profiling macros from Profiler.h should be fast enough for the job, the window will be toggleable and I'll wrap it all in macros, so we can disable it from compiling if it's too slow.

Member

only-a-ptr commented Jun 2, 2015

Good work tracing the AMD perf issue.

One suspicious area is FlexBody/ThreadPool logic. Each flexbody is a subclass of IThreadTask, and for each frame, it's pushed into task queue in main thread and then processed/removed by ThreadWorkers. https://github.com/RigsOfRods/rigs-of-rods/blob/master/source/main/physics/Beam.cpp#L3845 This is bad because:

  • Popping a task from queue for processing requires locking the queue. Locks are slow.
  • It makes no sense to process flexbodies individually - number of flexbodies and their computational cost are known in advance and don't change throughout a lifetime of the vehicle. Flexbodies should be pre-sorted into "Flexbody batches" and assigned to threads directly.

What we need to pinpoint the AMD problem is a realtime performance graph window showing various stages of simulation processing in various colors. I've already been thinking about this for a while so I'll take a shot at implementing it - the profiling macros from Profiler.h should be fast enough for the job, the window will be toggleable and I'll wrap it all in macros, so we can disable it from compiling if it's too slow.

@only-a-ptr only-a-ptr self-assigned this Jun 2, 2015

@Hiradur

This comment has been minimized.

Show comment
Hide comment
@Hiradur

Hiradur Jun 2, 2015

Contributor

The FX6300 only has 3 FPUs while the OS sees it as 6 discrete cores and thinks it has 6 FPUs. So it may produce more threads than the FX6300 can handle simultaneously.
I already suspected this to be the cause for the performance problems but klink told me that Phenom processors (which have as many FPUs as cores shown in the OS) suffered from the performance drop as well so I dropped that idea.

Contributor

Hiradur commented Jun 2, 2015

The FX6300 only has 3 FPUs while the OS sees it as 6 discrete cores and thinks it has 6 FPUs. So it may produce more threads than the FX6300 can handle simultaneously.
I already suspected this to be the cause for the performance problems but klink told me that Phenom processors (which have as many FPUs as cores shown in the OS) suffered from the performance drop as well so I dropped that idea.

@Max98

This comment has been minimized.

Show comment
Hide comment
@Max98

Max98 Jun 2, 2015

Contributor

I forgot to say, 3 or 6, gives a slightly better fps than 0.

Contributor

Max98 commented Jun 2, 2015

I forgot to say, 3 or 6, gives a slightly better fps than 0.

@only-a-ptr only-a-ptr added this to the Post-Nextstable milestone Jul 18, 2015

@Hiradur

This comment has been minimized.

Show comment
Hide comment
@Hiradur

Hiradur Jul 18, 2015

Contributor

This could be related to #1 and its fix

Contributor

Hiradur commented Jul 18, 2015

This could be related to #1 and its fix

@Hiradur Hiradur removed this from the Post-Nextstable milestone Oct 24, 2015

ulteq added a commit to ulteq/rigs-of-rods that referenced this issue Dec 5, 2015

ulteq added a commit to ulteq/rigs-of-rods that referenced this issue Dec 5, 2015

@ulteq ulteq referenced this issue Dec 5, 2015

Merged

Bugfixes #466

ulteq added a commit to ulteq/rigs-of-rods that referenced this issue Dec 8, 2015

@ulteq ulteq closed this Feb 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment