Skip to content
This repository

Automatic CPU-GPU affinity #57

Closed
eile opened this Issue November 28, 2011 · 6 comments

2 participants

Stefan Eilemann Marwan Abdellah
Stefan Eilemann
Owner

On multi-socket systems, performance may vary widely depending on which core a thread executes:

readback "channel" RGBA/401/101/0 1920x1200: 249MPix/sec (8.80081ms, 113FPS)
readback "channel" RGBA/401/101/1 1920x1200: 243MPix/sec (9.02805ms, 110FPS)
readback "channel" RGBA/401/101/2 1920x1200: 257MPix/sec (8.54776ms, 116FPS)
readback "channel" RGBA/401/101/3 1920x1200: 241MPix/sec (9.10284ms, 109FPS)
readback "channel" RGBA/401/101/4 1920x1200: 263MPix/sec (8.34019ms, 119FPS)
readback "channel" RGBA/401/101/5 1920x1200: 262MPix/sec (8.37638ms, 119FPS)
readback "channel" RGBA/401/101/6 1920x1200: 173MPix/sec (12.6451ms, 79FPS)
readback "channel" RGBA/401/101/7 1920x1200: 173MPix/sec (12.6364ms, 79FPS)
readback "channel" RGBA/401/101/8 1920x1200: 173MPix/sec (12.6432ms, 79FPS)
readback "channel" RGBA/401/101/9 1920x1200: 173MPix/sec (12.6438ms, 79FPS)
readback "channel" RGBA/401/101/a 1920x1200: 175MPix/sec (12.5237ms, 79FPS)
readback "channel" RGBA/401/101/b 1920x1200: 174MPix/sec (12.5634ms, 79FPS)

Provide a, preferably automatic, way to configure CPU affinity. hwloc seems to be the most promising package for this.

Stefan Eilemann
Owner

It also matters for network IO:

[[eilemann@node01 Equalizer]$ numactl --cpunodebind=0 ./release/bin/netperf -s node01i:4242:RDMA
Recv perf: 1758.73MB/s (1758.73pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 1787.89MB/s (1787.89pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 1771.19MB/s (1771.19pps) from RDMA#5000000#node01i##4242#default#

[eilemann@node01 Equalizer]$ numactl --cpunodebind=1 ./release/bin/netperf -s node01i:4242:RDMA
Recv perf: 2028.07MB/s (2028.07pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 2022.14MB/s (2022.14pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 1921.8MB/s (1921.8pps) from RDMA#5000000#node01i##4242#default#

Stefan Eilemann
Owner
eile commented March 21, 2012

FindHWLOC needs version checking. The Ubuntu version is too old, it doesn't have hwloc_bitmap_t which is deprecated in newer versions. Later we'll need also your new code. Please do this in https://github.com/Eyescale/CMake and then merge to Eq (see doc).

Marwan Abdellah
Collaborator

This issue was resolved in the commit 08f4ae5

Stefan Eilemann eile closed this July 06, 2012
Stefan Eilemann
Owner
eile commented July 06, 2012

Implemented except node thread affinity, as the small gain does not warrant the implementation overhead right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.