Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Automatic CPU-GPU affinity #57

Closed
eile opened this Issue · 6 comments

2 participants

Stefan Eilemann Marwan Abdellah
Stefan Eilemann
Owner

On multi-socket systems, performance may vary widely depending on which core a thread executes:

readback "channel" RGBA/401/101/0 1920x1200: 249MPix/sec (8.80081ms, 113FPS)
readback "channel" RGBA/401/101/1 1920x1200: 243MPix/sec (9.02805ms, 110FPS)
readback "channel" RGBA/401/101/2 1920x1200: 257MPix/sec (8.54776ms, 116FPS)
readback "channel" RGBA/401/101/3 1920x1200: 241MPix/sec (9.10284ms, 109FPS)
readback "channel" RGBA/401/101/4 1920x1200: 263MPix/sec (8.34019ms, 119FPS)
readback "channel" RGBA/401/101/5 1920x1200: 262MPix/sec (8.37638ms, 119FPS)
readback "channel" RGBA/401/101/6 1920x1200: 173MPix/sec (12.6451ms, 79FPS)
readback "channel" RGBA/401/101/7 1920x1200: 173MPix/sec (12.6364ms, 79FPS)
readback "channel" RGBA/401/101/8 1920x1200: 173MPix/sec (12.6432ms, 79FPS)
readback "channel" RGBA/401/101/9 1920x1200: 173MPix/sec (12.6438ms, 79FPS)
readback "channel" RGBA/401/101/a 1920x1200: 175MPix/sec (12.5237ms, 79FPS)
readback "channel" RGBA/401/101/b 1920x1200: 174MPix/sec (12.5634ms, 79FPS)

Provide a, preferably automatic, way to configure CPU affinity. hwloc seems to be the most promising package for this.

Stefan Eilemann eile was assigned
Stefan Eilemann
Owner

It also matters for network IO:

[[eilemann@node01 Equalizer]$ numactl --cpunodebind=0 ./release/bin/netperf -s node01i:4242:RDMA
Recv perf: 1758.73MB/s (1758.73pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 1787.89MB/s (1787.89pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 1771.19MB/s (1771.19pps) from RDMA#5000000#node01i##4242#default#

[eilemann@node01 Equalizer]$ numactl --cpunodebind=1 ./release/bin/netperf -s node01i:4242:RDMA
Recv perf: 2028.07MB/s (2028.07pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 2022.14MB/s (2022.14pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 1921.8MB/s (1921.8pps) from RDMA#5000000#node01i##4242#default#

Stefan Eilemann
Owner

FindHWLOC needs version checking. The Ubuntu version is too old, it doesn't have hwloc_bitmap_t which is deprecated in newer versions. Later we'll need also your new code. Please do this in https://github.com/Eyescale/CMake and then merge to Eq (see doc).

Marwan Abdellah
Collaborator

This issue was resolved in the commit 08f4ae5

Stefan Eilemann
Owner

Implemented except node thread affinity, as the small gain does not warrant the implementation overhead right now.

Stefan Eilemann eile closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.