Skip to content

Automatic CPU-GPU affinity #57

Closed
eile opened this Issue Nov 28, 2011 · 6 comments

2 participants

@eile
Eyescale Software GmbH member
eile commented Nov 28, 2011

On multi-socket systems, performance may vary widely depending on which core a thread executes:

readback "channel" RGBA/401/101/0 1920x1200: 249MPix/sec (8.80081ms, 113FPS)
readback "channel" RGBA/401/101/1 1920x1200: 243MPix/sec (9.02805ms, 110FPS)
readback "channel" RGBA/401/101/2 1920x1200: 257MPix/sec (8.54776ms, 116FPS)
readback "channel" RGBA/401/101/3 1920x1200: 241MPix/sec (9.10284ms, 109FPS)
readback "channel" RGBA/401/101/4 1920x1200: 263MPix/sec (8.34019ms, 119FPS)
readback "channel" RGBA/401/101/5 1920x1200: 262MPix/sec (8.37638ms, 119FPS)
readback "channel" RGBA/401/101/6 1920x1200: 173MPix/sec (12.6451ms, 79FPS)
readback "channel" RGBA/401/101/7 1920x1200: 173MPix/sec (12.6364ms, 79FPS)
readback "channel" RGBA/401/101/8 1920x1200: 173MPix/sec (12.6432ms, 79FPS)
readback "channel" RGBA/401/101/9 1920x1200: 173MPix/sec (12.6438ms, 79FPS)
readback "channel" RGBA/401/101/a 1920x1200: 175MPix/sec (12.5237ms, 79FPS)
readback "channel" RGBA/401/101/b 1920x1200: 174MPix/sec (12.5634ms, 79FPS)

Provide a, preferably automatic, way to configure CPU affinity. hwloc seems to be the most promising package for this.

@eile eile was assigned Nov 28, 2011
@eile
Eyescale Software GmbH member
eile commented Nov 28, 2011

It also matters for network IO:

[[eilemann@node01 Equalizer]$ numactl --cpunodebind=0 ./release/bin/netperf -s node01i:4242:RDMA
Recv perf: 1758.73MB/s (1758.73pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 1787.89MB/s (1787.89pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 1771.19MB/s (1771.19pps) from RDMA#5000000#node01i##4242#default#

[eilemann@node01 Equalizer]$ numactl --cpunodebind=1 ./release/bin/netperf -s node01i:4242:RDMA
Recv perf: 2028.07MB/s (2028.07pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 2022.14MB/s (2022.14pps) from RDMA#5000000#node01i##4242#default#
Recv perf: 1921.8MB/s (1921.8pps) from RDMA#5000000#node01i##4242#default#

@eile
Eyescale Software GmbH member
eile commented Nov 30, 2011
@eile
Eyescale Software GmbH member
eile commented Dec 19, 2011
@eile
Eyescale Software GmbH member
eile commented Mar 21, 2012

FindHWLOC needs version checking. The Ubuntu version is too old, it doesn't have hwloc_bitmap_t which is deprecated in newer versions. Later we'll need also your new code. Please do this in https://github.com/Eyescale/CMake and then merge to Eq (see doc).

@marwan-abdellah
Eyescale Software GmbH member

This issue was resolved in the commit 08f4ae5

@eile
Eyescale Software GmbH member
eile commented Jul 6, 2012

Implemented except node thread affinity, as the small gain does not warrant the implementation overhead right now.

@eile eile closed this Jul 6, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.