Skip to content

Releases: ROCm/TransferBench

TransferBench v1.50

03 Apr 16:27
eaf32b4
Compare
Choose a tag to compare

Added

  • Adding new parallel copy preset benchmark (pcopy)
    • Usage: ./TransferBench pcopy <numBytes=64M> <#CUs=8> <srcGpu=0> <minGpus=1> <maxGpus=#GPU-1>

Fixed

  • Removed non-copies DMA Transfers (this had previously been using hipMemset)
  • Fixed CPU executor when operating on null destination

TransferBench v1.49

02 Apr 22:38
97fbbbb
Compare
Choose a tag to compare

Fixes

  • Enumerating previously missed DMA engines used only for CPU traffic in topology display

TransferBench v1.48

02 Feb 22:46
aa801b9
Compare
Choose a tag to compare

v1.48

Fixes

  • Various fixes for TransferBenchCuda

Additions

  • Support for targeting specific DMA engines via executor subindex (e.g. D0.1)
  • Printing warnings when exeuctors are overcommited

Modifications

  • USE_REMOTE_READ supported for rwrite preset benchmark

TransferBench v1.47

09 Jan 20:52
ceeab46
Compare
Choose a tag to compare

Fixes

  • Fixing CUDA compilation

TransferBench v1.46

14 Dec 03:54
d5445b9
Compare
Choose a tag to compare

Fixes

  • Fixing GFX_UNROLL set to 13 (past 8) on gfx906 cards

Modifications

  • GFX_SINGLE_TEAM=1 by default
  • Adding field showing summation of individual Transfer bandwidths for Executors

TransferBench v1.45

05 Dec 06:41
f33c7fd
Compare
Choose a tag to compare

Additions

  • Adding A2A_MODE to a2a preset (0 = copy, 1 = read-only, 2 = write-only)
  • Adding GFX_UNROLL to modify GFX kernel's unroll factor
  • Adding GFX_WAVE_ORDER to modify order in which wavefronts process data

Modifications

  • Rewrote the GFX reduction kernel to support new wave ordering

TransferBench v1.44

01 Dec 21:00
33a5435
Compare
Choose a tag to compare

Additions

  • Adding rwrite preset to benchmark remote parallel writes
  • Usage: ./TransferBench rwrite <numBytes=64M> <#CUs=8> <srcGpu=0> <minGpus=1> <maxGpus=3>

TransferBench v1.43

30 Nov 20:33
30f1c58
Compare
Choose a tag to compare

Changes

  • Modifying a2a to show executor timing, as well as executor min/max bandwidth

TransferBench v1.42

30 Nov 09:07
8b2cd85
Compare
Choose a tag to compare

Fixes

  • Fixing schmoo maxNumCus optional arg parsing
  • Schmoo output modified to be easier to copy

TransferBench v1.41

30 Nov 07:38
f5e9cf3
Compare
Choose a tag to compare

Additions

  • Adding schmoo preset config benchmarks local/remote reads/writes/copies
    • Usage: ./TransferBench schmoo <numBytes=64M> <localIdx=0> <remoteIdx=1> <maxNumCUs=32>

Fixes

  • Fixing some misreported timings when running with non-fixed number of iterations