Documentation: Multiple GPUs

I think it would be great to have an example for using multiple GPUs.

Here is what I tried. If thats the right way to do it, then you may add it as an example.
It seems to scale fine (tested up to 7 GPUs) and nvidia-smi reports 96% util.
```
import time
import numpy as np
import arrayfire as af
import argparse
af.set_backend('cuda')

if __name__ == '__main__':
      parser = argparse.ArgumentParser()
      parser.add_argument('gpus', type=int)
      parser.add_argument('-runs', type=int, default=100)
      args = parser.parse_args()

      GPUS = args.gpus
      N = 5000
      runs = args.runs

      # The simple task we want to solve:
      # we have a huge list of vectors X and want to calculate the distance between all of them
      # this will result in a huge distance matrix M
      # the resulting matrix should be multiplied by a vector alpha
      X = np.random.rand(100, N)
      Alpha = np.random.rand(N,1)

      #copy data once:
      xGPU = []
      alphaGPU = []
      for i in range(GPUS):
            af.set_device(i)
            x = af.to_array(X)
            xGPU.append(x)
            alpha = af.to_array(Alpha)
            alphaGPU.append(alpha)

      sub = lambda a,b: a - b 
      print("init finished")
      for _ in range(runs):
            startTime = time.time()
            splitSize = int(np.ceil(N / GPUS))
            #print("Temp data will ocupy at least {:.2f} MB on the gpu.".format((X.shape[0] * splitSize * X.shape[1]) *8 /1024/1024))

            result = []
            for i in range(GPUS):
                  af.set_device(i)
                  x = xGPU[i]
                  alpha = alphaGPU[i]

                  start = i*splitSize
                  end = min((i+1)*splitSize, N) 

                  diff = af.broadcast(sub, af.tile(x[:,start:end],1,1,x.shape[1]), af.moddims(x,x.shape[0],1,x.shape[1]))
                  diff = af.sqrt(af.sum(af.pow(diff,2),0) )
                  r = af.matmul(af.moddims(diff, diff.shape[1], diff.shape[2]), alpha)
                  result.append(r)

            total = 0
            for i in range(GPUS):
                  af.set_device(i)
                  total += af.sum(result[i])

            print("Took {} sec".format(time.time() - startTime ))
 
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Documentation: Multiple GPUs #165

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Documentation: Multiple GPUs #165

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions