-
Notifications
You must be signed in to change notification settings - Fork 64
Open
Description
I think it would be great to have an example for using multiple GPUs.
Here is what I tried. If thats the right way to do it, then you may add it as an example.
It seems to scale fine (tested up to 7 GPUs) and nvidia-smi reports 96% util.
import time
import numpy as np
import arrayfire as af
import argparse
af.set_backend('cuda')
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('gpus', type=int)
parser.add_argument('-runs', type=int, default=100)
args = parser.parse_args()
GPUS = args.gpus
N = 5000
runs = args.runs
# The simple task we want to solve:
# we have a huge list of vectors X and want to calculate the distance between all of them
# this will result in a huge distance matrix M
# the resulting matrix should be multiplied by a vector alpha
X = np.random.rand(100, N)
Alpha = np.random.rand(N,1)
#copy data once:
xGPU = []
alphaGPU = []
for i in range(GPUS):
af.set_device(i)
x = af.to_array(X)
xGPU.append(x)
alpha = af.to_array(Alpha)
alphaGPU.append(alpha)
sub = lambda a,b: a - b
print("init finished")
for _ in range(runs):
startTime = time.time()
splitSize = int(np.ceil(N / GPUS))
#print("Temp data will ocupy at least {:.2f} MB on the gpu.".format((X.shape[0] * splitSize * X.shape[1]) *8 /1024/1024))
result = []
for i in range(GPUS):
af.set_device(i)
x = xGPU[i]
alpha = alphaGPU[i]
start = i*splitSize
end = min((i+1)*splitSize, N)
diff = af.broadcast(sub, af.tile(x[:,start:end],1,1,x.shape[1]), af.moddims(x,x.shape[0],1,x.shape[1]))
diff = af.sqrt(af.sum(af.pow(diff,2),0) )
r = af.matmul(af.moddims(diff, diff.shape[1], diff.shape[2]), alpha)
result.append(r)
total = 0
for i in range(GPUS):
af.set_device(i)
total += af.sum(result[i])
print("Took {} sec".format(time.time() - startTime ))
Metadata
Metadata
Assignees
Labels
No labels