# Why Are We Interested in GPU, CUDA, Numba, RAPIDS...?

Let's take a look.

Here are 1 million numbers and their square roots in (regular) Python:

In [1]:
import math

numbers = list(range(1000000))

In [2]:
%%timeit 

s = [math.sqrt(x) for x in numbers]

80.8 ms ± 412 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


Using `numpy` we can both vectorize our operation and leverage a native (C) implementation from Python:

In [3]:
import numpy as np

np_numbers = np.array(numbers)

In [4]:
%%timeit

np_s = np.sqrt(np_numbers)

1.34 ms ± 20.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


That's pretty nice. Of course, maybe we just started out with Python as an easy target.


Let's look at jitted compiled code.

In [5]:
import numba

@numba.jit
def root(n):
  return np.sqrt(n)

In [6]:
%%timeit

numba_s = root(np_numbers)

1 ms ± 63.1 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


Not bad. But we're here for GPUs ... will the GPU help much?

In [7]:
import torch

gpu_numbers = torch.tensor(numbers, dtype=torch.float32).cuda()

In [8]:
%%timeit

gpu_squares = torch.sqrt(gpu_numbers)

54.6 µs ± 143 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Now things are getting interesting!