## CPU vs GPU in practice

### Imports

In [4]:
import torch

### CPU

In [9]:
%%timeit
z = torch.randn(1, 1)
result = torch.matmul(z, z)
del z
del result

4.42 µs ± 46.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [10]:
%%timeit
z = torch.randn(1_000, 1_000)
result = torch.matmul(z, z)
del z
del result

20.7 ms ± 708 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [11]:
%%timeit
z = torch.randn(10_000, 10_000)
result = torch.matmul(z, z)
del z
del result

11.2 s ± 201 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### GPU

In [12]:
device = torch.device('cuda')

In [16]:
%%timeit
z = torch.randn(1, 1).to(device)
result = torch.matmul(z, z)
del z
del result

66.5 µs ± 7.14 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [14]:
%%timeit
z = torch.randn(1_000, 1_000).to(device)
result = torch.matmul(z, z)
del z
del result

5.56 ms ± 48.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [15]:
%%timeit
z = torch.randn(10_000, 10_000).to(device)
result = torch.matmul(z, z)
del z
del result

576 ms ± 63.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Conclusions

- $1$ x $1$ -> $4.42$ µs vs $66.5$ µs.

CPU is **15x** faster than GPU.

- $1,000$ x $1,000$ -> $20.7$ ms vs $5.56$ ms.

GPU is **3.7x** faster than CPU.

- $10,000$ x $10,000$ -> $11.2$ s vs $576$ ms.

GPU is **19.4x** faster than CPU.

This WILL depend on your hardware, but in general, GPUs are much faster than CPUs for large matrix multiplication.