### Execution profiling 1
PyTorch offers a comprehensive selection of mathematical operators on tensors. These operations can be executed on a CPU or a GPU, and many of the operations are much faster on a GPU.

Experiment with PyTorch tensor operations as follows:

1. Prepare startup code for allowing both CPU and GPU execution.
2. Create a reasonably large 2D tensor of random numbers to the main memory (CPU) and another one of the same size to the GPU memory.
3. Perform eigenvalue decomposition of the tensors and compare the execution times of both cases so that the measurement is performed properly.

Hints: torch.linalg, torch.cuda.Event, torch.cuda.synchronize.

In [1]:
import torch
import time

Determine the device for tensor processing

In [2]:
cpu = torch.device('cpu')
if torch.cuda.is_available():
    print("GPU available")
    gpu = torch.device('cuda')
else:
    print("GPU not available")
    gpu = None

GPU available


Create a CPU tensor with elements sampled from the normal distribution

In [3]:
n = 5000
Acpu = torch.rand((n, n), device=cpu)
(Acpu.device, Acpu.shape, Acpu.element_size() * Acpu.nelement())

(device(type='cpu'), torch.Size([5000, 5000]), 100000000)

Create a GPU tensor with elements sampled from the normal distribution

In [4]:
Agpu = torch.rand((n, n), device=gpu)
(Agpu.device, Agpu.shape, Agpu.element_size() * Agpu.nelement())

(device(type='cuda', index=0), torch.Size([5000, 5000]), 100000000)

Perform the eigenvalue decomposition on the CPU and time it

In [5]:
t_start = time.time()

L, V = torch.linalg.eig(Acpu)

t_end = time.time()
et_cpu = t_end - t_start
print(f"Eigenvalue decompostion on CPU: {et_cpu} s")

Eigenvalue decompostion on CPU: 19.98369789123535 s


Perform the eigenvalue decomposition on the GPU. Warm up first and then time the operation by waiting till the end through synchronisation.

In [6]:
L, V = torch.linalg.eig(Agpu)

start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)
start_event.record()

L, V = torch.linalg.eig(Agpu)

end_event.record()
torch.cuda.synchronize()

et_gpu = start_event.elapsed_time(end_event) / 1000.
print(f"Eigenvalue decompostion on GPU: {et_gpu} s")

Eigenvalue decompostion on GPU: 10.2351982421875 s
