### Speed check with CPU and GPU
While training batches with PyTorch, it is often better to use GPU as a calculation for reducing execution time. However, the allocated memory on the GPU is important to estimate to ensure it can run. For instance, in the code below, generating a tensor with a size of 100k x 100k is impossible. It is because the memory that should be allocated on my GPU (8Gb) is insufficient. Therefore, I had to use CPU and my SWAP area for this operation because my memory was not sufficient (32Gb). Therefore, it is very important to check the capability of the GPU.

Another point is to determine at which point GPU passes CPU. For instance, a tensor with 100 x 100, won't any faster if it is generated in the GPU compared to CPU. At least, on my machine, it took same amount of time. Therefore, if there is a big batch and GPU is capable, then it is better to use GPU. 

In [None]:
from functools import wraps
import time
import torch

def timeit(func):
    @wraps(func)
    def timeit_wrapper(*args, **kwargs):
        start_time = time.perf_counter()
        result = func(*args, **kwargs)
        end_time = time.perf_counter()
        total_time = end_time - start_time
        print(f'Function {func.__name__}{args} {kwargs} Took {total_time:.4f} seconds')
        return result
    return timeit_wrapper

@timeit
def torch_ones(x_dim, y_dim, device = "cpu"):
    device = torch.device(device)
    print('Using device:', device)
    total = torch.ones(x_dim, y_dim, device=device)
    return total

torch_ones(100,100,device = "cuda")