In [1]:
import torch
from transformers import AutoModelForSequenceClassification, logging
import time

logging.set_verbosity_error() # this just clears our cell output of some clutter

We'll first use the `torch.cuda` module to learn more about the GPUs we have available.

- you can find out whether or not a GPU is available with `torch.cuda.is_available()`
- you can find out *how many* GPUs are available with `torch.cuda.device_count()`
- you can find out what the name of the GPUs are, given they exist, with `torch.cuda.get_device_name()`

In [6]:
is_cuda_available = torch.cuda.is_available()

count_of_device = torch.cuda.device_count()

if count_of_device:
    name_of_device = torch.cuda.get_device_name()
else:
    name_of_device = "No device is available."

print(f"Is cuda availabel: {is_cuda_available}")
print(f"Count of device: {count_of_device}")
print(f"Name of the device: {name_of_device}")

Is cuda availabel: False
Count of device: 0
Name of the device: No device is available.


One very common pattern you'll see in machine learning code is encapsulated in the following line:

```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
```
It finds out whether or not we have a GPU available and, if so, assigns it to `device`. Otherwise, `device` is `"cpu"`. This pattern enables what's called "device-agnostic code," meaning the code itself doesn't know whether or not we've got GPU access, but will work in either case.

Datasets and models are moved to our `device` using the `.to()` method. For instance, to initialize a 3x3 PyTorch tensor with random weights and then assign it to the GPU, we'd write

```python
tensor = torch.rand(3,3).to(device)
```
Moving a model to the GPU is the same method. For model `ML_model`, you'd simply write `ML_model.to(device)` (assuming we've defined `device` the way we did above.)

In [9]:
# device should be 'cuda' if GPU is available, else cpu
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Lets create the random tensor matrix and put it on device.
tensor = torch.rand(2,2).to(device)

# Lets import the tokenizer model and put it to particular device 
model = AutoModelForSequenceClassification.from_pretrained("prajjwal1/bert-tiny", num_labels=2)
model.to(device)

print(f"Device name: {device}")
print(f"Following tensor added: {tensor}")
print(f"Model added to: {model.device}")

Device name: cpu
Following tensor added: tensor([[0.1280, 0.2532],
        [0.7394, 0.1501]])
Model added to: cpu


Notice how the model's device is returned as `"cuda:0"`? That's because it's indexed as the *first* GPU in what might be an array of many GPUs. We only have one GPU we'll use, but in larger scale training runs, you might find additional GPUs assigned to `"cuda:1"`, `"cuda:2"`, and so on.

Finally, let's perform a small experiment to see how much the GPU speeds up our computation.

First, assign to the variable `cpu_tensor` a randomly-initialized PyTorch tensor with 10,000 rows and 10,000 columns. You can do so with `torch.rand(num_rows, num_columns)`. We'll then add all the values together using the `.sum()` method.

Then, under the `torch.cuda.is_available()` if check, assign to the variable `gpu_tensor` another randomized 10,000 x 10,000 tensor, but pass `device=device` as the third argument.

Execute the code cell to see how much faster the GPU can sum every number in a 10,000 by 10,000 tensor than the CPU.


In [11]:
start_time = time.time()
## YOUR SOLUTION HERE ##
cpu_tensor = torch.rand(1000, 1000)
cpu_sum = cpu_tensor.sum()
cpu_time = time.time() - start_time
print("CPU time: ", cpu_time)

if torch.cuda.is_available():
    start_time = time.time()
## YOUR SOLUTION HERE ##
    gpu_tensor = torch.rand(1000, 1000, device=device)
    gpu_sum = gpu_tensor.sum()
    gpu_time = time.time() - start_time
    print("GPU time: ", gpu_time)
    print("Speedup: ", cpu_time / gpu_time)

CPU time:  0.08378267288208008
