**<font color = white size=6 >如何將資料丟到CUDA。</font>**




### 確認 GPU 狀態

In [12]:
import torch

# Check if CUDA (GPU support) is available
cuda_available = torch.cuda.is_available()

# Get the number of available GPUs
num_gpus = torch.cuda.device_count()

if cuda_available:
    # Print GPU information
    for gpu_id in range(num_gpus):
        gpu_name = torch.cuda.get_device_name(gpu_id)
        print(f"GPU {gpu_id}: {gpu_name}")
        
else:
    print("No CUDA-enabled GPU found.")


# Print the current GPU being used (if available)
if cuda_available:
    current_gpu = torch.cuda.current_device()
    print("-----GPU can be used-----")
    print(f"Using GPU {current_gpu}: {torch.cuda.get_device_name(current_gpu)}")



GPU 0: NVIDIA GeForce RTX 3090
-----GPU can be used-----
Using GPU 0: NVIDIA GeForce RTX 3090


In [19]:
torch.__version__

'2.1.0+cu121'

In [14]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:30:42_Pacific_Standard_Time_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0


In [13]:
!nvidia-smi

Sun Feb  4 00:29:04 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.23                 Driver Version: 551.23         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3090      WDDM  |   00000000:01:00.0  On |                  N/A |
|  0%   39C    P2            132W /  390W |    1303MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### 指定 GPU 運算

在torch的tensor下直接.to(device)即可，但device需要先宣告。

In [28]:
from torchvision import datasets, transforms
import torch


use_cuda = 1
device = torch.device("cuda:0" if (torch.cuda.is_available() & use_cuda) else "cpu")
print(device)


transform = transforms.ToTensor()
dataset_MNIST_tensor = datasets.MNIST('../../data/cv', train=True, download=True, transform=transform)
mnistdata_loader = torch.utils.data.DataLoader(dataset_MNIST_tensor, batch_size=2)


for data, target in mnistdata_loader:
    print(target)
    data, target = data.to(device), target.to(device)
    print(target)
    break




cuda:0
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../../data/cv\MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:15<00:00, 650385.00it/s] 


Extracting ../../data/cv\MNIST\raw\train-images-idx3-ubyte.gz to ../../data/cv\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ../../data/cv\MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 444021.38it/s]


Extracting ../../data/cv\MNIST\raw\train-labels-idx1-ubyte.gz to ../../data/cv\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ../../data/cv\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:04<00:00, 388135.54it/s]


Extracting ../../data/cv\MNIST\raw\t10k-images-idx3-ubyte.gz to ../../data/cv\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ../../data/cv\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<?, ?it/s]

Extracting ../../data/cv\MNIST\raw\t10k-labels-idx1-ubyte.gz to ../../data/cv\MNIST\raw

tensor([5, 0])
tensor([5, 0], device='cuda:0')





## CPU mode 的 tensor vs. CUDA mode 的 tensor  

資料必須在相同硬體位置才可以做計算



In [24]:
import torch

# 創建 CPU 張量
tensor_cpu = torch.tensor([1, 2, 3], dtype=torch.float32)

# 創建 GPU 張量
tensor_gpu = torch.tensor([4, 5, 6], dtype=torch.float32).cuda()


# 將 CPU 張量搬到 GPU 上，進行相加
result_gpu = tensor_cpu.to('cuda') + tensor_gpu
print("gup tensor add gpu tensor",result_gpu)

# 將 GPU 張量搬到 CPU 上，然後相加
result_cpu = tensor_cpu + tensor_gpu.to('cpu')
print("cpu tensor add cpu tensor",result_cpu)

# 硬質硬體位置相加
result_cpu_gpu = tensor_cpu + tensor_gpu
print("cpu tensor add gpu tensor",result_cpu_gpu)


gup tensor add gpu tensor tensor([5., 7., 9.], device='cuda:0')
cpu tensor add cpu tensor tensor([5., 7., 9.])


RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

### 測試CPU和CUDA之間資料搬移

In [8]:
import time
def test_time(data_loader):
    start = time.time()
    count=0
    for data, target in data_loader:
        count+=1
    print("a forloop time for whole dataset within CPU: {}s".format(time.time()-start))

    start = time.time()
    count=0
    for data, target in data_loader:
        data, target = data.to(device), target.to(device)
        count+=1
    print("a forloop time for whole dataset with CPU to CUDA: {}s".format(time.time()-start))
    
print("When batch size = 10")
mnistdata_loader = torch.utils.data.DataLoader(dataset_MNIST_tensor, batch_size=10, shuffle=False)
test_time(mnistdata_loader)

print("\nWhen batch size = 2")
mnistdata_loader = torch.utils.data.DataLoader(dataset_MNIST_tensor, batch_size=2, shuffle=False)
test_time(mnistdata_loader)





When batch size = 10
a forloop time for whole dataset within CPU: 2.253023862838745s
a forloop time for whole dataset with CPU to CUDA: 4.19034218788147s

When batch size = 2
a forloop time for whole dataset within CPU: 2.874025583267212s
a forloop time for whole dataset with CPU to CUDA: 8.376002311706543s


</font>**<font color = black size=4 >Note: </font>**<br>

**<font color = black size=3 >1:資料跑來跑去一定花費data bandwidth，導致時間會變慢，所以在pytorch撰寫過程中要盡量避免資料在CPU和GPU之間跑來跑去。</font>**<br>

**<font color = black size=3 >2: 容易造成CPU的tensor和CUDA的tensor進行運算的error，在進行運算要注意是在CPU還是CUDA。</font>**
