## 1. Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## 2. Navigate to Project

⚠️ Sửa đường dẫn nếu cần

In [None]:
PROJECT_PATH = "/content/drive/MyDrive/AutoencoderGpu"
%cd {PROJECT_PATH}
!ls -la

## 3. Check GPU

In [None]:
!nvidia-smi
!nvcc --version

## 4. Download CIFAR-10

In [None]:
!if [ ! -d "cifar-10-batches-bin" ]; then \
    wget -q https://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz; \
    tar -xzf cifar-10-binary.tar.gz; \
    rm cifar-10-binary.tar.gz; \
    echo "Downloaded!"; \
else echo "Already exists!"; fi
!ls cifar-10-batches-bin/

## 5. Compile

Colab T4 GPU uses `sm_75`, V100 uses `sm_70`, A100 uses `sm_80`

In [None]:
# Detect GPU architecture
import subprocess
result = subprocess.run(['nvidia-smi', '--query-gpu=name', '--format=csv,noheader'], capture_output=True, text=True)
gpu_name = result.stdout.strip()
print(f"GPU: {gpu_name}")

# Set arch based on GPU
if 'T4' in gpu_name:
    arch = 'sm_75'
elif 'V100' in gpu_name:
    arch = 'sm_70'
elif 'A100' in gpu_name:
    arch = 'sm_80'
else:
    arch = 'sm_75'  # default
    
print(f"Using architecture: {arch}")

In [None]:
# Compile with detected architecture
!nvcc -std=c++17 -O3 -arch={arch} -o autoencoder_gpu src/main.cu
print("\n✅ Compilation successful!")

## 6. Run Training

**Usage:** `./autoencoder_gpu <data_path> [epochs] [batch_size] [max_samples] [optimizer]`

In [None]:
# Quick test: 100 samples, 3 epochs
!./autoencoder_gpu ./cifar-10-batches-bin 3 32 100 adam

In [None]:
# More samples: 500 samples, 3 epochs
!./autoencoder_gpu ./cifar-10-batches-bin 3 32 500 adam

In [None]:
# Full dataset (takes longer)
# !./autoencoder_gpu ./cifar-10-batches-bin 5 32 0 adam

## 7. Compare with CPU Baseline

Để so sánh speedup, chạy cả 2 phiên bản và so sánh thời gian Conv2D

In [None]:
# Run CPU baseline (if available in Drive)
# CPU_PATH = "/content/drive/MyDrive/AutoencoderCpu"
# !cd {CPU_PATH} && make clean && make && ./autoencoder_cpu ./cifar-10-batches-bin 3 32 100 adam