# 🚀 GPU Setup Guide for Deep Learning

## Why GPU Acceleration Matters
- **Speed**: Training 5-10x faster than CPU-only
- **Scale**: Handle larger models and datasets
- **Efficiency**: Better performance per watt

## Installation Steps by Operating System

### 🐧 Linux / WSL2
1. **Install NVIDIA Drivers**
    ```bash
    sudo apt update && sudo apt install nvidia-driver-535
    ```

2. **Install CUDA Toolkit**
    ```bash
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
    sudo dpkg -i cuda-keyring_1.1-1_all.deb
    sudo apt-get update
    sudo apt-get -y install cuda-toolkit-12-4
    ```

3. **Set Environment Variables**
    ```bash
    echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
    echo 'export PATH=$CUDA_HOME/bin:$PATH' >> ~/.bashrc
    echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
    source ~/.bashrc
    ```

4. **Install cuDNN**: Download from [NVIDIA cuDNN](https://developer.nvidia.com/cudnn)

### 🪟 Windows
- **Best Option**: Use WSL2 with Ubuntu (follow Linux instructions above)
- **Native Windows**: Install CUDA from [NVIDIA Downloads](https://developer.nvidia.com/cuda-downloads)

### 🍎 macOS
- Apple Silicon (M1/M2/M3): TensorFlow uses Metal Performance Shaders automatically
- Intel Mac: CPU-only mode is your only option

## Verification
1. Run the **"Verify System and GPU is OK"** cell above
2. Verify CUDA installation: `nvcc --version`
3. Check GPU detection: `nvidia-smi`

## Troubleshooting
If issues persist, run the **GPU Troubleshooting** cell to diagnose and fix common problems.


```markdown
> **ℹ️ To verify your deep learning environment:**  
Run the cells below to automatically check if your GPU, CUDA, cuDNN, and TensorFlow are installed and properly detected by your system.  
These checks will help you confirm that your hardware and software are ready for accelerated training, and provide troubleshooting tips if any issues are found.
```

In [1]:
# Main imports for Car Classification Project
import os

# Configure TensorFlow environment before importing TF
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  # Suppress INFO and WARNING messages
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'  # Disable oneDNN warnings
os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async'  # Better GPU memory management

import platform
import numpy as np
import tensorflow as tf

# Suppress TensorFlow logging
tf.get_logger().setLevel('ERROR')

print("🚀 Car Classification Environment")
print("=" * 40)
print(f"TensorFlow version: {tf.__version__}")
print(f"Operating System: {platform.system()}")
print(f"Architecture: {platform.machine()}")

# Check for GPU availability (OS-dependent)
system = platform.system().lower()

if system == 'windows':
    print("🪟 Windows detected: Running in CPU-only mode")
    print("💡 For GPU support on Windows, consider using WSL2 or Docker")
    print("🚀 Your RTX 4060 can still be used via WSL2 if needed")
    
elif system == 'linux':
    print("🐧 Linux detected: Checking for GPU support...")
    gpus = tf.config.list_physical_devices('GPU')
    print(f"GPU devices detected: {len(gpus)}")
    
    if gpus:
        print("🎉 GPU DETECTED!")
        for i, gpu in enumerate(gpus):
            print(f"  🎮 GPU {i}: {gpu}")
        
        # Configure GPU memory growth to prevent allocation errors
        try:
            for gpu in gpus:
                tf.config.experimental.set_memory_growth(gpu, True)
            print("✅ GPU memory growth enabled")
        except RuntimeError:
            print("⚠️ GPU memory growth already configured")
        
        # Enable mixed precision for better performance
        try:
            tf.keras.mixed_precision.set_global_policy('mixed_float16')
            print("✅ Mixed precision enabled for optimal GPU performance")
        except Exception as e:
            print(f"⚠️ Mixed precision setup: {e}")
    else:
        print("⚠️ No GPU detected - using CPU")
        print("💡 For GPU setup, run the troubleshooting cell below")
        
elif system == 'darwin':
    print("🍎 macOS detected: Optimized for Apple hardware")
    if 'arm' in platform.machine().lower():
        print("🚀 Apple Silicon detected: Using optimized Metal Performance Shaders")
    else:
        print("💻 Intel Mac: CPU-only mode")

# CPU optimizations for all platforms
print("⚡ Enabling CPU optimizations...")
tf.config.threading.set_inter_op_parallelism_threads(0)  # Use all available cores
tf.config.threading.set_intra_op_parallelism_threads(0)  # Use all available cores

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print("✅ Environment ready for car classification training!")
print("📊 Performance optimized for your hardware configuration")

2025-07-24 22:54:53.065432: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753368893.138026    2969 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753368893.159265    2969 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1753368893.331272    2969 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1753368893.331294    2969 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1753368893.331295    2969 computation_placer.cc:177] computation placer alr

🚀 Car Classification Environment
TensorFlow version: 2.19.0
Operating System: Linux
Architecture: x86_64
🐧 Linux detected: Checking for GPU support...
GPU devices detected: 1
🎉 GPU DETECTED!
  🎮 GPU 0: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
✅ GPU memory growth enabled
✅ Mixed precision enabled for optimal GPU performance
⚡ Enabling CPU optimizations...
✅ Environment ready for car classification training!
📊 Performance optimized for your hardware configuration


## Verify System and GPU is OK

In [2]:
# System verification and comprehensive GPU troubleshooting
import platform
import sys
import os

print("=== System Information ===")
print(f"Platform: {platform.platform()}")
print(f"System: {platform.system()}")
print(f"Python Version: {sys.version}")
print(f"TensorFlow Version: {tf.__version__}")

# Check GPU availability
physical_devices = tf.config.list_physical_devices('GPU')
print(f"GPU Available: {len(physical_devices) > 0}")
print(f"Number of GPUs: {len(physical_devices)}")

if len(physical_devices) > 0:
    print("✅ GPU Details:")
    for i, device in enumerate(physical_devices):
        print(f"  GPU {i}: {device}")
        
    # Configure GPU memory growth to avoid allocation errors
    try:
        for gpu in physical_devices:
            tf.config.experimental.set_memory_growth(gpu, True)
        print("✅ GPU memory growth configured")
    except RuntimeError as e:
        print(f"⚠️ GPU memory configuration failed: {e}")
else:
    print("⚠️ No GPU detected")
    print("\n🔧 GPU Troubleshooting Steps:")
    print("1. Check if NVIDIA GPU is available:")
    print("   Run: nvidia-smi")
    print("2. Install CUDA Toolkit:")
    print("   Visit: https://developer.nvidia.com/cuda-downloads")
    print("3. Install cuDNN:")
    print("   Visit: https://developer.nvidia.com/cudnn")
    print("4. Verify CUDA installation:")
    print("   Run: nvcc --version")
    print("5. Check TensorFlow GPU installation:")
    print("   Run: python -c 'import tensorflow as tf; print(tf.config.list_physical_devices())'")

print("\n=== CUDA Environment Check ===")
# Check for CUDA installation
cuda_paths = [
    "/usr/local/cuda/bin/nvcc",
    "/usr/bin/nvcc",
    "/opt/cuda/bin/nvcc"
]

cuda_found = False
for path in cuda_paths:
    if os.path.exists(path):
        print(f"✅ CUDA found at: {path}")
        cuda_found = True
        break

if not cuda_found:
    print("❌ CUDA not found in common locations")
    print("💡 Install CUDA from: https://developer.nvidia.com/cuda-downloads")

# Check for cuDNN
# Use existing cudnn_paths variable if present, otherwise define common locations
cudnn_paths = []
if 'cudnn_paths' not in globals():
    cudnn_paths = [
        "/usr/local/cuda/include/cudnn.h",
        "/usr/include/cudnn.h",
        "/usr/local/cuda/lib64/libcudnn.so",
        "/usr/lib/x86_64-linux-gnu/libcudnn.so",
        "/usr/local/cuda/lib64/libcudnn_ops_infer.so",
        "/usr/local/cuda/lib64/libcudnn_ops_train.so",
        "/usr/local/cuda/lib64/libcudnn_adv_infer.so",
        "/usr/local/cuda/lib64/libcudnn_adv_train.so",
        "/usr/local/cuda/lib64/libcudnn_cnn_infer.so",
        "/usr/local/cuda/lib64/libcudnn_cnn_train.so",
        "/usr/lib/x86_64-linux-gnu/include/cudnn.h",
        "/usr/lib64/include/cudnn.h"
    ]
else:
    # Optionally, extend the existing cudnn_paths with more locations
    cudnn_paths.extend([
        "/usr/local/cuda/lib64/libcudnn.so",
        "/usr/lib/x86_64-linux-gnu/libcudnn.so",
        "/usr/local/cuda/lib64/libcudnn_ops_infer.so",
        "/usr/local/cuda/lib64/libcudnn_ops_train.so",
        "/usr/local/cuda/lib64/libcudnn_adv_infer.so",
        "/usr/local/cuda/lib64/libcudnn_adv_train.so",
        "/usr/local/cuda/lib64/libcudnn_cnn_infer.so",
        "/usr/local/cuda/lib64/libcudnn_cnn_train.so",
        "/usr/lib/x86_64-linux-gnu/include/cudnn.h",
        "/usr/lib64/include/cudnn.h"
    ])

cudnn_found = False
for path in cudnn_paths:
    if os.path.exists(path):
        print(f"✅ cuDNN found at: {path}")
        cudnn_found = True
        break

if not cudnn_found:
    print("❌ cuDNN not found")
    print("💡 Install cuDNN from: https://developer.nvidia.com/cudnn")

print("\n=== Performance Optimization ===")
if len(physical_devices) > 0:
    print("🎮 GPU acceleration enabled")
    # Enable mixed precision for better GPU performance
    try:
        tf.keras.mixed_precision.set_global_policy('mixed_float16')
        print("✅ Mixed precision enabled for optimal GPU performance")
    except Exception as e:
        print(f"⚠️ Mixed precision setup failed: {e}")
else:
    print("💻 CPU-only mode - optimizing for CPU performance")
    # CPU optimizations
    tf.config.threading.set_inter_op_parallelism_threads(0)
    tf.config.threading.set_intra_op_parallelism_threads(0)
    print("✅ CPU threading optimized for maximum performance")

print("\n=== Environment Status ===")
if len(physical_devices) > 0:
    print("🎉 GPU setup complete!")
    print("📈 Training will use GPU acceleration")
else:
    print("✅ CPU setup complete!")
    print("📊 Training will use optimized CPU performance")
    print("⏱️ Expected training time: 2-3x longer than GPU (still manageable)")

print("=" * 50)

=== System Information ===
Platform: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
System: Linux
Python Version: 3.12.3 (main, Jun 18 2025, 17:59:45) [GCC 13.3.0]
TensorFlow Version: 2.19.0
GPU Available: True
Number of GPUs: 1
✅ GPU Details:
  GPU 0: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
✅ GPU memory growth configured

=== CUDA Environment Check ===
✅ CUDA found at: /usr/local/cuda/bin/nvcc
✅ cuDNN found at: /usr/lib/x86_64-linux-gnu/libcudnn.so

=== Performance Optimization ===
🎮 GPU acceleration enabled
✅ Mixed precision enabled for optimal GPU performance

=== Environment Status ===
🎉 GPU setup complete!
📈 Training will use GPU acceleration


## If there exists GPU issue, please run the cell below to troubleshoot

In [3]:
# 🔧 GPU Troubleshooting and Setup Assistant
# Run this cell if you're experiencing GPU issues

import subprocess
import os

def run_command(command, description):
    """Run a system command and return the result"""
    try:
        result = subprocess.run(command, shell=True, capture_output=True, text=True, timeout=10)
        if result.returncode == 0:
            return True, result.stdout.strip()
        else:
            return False, result.stderr.strip()
    except subprocess.TimeoutExpired:
        return False, "Command timed out"
    except Exception as e:
        return False, str(e)

print("🔍 GPU Troubleshooting Assistant")
print("=" * 40)

# 1. Check NVIDIA GPU
print("\n1️⃣ Checking NVIDIA GPU...")
success, output = run_command("nvidia-smi", "NVIDIA GPU check")
if success:
    print("✅ NVIDIA GPU detected:")
    # Extract GPU info from nvidia-smi output
    lines = output.split('\n')
    for line in lines:
        if 'RTX' in line or 'GTX' in line or 'Tesla' in line or 'GeForce' in line:
            print(f"   🎮 {line.strip()}")
else:
    print("❌ NVIDIA GPU not detected or nvidia-smi not available")
    print("💡 Possible solutions:")
    print("   - Install NVIDIA drivers: sudo apt update && sudo apt install nvidia-driver-xxx")
    print("   - Check if GPU is properly connected")
    print("   - Restart your system after driver installation")

# 2. Check CUDA installation
print("\n2️⃣ Checking CUDA installation...")
success, output = run_command("nvcc --version", "CUDA compiler check")
if success:
    print("✅ CUDA installed:")
    for line in output.split('\n'):
        if 'release' in line.lower():
            print(f"   📦 {line.strip()}")
else:
    print("❌ CUDA not installed or not in PATH")
    print("💡 Install CUDA:")
    print("   Ubuntu 24.04: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network")

# 3. Check CUDA environment variables
print("\n3️⃣ Checking CUDA environment...")
cuda_home = os.environ.get('CUDA_HOME', os.environ.get('CUDA_PATH', 'Not set'))
ld_library_path = os.environ.get('LD_LIBRARY_PATH', 'Not set')

print(f"   CUDA_HOME: {cuda_home}")
if 'cuda' not in cuda_home.lower():
    print("   ⚠️ CUDA_HOME not properly set")
    print("   💡 Add to ~/.bashrc: export CUDA_HOME=/usr/local/cuda")

# 4. Check TensorFlow GPU support
print("\n4️⃣ Checking TensorFlow GPU support...")
try:
    import tensorflow as tf
    
    # Suppress warnings for this check
    tf.get_logger().setLevel('ERROR')
    
    gpus = tf.config.list_physical_devices('GPU')
    if gpus:
        print(f"✅ TensorFlow can see {len(gpus)} GPU(s)")
        for i, gpu in enumerate(gpus):
            print(f"   🎮 GPU {i}: {gpu}")
        
        # Test GPU computation
        try:
            with tf.device('/GPU:0'):
                a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
                b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
                c = tf.matmul(a, b)
            print("✅ GPU computation test successful")
        except Exception as e:
            print(f"❌ GPU computation test failed: {e}")
    else:
        print("❌ TensorFlow cannot detect GPU")
        print("💡 Possible solutions:")
        print("   - Reinstall TensorFlow with GPU support: pip install tensorflow[and-cuda]")
        print("   - Check CUDA compatibility: https://www.tensorflow.org/install/source#gpu")
        
except Exception as e:
    print(f"❌ TensorFlow import failed: {e}")

# 5. Provide installation commands
print("\n5️⃣ Quick Fix Commands (if needed):")
print("   # Install NVIDIA drivers")
print("   sudo apt update && sudo apt install nvidia-driver-535")
print("   ")
print("   # Install CUDA (Ubuntu 24.04)")
print("   wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb")
print("   sudo dpkg -i cuda-keyring_1.1-1_all.deb")
print("   sudo apt-get update")
print("   sudo apt-get -y install cuda-toolkit-12-4")
print("   ")
print("   # Add to ~/.bashrc")
print("   echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc")
print("   echo 'export PATH=$CUDA_HOME/bin:$PATH' >> ~/.bashrc")
print("   echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc")
print("   source ~/.bashrc")

print("\n" + "=" * 40)
print("🔄 After installing CUDA, restart your kernel: Kernel → Restart Kernel")

🔍 GPU Troubleshooting Assistant

1️⃣ Checking NVIDIA GPU...
✅ NVIDIA GPU detected:
   🎮 |   0  NVIDIA GeForce RTX 4060        On  |   00000000:01:00.0  On |                  N/A |

2️⃣ Checking CUDA installation...
✅ CUDA installed:
   📦 Cuda compilation tools, release 12.9, V12.9.86

3️⃣ Checking CUDA environment...
   CUDA_HOME: /usr/local/cuda

4️⃣ Checking TensorFlow GPU support...
✅ TensorFlow can see 1 GPU(s)
   🎮 GPU 0: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
✅ GPU computation test successful

5️⃣ Quick Fix Commands (if needed):
   # Install NVIDIA drivers
   sudo apt update && sudo apt install nvidia-driver-535
   
   # Install CUDA (Ubuntu 24.04)
   wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
   sudo dpkg -i cuda-keyring_1.1-1_all.deb
   sudo apt-get update
   sudo apt-get -y install cuda-toolkit-12-4
   
   # Add to ~/.bashrc
   echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
   echo 

I0000 00:00:1753368896.613146    2969 gpu_process_state.cc:208] Using CUDA malloc Async allocator for GPU: 0
I0000 00:00:1753368896.615040    2969 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5563 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 4060, pci bus id: 0000:01:00.0, compute capability: 8.9
