# GPU Support Comparison: Google Colab vs Local Setup

## ‚úÖ Google Colab Results (Working)
```
TensorFlow version: 2.19.0
Built with CUDA: True
Physical GPUs found: 1
  GPU 0: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')
Local devices:
  /device:CPU:0 CPU 268435456
  /device:GPU:0 GPU 14619377664
Running a small matmul on /GPU:0 to test execution...
GPU matmul completed in 0.2495 s
```

## ‚ö†Ô∏è Local Setup Status
- **Hardware**: NVIDIA GeForce RTX 5080 (15.92 GB VRAM) ‚úÖ
- **Driver**: 591.44 (CUDA 13.1 compatible) ‚úÖ
- **PyTorch GPU**: Working perfectly (3.18x speedup) ‚úÖ
- **TensorFlow GPU**: ‚ùå Not working yet

## üîß To Fix TensorFlow GPU Locally:

**Current Kernel**: `anaconda3` (Python 3.13.2) - TensorFlow 2.20.0 has **no GPU support**

**Solution**: Switch to the **"TensorFlow GPU (Python 3.11)"** kernel (top-right corner)
- This kernel has TensorFlow 2.16.2 with CUDA packages installed
- **BUT** still needs system-level CUDA Toolkit to detect GPU

**Next Steps**:
1. Download [CUDA Toolkit 12.6](https://developer.nvidia.com/cuda-12-6-0-download-archive)
2. Install it (adds necessary DLL files to system PATH)
3. Restart VS Code
4. Switch kernel to "TensorFlow GPU (Python 3.11)"
5. Run the TensorFlow test cell below

---

In [1]:
# Cell (index 0) - quick TensorFlow GPU support check
# Put this in the new Notebook cell and run.

import time
try:
    import tensorflow as tf
except Exception as e:
    print("Failed to import TensorFlow:", e)
else:
    print("TensorFlow version:", tf.__version__)
    # Built with CUDA (True/False)
    built_with_cuda = False
    try:
        built_with_cuda = tf.test.is_built_with_cuda()
    except Exception:
        pass
    print("Built with CUDA:", built_with_cuda)

    # Physical GPUs detected by TensorFlow
    try:
        gpus = tf.config.list_physical_devices('GPU')
    except Exception:
        gpus = []
    print("Physical GPUs found:", len(gpus))
    for i, g in enumerate(gpus):
        print(f"  GPU {i}:", g)

    # Fallback: list local devices (gives CPU/GPU info from TF runtime)
    try:
        from tensorflow.python.client import device_lib
        devices = device_lib.list_local_devices()
        print("Local devices:")
        for d in devices:
            print(" ", d.name, d.device_type, getattr(d, "memory_limit", ""))
    except Exception as e:
        print("Could not list local devices:", e)

    # Quick GPU execution test (matrix multiply) if a GPU is available
    if gpus:
        print("Running a small matmul on /GPU:0 to test execution...")
        a = tf.random.uniform((1024, 1024))
        b = tf.random.uniform((1024, 1024))
        t0 = time.time()
        try:
            with tf.device('/GPU:0'):
                c = tf.matmul(a, b)
            # force evaluation (works in eager mode)
            _ = c.numpy()
            t1 = time.time()
            print("GPU matmul completed in %.4f s" % (t1 - t0))
        except Exception as e:
            print("GPU test failed:", e)
    else:
        print("No GPU detected, skipping execution test.")

TensorFlow version: 2.20.0
Built with CUDA: False
Physical GPUs found: 0
Local devices:
  /device:CPU:0 CPU 268435456
No GPU detected, skipping execution test.


## Test PyTorch GPU Support

PyTorch often has better GPU detection on Windows than TensorFlow.

In [2]:
.import torch
import time

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("Number of GPUs:", torch.cuda.device_count())

if torch.cuda.is_available():
    for i in range(torch.cuda.device_count()):
        print(f"\nGPU {i}: {torch.cuda.get_device_name(i)}")
        print(f"  Memory: {torch.cuda.get_device_properties(i).total_memory / 1024**3:.2f} GB")
    
    # Quick GPU speed test
    print("\nüöÄ Running GPU speed test...")
    device = torch.device("cuda:0")
    a = torch.randn(5000, 5000, device=device)
    b = torch.randn(5000, 5000, device=device)
    
    torch.cuda.synchronize()
    start = time.time()
    c = torch.matmul(a, b)
    torch.cuda.synchronize()
    gpu_time = time.time() - start
    
    print(f"‚úÖ GPU matmul (5000x5000) completed in {gpu_time:.4f} seconds")
    
    # CPU comparison
    a_cpu = a.cpu()
    b_cpu = b.cpu()
    start = time.time()
    c_cpu = torch.matmul(a_cpu, b_cpu)
    cpu_time = time.time() - start
    
    print(f"üêå CPU matmul (5000x5000) completed in {cpu_time:.4f} seconds")
    print(f"‚ö° Speedup: {cpu_time/gpu_time:.2f}x faster on GPU!")
else:
    print("\n‚ùå No CUDA GPU detected by PyTorch")

PyTorch version: 2.9.1+cu130
CUDA available: True
CUDA version: 13.0
Number of GPUs: 1

GPU 0: NVIDIA GeForce RTX 5080
  Memory: 15.92 GB

üöÄ Running GPU speed test...
‚úÖ GPU matmul (5000x5000) completed in 0.0636 seconds
üêå CPU matmul (5000x5000) completed in 0.3469 seconds
‚ö° Speedup: 5.46x faster on GPU!
