# 🔍 GPU Troubleshooting and Diagnostics

This notebook is designed to diagnose why your NVIDIA GPU isn't accessible through PyTorch and provide step-by-step solutions.

## 1️⃣ System and Installation Diagnostics

First, let's check your current PyTorch installation and system configuration.

In [1]:
import sys
import subprocess
import platform
import os

print("Python version:", sys.version)
print("Platform:", platform.platform())

# Check if PyTorch is installed and which version
try:
    import torch

    print("\nPyTorch version:", torch.__version__)

    # Check CUDA availability
    print("CUDA available:", torch.cuda.is_available())
    if hasattr(torch.version, "cuda"):
        print("CUDA version (PyTorch):", torch.version.cuda)
    else:
        print("CUDA version: Not found in PyTorch build")

    # Display PyTorch build information
    print("\nPyTorch build details:")
    build_details = torch.__config__.show()

    # Check if this is a CPU-only build
    print(
        "\nIs this a CPU-only build?",
        "Yes"
        if not hasattr(torch.version, "cuda") or torch.version.cuda is None
        else "No",
    )

except ImportError:
    print("PyTorch is not installed")

Python version: 3.12.11 (main, Jul 11 2025, 22:43:48) [Clang 20.1.4 ]
Platform: Linux-6.12.48-x86_64-with-glibc2.40

PyTorch version: 2.6.0+cu124
CUDA available: True
CUDA version (PyTorch): 12.4

PyTorch build details:

Is this a CPU-only build? No


## 2️⃣ Check GPU Hardware

Let's check which GPUs are physically present in your system.

In [2]:
# Check for NVIDIA GPUs using Windows tools
def get_gpu_info():
    try:
        if platform.system() == "Windows":
            gpu_info = (
                subprocess.check_output(
                    "wmic path win32_VideoController get name", shell=True
                )
                .decode()
                .strip()
                .split("\n")
            )
            gpu_info = [
                line.strip()
                for line in gpu_info
                if line.strip() and line.strip() != "Name"
            ]
            return gpu_info
        else:
            return ["Non-Windows platform - can't use wmic"]
    except Exception as e:
        return [f"Error detecting GPUs: {e}"]


# Check NVIDIA driver
def get_nvidia_driver_version():
    try:
        if platform.system() == "Windows":
            driver_info = (
                subprocess.check_output(
                    "wmic path win32_VideoController where \"name like '%NVIDIA%'\" get DriverVersion",
                    shell=True,
                )
                .decode()
                .strip()
                .split("\n")
            )
            driver_version = [
                line.strip()
                for line in driver_info
                if line.strip() and line.strip() != "DriverVersion"
            ]
            return driver_version[0] if driver_version else "Not found"
        else:
            return "Non-Windows platform"
    except Exception as e:
        return f"Error detecting NVIDIA driver: {e}"


# Run the diagnostics
print("Detecting GPUs...")
gpus = get_gpu_info()
print("\nGPUs detected:")
for i, gpu in enumerate(gpus):
    print(f"  {i + 1}. {gpu}")

# Filter NVIDIA and AMD GPUs
nvidia_gpus = [
    gpu
    for gpu in gpus
    if "NVIDIA" in gpu or "GeForce" in gpu or "RTX" in gpu or "GTX" in gpu
]
amd_gpus = [gpu for gpu in gpus if "AMD" in gpu or "Radeon" in gpu]

if nvidia_gpus:
    print("\nNVIDIA GPUs found:")
    for i, gpu in enumerate(nvidia_gpus):
        print(f"  {i + 1}. {gpu}")

    # Get NVIDIA driver info
    driver_version = get_nvidia_driver_version()
    print(f"\nNVIDIA Driver Version: {driver_version}")
else:
    print("\nNo NVIDIA GPUs detected")

if amd_gpus:
    print("\nAMD GPUs found:")
    for i, gpu in enumerate(amd_gpus):
        print(f"  {i + 1}. {gpu}")

Detecting GPUs...

GPUs detected:
  1. Non-Windows platform - can't use wmic

No NVIDIA GPUs detected


## 3️⃣ Diagnose PyTorch GPU Issues

Let's determine why PyTorch isn't accessing your GPU.

In [3]:
def check_pytorch_gpu_support():
    try:
        import torch

        issues = []

        # Check 1: Is PyTorch built with CUDA?
        has_cuda_build = (
            hasattr(torch.version, "cuda") and torch.version.cuda is not None
        )
        if not has_cuda_build:
            issues.append("PyTorch installation doesn't include CUDA support")

        # Check 2: Is CUDA available at runtime?
        if not torch.cuda.is_available():
            if has_cuda_build:
                issues.append(
                    "PyTorch has CUDA support, but can't access CUDA at runtime"
                )

        # Check 3: Try to create a CUDA tensor
        if torch.cuda.is_available():
            try:
                x = torch.ones(1, device="cuda")
                print("Successfully created a CUDA tensor!")
            except Exception as e:
                issues.append(f"Failed to create CUDA tensor: {e}")

        return issues
    except ImportError:
        return ["PyTorch is not installed"]


# Check if CUDA is generally available on the system
def check_system_cuda():
    try:
        # Try to run nvidia-smi command
        result = subprocess.run(
            ["nvidia-smi"],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            check=False,
        )
        if result.returncode == 0:
            return True, result.stdout
        else:
            return False, result.stderr
    except Exception as e:
        return False, f"Error running nvidia-smi: {e}"


# Run diagnostics
issues = check_pytorch_gpu_support()
if issues:
    print("Issues detected with PyTorch GPU support:")
    for i, issue in enumerate(issues):
        print(f"  {i + 1}. {issue}")
else:
    print("No issues detected with PyTorch GPU support")

# Check system CUDA
print("\nChecking if CUDA is generally available on your system...")
cuda_available, cuda_output = check_system_cuda()
if cuda_available:
    print("CUDA is available on your system! nvidia-smi output:")
    print("\n".join(cuda_output.split("\n")[:10]))  # Show first 10 lines only
else:
    print(f"CUDA is not generally available on your system: {cuda_output}")

Successfully created a CUDA tensor!
No issues detected with PyTorch GPU support

Checking if CUDA is generally available on your system...
CUDA is available on your system! nvidia-smi output:
Thu Sep 25 12:30:59 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.153.02             Driver Version: 570.153.02     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 4060 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   61C    P3              8W /   55W |     813MiB /   8188MiB |     16%      Default |


## 4️⃣ Verify PyTorch Installation Source

Let's check how PyTorch was installed to determine if it's the CUDA version.

In [4]:
def check_install_source():
    try:
        # Get the location of PyTorch
        import torch
        import inspect

        torch_location = inspect.getfile(torch)
        print(f"PyTorch is installed at: {torch_location}")

        # Try to get package info
        try:
            import pkg_resources

            torch_pkg = pkg_resources.get_distribution("torch")
            print(f"PyTorch package version: {torch_pkg.version}")
            print(f"Package location: {torch_pkg.location}")
        except Exception as e:
            print(f"Could not get package info: {e}")

        # Try to determine if this was installed with CUDA support
        if hasattr(torch.version, "cuda") and torch.version.cuda:
            print("This appears to be a CUDA-enabled build of PyTorch")
        else:
            print("This appears to be a CPU-only build of PyTorch")

    except ImportError:
        print("PyTorch is not installed")


print("Checking PyTorch installation details...")
check_install_source()

Checking PyTorch installation details...
PyTorch is installed at: /home/yrrrrrf/docs/lab/ai/.venv/lib/python3.12/site-packages/torch/__init__.py
PyTorch package version: 2.6.0+cu124
Package location: /home/yrrrrrf/docs/lab/ai/.venv/lib/python3.12/site-packages
This appears to be a CUDA-enabled build of PyTorch


  import pkg_resources


## 5️⃣ GPU Power Settings and Optimus Diagnosis

Let's check if this might be an issue with laptop hybrid graphics.

In [5]:
def check_hybrid_graphics():
    # Check if this is a laptop with dual graphics
    if platform.system() == "Windows" and len(nvidia_gpus) > 0 and len(amd_gpus) > 0:
        return True
    return False


has_hybrid = check_hybrid_graphics()

if has_hybrid:
    print("Your system has hybrid graphics (AMD + NVIDIA)")
    print(
        "This is common in laptops and can cause issues with GPU recognition unless configured correctly."
    )
    print("\nPossible issues:")
    print("1. Your laptop might be in power-saving mode, which disables the NVIDIA GPU")
    print(
        "2. The NVIDIA GPU might not be set as the preferred graphics processor for Python"
    )
    print("3. NVIDIA Optimus technology might be preventing direct access to the GPU")
else:
    print("Your system doesn't appear to have hybrid graphics")

Your system doesn't appear to have hybrid graphics


## 6️⃣ Path Forward: What To Do Next

Based on the diagnostics, here are the recommended steps to get your GPU working with PyTorch.

In [6]:
def recommend_solution():
    has_nvidia = len(nvidia_gpus) > 0
    has_cuda_build = hasattr(torch.version, "cuda") and torch.version.cuda is not None
    cuda_working = torch.cuda.is_available() if "torch" in sys.modules else False

    print(f"{'=' * 80}")
    print("DIAGNOSIS AND SOLUTION RECOMMENDATIONS".center(80))
    print(f"{'=' * 80}")

    if not has_nvidia:
        print("\nIssue: No NVIDIA GPU detected in the system")
        print(
            "Solution: If you believe you have an NVIDIA GPU, check if it's properly installed and recognized by Windows."
        )
        return

    if not has_cuda_build:
        print(
            "\nIssue: Your PyTorch installation is CPU-only (doesn't include CUDA support)"
        )
        print("\nSolution: Reinstall PyTorch with CUDA support.")
        print("\nUsing uv:")
        print(
            "uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121"
        )
        print("\nOr using pip:")
        print(
            "pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121"
        )
        print("\nAfter installation, restart your kernel and rerun this notebook.")
        return

    if has_cuda_build and not cuda_working:
        print("\nIssue: PyTorch has CUDA support but can't access the GPU at runtime")

        if has_hybrid:
            print(
                "\nLikely cause: Hybrid graphics configuration issue (NVIDIA Optimus)"
            )
            print("\nSolutions to try:")
            print(
                "1. Set your laptop to high-performance mode in Windows power settings"
            )
            print(
                "2. Open NVIDIA Control Panel > Manage 3D Settings > Program Settings"
            )
            print("   - Add python.exe and jupyter.exe to the list")
            print("   - Set them to use the 'High-performance NVIDIA processor'")
            print(
                "3. If available, check your laptop's BIOS settings for graphics options"
            )
            print(
                "   - Some laptops allow disabling hybrid graphics or setting a preference"
            )
            print("\nAlternatively, try a different approach:")
            print("1. Uninstall PyTorch: uv pip uninstall torch torchvision torchaudio")
            print(
                "2. Download the wheel files directly from https://download.pytorch.org/whl/cu121/torch/"
            )
            print("3. Install them using: uv pip install <downloaded-wheel-file>")
        else:
            print("\nLikely causes:")
            print(
                "1. Incompatible CUDA version: The CUDA version in PyTorch doesn't match your drivers"
            )
            print("2. Missing CUDA toolkit or incorrect environment variables")
            print("\nSolutions to try:")
            print("1. Update your NVIDIA drivers to the latest version")
            print("2. Reinstall PyTorch with a different CUDA version:")
            print(
                "   - For CUDA 11.8: uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118"
            )
            print(
                "   - For CUDA 12.1: uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121"
            )
        return

    print(
        "\nYour PyTorch installation appears to have GPU support, but there may be another issue."
    )
    print("Please check the error messages above for more specific information.")


# Run the recommendation engine
try:
    recommend_solution()
except Exception as e:
    print(f"Error generating recommendations: {e}")

                     DIAGNOSIS AND SOLUTION RECOMMENDATIONS                     

Issue: No NVIDIA GPU detected in the system
Solution: If you believe you have an NVIDIA GPU, check if it's properly installed and recognized by Windows.


## 7️⃣ Additional Information for Debugging

If you're still having issues, let's collect some more detailed information.

In [7]:
print("Collecting detailed system information for debugging...")

# Python environment
print(f"\nPython executable: {sys.executable}")
print(f"Python version: {sys.version}")
print(f"Python path: {sys.path}")

# Try to get pip list
try:
    print("\nInstalled packages:")
    pip_list = subprocess.check_output([sys.executable, "-m", "pip", "list"]).decode()
    print(pip_list)
except Exception as e:
    print(f"Error getting pip list: {e}")

# Environment variables related to CUDA
print("\nCUDA-related environment variables:")
for key, value in os.environ.items():
    if "CUDA" in key or "NVIDIA" in key or "GPU" in key or "PATH" in key:
        print(f"{key}: {value}")

Collecting detailed system information for debugging...

Python executable: /home/yrrrrrf/docs/lab/ai/.venv/bin/python
Python version: 3.12.11 (main, Jul 11 2025, 22:43:48) [Clang 20.1.4 ]
Python path: ['/home/yrrrrrf/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python312.zip', '/home/yrrrrrf/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12', '/home/yrrrrrf/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/lib-dynload', '', '/home/yrrrrrf/docs/lab/ai/.venv/lib/python3.12/site-packages', '/home/yrrrrrf/docs/lab/ai/.venv/lib/python3.12/site-packages/setuptools/_vendor']

Installed packages:
Error getting pip list: Command '['/home/yrrrrrf/docs/lab/ai/.venv/bin/python', '-m', 'pip', 'list']' returned non-zero exit status 1.

CUDA-related environment variables:
NIX_LD_LIBRARY_PATH: /run/current-system/sw/share/nix-ld/lib
PKG_CONFIG_PATH: /nix/store/az8d7l0d6qxj63mggb1w8ma37s1zyy7l-systemd-257.9-dev/lib/pkgconfig:/nix/store/284df5f5m

/home/yrrrrrf/docs/lab/ai/.venv/bin/python: No module named pip


## 8️⃣ Test PyTorch CPU Performance

Until we get your GPU working, let's see how your CPU performs with PyTorch.

In [8]:
import torch
import time


def test_cpu_performance():
    print("Testing PyTorch CPU performance...")

    # Test matrix multiplication with different sizes
    sizes = [1000, 2000, 4000]

    for size in sizes:
        print(f"\nMatrix multiplication with size {size}x{size}")

        # Create random matrices
        a = torch.randn(size, size)
        b = torch.randn(size, size)

        # Warm-up run
        _ = torch.matmul(a, b)

        # Timed run
        start_time = time.time()
        _ = torch.matmul(a, b)
        end_time = time.time()

        print(f"Time taken: {end_time - start_time:.4f} seconds")


try:
    test_cpu_performance()
except Exception as e:
    print(f"Error during CPU performance test: {e}")

Testing PyTorch CPU performance...

Matrix multiplication with size 1000x1000
Time taken: 0.0136 seconds

Matrix multiplication with size 2000x2000
Time taken: 0.0499 seconds

Matrix multiplication with size 4000x4000
Time taken: 0.2180 seconds


## 9️⃣ Summary and Next Steps

Here's a summary of what we found and what you should do next.

In [9]:
def print_summary():
    import torch

    print(f"{'=' * 80}")
    print("GPU TROUBLESHOOTING SUMMARY".center(80))
    print(f"{'=' * 80}")

    print("\nSystem:")
    print(f"  Operating System: {platform.system()} {platform.release()}")
    print(f"  Architecture: {platform.machine()}")
    print(f"  Python: {sys.version.split()[0]}")

    print("\nPyTorch:")
    print(f"  Version: {torch.__version__}")
    print(
        f"  CUDA Support: {'Yes' if hasattr(torch.version, 'cuda') and torch.version.cuda else 'No'}"
    )
    if hasattr(torch.version, "cuda") and torch.version.cuda:
        print(f"  CUDA Version: {torch.version.cuda}")
    print(
        f"  CUDA Available at Runtime: {'Yes' if torch.cuda.is_available() else 'No'}"
    )

    print("\nGPU:")
    if nvidia_gpus:
        print(f"  NVIDIA GPU: {nvidia_gpus[0]}")
    else:
        print("  NVIDIA GPU: None detected")

    if amd_gpus:
        print(f"  AMD GPU: {amd_gpus[0]}")

    # Assess the situation
    has_nvidia = len(nvidia_gpus) > 0
    has_cuda_build = hasattr(torch.version, "cuda") and torch.version.cuda is not None
    cuda_working = torch.cuda.is_available()

    print("\nDiagnosis:")
    if not has_nvidia:
        print("  No NVIDIA GPU detected")
    elif not has_cuda_build:
        print("  PyTorch doesn't have CUDA support")
    elif not cuda_working:
        print("  PyTorch has CUDA support but can't access the GPU")
    else:
        print("  Everything appears to be working correctly")

    print("\nNext Steps:")
    if has_nvidia and not has_cuda_build:
        print("  1. Reinstall PyTorch with CUDA support using:")
        print(
            "     uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121"
        )
    elif has_nvidia and has_cuda_build and not cuda_working:
        print(
            "  1. Check NVIDIA Control Panel settings (make sure Python uses the NVIDIA GPU)"
        )
        print("  2. Make sure your laptop is in high-performance mode")
        print(
            "  3. Try a different CUDA version of PyTorch (cu118 instead of cu121 or vice versa)"
        )
    elif not has_nvidia:
        print(
            "  1. If you believe you have an NVIDIA GPU, check if it's properly installed"
        )
    else:
        print("  Your GPU setup appears to be working correctly with PyTorch")

    print(f"\n{'=' * 80}")


try:
    print_summary()
except Exception as e:
    print(f"Error generating summary: {e}")

                          GPU TROUBLESHOOTING SUMMARY                           

System:
  Operating System: Linux 6.12.48
  Architecture: x86_64
  Python: 3.12.11

PyTorch:
  Version: 2.6.0+cu124
  CUDA Support: Yes
  CUDA Version: 12.4
  CUDA Available at Runtime: Yes

GPU:
  NVIDIA GPU: None detected

Diagnosis:
  No NVIDIA GPU detected

Next Steps:
  1. If you believe you have an NVIDIA GPU, check if it's properly installed

