## GPU JHub Testing Notebook

Notebook used for first pass testing of the environment and GPU access. Here are the various JS2 GPU instance [flavors](https://docs.jetstream-cloud.org/general/instance-flavors/#jetstream2-gpu).

Note: this also tests PyTorch install, as of Novembeerr 2024, I hope to not use tensorflow for work at UCAR / Unidata. This entire notebook should run without any errors.  

In [1]:
import psutil
import platform
import sys

import torch
import platform

In [2]:
def get_simple_system_info():
    # Memory info
    memory = psutil.virtual_memory()
    ram_gb = memory.total / (1024 ** 3)  # Convert to GB
    ram_used_gb = memory.used / (1024 ** 3)
    
    # CPU info
    cpu_cores = psutil.cpu_count()
    cpu_usage = psutil.cpu_percent(interval=1)
    
    print(f"Python Version: {platform.python_version()}")
    print(f"\nCPU:")
    print(f"- Cores: {cpu_cores}")
    print(f"- Current Usage: {cpu_usage}%")
    print(f"\nRAM:")
    print(f"- Total: {ram_gb:.1f} GB")
    print(f"- Used: {ram_used_gb:.1f} GB")
    print(f"- Usage: {memory.percent}%")

In [3]:
get_simple_system_info()

Python Version: 3.10.15

CPU:
- Cores: 8
- Current Usage: 1.0%

RAM:
- Total: 29.4 GB
- Used: 1.4 GB
- Usage: 6.1%


In [4]:
!nvidia-smi

Thu Nov 21 16:36:04 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06             Driver Version: 535.183.06   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  GRID A100X-10C                 On  | 00000000:04:00.0 Off |                    0 |
| N/A   N/A    P0              N/A /  N/A |      0MiB / 10240MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [5]:
def get_pytorch_info():
    print("PyTorch System Information")
    print("-" * 30)
    
    # PyTorch version
    print(f"PyTorch Version: {torch.__version__}")
    
    # CUDA availability
    print(f"\nCUDA Available: {torch.cuda.is_available()}")
    
    if torch.cuda.is_available():
        # Current device information
        current_device = torch.cuda.current_device()
        print(f"Current CUDA Device: {current_device}")
        
        # Device name
        print(f"Device Name: {torch.cuda.get_device_name(current_device)}")
        
        # CUDA version
        print(f"CUDA Version: {torch.version.cuda}")
        
        # Number of CUDA devices
        print(f"Device Count: {torch.cuda.device_count()}")
        
        # Memory information
        print("\nGPU Memory Information:")
        print(f"- Total: {torch.cuda.get_device_properties(current_device).total_memory / 1024**3:.2f} GB")
        print(f"- Allocated: {torch.cuda.memory_allocated(current_device) / 1024**3:.2f} GB")
        print(f"- Cached: {torch.cuda.memory_reserved(current_device) / 1024**3:.2f} GB")
        
        # Architecture information
        device_props = torch.cuda.get_device_properties(current_device)
        print(f"\nGPU Architecture:")
        print(f"- GPU Compute Capability: {device_props.major}.{device_props.minor}")
        print(f"- Multi Processors: {device_props.multi_processor_count}")
    else:
        print("\nNo CUDA GPU available. PyTorch will run on CPU only.")
        print(f"CPU Architecture: {platform.machine()}")
        print(f"CPU Type: {platform.processor()}")

In [6]:
get_pytorch_info()

PyTorch System Information
------------------------------
PyTorch Version: 2.5.1+cu124

CUDA Available: True
Current CUDA Device: 0
Device Name: GRID A100X-10C
CUDA Version: 12.4
Device Count: 1

GPU Memory Information:
- Total: 10.00 GB
- Allocated: 0.00 GB
- Cached: 0.00 GB

GPU Architecture:
- GPU Compute Capability: 8.0
- Multi Processors: 108


In [7]:
def get_instance_type():
    cpu_count = psutil.cpu_count()
    ram_gb = psutil.virtual_memory().total / (1024**3)
    gpu_ram = 0
    
    if torch.cuda.is_available():
        current_device = torch.cuda.current_device()
        gpu_ram = torch.cuda.get_device_properties(current_device).total_memory / (1024**3)
    
    if cpu_count == 4 and 13 <= ram_gb <= 17 and 7 <= gpu_ram <= 9:
        return "g3.small"
    elif cpu_count == 8 and 28 <= ram_gb <= 32 and 9 <= gpu_ram <= 11:
        return "g3.medium"
    elif cpu_count == 16 and 58 <= ram_gb <= 62 and 19 <= gpu_ram <= 21:
        return "g3.large"
    elif cpu_count == 32 and 123 <= ram_gb <= 127 and 39 <= gpu_ram <= 41:
        return "g3.xl"
    else:
        return "custom"

In [8]:
get_instance_type()

'g3.medium'