# Basic Mamba Installation and Functionality Test (WSL2)

## Environment Setup

This notebook is designed to run in **WSL2 with the `mamba-env` conda environment**.

### Prerequisites

Follow the setup guide at `docs/WSL2_SETUP_GUIDE.md`

## Verify Installation

Check that packages can be imported successfully.

In [1]:
# Test imports
try:
    import mamba_ssm
    print("✓ mamba_ssm imported successfully")
    print(f"  Version: {mamba_ssm.__version__ if hasattr(mamba_ssm, '__version__') else 'Unknown'}")
except ImportError as e:
    print(f"✗ Failed to import mamba_ssm: {e}")

try:
    import causal_conv1d
    print("✓ causal_conv1d imported successfully")
except ImportError as e:
    print(f"✗ Failed to import causal_conv1d: {e}")

✓ mamba_ssm imported successfully
  Version: 2.2.6.post3
✓ causal_conv1d imported successfully


## Check PyTorch and CUDA Availability

Verify that PyTorch is installed and CUDA is available.

In [2]:
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Current device: {torch.cuda.current_device()}")
    print(f"Device name: {torch.cuda.get_device_name(0)}")
    print(f"Device count: {torch.cuda.device_count()}")
else:
    print("⚠️  CUDA not available - Mamba may not work optimally")
    print("   Consider using a machine with CUDA support for best performance")

PyTorch version: 2.4.1+cu121
CUDA available: True
CUDA version: 12.1
Current device: 0
Device name: NVIDIA GeForce RTX 4060
Device count: 1


## Test Basic Mamba Forward Pass

Create a simple Mamba layer and test a forward pass with random data.

**Test Configuration**:
- Input: (batch=2, sequence_length=64, dimension=128)
- Model: Mamba with d_model=128, d_state=16, d_conv=4, expand=2

In [3]:
from mamba_ssm import Mamba

# Test configuration
batch_size = 2
sequence_length = 64
dim = 128

# Create random input tensor
device = 'cuda' if torch.cuda.is_available() else 'cpu'
x = torch.randn(batch_size, sequence_length, dim).to(device)

print(f"Input shape: {x.shape}")
print(f"Device: {device}")
print(f"Memory allocated: {torch.cuda.memory_allocated() / 1024**2:.2f} MB" if torch.cuda.is_available() else "")

Input shape: torch.Size([2, 64, 128])
Device: cuda
Memory allocated: 0.06 MB


In [4]:
# Create Mamba model
model = Mamba(
    d_model=dim,      # Model dimension
    d_state=16,       # SSM state dimension
    d_conv=4,         # Local convolution width
    expand=2          # Expansion factor
).to(device)

print("✓ Mamba model created successfully")
print(f"  Model parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"  Model device: {next(model.parameters()).device}")

✓ Mamba model created successfully
  Model parameters: 116,480
  Model device: cuda:0


In [5]:
# Test forward pass
try:
    with torch.no_grad():
        y = model(x)
    
    print("✓ Forward pass successful!")
    print(f"  Input shape:  {x.shape}")
    print(f"  Output shape: {y.shape}")
    print(f"  Output dtype: {y.dtype}")
    print(f"  Output device: {y.device}")
    
    # Check for NaN or Inf
    if torch.isnan(y).any():
        print("⚠️  Warning: Output contains NaN values")
    elif torch.isinf(y).any():
        print("⚠️  Warning: Output contains Inf values")
    else:
        print("✓ Output is numerically stable (no NaN/Inf)")
    
    # Show output statistics
    print(f"\nOutput statistics:")
    print(f"  Mean: {y.mean().item():.4f}")
    print(f"  Std:  {y.std().item():.4f}")
    print(f"  Min:  {y.min().item():.4f}")
    print(f"  Max:  {y.max().item():.4f}")
    
except Exception as e:
    print(f"✗ Forward pass failed: {e}")
    import traceback
    traceback.print_exc()

✓ Forward pass successful!
  Input shape:  torch.Size([2, 64, 128])
  Output shape: torch.Size([2, 64, 128])
  Output dtype: torch.float32
  Output device: cuda:0
✓ Output is numerically stable (no NaN/Inf)

Output statistics:
  Mean: -0.0002
  Std:  0.0421
  Min:  -0.1791
  Max:  0.1942


## Test Gradient Flow (Training Readiness)

Verify that gradients can be computed (important for training).

In [6]:
# Test gradient computation
try:
    # Forward pass with gradients enabled
    x_grad = torch.randn(batch_size, sequence_length, dim, requires_grad=True).to(device)
    y_grad = model(x_grad)
    
    # Compute a dummy loss and backpropagate
    loss = y_grad.mean()
    loss.backward()
    
    print("✓ Gradient computation successful!")
    print(f"  Loss value: {loss.item():.4f}")
    
    # Check model parameter gradients
    grad_params = [p for p in model.parameters() if p.grad is not None]
    print(f"  Parameters with gradients: {len(grad_params)}/{len(list(model.parameters()))}")
    
    if len(grad_params) > 0:
        avg_grad = torch.stack([p.grad.abs().mean() for p in grad_params]).mean()
        print(f"  Average gradient magnitude: {avg_grad.item():.6f}")
    
    print("✓ Model is ready for training!")
    
except Exception as e:
    print(f"✗ Gradient computation failed: {e}")
    import traceback
    traceback.print_exc()

✓ Gradient computation successful!
  Loss value: 0.0004
  Parameters with gradients: 9/9
  Average gradient magnitude: 0.000026
✓ Model is ready for training!


## Environment Summary

Display the complete working environment configuration.

In [7]:
import sys
import subprocess

print("=" * 60)
print("WORKING ENVIRONMENT CONFIGURATION")
print("=" * 60)

# Python info
print(f"\nPython:")
print(f"  Version: {sys.version.split()[0]}")
print(f"  Executable: {sys.executable}")

# PyTorch info
print(f"\nPyTorch:")
print(f"  Version: {torch.__version__}")
print(f"  CUDA available: {torch.cuda.is_available()}")
print(f"  CUDA version: {torch.version.cuda}")
print(f"  cuDNN version: {torch.backends.cudnn.version()}")

# NumPy info
import numpy as np
print(f"\nNumPy:")
print(f"  Version: {np.__version__}")

# Mamba info
import mamba_ssm
print(f"\nMamba-SSM:")
print(f"  Version: {mamba_ssm.__version__ if hasattr(mamba_ssm, '__version__') else '2.2.6.post3'}")

# CUDA toolkit info (from system)
try:
    nvcc_out = subprocess.check_output(['nvcc', '--version'], stderr=subprocess.STDOUT, text=True)
    cuda_version = [line for line in nvcc_out.split('\n') if 'release' in line.lower()][0]
    print(f"\nCUDA Toolkit:")
    print(f"  {cuda_version.strip()}")
except:
    print("\nCUDA Toolkit: Unable to determine")

# GPU info
if torch.cuda.is_available():
    print(f"\nGPU:")
    print(f"  Device: {torch.cuda.get_device_name(0)}")
    print(f"  Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

print("\n" + "=" * 60)
print("✓ All components configured correctly for Mamba training!")
print("=" * 60)

WORKING ENVIRONMENT CONFIGURATION

Python:
  Version: 3.10.19
  Executable: /home/jason/miniconda3/envs/mamba-env/bin/python

PyTorch:
  Version: 2.4.1+cu121
  CUDA available: True
  CUDA version: 12.1
  cuDNN version: 90100

NumPy:
  Version: 1.26.4

Mamba-SSM:
  Version: 2.2.6.post3

CUDA Toolkit:
  Cuda compilation tools, release 12.6, V12.6.20

GPU:
  Device: NVIDIA GeForce RTX 4060
  Memory: 8.0 GB

✓ All components configured correctly for Mamba training!
