In [1]:
!git clone https://github.com/ToelUl/EfficientSVD.git

!cp -r EfficientSVD/universal_svd ./

# Experiencing EfficientSVD in Google Colab

This notebook provides a hands-on demonstration of the `EfficientSVD` class. `EfficientSVD` is a Python module designed to compute Singular Value Decomposition (SVD) efficiently by leveraging optimal backends (PyTorch, SciPy, Scikit-learn) based on the input matrix type, size, and desired computation method.

**Goals:**
* Install necessary libraries.
* Define the `EfficientSVD` class.
* Demonstrate various SVD computation methods:
    * `auto`: Automatic backend selection.
    * `full`: Computes the complete SVD.
    * `truncated`: Computes the top `k` singular values/vectors (using SciPy or Scikit-learn).
    * `randomized`: Computes an approximate truncated SVD using randomized algorithms (Scikit-learn).
    * `values_only`: Computes only the singular values efficiently (using PyTorch if available).
* Show usage with different input types:
    * NumPy arrays
    * SciPy sparse matrices
    * PyTorch tensors (including GPU acceleration if available)

**Running the Notebook:**
* Execute the cells sequentially using Shift+Enter or the "Run" button.
* Ensure the runtime type is set appropriately (CPU, GPU, or TPU) via `Runtime > Change runtime type`. GPU is recommended if you want to test PyTorch GPU acceleration in Example 6.

In [2]:
# @title Setup: Install Libraries
# Install necessary libraries. PyTorch, SciPy, and Scikit-learn provide different SVD backends.
# Use -q for quieter installation in Colab.
print("Installing required libraries...")
!pip install -q numpy scipy scikit-learn torch
print("Installation complete.")

# Import base libraries required immediately
import numpy as np
import warnings
import time

# Check available libraries after installation
try:
    import torch
    _TORCH_AVAILABLE = True
    print(f"PyTorch version: {torch.__version__} (Available: {_TORCH_AVAILABLE})")
    if torch.cuda.is_available():
        print(f"GPU available: {torch.cuda.get_device_name(0)}")
    else:
        print("GPU not available or PyTorch CUDA build not installed.")
except ImportError:
    _TORCH_AVAILABLE = False
    print("PyTorch not found.")

try:
    import scipy
    from scipy.sparse import spmatrix as scipy_sparse_matrix
    from scipy.sparse.linalg import svds as scipy_svds
    from scipy.sparse import random as sparse_random
    _SCIPY_AVAILABLE = True
    print(f"SciPy version: {scipy.__version__} (Available: {_SCIPY_AVAILABLE})")
except ImportError:
    scipy_sparse_matrix = None
    scipy_svds = None
    sparse_random = None
    _SCIPY_AVAILABLE = False
    print("SciPy not found.")

try:
    import sklearn
    from sklearn.decomposition import TruncatedSVD as SklearnTruncatedSVD
    from sklearn.utils.extmath import randomized_svd as sklearn_randomized_svd
    from sklearn.utils.validation import check_random_state
    _SKLEARN_AVAILABLE = True
    print(f"Scikit-learn version: {sklearn.__version__} (Available: {_SKLEARN_AVAILABLE})")
except ImportError:
    SklearnTruncatedSVD = None
    sklearn_randomized_svd = None
    check_random_state = None
    _SKLEARN_AVAILABLE = False
    print("Scikit-learn not found.")

from universal_svd import EfficientSVD
print("EfficientSVD class defined.")

Installing required libraries...
Installation complete.
PyTorch version: 2.5.1+cu124 (Available: True)
GPU available: NVIDIA GeForce RTX 3060 Laptop GPU
SciPy version: 1.15.2 (Available: True)
Scikit-learn version: 1.6.1 (Available: True)
EfficientSVD class defined.


## Demonstration Setup

Let's create some test matrices:
1.  A **dense NumPy array**.
2.  A **sparse SciPy matrix** (if SciPy is available).
3.  A **PyTorch tensor** (if PyTorch is available).

We'll also define some common parameters for the SVD computations.

In [3]:
# @title Prepare Example Data

# --- Configuration ---
M, N = 1000, 500       # Dimensions for larger matrices
M_small, N_small = 100, 50 # Dimensions for full SVD example
K_COMPONENTS = 20      # Number of components for truncated/randomized SVD
SPARSITY = 0.05        # Sparsity level for the sparse matrix
RANDOM_SEED = 42

np.random.seed(RANDOM_SEED)

# --- Create Dense NumPy Matrix ---
print(f"\nCreating a dense NumPy matrix (matrix_np) with shape ({M}x{N})")
matrix_np = np.random.rand(M, N).astype(np.float32)
print(f"matrix_np: type={type(matrix_np)}, shape={matrix_np.shape}, dtype={matrix_np.dtype}")

# --- Create Smaller Dense NumPy Matrix (for Full SVD) ---
print(f"\nCreating a smaller dense NumPy matrix (matrix_np_small) with shape ({M_small}x{N_small})")
matrix_np_small = np.random.rand(M_small, N_small).astype(np.float32)
print(f"matrix_np_small: type={type(matrix_np_small)}, shape={matrix_np_small.shape}, dtype={matrix_np_small.dtype}")


# --- Create Sparse SciPy Matrix ---
matrix_sparse = None
if _SCIPY_AVAILABLE and sparse_random is not None:
    print(f"\nCreating a sparse SciPy CSR matrix (matrix_sparse) with shape ({M}x{N}) and sparsity {SPARSITY}")
    matrix_sparse = sparse_random(M, N, density=SPARSITY, format='csr', random_state=RANDOM_SEED).astype(np.float32)
    print(f"matrix_sparse: type={type(matrix_sparse)}, shape={matrix_sparse.shape}, nnz={matrix_sparse.nnz}, dtype={matrix_sparse.dtype}")
else:
    print("\nSkipping sparse matrix creation: SciPy not available or `sparse_random` failed to import.")

# --- Create PyTorch Tensor ---
matrix_torch = None
if _TORCH_AVAILABLE:
    print(f"\nCreating a PyTorch tensor (matrix_torch) with shape ({M}x{N}) from the NumPy matrix")
    matrix_torch = torch.from_numpy(matrix_np.copy()) # Use copy to avoid sharing memory if numpy array is modified later
    print(f"matrix_torch: type={type(matrix_torch)}, shape={matrix_torch.shape}, dtype={matrix_torch.dtype}, device={matrix_torch.device}")
    # Optional: Move to GPU if available
    # if torch.cuda.is_available():
    #     try:
    #         matrix_torch = matrix_torch.cuda()
    #         print(f"Moved matrix_torch to GPU: {matrix_torch.device}")
    #     except Exception as e:
    #         print(f"Failed to move tensor to GPU: {e}")
else:
    print("\nSkipping PyTorch tensor creation: PyTorch not available.")

# --- Instantiate the SVD Computer ---
svd_computer = EfficientSVD(random_state=RANDOM_SEED) # Set default random state for reproducibility
print("\nEfficientSVD instance created.")


Creating a dense NumPy matrix (matrix_np) with shape (1000x500)
matrix_np: type=<class 'numpy.ndarray'>, shape=(1000, 500), dtype=float32

Creating a smaller dense NumPy matrix (matrix_np_small) with shape (100x50)
matrix_np_small: type=<class 'numpy.ndarray'>, shape=(100, 50), dtype=float32

Creating a sparse SciPy CSR matrix (matrix_sparse) with shape (1000x500) and sparsity 0.05
matrix_sparse: type=<class 'scipy.sparse._csr.csr_matrix'>, shape=(1000, 500), nnz=25000, dtype=float32

Creating a PyTorch tensor (matrix_torch) with shape (1000x500) from the NumPy matrix
matrix_torch: type=<class 'torch.Tensor'>, shape=torch.Size([1000, 500]), dtype=torch.float32, device=cpu

EfficientSVD instance created.


## Example 1: Auto Method

Let `EfficientSVD` automatically choose the best backend for truncated SVD on the dense NumPy matrix. We expect it to likely use a randomized or truncated method since `k` is specified and relatively small compared to matrix dimensions.

In [4]:
# @title Example 1: Auto SVD (k=K_COMPONENTS) on NumPy Array

print(f"\n--- Example 1: Auto SVD (compute_uv=True, k={K_COMPONENTS}) on Dense NumPy Array ---")
try:
    # Explicitly pass k, let method='auto' decide the backend
    result = svd_computer.compute(matrix_np, k=K_COMPONENTS, compute_uv=True, method='auto')

    if result is not None and isinstance(result, tuple):
        U, S, Vh = result
        print(f"SVD computation successful.")
        if U is not None: print(f"U shape: {U.shape}")
        if S is not None: print(f"S shape: {S.shape}")
        if Vh is not None: print(f"Vh shape: {Vh.shape}")
        print(f"Top 5 Singular Values: {S[:5]}")

        # Optional: Check reconstruction error (meaningful for truncated/randomized)
        if U is not None and S is not None and Vh is not None:
             reconstructed_matrix = U @ np.diag(S) @ Vh
             norm_diff = np.linalg.norm(matrix_np - reconstructed_matrix)
             norm_orig = np.linalg.norm(matrix_np)
             if norm_orig > 0:
                print(f"Relative reconstruction error: {norm_diff / norm_orig:.4e}")
             else:
                 print("Original matrix norm is zero, cannot compute relative error.")

except Exception as e:
    print(f"Error during Auto SVD: {e}")


--- Example 1: Auto SVD (compute_uv=True, k=20) on Dense NumPy Array ---
SVD computation successful.
U shape: (1000, 20)
S shape: (20,)
Vh shape: (20, 500)
Top 5 Singular Values: [353.92874   15.408077  15.321458  15.289583  15.168637]
Relative reconstruction error: 4.7357e-01


## Example 2: Full SVD

Compute the full SVD. This can be slow and memory-intensive for large matrices, so we use the smaller dense matrix (`matrix_np_small`). We explicitly request `method='full'`. The backend will be PyTorch (if available) or NumPy.

In [5]:
# @title Example 2: Full SVD on Smaller NumPy Array

print("\n--- Example 2: Full SVD (compute_uv=True) on Smaller Dense NumPy Array ---")
try:
    result_full = svd_computer.compute(matrix_np_small, method='full', compute_uv=True)

    if result_full is not None and isinstance(result_full, tuple):
        U_full, S_full, Vh_full = result_full
        print(f"Full SVD computation successful.")
        if U_full is not None: print(f"U_full shape: {U_full.shape}")
        if S_full is not None: print(f"S_full shape: {S_full.shape}")
        if Vh_full is not None: print(f"Vh_full shape: {Vh_full.shape}")
        print(f"Top 5 Singular Values: {S_full[:5]}")

        # Optional: Check full reconstruction error (should be close to zero)
        if U_full is not None and S_full is not None and Vh_full is not None:
            # Need to handle shapes carefully for full SVD reconstruction
            # U is (m, m), S is (min(m,n),), Vh is (n, n) -> use U[:, :k] @ diag(S) @ Vh[:k, :] where k=len(S)
            k_full = S_full.size
            reconstructed_full = U_full[:, :k_full] @ np.diag(S_full) @ Vh_full[:k_full, :]
            norm_diff_full = np.linalg.norm(matrix_np_small - reconstructed_full)
            norm_orig_full = np.linalg.norm(matrix_np_small)
            if norm_orig_full > 0:
                print(f"Relative reconstruction error (full): {norm_diff_full / norm_orig_full:.4e}")
            else:
                print("Original matrix norm is zero, cannot compute relative error.")

except Exception as e:
    print(f"Error during Full SVD: {e}")


--- Example 2: Full SVD (compute_uv=True) on Smaller Dense NumPy Array ---
Full SVD computation successful.
U_full shape: (100, 100)
S_full shape: (50,)
Vh_full shape: (50, 50)
Top 5 Singular Values: [35.535755   4.735351   4.539307   4.4508214  4.27382  ]
Relative reconstruction error (full): 1.4032e+00


## Example 3: Explicitly Randomized SVD

Force the use of `method='randomized'` (requires Scikit-learn). This computes an approximate SVD, often faster for large matrices than exact truncated methods. We also pass an extra backend-specific argument `n_iter`.

In [6]:
# @title Example 3: Randomized SVD (k=K_COMPONENTS) on NumPy Array

print(f"\n--- Example 3: Randomized SVD (compute_uv=True, k={K_COMPONENTS}) on Dense NumPy Array ---")
if _SKLEARN_AVAILABLE:
    try:
        # Pass extra argument n_iter for sklearn.utils.extmath.randomized_svd
        result_rand = svd_computer.compute(matrix_np, method='randomized', k=K_COMPONENTS, compute_uv=True, n_iter=7)

        if result_rand is not None and isinstance(result_rand, tuple):
            U_rand, S_rand, Vh_rand = result_rand
            print(f"Randomized SVD computation successful (n_iter=7).")
            if U_rand is not None: print(f"U_rand shape: {U_rand.shape}")
            if S_rand is not None: print(f"S_rand shape: {S_rand.shape}")
            if Vh_rand is not None: print(f"Vh_rand shape: {Vh_rand.shape}")
            print(f"Top 5 Singular Values: {S_rand[:5]}")

            # Optional: Check reconstruction error
            if U_rand is not None and S_rand is not None and Vh_rand is not None:
                 reconstructed_rand = U_rand @ np.diag(S_rand) @ Vh_rand
                 norm_diff_rand = np.linalg.norm(matrix_np - reconstructed_rand)
                 norm_orig = np.linalg.norm(matrix_np)
                 if norm_orig > 0:
                    print(f"Relative reconstruction error (randomized): {norm_diff_rand / norm_orig:.4e}")
                 else:
                    print("Original matrix norm is zero, cannot compute relative error.")

    except Exception as e:
        print(f"Error during Randomized SVD: {e}")
else:
    print("Skipping Randomized SVD example: Scikit-learn not available.")


--- Example 3: Randomized SVD (compute_uv=True, k=20) on Dense NumPy Array ---
Randomized SVD computation successful (n_iter=7).
U_rand shape: (1000, 20)
S_rand shape: (20,)
Vh_rand shape: (20, 500)
Top 5 Singular Values: [353.92874   15.408077  15.321458  15.289583  15.168637]
Relative reconstruction error (randomized): 4.7357e-01


## Example 4: Values Only

Compute only the singular values (`S`) using `method='values_only'`. If PyTorch is available, this should use the optimized `torch.linalg.svdvals`. Otherwise, it will likely fall back to computing the full SVD and discarding `U` and `Vh`.

In [7]:
# @title Example 4: Values Only SVD on NumPy Array

print("\n--- Example 4: Values Only SVD (compute_uv=False) on Dense NumPy Array ---")
try:
    # Request only singular values
    S_vals_only = svd_computer.compute(matrix_np, method='values_only', compute_uv=False)

    if S_vals_only is not None:
        print(f"Singular Values Only computation successful.")
        print(f"S_vals_only shape: {S_vals_only.shape}")
        print(f"Top 5 Singular Values: {S_vals_only[:5]}")
        print(f"Smallest 5 Singular Values: {S_vals_only[-5:]}") # Should be sorted descending

except Exception as e:
    print(f"Error during Values Only SVD: {e}")


--- Example 4: Values Only SVD (compute_uv=False) on Dense NumPy Array ---
Singular Values Only computation successful.
S_vals_only shape: (500,)
Top 5 Singular Values: [353.92877    15.480578   15.406039   15.3439455  15.256517 ]
Smallest 5 Singular Values: [2.895017  2.8555083 2.8287802 2.7462342 2.705585 ]


## Example 5: Sparse Matrix Input

Demonstrate using a SciPy sparse matrix as input. With `method='auto'` and a specified `k`, `EfficientSVD` should select an appropriate sparse SVD backend (likely `scipy.sparse.linalg.svds` or `sklearn.decomposition.TruncatedSVD`/`randomized_svd`).

In [8]:
# @title Example 5: Auto SVD (k=K_COMPONENTS) on Sparse Matrix

print(f"\n--- Example 5: Auto SVD (compute_uv=True, k={K_COMPONENTS}) on Sparse Matrix ---")
if matrix_sparse is not None:
    try:
        result_sparse = svd_computer.compute(matrix_sparse, k=K_COMPONENTS, compute_uv=True, method='auto')

        if result_sparse is not None and isinstance(result_sparse, tuple):
            U_sp, S_sp, Vh_sp = result_sparse
            print(f"SVD computation successful for sparse input.")
            if U_sp is not None: print(f"U_sp shape: {U_sp.shape}")
            if S_sp is not None: print(f"S_sp shape: {S_sp.shape}")
            if Vh_sp is not None: print(f"Vh_sp shape: {Vh_sp.shape}")
            print(f"Top 5 Singular Values: {S_sp[:5]}")

            # Reconstruction check is more complex/costly for sparse matrices, skipping detailed error calculation.
            print("Reconstruction check skipped for sparse matrix input.")

    except Exception as e:
        print(f"Error during SVD on sparse matrix: {e}")
else:
    print("Skipping Sparse Matrix SVD example: Sparse matrix was not created (SciPy might be missing).")


--- Example 5: Auto SVD (compute_uv=True, k=20) on Sparse Matrix ---
SVD computation successful for sparse input.
U_sp shape: (1000, 20)
S_sp shape: (20,)
Vh_sp shape: (20, 500)
Top 5 Singular Values: [18.386974   6.8904147  6.808587   6.783335   6.7562876]
Reconstruction check skipped for sparse matrix input.


## Example 6: PyTorch Tensor Input (with GPU option)

Show passing a PyTorch tensor directly. If a GPU is available and the tensor is moved to the GPU (uncomment the relevant lines in the data preparation cell), `EfficientSVD` should leverage PyTorch's GPU-accelerated SVD when appropriate (e.g., for `method='full'` or `method='values_only'`, or potentially some drivers used by `auto`). The results (`U`, `S`, `Vh`) are consistently returned as NumPy arrays.

In [9]:
# @title Example 6: Auto SVD (k=K_COMPONENTS) on PyTorch Tensor

print(f"\n--- Example 6: Auto SVD (compute_uv=True, k={K_COMPONENTS}) on PyTorch Tensor ---")
if matrix_torch is not None:
    # Optional: Try moving tensor to GPU right before compute if not done globally
    if torch.cuda.is_available() and matrix_torch.device.type != 'cuda':
         try:
             matrix_torch_gpu = matrix_torch.cuda()
             print(f"Attempting computation on GPU: {matrix_torch_gpu.device}")
             input_tensor = matrix_torch_gpu
         except Exception as e:
             print(f"Failed to move tensor to GPU for this example: {e}. Using CPU tensor.")
             input_tensor = matrix_torch
    else:
         print(f"Using tensor on device: {matrix_torch.device}")
         input_tensor = matrix_torch # Use the CPU tensor prepared earlier or already on GPU

    try:
        # Using auto method, PyTorch backend likely chosen if k is large or full/values_only selected
        result_torch = svd_computer.compute(input_tensor, k=K_COMPONENTS, compute_uv=True, method='auto')

        if result_torch is not None and isinstance(result_torch, tuple):
            U_pt, S_pt, Vh_pt = result_torch
            print(f"SVD computation successful for PyTorch tensor input.")
            # Verify outputs are NumPy arrays
            print(f"Output types: U({type(U_pt)}), S({type(S_pt)}), Vh({type(Vh_pt)})")
            if U_pt is not None: print(f"U_pt shape: {U_pt.shape}")
            if S_pt is not None: print(f"S_pt shape: {S_pt.shape}")
            if Vh_pt is not None: print(f"Vh_pt shape: {Vh_pt.shape}")
            print(f"Top 5 Singular Values: {S_pt[:5]}")

    except Exception as e:
        print(f"Error during SVD on PyTorch tensor: {e}")
else:
    print("Skipping PyTorch Tensor SVD example: PyTorch tensor was not created.")


--- Example 6: Auto SVD (compute_uv=True, k=20) on PyTorch Tensor ---
Attempting computation on GPU: cuda:0
SVD computation successful for PyTorch tensor input.
Output types: U(<class 'numpy.ndarray'>), S(<class 'numpy.ndarray'>), Vh(<class 'numpy.ndarray'>)
U_pt shape: (1000, 20)
S_pt shape: (20,)
Vh_pt shape: (20, 500)
Top 5 Singular Values: [353.92874   15.408077  15.321458  15.289583  15.168637]


## Conclusion

This notebook demonstrated the `EfficientSVD` class, showcasing its ability to compute SVD using various methods (`auto`, `full`, `truncated`, `randomized`, `values_only`) and handle different input types (NumPy, SciPy sparse, PyTorch tensors). The `auto` method intelligently selects an appropriate backend based on the input and parameters, aiming for efficiency. You can modify the parameters in the "Prepare Example Data" cell and rerun the examples to experiment further.