# Exercise - Memory Spaces - Power Iteration

Let's learn about memory spaces and transfers! In this exercise, we'll learn:

- How to explicitly transfer data between host and device.
  - `d = cupy.asarray(h)` to copy from a host array to a device array.
  - `h = cupy.asnumpy(d)` to copy from a device array to a host array.
- What happens if we mix NumPy and CuPy code.
- Some ways in which NumPy and CuPy produce different results.
- Some of the limitations of CuPy.
- How problem size and compute workload impacts performance.

We're going to estimate the dominant eigenvalue of a matrix with the [power iteration algorithm](https://en.wikipedia.org/wiki/Power_iteration).
First, we'll randomly generate a dense diagonalizable square matrix.

In [None]:
import numpy as np

In [None]:
from dataclasses import dataclass

@dataclass
class PowerIterationConfig:
  dim: int = 4096 # Number of rows and columns in the square matrix.

  # Value from 0 to 1 that controls how much greater the dominant eigenvalue is
  # from the rest of the eigenvalues. A higher value means quicker convergence.
  dominance: float = 0.1

  # Maximum number of steps to perform.
  max_steps: int = 400

  # Every `check_frequency` steps we save a checkpoint and compute the residual.
  check_frequency: int = 10

  # Whether the residual should be printed every `check_frequency` steps.
  progress: bool = True

  # If the residual is below `residual_threshold`, terminate early.
  residual_threshold: float = 1e-10

In [None]:
def generate_host(cfg=PowerIterationConfig()):
  np.random.seed(42)

  # Vector with a single 1 & `cfg.dim - 1` values from 0 to `1 - cfg.dominance`.
  weak_lam = np.random.random(cfg.dim - 1) * (1.0 - cfg.dominance)
  lam = np.random.permutation(np.concatenate(([1.0], weak_lam)))

  P = np.random.random((cfg.dim, cfg.dim)) # Random invertible matrix.
  D = np.diag(np.random.permutation(lam))  # Diagonal matrix w/ random eigenvalues.
  A = ((P @ D) @ np.linalg.inv(P))         # Diagonalizable matrix.
  return A

A_host = generate_host()

with np.printoptions(precision=4):
  print(A_host)

Next, we perform the power iteration with NumPy, using a vector of 1s as our initial guess.

We'll perform at most `cfg.max_steps`. Every `config.check_frequency` steps, we'll output a checkpoint, compute the absolute residual, check whether it's below a `cfg.residual_threshold`. If it is, then we'll stop early.

In [None]:
def estimate_host(A, cfg=PowerIterationConfig()):
  x = np.ones(A.shape[0], dtype=np.float64)

  for i in range(0, cfg.max_steps, cfg.check_frequency):
    y = A @ x
    lam = (x @ y) / (x @ x)            # Rayleigh quotient.
    res = np.linalg.norm(y - lam * x)
    x = y / np.linalg.norm(y)          # Normalize for next step.

    if cfg.progress:
      print(f"step {i}: residual = {res:.3e}")

    np.savetxt(f"host_{i}.txt", x) # Save a checkpoint.

    if res < cfg.residual_threshold:
      break

    for _ in range(cfg.check_frequency - 1):
      y = A @ x
      x = y / np.linalg.norm(y) # Normalize for next step.

  return (x.T @ (A @ x)) / (x.T @ x)

lam_est_host = estimate_host(A_host).item()

print()
print(lam_est_host)

**TODO: In the next cell, port the power iteration function to CuPy. Try leaving some operations as NumPy and see what happens.**

In [None]:
def estimate_device(A, cfg=PowerIterationConfig):
  x = np.ones(A.shape[0], dtype=np.float64)

  for i in range(0, cfg.max_steps, cfg.check_frequency):
    y = A @ x
    lam = (x @ y) / (x @ x)            # Rayleigh quotient.
    res = np.linalg.norm(y - lam * x)
    x = y / np.linalg.norm(y)          # Normalize for next step.

    if cfg.progress:
      print(f"step {i}: residual = {res:.3e}")

    np.savetxt(f"device_{i}.txt", x) # Save a checkpoint.

    if res < cfg.residual_threshold:
      break

    for _ in range(cfg.check_frequency - 1):
      y = A @ x
      x = y / np.linalg.norm(y) # Normalize for next step.

  return (x.T @ (A @ x)) / (x.T @ x)

lam_est_device = estimate_device(A_host).item()

print()
print(lam_est_device)

**TODO: Now port the matrix generation function to CuPy, and run the power iteration with it. What do you notice about the result?**

In [None]:
def generate_device(cfg=PowerIterationConfig):
  np.random.seed(42)

  # Vector with a single 1 & `cfg.dim - 1` values from 0 to `1 - cfg.dominance`.
  weak_lam = np.random.random(cfg.dim - 1) * (1.0 - cfg.dominance)
  lam = np.random.permutation(np.concatenate(([1.0], weak_lam)))

  P = np.random.random((cfg.dim, cfg.dim)) # Random invertible matrix.
  D = np.diag(np.random.permutation(lam))  # Diagonal matrix with random eigenvalues.
  A = ((P @ D) @ np.linalg.inv(P))         # Diagonalizable matrix.
  return A

A_device = generate_device()

with np.printoptions(precision=4):
  print("A_host:")
  print(A_host)
  print()
  print("A_device:")
  print(A_device)
  print()

lam_est_device_generation = estimate_device(A_device).item()

print()
print(lam_est_device_generation)

Next, let's compute the eigenvalues of the matrix with `numpy.linalg.eigvals`. This may take a little while.

**TODO: What happens if we port this to CuPy?**

In [None]:
lam_ref = np.linalg.eigvals(A_host).real.max()

Now we can check whether our power iteration estimation is correct.

In [None]:
print(f"Solution")
print()
print(f"Power iteration (host)   = {lam_est_host:.6e}")
print(f"Power iteration (device) = {lam_est_device:.6e}")
print(f"`eigvals` reference      = {lam_ref:.6e}")

rel_err_host   = abs(lam_est_host - lam_ref) / abs(lam_ref)
rel_err_device = abs(lam_est_device - lam_ref) / abs(lam_ref)
print()
print(f"Relative error (host)    = {rel_err_host:.3e}")
print(f"Relative error (device)  = {rel_err_device:.3e}")

np.testing.assert_allclose(lam_est_host, lam_ref, rtol=1e-4)
np.testing.assert_allclose(lam_est_device, lam_ref, rtol=1e-4)

Finally, let's benchmark all three solutions.

In [None]:
print(f"Execution Time")
print()

time_host = %timeit -q -o estimate_host(A_host, PowerIterationConfig(progress=False)).item()
print(f"Power iteration (host)   = {time_host}")

time_device = %timeit -q -o estimate_device(A_host, PowerIterationConfig(progress=False)).item()
print(f"Power iteration (device) = {time_device}")

time_ref = %timeit -q -o -r 1 -n 1 np.linalg.eigvals(A_host).real.max()
print(f"`eigvals` reference      = {time_ref}")

**EXTRA CREDIT: Explore the impact of changing the problem size (`dim`), the compute workload (`max_steps` and `dominance`), and the check frequency (`check_frequency`).**