Objective: to solve for $X$ (i.e., $AX = B$) based on the following equation:

$\begin{bmatrix}
  \beta S^T S + (\sum\limits_{i=1}^N w_i) H^TD^TDH
\end{bmatrix}X = \sum\limits_{i=1}^N (w_iH^TD^TY_i)$

Notes to self:
1. Judging by what I see, it is clear I can't keep a hold of everything in RAM all at once if I want to have decent resolution (say 300px^2 or more). So, maybe it makes sense to break the problem down so at runtime I produce the a/b matrices, save them to file, free up the setup matrices used (i.e., y, d, h, s), then reload a/b from file to conduct solving. I dont think I need to keep anything around once I hit the solving step (i.e. AX=B, or in my case x = solver(a, b).
2. A further improvement would be if I could perform solving in a distributed manner - that is, instead of requiring everything in memory all at once for full-bloqn solving, I could just produce fragments of each setup matrix, fragments of a/b, and build up towards x over time.
    - Update: there appears to be several libs for performing such an operation, pytrilinos, MAGMA
    - Update: I wonder if it just makes more sense to rely on sparse to dense compression methods, like the CSR (compressed sparse row) format and use dense solving?
3. Another improvement would be to reduce/remove matrices where possible. For instance, in the above equation, I could just store $HD$, and use the transpose when neccesary, thus removing one significantly large matrix. (subnote: is this actually possible? is $H^TD^T = (HD)^T$?
4. One issue is that I haven't accounted for auxiliary memory requirements during intermin processes...

In [2]:
import numpy as np

precision = np.dtype(np.float32).itemsize
l = 800
m = 200
n = 30
psf_support = 51

setup_matrices = {
    'y': m**2 * n * precision,
    'd': m**2 * l**2 * precision,
    'h': l**2 * l**2 * precision,
    's': l**2 * l**2 * precision
}

solver_matrices = {
    'x': l**2 * precision,
    'a': l**2 * precision,
    'b': l**2 * l**2 * precision
}

print(f"Setup matrices (bytes): {setup_matrices}")
setup_matrices_bytes = sum(setup_matrices.values())
print(f"Memory footprint for setup matrices (bytes/megabytes/gigabytes): {setup_matrices_bytes, setup_matrices_bytes / 10**6, setup_matrices_bytes / 10**9}\n")

print(f"Solver matrices (bytes): {solver_matrices}")
solver_matrices_bytes = sum(solver_matrices.values())
print(f"Memory footprint for solver matrices (bytes/megabytes/gigabytes): {solver_matrices_bytes, solver_matrices_bytes / 10**6, solver_matrices_bytes / 10**9}\n")

super_resolution_possible = l**2 <= np.minimum((psf_support - 2) * m**2, n * m**2)
print(f"Is super-resolution possible with your configuration: {super_resolution_possible}")
print(f"l^2:                      {l**2}")
print(f"(psf_support - 2) * m^2:  {(psf_support - 2) * m**2}")
print(f"n * m^2:                  {n * m**2}")

Setup matrices (bytes): {'y': 4800000, 'd': 102400000000, 'h': 1638400000000, 's': 1638400000000}
Memory footprint for setup matrices (bytes/megabytes/gigabytes): (3379204800000, 3379204.8, 3379.2048)

Solver matrices (bytes): {'x': 2560000, 'a': 2560000, 'b': 1638400000000}
Memory footprint for solver matrices (bytes/megabytes/gigabytes): (1638405120000, 1638405.12, 1638.40512)

Is super-resolution possible with your configuration: True
l^2:                      640000
(psf_support - 2) * m^2:  1960000
n * m^2:                  1200000
