### OMP vs F-G-OLS
We have discussed that there are two methods, Orthogonal Matching Pursuit and Forward-Greedy-Ordinary Least Squares. 

Let us first assume the one-dimensional example. Assume that we have data $\mathbf{X} \in \mathbb{R}^{T \times p}$, and we have a signal defined as $$Y = X w,$$ where $Y \in \mathbb{R}^T$, and $w \in \mathbb{R}^p$.

Assume we already have a set of $k$ atoms, $\Gamma^k$. Then, the current residual is $$r_k = X_{\Gamma^k} w_{\Gamma^k} - Y.$$ Both methods iteratively add the next "best" possible atom. They only differ in their definition of "best". 
- For OMP, we pick the atom that is most correlated with the residual. In mathematical notation:

$$i_{k+1} = \underset{i}{\arg \max} |\tilde{x_i} X_{\Gamma^k \cup i} r_k|$$

- For F-G-OLS, we pick the atom that minimizes the residual error. In mathematical notation:

$$i = \underset{i}{\arg \min} ||Y - X_{\Gamma^k \cup i} X_{\Gamma^k \cup i}^\dagger Y||_2$$

Both methods have the exact same goal, find a subset $\Gamma^k$ and an accompanying coefficient vector $\tilde{w}$ such that $X_{\Gamma^k}w$ approximates $Y$ well, thereby minimizing the residual error. However, their criteria slightly differ.

The question is: when do the methods yield a different choice?

From [this paper](https://eprints.soton.ac.uk/142469/1/BDOMPvsOLS07.pdf) we see that they are not the same, and give a geographical interpretation. For this, we consider a three-dimensional example.

In [401]:
import numpy as np
from sklearn.linear_model import OrthogonalMatchingPursuit

import sys
sys.path.append("..")
import helper.helper as h

### Generate Data

In [468]:
## Number of samples, number of variables
T, p = 20, 10

## Generate random X
X = np.random.rand(T, p)

# for i in range(p):
#     X[:, i] = normalize(X[:, i])

## Generate true coefficient vector w
w = range(p) + np.ones(p)

## Compute Y
Y = X @ w

### OMP Implementation of $\texttt{sklearn}$.

In [417]:
omp = OrthogonalMatchingPursuit(n_nonzero_coefs = 1, normalize = True, fit_intercept = False)
omp_fit = omp.fit(X, Y)
print(np.round(omp_fit.coef_, 2))

[0.   0.   4.99]


### One step of Orthogonal Matching Pursuit

In [418]:
OLS_gains = OLS_GAINS(X, Y, [])[3]
OMP_gains = OMP_GAINS(X, Y, np.zeros(p))

print(np.round(OLS_gains, 3))
print(np.round(OMP_gains, 3))

for i in range(p):
    if np.argmax(OLS_gains) != np.argmin(OMP_gains):
        print("Order is not equivalent")
        
    OLS_gains = np.delete(OLS_gains, np.argmax(OLS_gains))
    OMP_gains = np.delete(OMP_gains, np.argmin(OMP_gains))

[5.064 4.538 2.605]
[8.163 8.467 9.246]


In [481]:
def OMP_GAINS(X, Y, w):
    r = Y - X @ w
    return [np.abs(np.dot(normalize(x), r)) for x in X.T]

def OLS(X, Y, F):
    F_copy = F.copy()
    F_copy.sort()
    w = np.zeros(p)
    w[F_copy] = np.linalg.pinv(X[:, F_copy]) @ Y
    
    return w

In [420]:
### Initialize variables
w, F = np.zeros(p), []

### Get residual correlations
print(f"OMP:\nResidual Correlations: {np.round(OMP_GAINS(X, Y, w), 2)}.")

### Find largest one
F.append(np.argmax(OMP_GAINS(X, Y, w)))
print(f"Index Maximizing Residual Correlation: {np.argmax(OMP_GAINS(X, Y, w))}.")

### Create new vector w
print(f"Resulting vector w: {np.round(OLS(X, Y, F), 2)}.")

OMP:
Residual Correlations: [8.16 8.47 9.25].
Index Maximizing Residual Correlation: 2.
Resulting vector w: [0.   0.   4.99].


### One step of Orthogonal Least Squares

In [482]:
def OLS_GAINS(X, Y, F):
    
    gains = []
    
    best_score, best_index, best_w = np.inf, None, np.zeros(p)
    
    for i in range(p):
        if i not in F:

            w = np.zeros(p)
            
            # append i to index
            F.append(i)

            # compute vector w
            w[F] = np.linalg.pinv(X[:, F]) @  Y

            # print residual
            # print(f"Residual for adding index {i}: {round(np.linalg.norm(Y - X @ w), 2)}.")
            
            gains.append(np.linalg.norm(Y - X @ w))
            
            # compute residual
            if np.linalg.norm(Y - X @ w) < best_score:
                best_w = w.copy()
                best_score = np.linalg.norm(Y - X @ w)
                best_index = i

            F.remove(i)

    return best_index, best_w, best_score, gains

best_index, best_w, best_score, OLS_gains = OLS_GAINS(X, Y, [])

print(f"\nBest Index to add: {best_index} with score {round(best_score, 2)}.\nResults in vector: {np.round(best_w, 2)}.")


Best Index to add: 9 with score 44.3.
Results in vector: [ 0.   0.   0.   0.   0.   0.   0.   0.   0.  38.1].


### Full F-GLS

In [473]:
F = []
best_index, best_w, best_score, gains = OLS_GAINS(X, Y, F)
F.append(best_index)
print(f"\nBest Index to add: {best_index} with score {round(best_score, 2)}.\nResults in vector: {np.round(best_w, 2)}.")

best_index, best_w, best_score, gains = OLS_GAINS(X, Y, F)
F.append(best_index)
print(f"\nBest Index to add: {best_index} with score {round(best_score, 2)}.\nResults in vector: {np.round(best_w, 2)}.")

best_index, best_w, best_score, gains = OLS_GAINS(X, Y, F)
F.append(best_index)
print(f"\nBest Index to add: {best_index} with score {round(best_score, 2)}.\nResults in vector: {np.round(best_w, 2)}.")


Best Index to add: 9 with score 44.3.
Results in vector: [ 0.   0.   0.   0.   0.   0.   0.   0.   0.  38.1].

Best Index to add: 5 with score 20.16.
Results in vector: [ 0.    0.    0.    0.    0.   31.34  0.    0.    0.   24.33].

Best Index to add: 8 with score 16.74.
Results in vector: [ 0.    0.    0.    0.    0.   25.83  0.    0.    9.53 20.57].


### Full F-OMP

In [483]:
### Initialize variables
w, F = np.zeros(p), []

### Get residual correlations
print(f"OMP:\nResidual Correlations: {np.round(OMP_GAINS(X, Y, w), 2)}.")

### Find largest one
F.append(np.argmax(OMP_GAINS(X, Y, w)))
print(f"Index Maximizing Residual Correlation: {np.argmax(OMP_GAINS(X, Y, w))}.")

### Create new vector w
print(f"Resulting vector w: {np.round(OLS(X, Y, F), 2)}.")

### Get residual correlations
print(f"\nStep 2:\nResidual Correlations: {np.round(OMP_GAINS(X, Y, OLS(X, Y, F)), 2)}.")

### Find largest one
print(f"Index Maximizing Residual Correlation: {np.argmax(OMP_GAINS(X, Y, OLS(X, Y, F)))}.")
F.append(np.argmax(OMP_GAINS(X, Y, OLS(X, Y, F))))

### Create new vector w
print(f"Resulting vector w: {np.round(OLS(X, Y, F), 2)}.")

### Get residual correlations
print(f"\nStep 3:\nResidual Correlations: {np.round(OMP_GAINS(X, Y, OLS(X, Y, F)), 2)}.")

### Find largest one
print(f"Index Maximizing Residual Correlation: {np.argmax(OMP_GAINS(X, Y, OLS(X, Y, F)))}.")
F.append(np.argmax(OMP_GAINS(X, Y, OLS(X, Y, F))))

### Create new vector w
print(f"Resulting vector w: {np.round(OLS(X, Y, F), 2)}.")

OMP:
Residual Correlations: [102.93 103.91 103.33 107.2  107.09 107.03 107.22  97.57 107.68 111.24].
Index Maximizing Residual Correlation: 9.
Resulting vector w: [ 0.   0.   0.   0.   0.   0.   0.   0.   0.  38.1].

Step 2:
Residual Correlations: [15.22 12.61 12.65 16.21 12.02 27.63 20.81 22.64 17.88  0.  ].
Index Maximizing Residual Correlation: 5.
Resulting vector w: [ 0.    0.    0.    0.    0.   31.34  0.    0.    0.   24.33].

Step 3:
Residual Correlations: [4.22 1.84 5.78 3.12 1.18 0.   5.38 4.32 5.64 0.  ].
Index Maximizing Residual Correlation: 2.
Resulting vector w: [ 0.    0.    6.14  0.    0.   28.72  0.    0.    0.   20.19].


In [487]:
T, p = 20, 10

X = np.random.rand(T, p)
w = range(p) + np.ones(p)
Y = X @ w

In [488]:
def OLS_full(X, Y):
    
    # initialize w
    w = np.zeros(p)    
    
    # initialize importance order
    F = []
    
    # iteratively add edge
    for i in range(p):
        best_index, best_w, best_score, gains = OLS_GAINS(X, Y, F)
        
        if best_index != None:
            F.append(best_index)
    
    return best_w, F

OLS_full(X, Y)

(array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.]),
 [4, 9, 6, 8, 5, 7, 3, 2, 1, 0])

In [None]:
def OMP_full(X, Y):
    
    # initialize w
    w = np.zeros(p)    
    
    # initialize importance order
    F = []
    
    # iteratively add edge
    for i in range(p):
        ### Get residual correlations
        F.append(np.argmax(OMP_GAINS(X, Y, w)))
        w = OLS(X, Y, F)
        
    return w, F

OMP_full(X, Y)

### Is there sometimes a difference?

In [None]:
for a in range(100):
    
    print(a * 100)
    
    for _ in range(100):
        ### Generate data
        T, p = 10, 1000

        ## Generate random X
        X = np.random.rand(T, p)

        ## Generate true coefficient vector w
        w = np.random.rand(p)

        ## Compute Y
        Y = X @ w

        ### Compute next index of OMP
        OMP_index = np.argmax(OMP_GAINS(X, Y, np.zeros(p)))

        ### Compute next index of F-GLS
        GLS_index = OLS_GAINS(X, Y, [])[0]

        ### Verify equivalence
        if OMP_index != GLS_index:
            print(X, Y, w)
        
            break

0


### Conjecture, OMP seems to be equal to OLS for normalized stuff, strange.
If we generate any random data $X$, and any random coefficient vector $w$, OMP and OLS always seem to pick the exact same vectors, which is strange as the paper suggests that this is not necessarily the case. Also, OLS is order $k$ slower so OLS seems to be the "smarter" choice.