The Lasso estimator penalises the RSS by the $L_1$ norm of the regression coefficients
\begin{equation}
    \hat\beta^{(L,\lambda)}(\mathcal{T}) = \underset{\hat\beta}{\operatorname{argmin}} \left\{RSS(\hat\beta; \mathcal{T}) + \lambda \sum_{j=1}^p |\hat\beta_j|\right\}.
\end{equation}
To express the Lasso estimator as a quadratic program, we must transform the objective function to remove the non-differentiable $L_1$ norm ($|\hat{\beta}_j|$). This is done by decomposing the regression coefficients into their positive and negative parts.

We want to minimise
\begin{equation}
    \mathcal{L}(\hat{\beta}) = RSS(\hat{\beta}; \mathcal{T}) + \lambda \sum_{j=1}^p |\hat{\beta}_j|.
\end{equation}
Substituting the formula $RSS = \frac{1}{N} (y - x\hat{\beta})^T (y - x\hat{\beta})$ yields
\begin{equation}
    \mathcal{L}(\hat{\beta}) = \frac{1}{N} (y - x\hat{\beta})^T (y - x\hat{\beta}) + \lambda \sum_{j=1}^p |\hat{\beta}_j|.
\end{equation}
Define two vectors of non-negative variables of size $p \times 1$
\begin{align}
    u_j &= \max(0, \hat{\beta}_j), \\
    v_j &= \max(0, -\hat{\beta}_j).
\end{align}
Then
\begin{align}
    \hat{\beta} = u - v, \\
    |\hat{\beta}| = u + v.
\end{align}
subject to the constraints $u \geq 0, v \geq 0$. Substituting back into the objective function, the quadratic RSS term becomes
\begin{equation}
    RSS = \frac{1}{N} (y - xu + xv)^T (y - xu + xv) \\
\end{equation}
Expanding this and dropping the constant term $y^Ty$ which does not affect the $\operatorname{argmin}$,
\begin{equation}
    RSS \propto \frac{1}{N} \left[ (u-v)^T x^T x (u-v) - 2y^T x (u-v) \right].
\end{equation}
The penalty term is
\begin{equation}
    \lambda \sum |\hat{\beta}_j| = \lambda \sum (u_j + v_j) = \lambda 1^T u + \lambda 1^T v.
\end{equation}
We define the new optimisation variable vector $z = \begin{pmatrix} u \\ v \end{pmatrix}$ of size $2p \times 1$, so in standard QP form,
\begin{equation}
    \min_{z}\frac{1}{2} z^T Q z + c^T z \quad \text{subject to } Az \leq b.
\end{equation}
From the RSS expansion, the quadratic term is proportional to $(u-v)^T x^T x (u-v)$. Let $H = x^T x$. Then
\begin{equation}
    (u-v)^T H (u-v) = u^T H u - 2u^T H v + v^T H v
\end{equation}
so in terms of $z$
\begin{equation}
    z^T \begin{pmatrix} H & -H \\ -H & H \end{pmatrix} z.
\end{equation}
To match the $\frac{1}{2}z^T Q z$ form, and accounting for the $\frac{1}{N}$ factor in the original RSS, we write
\begin{equation}
    Q = \frac{2}{N} \begin{pmatrix} x^Tx & -x^Tx \\ -x^Tx & x^Tx \end{pmatrix}.
\end{equation}
We combine the RSS linear part $-\frac{2}{N}y^Tx(u-v)$ and the penalty $\lambda \mathbf{1}^T (u+v)$,
\begin{equation}
    \text{Linear terms} = \left( \lambda \mathbf{1} - \frac{2}{N}x^Ty \right)^T u + \left( \lambda \mathbf{1} + \frac{2}{N}x^Ty \right)^T v,
\end{equation}
so
\begin{equation}
    c = \begin{pmatrix} \lambda \mathbf{1}_p - \frac{2}{N}x^Ty \\ \lambda \mathbf{1}_p + \frac{2}{N}x^Ty \end{pmatrix}.
\end{equation}
The only constraints are non-negativity
\begin{equation}
    u \geq 0, \quad v \ge 0q \implies z \geq 0
\end{equation}
or the linear inequality $-Iz \leq 0$.

To summarise, the Lasso can be expressed as minimising over $z \in \mathbb{R}^{2p}$:
\begin{equation}
    \min_{z} \frac{1}{2} z^T \left[ \frac{2}{N} \begin{pmatrix} x^Tx & -x^Tx \\ -x^Tx & x^Tx \end{pmatrix} \right] z + \left[ \begin{pmatrix} \lambda \mathbf{1} \\ \lambda \mathbf{1} \end{pmatrix} + \frac{2}{N} \begin{pmatrix} -x^Ty \\ x^Ty \end{pmatrix} \right]^T z,
\end{equation}
subject to $z \geq 0$.

In the Lasso estimator, the degree of sparsity is controlled indirectly via the penalty weight $\lambda$, rather than directly as in earlier methods. For $\lambda = 0$ the full model is employed, whereas increasingly many covariates are deleted from the model as $\lambda \to \infty$. Given an algorithm for
solving the quadratic programme, we can then use cross-validation to select among any finite set of values $\lambda_1 < \cdots < \lambda_q$ for $\lambda$. For simplicity, we will continue here to perform cross-validation to select model size rather than $\lambda$. To do so, we will rely on the LARS algorithm which allows us to compute in an efficient manner one Lasso solution for each model size.

In [7]:
DATA_STRING = """
-2.909170 -1.645861 -2.016634 -1.872101 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-2.640906 -1.999313 -0.725759 -0.791989 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-2.640906 -1.587021 -2.200154 1.368234 -1.030029 -0.525657 -0.867655 0.344407 -0.156155
-2.640906 -2.178174 -0.812191 -0.791989 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-2.106823 -0.510513 -0.461218 -0.251933 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-1.712919 -2.046706 -0.938806 -1.872101 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-1.712919 -0.522668 -0.364678 0.018095 0.356701 -0.525657 -0.867655 -1.047571 -0.868957
-1.623972 -0.560208 -0.209841 -0.791989 0.995291 -0.525657 -0.867655 -1.047571 -0.868957
-1.431068 -1.813627 -0.209841 -2.277143 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-1.431068 -0.961052 -0.901927 -0.116919 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-1.211439 -0.934188 -0.058200 0.153109 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-1.211439 -2.300218 -0.071004 -0.116919 0.808276 -0.525657 -0.867655 -1.047571 -0.868957
-1.211439 0.224659 -1.422069 -0.116919 -1.030029 -0.525657 -0.300837 0.344407 0.200246
-1.130314 0.108346 -1.479863 0.423137 -1.030029 -0.525657 -0.867655 0.344407 -0.690757
-1.079670 -0.122844 -0.438585 -0.927003 -1.030029 -0.525657 -0.180743 0.344407 -0.690757
-1.031468 0.163023 -1.332460 0.288123 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-1.008211 -1.505735 -0.264970 0.828178 0.792484 -0.525657 -0.300837 0.344407 0.200246
-0.985483 0.800383 0.047904 0.288123 -1.030029 -0.525657 0.396060 -1.047571 -0.868957
-0.920242 -1.630766 -0.847675 -3.087227 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-0.878999 -0.995867 0.460895 0.828178 1.079376 -0.525657 -0.867655 -1.047571 -0.868957
-0.839390 -0.172794 -0.491739 -0.656975 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-0.820159 0.604869 -0.300095 -0.521961 0.952261 -0.525657 1.098068 0.344407 -0.156155
-0.782771 -1.615934 -0.593769 -0.656975 -0.622780 -0.525657 -0.867655 -1.047571 -0.868957
-0.764589 0.368177 -0.416166 -0.116919 0.234114 -0.525657 0.976274 0.344407 1.269449
-0.746731 -0.822788 0.090234 0.693165 1.038608 -0.525657 -0.867655 -1.047571 -0.868957
-0.711945 0.082650 -1.183437 0.558151 0.138397 -0.525657 -0.867655 -1.047571 -0.868957
-0.678329 -0.713997 0.212832 0.153109 -1.030029 -0.525657 -0.445098 0.344407 1.625850
-0.661935 -1.492910 0.556166 0.423137 1.189002 -0.525657 -0.867655 0.344407 -0.156155
-0.629932 -0.264157 -1.173146 0.423137 0.085074 -0.525657 0.164020 0.344407 1.982251
-0.583770 0.903714 -0.593769 0.153109 -1.030029 -0.525657 1.293115 -1.047571 -0.868957
-0.554138 -0.908145 1.082190 0.153109 1.290474 -0.525657 -0.445098 -1.047571 -0.868957
-0.470173 -0.995867 0.411770 0.153109 1.111607 -0.525657 -0.867655 -1.047571 -0.868957
-0.470173 -0.063663 -1.388063 0.963192 0.808276 -0.525657 -0.867655 -1.047571 -0.868957
-0.456839 -1.142875 -0.847675 -1.332045 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
-0.430694 -1.159932 -0.966850 -0.116919 -1.030029 -0.525657 -0.445098 -1.047571 -0.868957
-0.392715 -0.035544 1.151831 0.018095 1.434884 -0.525657 -0.867655 0.344407 -0.690757
-0.320828 0.062343 0.066139 1.233220 -0.471260 -0.525657 1.321037 1.736385 -0.334356
-0.286733 -0.761244 -2.942386 0.018095 -1.030029 -0.525657 -0.867655 0.344407 -0.334356
-0.264633 1.118048 1.070381 0.558151 0.882251 1.902379 1.446379 0.344407 0.378447
-0.201120 -0.471204 -1.445016 -1.062017 0.579043 -0.525657 0.012111 0.344407 -0.690757
-0.180814 -0.622100 -1.142541 -0.521961 -1.030029 -0.525657 -0.867655 3.128363 1.982251
-0.170814 0.078627 0.125921 0.558151 -1.030029 -0.525657 -0.867655 0.344407 -0.512556
-0.151109 -0.654816 0.556166 -0.251933 1.117877 -0.525657 -0.180743 -1.047571 -0.868957
-0.103481 0.359518 0.628738 -0.386947 -1.030029 -0.525657 0.711919 0.344407 -0.655117
0.043334 0.116099 -0.514895 0.288123 1.142406 -0.525657 -0.180743 0.344407 -0.156155
0.074957 0.267725 -0.554001 -0.386947 0.356701 -0.525657 -0.867655 0.344407 -0.334356
0.090401 1.175099 0.859936 2.043304 1.232660 1.902379 2.038875 3.128363 2.695054
0.090401 -0.159363 0.953038 0.558151 1.117877 -0.525657 -0.180743 0.344407 0.556647
0.113130 0.337479 -0.307183 -2.817199 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
0.113130 -0.110171 -0.142703 0.828178 0.882251 -0.525657 -0.445098 -1.047571 -0.868957
0.178370 -0.220110 0.855614 0.558151 -1.030029 -0.525657 -0.867655 0.344407 0.913048
0.199204 0.264488 1.421615 0.018095 1.366871 -0.525657 -0.867655 -1.047571 -0.868957
0.206053 -0.713997 0.011000 0.018095 0.964831 -0.525657 0.164020 0.344407 1.625850
0.212856 0.662694 1.155640 0.558151 1.154352 -0.525657 1.169128 0.344407 0.556647
0.226324 1.538191 -0.264970 -0.656975 -1.030029 -0.525657 -0.867655 0.344407 -0.690757
0.239614 -0.070840 1.527906 0.288123 1.400882 -0.525657 -0.867655 0.344407 -0.334356
0.309706 -0.320204 -1.792336 -2.277143 -1.030029 -0.525657 0.488950 0.344407 -0.726397
0.315841 -0.755864 0.318490 -2.007115 0.916472 -0.525657 -0.867655 -1.047571 -0.868957
0.327999 -0.688838 1.288801 0.828178 0.234114 -0.525657 -0.867655 0.344407 -0.156155
0.334023 -0.246264 0.521515 -0.386947 0.827523 -0.525657 -0.867655 0.344407 0.556647
0.363611 -0.761244 2.101279 1.233220 1.542252 -0.525657 -0.867655 -1.047571 -0.868957
0.375206 0.552145 0.212832 -0.116919 1.052465 1.902379 1.501706 0.344407 0.556647
0.375206 1.215913 -0.244144 1.098206 -1.030029 -0.525657 1.249088 3.128363 2.516853
0.403617 0.583946 0.675904 0.288123 1.321864 1.902379 1.645967 0.344407 1.269449
0.403617 0.616752 -0.013927 0.018095 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
0.409203 0.092625 0.486344 -0.386947 0.846250 -0.525657 -0.180743 0.344407 -0.156155
0.442083 0.573853 0.585465 0.558151 1.166095 -0.525657 1.079149 0.344407 1.625850
0.484306 0.723498 0.990087 1.098206 1.529276 -0.525657 -0.180743 0.344407 -0.512556
0.484306 -1.531979 1.829210 0.693165 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
0.494588 -0.133120 2.701661 1.098206 1.542252 -0.525657 -0.445098 0.344407 -0.690757
0.534694 0.438427 -0.083878 -0.521961 -1.030029 1.902379 1.079149 0.344407 1.269449
0.558967 -0.162033 -0.675391 1.773276 1.142406 -0.525657 -0.867655 0.344407 0.022045
0.577970 -0.115218 0.460895 0.693165 -1.030029 1.902379 0.289362 0.344407 -0.156155
0.596619 0.417004 -0.920294 -0.521961 0.234114 1.902379 0.976274 3.128363 2.338652
0.796869 1.406541 0.516522 0.693165 -1.030029 1.902379 1.501706 0.344407 -0.156155
0.859161 1.527564 -0.856631 0.558151 -0.105070 1.902379 1.868936 0.344407 0.913048
0.914442 0.563639 1.888436 1.098206 1.400882 -0.525657 0.488950 0.344407 1.269449
0.957212 1.012890 1.703065 1.908290 1.542252 -0.525657 -0.867655 0.344407 -0.512556
0.979506 1.107252 -0.109840 0.693165 -1.030029 1.902379 1.986568 0.344407 1.625850
1.034650 1.219095 0.455773 -0.116919 -1.030029 -0.525657 0.396060 0.344407 0.913048
1.037626 0.100521 -1.310583 0.288123 0.318200 -0.525657 0.289362 0.344407 0.556647
1.052376 0.992420 -0.364678 -0.927003 0.234114 -0.525657 1.802014 0.344407 1.269449
1.086912 1.077152 0.609604 1.773276 -0.435103 1.902379 0.531250 0.344407 0.200246
1.092553 1.132233 0.491400 0.153109 0.703097 -0.525657 1.386436 3.128363 1.625850
1.109290 0.181092 0.189969 -0.521961 1.105280 -0.525657 0.711919 0.344407 0.200246
1.152599 1.665487 -0.258009 0.018095 -1.030029 1.902379 1.802014 0.344407 1.269449
1.201704 0.574980 0.241100 -0.791989 1.066051 -0.525657 -0.867655 -1.047571 -0.868957
1.233965 0.325488 -0.609869 -0.251933 -1.030029 1.902379 0.344689 0.344407 0.200246
1.505957 1.243106 2.555412 0.153109 -1.030029 1.902379 1.900197 0.344407 1.269449
1.515216 0.181092 0.155251 1.638262 0.579043 1.902379 0.711919 0.344407 1.804051
1.551419 1.617422 1.109520 0.558151 -1.030029 -0.525657 -0.867655 -1.047571 -0.868957
1.651164 1.008835 0.114086 -0.386947 0.864484 1.902379 -0.867655 0.344407 -0.334356
1.906760 1.262444 0.580608 0.558151 -1.030029 1.902379 1.079149 0.344407 1.269449
2.206057 2.107397 0.628738 -2.682185 -1.030029 1.902379 1.688267 0.344407 0.556647
2.664738 1.328267 -0.546127 -1.602073 -1.030029 1.902379 1.900197 0.344407 -0.512556
2.999122 1.307045 0.340141 0.558151 1.010033 1.902379 1.249088 0.344407 1.982251
3.104545 1.809719 0.811961 0.558151 0.234114 1.902379 2.216735 0.344407 -0.156155
"""

In [8]:
import numpy as np
import io

def load_and_process_data(data_str):
    # Parse string to numpy array
    raw_data = np.loadtxt(io.StringIO(data_str))

    # Extract response and covariates
    y = raw_data[:, 0]
    X_original = raw_data[:, 1:]
    N, p_original = X_original.shape

    # Augment with 4 zero-mean, unit-variance noise covariates
    X_noise = np.random.normal(0, 1, (N, 4))
    X_augmented = np.hstack([X_original, X_noise])

    # Split into training (size 70) and test (size 27)
    n_train = 70
    X_train = X_augmented[:n_train, :]
    y_train = y[:n_train]
    X_test = X_augmented[n_train:, :]
    y_test = y[n_train:]

    return X_train, y_train, X_test, y_test

def bestsubset(X, y):
    import itertools
    N, p = X.shape
    B = np.zeros((p, p))

    for j in range(1, p + 1):
        best_rss = np.inf
        best_indices = None
        best_beta_subset = None

        for indices in itertools.combinations(range(p), j):
            X_sub = X[:, indices]
            # lstsq returns residuals as item 1. If empty, calc manually.
            beta, res, _, _ = np.linalg.lstsq(X_sub, y, rcond=None)

            # Calculate RSS
            y_pred = X_sub @ beta
            rss = np.sum((y - y_pred)**2)

            if rss < best_rss:
                best_rss = rss
                best_indices = indices
                best_beta_subset = beta

        if best_indices is not None:
            B[list(best_indices), j-1] = best_beta_subset

    return B

def greedysubset(X, y):
    N, p = X.shape
    B = np.zeros((p, p))
    selected_indices = []
    remaining_indices = list(range(p))

    for size in range(1, p + 1):
        best_rss = np.inf
        best_candidate = -1
        best_beta_subset = None

        for candidate in remaining_indices:
            trial_indices = selected_indices + [candidate]
            X_sub = X[:, trial_indices]
            beta, _, _, _ = np.linalg.lstsq(X_sub, y, rcond=None)

            y_pred = X_sub @ beta
            rss = np.sum((y - y_pred)**2)

            if rss < best_rss:
                best_rss = rss
                best_candidate = candidate
                best_beta_subset = beta

        selected_indices.append(best_candidate)
        remaining_indices.remove(best_candidate)
        B[selected_indices, size-1] = best_beta_subset

    return B

def crossval(X, y, sparse_estimator):
    N, p = X.shape
    K = 10
    indices = np.random.permutation(N)
    folds = np.array_split(indices, K)
    rss_matrix = np.zeros((K, p))

    for k in range(K):
        test_idx = folds[k]
        train_idx = np.setdiff1d(np.arange(N), test_idx)

        X_tr, y_tr = X[train_idx], y[train_idx]
        X_te, y_te = X[test_idx], y[test_idx]
        B_matrix = sparse_estimator(X_tr, y_tr)

        for j in range(p):
            beta_j = B_matrix[:, j]
            y_pred = X_te @ beta_j
            rss_matrix[k, j] = np.sum((y_te - y_pred)**2)

    mean_rss = np.mean(rss_matrix, axis=0)
    best_size_idx = np.argmin(mean_rss)

    B_final = sparse_estimator(X, y)
    beta_cv = B_final[:, best_size_idx]

    return beta_cv, best_size_idx + 1, mean_rss

def run_analysis():
    X_tr, y_tr, X_te, y_te = load_and_process_data(DATA_STRING)
    col_names = [
        "lcavol", "lweight", "age", "lbph", "svi", "lcp", "gleason", "pgg45",
        "noise1", "noise2", "noise3", "noise4"
    ]

    beta_best, size_best, rss_best = crossval(X_tr, y_tr, bestsubset)
    y_pred_best = X_te @ beta_best
    test_mse_best = np.mean((y_te - y_pred_best)**2)

    print(f"Best Subset Selected Size: {size_best}")
    print(f"Test MSE: {test_mse_best:.4f}")
    print("Selected Variables:")
    for idx, coef in enumerate(beta_best):
        if abs(coef) > 1e-5:
            print(f"  {col_names[idx]}: {coef:.4f}")

    print("-" * 60)

    beta_greedy, size_greedy, rss_greedy = crossval(X_tr, y_tr, greedysubset)
    y_pred_greedy = X_te @ beta_greedy
    test_mse_greedy = np.mean((y_te - y_pred_greedy)**2)

    print(f"Greedy Subset Selected Size: {size_greedy}")
    print(f"Test MSE: {test_mse_greedy:.4f}")
    print("Selected Variables:")
    for idx, coef in enumerate(beta_greedy):
        if abs(coef) > 1e-5:
            print(f"  {col_names[idx]}: {coef:.4f}")

    print("-" * 60)

    beta_cv, size_cv, rss_cv = crossval(X_tr, y_tr, bestsubset)
    y_pred_cv = X_te @ beta_cv
    test_mse_cv = np.mean((y_te - y_pred_cv)**2)

    print(f"Cross-Validation Selected Size: {size_cv}")
    print(f"Test MSE: {test_mse_cv:.4f}")
    print("Selected Variables:")
    for idx, coef in enumerate(beta_cv):
        if abs(coef) > 1e-5:
            print(f"  {col_names[idx]}: {coef:.4f}")

    print("-" * 60)

run_analysis()

Best Subset Selected Size: 2
Test MSE: 0.9070
Selected Variables:
  lcavol: 0.5693
  lweight: 0.3566
------------------------------------------------------------
Greedy Subset Selected Size: 2
Test MSE: 0.9070
Selected Variables:
  lcavol: 0.5693
  lweight: 0.3566
------------------------------------------------------------
Cross-Validation Selected Size: 2
Test MSE: 0.9070
Selected Variables:
  lcavol: 0.5693
  lweight: 0.3566
------------------------------------------------------------


\the variable lcavol (log cancer volume) is typically the strongest predictor in this dataset, followed often by lweight (log prostate weight). All algorithms should select these variables first. Since the noise variables are generated independently of the response, they should ideally not be selected and ignore the spurious correlations.