Multiple linear regression is also known as multi output linear regression, multivariate linear regression, or multidimensional linear regression, which is used to predict multiple dependent variables using a linear combination of multiple independent variables. This is different from simple linear regression, which is used to predict a single dependent variable using a single independent variable.

The model is expressed as:
$$
    y = B^* x + \epsilon,
$$
where $y$ is an $m$-dimensional response variable, $x$ is $p$-dimensional predictors, $B \in R^{m \times p}$ is the sparse coefficient matrix, $\epsilon$ is an $m$-dimensional random noise variable with zero mean.

With $n$ independent data of the explanatory variables $X$ and the response variable $Y$, we can estimate $B^* $ by minimizing the loss function under sparsity constraint:
$$ arg\min_{B}L(B) := ||Y-B X||^2, s.t.  || B ||_ {0,2} \leq s, $$
where $|| B ||_ {0, 2}$ is the number of non-zero rows of $B$.

Here is Python code for solving sparse multiple linear regression problem:


In [2]:
from scope import ConvexSparseSolver
import jax.numpy as jnp
from sklearn.datasets import make_regression

n, p, k, m = 10, 5, 3, 2
x, y, coef = make_regression(n_samples=n, n_features=p, n_informative=k, n_targets=m, coef=True)

def multi_linear_loss(para):
    return jnp.sum(jnp.square(y - jnp.matmul(x, para.reshape((p, m)))))

solver = ConvexSparseSolver(p * m, k, group=[i for i in range(p) for j in range(m)])
solver.solve(multi_linear_loss)

print("Estimated parameter:\n", solver.get_parameters().reshape((p, m)))
print("True parameter:\n", coef)

Estimated parameter:
 [[35.71403298 88.78090329]
 [ 0.          0.        ]
 [ 0.          0.        ]
 [70.74618577  3.66011248]
 [62.16419429  7.0035477 ]]
True parameter:
 [[35.71403294 88.78090529]
 [ 0.          0.        ]
 [ 0.          0.        ]
 [70.74619082  3.66011168]
 [62.16419561  7.00354776]]
