Multiple linear regression is also known as multi output linear regression, multivariate linear regression, or multidimensional linear regression, which is used to predict multiple dependent variables using a linear combination of multiple independent variables. This is different from simple linear regression, which is used to predict a single dependent variable using a single independent variable.

The model is expressed as:
$$
    y = B^* x + \epsilon,
$$
where $y$ is an $m$-dimensional response variable, $x$ is $p$-dimensional predictors, $B \in R^{m \times p}$ is the sparse coefficient matrix, $\epsilon$ is an $m$-dimensional random noise variable with zero mean.

With $n$ independent data of the explanatory variables $X$ and the response variable $Y$, we can estimate $B^* $ by minimizing the loss function under sparsity constraint:
$$ arg\min_{B}L(B) := ||Y-B X||^2, s.t.  || B ||_ {0,2} \leq s, $$
where $|| B ||_ {0, 2}$ is the number of non-zero rows of $B$.

Here is Python code for solving sparse multiple linear regression problem:


In [None]:
from scope import ConvexSparseSolver, make_multivariate_glm_data
import jax.numpy as jnp

n, p, k, m = 10, 5, 3, 2
data = make_multivariate_glm_data(n=n, p=p, k=k, M=m, family="multigaussian", standardize=True)
def multi_linear_loss(para):
    para.reshape((p, m))
    return jnp.sum(jnp.square(data.y - jnp.matmul(data.x, para)))

solver = ConvexSparseSolver(p * m, k, group=[i for i in range(p) for j in range(m)])
solver.solve(multi_linear_loss)

print("Estimated parameter: ", solver.get_parameters(), "True parameter: ", data.coef_)