In [1]:
%matplotlib inline


# Generalized Linear Model


## Inverse Gaussian Regression
The inverse Gaussian distribution is a continuous probability distribution with probability density function:
$$
f(y|\mu,\lambda) = \sqrt{\frac{\lambda}{2\pi y^3}}\exp\left(-\frac{\lambda(y-\mu)^2}{2\mu^2y}\right)
$$
where $y>0$, mean $\mu>0$, and dispersion parameter $\lambda>0$. In fact, the inverse Gaussian distribution can also use the variance $\sigma^{2}$ to represent the shape parameter, and the relationship between the two is $\lambda = 1/\sigma^{2}$. We then use $\sigma^{2}$ to reparameterize the probability density function of the inverse Gaussian distribution:
$$
f\left(y ; \mu, \sigma^2\right)=\frac{1}{\sqrt{2 \pi y^3 \sigma^2}} \exp \left\{-\frac{(y-\mu)^2}{2(\mu \sigma)^2 y}\right\}
$$

In inverse Gaussian regression, we assume that the response variable $yi$ follows an inverse Gaussian distribution with mean $\mu_i$ and variance parameter $\sigma^{2}$. In GLM, we model the mean $\mu_i$ as $\frac{1}{2 \mu_i^2}=x_i^T\beta$.

where $\beta$ is a vector of unknown regression coefficients, and $x_i$ is a vector of predictor variables for the $i$th observation.

With $n$ independent data of the explanatory variables $x$ and the response variable $y$, we can estimate $\beta$ by minimizing the negative log-likelihood function under sparsity constraint:
$$
\arg \min _{\beta \in R^p} L(\beta):=-\frac{1}{n} \sum_{i=1}^n \left\{\frac{y_i x_i^T\beta-\sqrt{2 x_i^T\beta}}{-\sigma^2}-\frac{1}{2 y_i \sigma^2}-\frac{1}{2} \ln \left(2 \pi y_i^3 \sigma^2\right)\right\}, \text { s.t. }\|\beta\|_0 \leq s .
$$

Here is Python code for solving sparse Inverse Gaussian Regression problem:

In [2]:
import jax.numpy as jnp
import numpy as np
from scope import ScopeSolver
import numpy as np



In [3]:
np.random.seed(4)

In [4]:
import numpy as np
from scipy.stats import invgauss

def simulate_inverse_gaussian_regression_data(n, p, s, lambda_):
    # Generate predictor variables
    X = np.random.normal(size=(n, p))
    
    beta = np.zeros(p)
    true_support_set = np.random.choice(p, s, replace=False)
    beta[true_support_set] = np.random.normal(0, 1, s)
    
    # Compute mean for each observation
    mu = np.exp(np.dot(X, beta))
    
    # Generate response variables
    y = invgauss.rvs(mu=mu, scale=lambda_/mu)
    
    return X, y, beta, true_support_set

n = 200
p = 10
s = 3
lambda_ = 1
X, y, true_params, true_support_set = simulate_inverse_gaussian_regression_data(n, p, s, lambda_)

In [5]:
#def inverse_gaussian_regression_loss(params):
 #   xbeta = jnp.clip(X @ params, -30, 30)
  #  return jnp.mean(jnp.sqrt(2*xbeta) - y * xbeta)

def inverse_gaussian_regression_loss(params):
    xbeta = jnp.clip(X @ params, -30, 30)
    return jnp.mean(y/(2*jnp.exp(2*xbeta))-jnp.exp(-xbeta))

solver = ScopeSolver(p, s)
solver.solve(inverse_gaussian_regression_loss, jit=True)

print("True support set: ", np.sort(true_support_set))
print("Estimated support set: ", np.sort(solver.support_set))
print("True parameters: ", true_params)
print("True loss value: ", inverse_gaussian_regression_loss(true_params))
print("Estimated parameters: ", solver.params)
print("Estimated loss value: ", inverse_gaussian_regression_loss(solver.params))

True support set:  [2 4 9]
Estimated support set:  [1 2 6]
True parameters:  [ 0.          0.         -0.23149863  0.          0.25948346  0.
  0.          0.          0.          1.36526277]
True loss value:  8.164849
Estimated parameters:  [0.         0.17128891 0.10860098 0.         0.         0.
 0.14299648 0.         0.         0.        ]
Estimated loss value:  -0.453147
