# Log-Loss

In this notebook, we will discuss a closed form solution for logistic regression when the observed labels/targets are in the open interval $(0,1)$. For a dataset $D_n = \{(X_i,Y_i)\}_{i=1}^n$ where $X_i\in \mathbb{R}^d$ and $Y_i \in (0,1)$ for all $i \in [n]$, the log-loss with sigmoid link function is defined as

$$
L(\theta) = \sum_{i=1}^n Y_i \log \frac{1}{\sigma(X_i^\top \theta)} + (1-Y_i)\log\frac{1}{1-\sigma(X_i^\top \theta)}
$$
where $\theta \in \mathbb{R}^d$ and $\sigma(x) = 1 / (1 + \exp(-x))$ is the sigmoid activation function.

The gradient of $L(\theta)$ with respect to $\theta$ is simply
$$
\nabla_\theta L(\theta) = \sum_{i=1}^n (\sigma(X_i^\top \theta) - Y_i)X_i\,.
$$
In order to minimize $L(\theta)$, it sufficies to find a $\bar\theta$ such that $\nabla_\theta L(\bar\theta) = 0$. So long as $Y_i \in (0,1)$ this can be achieved by noticing that for any $i \in [n]$

$$
\sigma(X_i^\top \theta) - Y_i = 0 \implies \sigma(X_i^\top \theta) = Y_i \implies X_i^\top \theta = \log \frac{Y_i}{1-Y_i} \implies X_i^\top \theta - \log \frac{Y_i}{1-Y_i} = 0 = \sigma(X_i^\top \theta) - Y_i\,.
$$
Therefore
$$
\nabla_\theta L(\theta) = 0 \approx \sum_{i=1}^n \left( X_i^\top \theta - \underbrace{\log \frac{Y_i}{1-Y_i}}_{Z_i}\right)X_i = \sum_{i=1}^n \left( X_i^\top \theta - Z_i\right)X_i\,.
$$
which gives us that $\bar \theta \approx (X_i^\top X_i)^{-1} X_i^\top Z_i$. Note that we use $\approx$ since we are note including information about the individual $X_i$'s when reweighting the targets.  

Of course one can emperical verify whether this holds.





In [None]:
import numpy as np
import scipy as sc

def sigmoid(x):
    x[x < -36] = -36
    x[x > 36] = 36
    return 1 / (1 + np.exp(-x))

d = 10
n = 100000
trials = 1500000
X = np.random.normal(size=(n,d))
theta_star = np.random.uniform(low=0,high=3,size=d)
Y = np.zeros(n)
while 0 in Y or 1 in Y:
  theta_star = np.random.uniform(low=0,high=1,size=d)
  Y = np.random.binomial(trials, sigmoid(X@theta_star)) / trials


In [None]:
class Sigmoid_Regression(object):
    def __init__(self, features, obs, d, theta_init, minimizer):
        self.features = features
        self.obs = obs
        self.n = len(obs)
        self.d = d
        self.theta_init = theta_init
        self.minimizer = minimizer

    def sigmoid(self, x):
        x[x < -36] = -36
        x[x > 36] = 36
        return 1 / (1 + np.exp(-x))

    def objective_function(self, theta, X, Y):
        p = self.sigmoid(X @ theta)
        return -1.0 * np.sum(Y * np.log(p) + (1 - Y) * np.log(1 - p))

    def objective_grad(self, theta, X, Y):
        p = self.sigmoid(np.matmul(X,theta))
        scalar = p - Y
        return scalar.T @ X

    def logistic_regression(self):
        self.res = sc.optimize.minimize(
            self.objective_function,
            x0 = self.theta_init,
            args=(self.features,self.obs),
            jac = self.objective_grad,
            method = self.minimizer,
            options={"gtol":1e-8}
        )
        return self.res.x

In [None]:
clf = Sigmoid_Regression(X, Y, d, np.zeros(d), 'bfgs')
theta1 = clf.logistic_regression()

theta_closed_form = np.linalg.solve(X.T @ X, X.T @ np.log(Y / (1-Y)))

In [None]:
np.linalg.norm(theta_star - theta_closed_form)

3.0509288544447957e-05

In [None]:
np.linalg.norm(theta_star - theta1)

2.4848336634736298e-05

In [None]:
theta1

array([0.35809734, 0.79947218, 0.92197344, 0.07150805, 0.204389  ,
       0.63713671, 0.06229981, 0.72916141, 0.88325694, 0.30439809])

In [None]:
theta_closed_form

array([0.35808919, 0.79948057, 0.92197332, 0.07151759, 0.20439192,
       0.63713304, 0.06229445, 0.72915478, 0.88326779, 0.30440302])

Note things get a bit different if we look at the gradient of both the closed form solution and the solution found by minimizing the loss function iteratively. So use with caution.

In [None]:
np.linalg.norm(clf.objective_grad(theta1,X,Y))

6.7045541285553925e-06

In [None]:
np.linalg.norm(clf.objective_grad(theta_closed_form,X,Y))

0.33248616612509513

In [None]:
np.linalg.norm(clf.objective_grad(theta_star,X,Y))

0.3624837946972413