### Numerical Stable Binary Cross Entropy for Logistic Regression


In [1]:
import numpy as np

In [2]:
def logistic_function(z):
    return 1/(1 + np.exp(-z))

#### Machine limits for floating point types

In [3]:
np.finfo(np.double)

finfo(resolution=1e-15, min=-1.7976931348623157e+308, max=1.7976931348623157e+308, dtype=float64)

We are interested in the smallest positive number which can be represented as a `float64`:

In [4]:
smallest_positive_number = np.nextafter(0, 1)
smallest_positive_number

5e-324

In [5]:
np.log(smallest_positive_number)

-744.4400719213812

In [6]:
np.exp(-745)

5e-324

In [7]:
np.exp(-746)

0.0

Numerically, we have $\exp(-746) \approx 0$ and for all number smaller then $-746$ the exponent
is also zero.

In logistic regression the prediction model has the parameters $\theta$. 
The dot product between $\theta$ and the input $x$ computes the logit $z$.
$$
z = \vec x^T \cdot \vec \theta
$$

In [8]:
theta = np.array([-1.])

In [9]:
x = np.array([.1])

In [10]:
z = np.dot(theta, x)
z

-0.1

With the logit $z$ the prediction is computed with the logistic function:
$$p(y \mid x;\theta)= \sigma(z) = \frac{1}{1+\exp(-z)}$$ 

Problem:

1. if $z$ is a large number, $\sigma(z)$ becomes $1$, i.e. numerically $\sigma(z)\approx 1$.
2. if $z$ is a large negative number, $\sigma(z)$ becomes $0$, i.e. numerically $\sigma(z)\approx 0$.

for the cross entropy calculation:

- if for 1. the true label is $0$ the non-zero term in the cross-entropy is 
$-\log(1-\sigma(z)) \approx -\log(1-1) = - \log(0) = -\infty$

resp.
- if for 1. the true label is $1$ the non-zero term in the cross-entropy is 
$-\log(\sigma(z)) \approx -\log(0) = -\infty$

We get the following error if we calculate the log of 0:

In [11]:
np.log(0)

  """Entry point for launching an IPython kernel.


-inf

However, it is possible to transform $-\log(\sigma(z))$ algebraically
to get a numrical stable version fo log-sigma:

From
$$
 - \log\left(\sigma(z)\right) = - \log\left(\frac{1}{1+\exp(-z)}\right)
 = \log\left(1+\exp(-z)\right) = \log\left(\exp(0) + \exp(-z)\right)
$$

with 
- $\mu = \max(0, -z)$


We get:

$$
- \log\left(\sigma(z)\right) = \log\left(\left(\exp(0) + \exp(-z)\right)\right) = \\
\log\left(\frac{\exp(\mu)}{\exp(\mu)}\left(\exp(0) + \exp(-z)\right)\right) =\\
\mu + \log\left(\left(\exp(0-\mu) + \exp(-z-\mu)\right)\right)
$$

e.g. if $z=-20$ then $-z = \mu=20$

$\Rightarrow$

$- \log\left(\sigma(z)\right) = 20 + \log(\exp(-20) + \exp(20-20)) $

In [12]:
nl = lambda x: x+np.log(np.exp(-x)+np.exp(0))
nl(20)

20.000000002061153

In [13]:
-np.log(logistic_function(-20))

20.000000002061153

In [14]:
#numerical stable even for large -z = 1000
nl(1000)

1000.0

In [15]:
# instable version
-np.log(logistic_function(-1000))

  
  


inf

Analog for the negative probability, i.e. $\log(1-\sigma(z))$:

with
$$
p(y=0) = 1 - \sigma(z) = \sigma(-z) 
$$

and with 
$\mu = \max(0, z)$

$$
- \log\left(1- \sigma(z)\right) = - \log\left(\sigma(-z)\right)= \log\left(\frac{\exp(\mu)}{\exp(\mu)}\left(\exp(0) + \exp(z)\right)\right) = \\
\log\left(\frac{\exp(\mu)}{\exp(\mu)}\left(\exp(0) + \exp(z)\right)\right) =\\
\mu + \log\left(\left(\exp(0-\mu) + \exp(z-\mu)\right)\right)
$$

In [16]:
def cross_entropy(z, y):
    """
    Computes the cross-entropy for a single logit value and a given target class.
    
    Parameters
    ----------
    z : float64 or float32
    The logit
    y : int
    The target class
    
    Returns
    -------
    floatX
    The cross entropy value (negative log-likelihood)
    """
    mu = max([0, -z])
    r1 = y * (mu + np.log(np.exp(-mu)+np.exp(-z-mu)))  
        
    mu = max([0, z])
    r2 = (1-y) * (mu + np.log(np.exp(-mu)+np.exp(z-mu))) 
    return r1 + r2


In [17]:
z=np.array([-1000., 10000., 10., 78., -11.])
y=np.array([0, 1, 1, 0, 0])

In [18]:
x_ent=np.zeros_like(z)
for i, (z_, y_) in enumerate(zip(z,y)):
    x_ent[i] = cross_entropy(z[i], y[i])
x_ent

array([0.00000000e+00, 0.00000000e+00, 4.53988992e-05, 7.80000000e+01,
       1.67015613e-05])

#### Task

Implement the `cross_entropy` function for a (mini-)batch, i.e. 
`z` and `y` are 1d numpy arrays.
Implement it in a vectorized fashion, i.e. don't use pyton loops etc.


In [19]:
def batched_cross_entropy(z, y):
    """
    Computes the cross-entropy for a batch of logit values and a given target classes.
    
    Parameters
    ----------
    z : ndarray with dtype float64 or float32
    The logits
    y : ndarray with dtype int
    The target classes
    
    Returns
    -------
    ndarray with floats
    The cross entropy values (negative log-likelihood)
    """
    pass#your task

In [27]:
# This test must pass
np.testing.assert_array_almost_equal(batched_cross_entropy(z, y), x_ent)

[1000.    0.    0.    0.   11.]
