# Mean vs log mean

The purpose of this notebook is to investigate how the mean average of negative log likelihood varies simply with the likelihood. In my specific case the likelihood is represented by a Gaussian probability density which we aim to maximise by minimising its negative logarithm. In testing the TCN I have noticed that when the model overfits the train data then on the test data the probability density remains fairly constant whilst the loss function, which is based on the negative logarithm of the probability density, increases dramatically. Here I will attempt to explain this outcome which at first may seem counter-intuitive.

Firtly it should be noted that this phenomena could occur regardless of the form of the likelihood function. Therefore, despite my particular likelihood function being a Gaussian probability density we will denote the more general likelihood function simply as $z$. Next the negative log likelihood is defined simply as $-ln(z)$. Then the two values we wish to compare are $\Sigma_n z$ and $\Sigma_n (-ln(z))$ where $n$ is the number of samples in the test dataset.

In particular we want to identify the circumstances under which $\Sigma_n (-ln(z))$ can vary whilst $\Sigma_n z$ remains constant. I suspect that this is the case when extreme values of $z$ are encountered. We will begin therefore by starting with a uniform set of $z$ such that all elements are identical and then begin to vary each element under the strict condition that $\Sigma_n z$ remains unchanged.

## Uniform $z$

In [14]:
import numpy as np

In [22]:
# Let us start with this set for z
z_0 = 0.4*np.ones(100)
z_0

array([0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
       0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
       0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
       0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
       0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
       0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
       0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
       0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4])

In [26]:
# NLL
NLL = -np.log(z_0)
NLL

array([0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629073,
       0.91629073, 0.91629073, 0.91629073, 0.91629073, 0.91629

In [31]:
# average z and average NLL
print(np.mean(z_0))
print(np.mean(NLL))

0.3999999999999999
0.9162907318741549


In [56]:
# Here we have a fairly low value for the average of NLL 
# furthermore in this case the order of operations is irrelevant
print(-np.log(np.mean(z_0)))
print(np.mean(-np.log(z_0)))

0.9162907318741553
0.9162907318741549


## Varying $z$

In [38]:
# Let us now vary z whilst keeping the average value 0.4
z_1 = np.linspace(0.1,0.7,100)
z_1

array([0.1       , 0.10606061, 0.11212121, 0.11818182, 0.12424242,
       0.13030303, 0.13636364, 0.14242424, 0.14848485, 0.15454545,
       0.16060606, 0.16666667, 0.17272727, 0.17878788, 0.18484848,
       0.19090909, 0.1969697 , 0.2030303 , 0.20909091, 0.21515152,
       0.22121212, 0.22727273, 0.23333333, 0.23939394, 0.24545455,
       0.25151515, 0.25757576, 0.26363636, 0.26969697, 0.27575758,
       0.28181818, 0.28787879, 0.29393939, 0.3       , 0.30606061,
       0.31212121, 0.31818182, 0.32424242, 0.33030303, 0.33636364,
       0.34242424, 0.34848485, 0.35454545, 0.36060606, 0.36666667,
       0.37272727, 0.37878788, 0.38484848, 0.39090909, 0.3969697 ,
       0.4030303 , 0.40909091, 0.41515152, 0.42121212, 0.42727273,
       0.43333333, 0.43939394, 0.44545455, 0.45151515, 0.45757576,
       0.46363636, 0.46969697, 0.47575758, 0.48181818, 0.48787879,
       0.49393939, 0.5       , 0.50606061, 0.51212121, 0.51818182,
       0.52424242, 0.53030303, 0.53636364, 0.54242424, 0.54848

In [40]:
# NLL
NLL = -np.log(z_1)
NLL

array([2.30258509, 2.24374459, 2.18817474, 2.13553101, 2.08552059,
       2.03789254, 1.99243016, 1.94894505, 1.90727236, 1.86726702,
       1.82880074, 1.79175947, 1.75604139, 1.72155521, 1.68821879,
       1.65595793, 1.62470538, 1.59440004, 1.56498615, 1.53641278,
       1.50863321, 1.48160454, 1.45528723, 1.4296448 , 1.4046435 ,
       1.38025205, 1.3564414 , 1.33318454, 1.31045628, 1.28823315,
       1.26649316, 1.24521576, 1.22438168, 1.2039728 , 1.18397214,
       1.16436367, 1.1451323 , 1.12626382, 1.10774477, 1.08956245,
       1.07170484, 1.05416053, 1.03691872, 1.01996916, 1.00330211,
       0.9869083 , 0.97077892, 0.95490557, 0.93928025, 0.92389533,
       0.90874353, 0.89381788, 0.87911173, 0.86461872, 0.85033276,
       0.83624802, 0.82235891, 0.80866007, 0.79514635, 0.78181282,
       0.76865473, 0.75566754, 0.74284685, 0.73018845, 0.71768829,
       0.70534245, 0.69314718, 0.68109884, 0.66919394, 0.6574291 ,
       0.64580106, 0.63430668, 0.62294292, 0.61170685, 0.60059

In [42]:
# average z and average NLL
print(np.mean(z_0))
print(np.mean(NLL))

0.3999999999999999
1.0353726039740108


In [58]:
# We have acheived a variation in NLL whilst keeping the average z fixed
# Furthermore notice that the order of operations now effects the outcome
print(-np.log(np.mean(z_1)))
print(np.mean(-np.log(z_1)))

0.916290731874155
1.0353726039740108


## Extreme values

In [170]:
# Linear variation

factor = 0.006

z_2 = 0.4*np.ones(100)

for indx in range(len(z_2)):
    z_2[indx] = z_2[indx]+indx*factor-factor*(len(z_2)-1)/2
    
print(z_2)
print()
print(-np.log(z_2))
print()
print("Avg z:",np.mean(z_2))
print("Avg NLL:",np.mean(-np.log(z_2)))

[0.103 0.109 0.115 0.121 0.127 0.133 0.139 0.145 0.151 0.157 0.163 0.169
 0.175 0.181 0.187 0.193 0.199 0.205 0.211 0.217 0.223 0.229 0.235 0.241
 0.247 0.253 0.259 0.265 0.271 0.277 0.283 0.289 0.295 0.301 0.307 0.313
 0.319 0.325 0.331 0.337 0.343 0.349 0.355 0.361 0.367 0.373 0.379 0.385
 0.391 0.397 0.403 0.409 0.415 0.421 0.427 0.433 0.439 0.445 0.451 0.457
 0.463 0.469 0.475 0.481 0.487 0.493 0.499 0.505 0.511 0.517 0.523 0.529
 0.535 0.541 0.547 0.553 0.559 0.565 0.571 0.577 0.583 0.589 0.595 0.601
 0.607 0.613 0.619 0.625 0.631 0.637 0.643 0.649 0.655 0.661 0.667 0.673
 0.679 0.685 0.691 0.697]

[2.27302629 2.2164074  2.16282315 2.11196473 2.06356819 2.01740615
 1.97328135 1.93102154 1.89047544 1.85150947 1.81400508 1.77785656
 1.74296931 1.70925825 1.67664666 1.64506509 1.61445045 1.5847453
 1.55589715 1.52785793 1.50058351 1.47403328 1.44816976 1.42295835
 1.39836694 1.37436579 1.35092722 1.32802545 1.30563646 1.28373777
 1.26230838 1.24132859 1.22077992 1.20064501 1.18090753

Making the variations linear limits the Avg NLL. However, theoretically we can subtract any amount from one element and as long as we add it onto another then the average will remain unaffected.

In [174]:
z_3 = 0.4*np.ones(100)

change_num = 20

for indx in range(change_num): 
    z_3[indx] = z_3[indx] - 0.3999
    z_3[-indx-1] = z_3[-indx-1] + 0.3999

print(z_3)
print()
print(-np.log(z_3))
print()
print("Avg z:",np.mean(z_3))
print("Avg NLL:",np.mean(-np.log(z_3)))

[1.000e-04 1.000e-04 1.000e-04 1.000e-04 1.000e-04 1.000e-04 1.000e-04
 1.000e-04 1.000e-04 1.000e-04 1.000e-04 1.000e-04 1.000e-04 1.000e-04
 1.000e-04 1.000e-04 1.000e-04 1.000e-04 1.000e-04 1.000e-04 4.000e-01
 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01
 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01
 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01
 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01
 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01
 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01
 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01
 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01 4.000e-01
 4.000e-01 4.000e-01 4.000e-01 7.999e-01 7.999e-01 7.999e-01 7.999e-01
 7.999e-01 7.999e-01 7.999e-01 7.999e-01 7.999e-01 7.999e-01 7.999e-01
 7.999e-01 7.999e-01 7.999e-01 7.999e-01 7.999e-01 7.999e-01 7.999e-01
 7.999

It seems that we can further increase the NLL bycreating extreme values in the array but note that the values would have to be even further ridiculously extrem to acheive an NLL even above 10...

In [185]:
z_3 = 0.4*np.ones(100)

change_num = 30

for indx in range(change_num): 
    z_3[indx] = z_3[indx] - (0.4-10**-15)
    z_3[-indx-1] = z_3[-indx-1] + (0.4-10**-15)

print(z_3)
print()
print(-np.log(z_3))
print()
print("Avg z:",np.mean(z_3))
print("Avg NLL:",np.mean(-np.log(z_3)))

[9.99200722e-16 9.99200722e-16 9.99200722e-16 9.99200722e-16
 9.99200722e-16 9.99200722e-16 9.99200722e-16 9.99200722e-16
 9.99200722e-16 9.99200722e-16 9.99200722e-16 9.99200722e-16
 9.99200722e-16 9.99200722e-16 9.99200722e-16 9.99200722e-16
 9.99200722e-16 9.99200722e-16 9.99200722e-16 9.99200722e-16
 9.99200722e-16 9.99200722e-16 9.99200722e-16 9.99200722e-16
 9.99200722e-16 9.99200722e-16 9.99200722e-16 9.99200722e-16
 9.99200722e-16 9.99200722e-16 4.00000000e-01 4.00000000e-01
 4.00000000e-01 4.00000000e-01 4.00000000e-01 4.00000000e-01
 4.00000000e-01 4.00000000e-01 4.00000000e-01 4.00000000e-01
 4.00000000e-01 4.00000000e-01 4.00000000e-01 4.00000000e-01
 4.00000000e-01 4.00000000e-01 4.00000000e-01 4.00000000e-01
 4.00000000e-01 4.00000000e-01 4.00000000e-01 4.00000000e-01
 4.00000000e-01 4.00000000e-01 4.00000000e-01 4.00000000e-01
 4.00000000e-01 4.00000000e-01 4.00000000e-01 4.00000000e-01
 4.00000000e-01 4.00000000e-01 4.00000000e-01 4.00000000e-01
 4.00000000e-01 4.000000

It would seem that we can guarantee a sensible value for the NLL in all but the most extreme scenarios. Therefore I should go back and check the validity of the values for $z$ being taken over the test set.

Whilst extremely small values are cerainly possible for a Gaussian, particularly when the variance is low, the numbers I am seeing are very extreme and there may be errors in the evaluation code. Even if there are no errors perhaps I will introduce a minimum value for $z$ to give more interpretable results in the evaluation function.

## Outcome

I have found that in these first experimental runs the loss simply is extremely large as the model is just terrible. For any decent model the losses will remain reasonable and will not be a concern. Furthermore the operations of taking logarithms and averages should remain fairly interchangable and cause only a minor numerical difference.

Having said that I have decided to limit the variance in both the training loop, and the evaluation loop too for consistency. This implementation in the training loop should help to prevent overfitting by preventing the model from continually reducing the variance as it further overfits to the training data, although this may only slow down the overfitting rather than completely preventing it. 