
Numerical underflow is a common issue when working with probabilities, especially in Naive Bayes classifiers, where probabilities of features given a class are multiplied together. As probabilities often range between 0 and 1, multiplying many small probabilities can lead to numerical underflow, where the result becomes too small to represent accurately with floating-point arithmetic.

This notebook demonstrates how to handle numerical underflow by converting probabilities to logarithmic scale. Then instead of multiplying probabilities, sum their logarithms.

In [1]:
import numpy as np

p_A_good = np.array([0.000068, 0.00005, 0.00073])
p_good = 0.7

print("P(A|good)           :", np.prod(p_A_good))
print("P(A|good) * P(good) :", np.prod(p_A_good) * p_good)
print("sum(log(P(A|good))) :", np.sum(np.log(p_A_good)))
print("sum(log(P(Algood))) + log(P(good)): ", np.sum(np.log(p_A_good)) + np.log(p_good))

p_A_bad = np.array([0.000005, 0.0000075, 0.0000087])
p_bad = 0.3
print("P(A|bad)           :", np.prod(p_A_bad))
print("P(A|bad) * P(bad)  :", np.prod(p_A_bad) * p_bad)
print("sum(log(P(A|bad))) :", np.sum(np.log(p_A_bad)))
print("sum(log(P(A|bad))) + log(P(bad)) :", np.sum(np.log(p_A_bad)) + np.log(p_bad))

P(A|good)           : 2.482e-12
P(A|good) * P(good) : 1.7374e-12
sum(log(P(A|good))) : -26.721956429146132
sum(log(P(Algood))) + log(P(good)):  -27.078631373084864
P(A|bad)           : 3.2625000000000003e-16
P(A|bad) * P(bad)  : 9.7875e-17
sum(log(P(A|bad))) : -35.658867715255916
sum(log(P(A|bad))) + log(P(bad)) : -36.86284051958185
