**Author:** Zidong Chen

**Introduction:** This notebook explains why non-positive-definite or Nan error occurs during GP model training, a problem should not happen in theory. If you are aware of floating point error and IEEE754 standard, you can skip this notebook.

Let's start with a simple but mind-blowing example.

In [17]:
a = 0.1 + 0.2
b = 0.3
print("a == b:", a == b)

a == b: False


The result is False. It is because the floating-point number is represented in binary in the computer, and the binary representation of 0.1, 0.2, and 0.3 are not exact. The actual values are:

In [18]:
print("a:", a)
print("b:", b)
print("Difference:", abs(a - b))

a: 0.30000000000000004
b: 0.3
Difference: 5.551115123125783e-17


In [14]:
print("1e-200=0 is:", 1e-200 == 0)
print("1e-2000=0 is:", 1e-2000 == 0)

1e-200=0 is: False
1e-2000=0 is: True


The IEEE 754 standard is a widely adopted set of rules for binary floating-point arithmetic in computers. It was established by the Institute of Electrical and Electronics Engineers (IEEE) to ensure consistency and portability of numerical computations across different computing systems. Here’s a detailed overview:
### Key Features of IEEE 754 Standard

#### Representation of Floating Point Numbers

- **Sign Bit**: Determines if the number is positive or negative.
- **Exponent**: Represents the power to which the base (usually 2) is raised.
- **Mantissa (or Significand)**: Represents the significant digits of the number.

#### Floating Point Formats

- **Single Precision (32-bit)**
  - 1 bit for the sign
  - 8 bits for the exponent
  - 23 bits for the mantissa


In [4]:
import struct


def float_to_ieee754_components(f):
    # Convert the float to raw binary representation
    binary_rep = struct.unpack('>I', struct.pack('>f', f))[0]

    # Extract sign (1 bit), exponent (8 bits), and significand (23 bits)
    sign = (binary_rep >> 31) & 0x1
    exponent = (binary_rep >> 23) & 0xFF
    significand = binary_rep & 0x7FFFFF

    # Compute the actual exponent by subtracting the bias (127 for single precision)
    actual_exponent = exponent - 127

    # Convert the significand to the normalized form by adding the implicit leading 1
    if exponent != 0:
        normalized_significand = 1 + significand / (2 ** 23)
    else:
        # Handle the case for denormals
        normalized_significand = significand / (2 ** 23)

    return sign, actual_exponent, normalized_significand


def print_float_components(f):
    sign, exponent, significand = float_to_ieee754_components(f)
    print(f"Floating-point number: {f}")
    print(f"Sign: {sign}")
    print(f"Exponent (actual): {exponent}")
    print(f"Significand (normalized): {significand}")
    print(f"Representation: (-1)^{sign} * {significand} * 2^{exponent}")

In [31]:
# Example usage
float_number = 0.1
print_float_components(float_number)

Floating-point number: 0.1
Sign: 0
Exponent (actual): -4
Significand (normalized): 1.600000023841858
Representation: (-1)^0 * 1.600000023841858 * 2^-4


In [19]:
(-1) ** 0 * 1.600000023841858 * 2 ** (-4)

0.10000000149011612

**Catastrophic Cancellation:** This occurs when subtracting two nearly equal numbers, resulting in a significant loss of precision. The significant digits cancel out, leaving behind the less significant, error-prone digits. This can happen quite often in kernel computations, leading to non-positive-definite matrices or NaN errors during GP model training. This also explains why remove similar data points can help fix the problem.