The following borrows heavily from [python.org](https://docs.python.org/3/tutorial/floatingpoint.html)

# Floating Point Numbers
Computers store floating-point numbers in memory as base 2 (or binary) fractions. Most decimal fractions cannot be represented exactly as binary fractions; therefore, floating-point numbers are only approximations of the actually values they represent. Most users are not aware of the approximation because Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. On most machines, if Python were to print the true decimal value of the binary approximation, it may surprise you.

Just as base 10 numbers can be represented as decimal fractions of powers of 10,
$$0.125 = \frac{1}{10} + \frac{2}{100} + \frac{5}{1000}$$
we can represent numbers in base 2 using fractions of the power of 2,
$$0.125 = \frac{0}{2} + \frac{0}{4} + \frac{1}{8}$$
Unfortunately, most decimal fractions cannot be represented exactly as binary fractions. We see this in base 10 when considering the fraction $\frac{1}{3}$,
$$\frac{1}{3} = 0.333333....$$
No matter how many 3's we choose to display the number on the right-hand side will never be equivalent to $\frac{1}{3}$.

In the same way, no matter how many base 2 digits you’re willing to use, the decimal value 0.1 cannot be represented exactly as a base 2 fraction. In base 2, $\frac{1}{10}$ is the infinitely repeating fraction

0.0001100110011001100110011001100110011001100110011...

Stop at any finite number of bits, and you get an approximation.

It’s easy to forget that the stored value is an approximation to the original decimal fraction, because of the way that floats are displayed at the interpreter prompt. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. If Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display

In [None]:
format(0.1, '.32f')

That is more digits than most people find useful, so Python keeps the number of digits manageable by displaying a rounded value instead. It’s important to realize that this is, in a real sense, an illusion: the value in the machine is not exactly $\frac{1}{10}$, you’re simply rounding the display of the true machine value. This fact becomes apparent as soon as you try to do arithmetic with these values

In [None]:
0.1+0.2

Note that this is in the very nature of binary floating-point: this is not a bug in Python, and it is not a bug in your code either. You’ll see the same kind of thing in all languages that support your hardware’s floating-point arithmetic (although some languages may not display the difference by default, or in all output modes).

Other surprises follow from this one.

In [None]:
round(2.675,2)

The documentation for the built-in **round()** function says that it rounds to the nearest value, rounding ties away from zero. Since the decimal fraction 2.675 is exactly halfway between 2.67 and 2.68, you might expect the result here to be (a binary approximation to) 2.68. It’s not, because when the decimal string 2.675 is converted to a binary floating-point number, it’s again replaced with a binary approximation, whose exact value is 

2.67499999999999982236431605997495353221893310546875

Since this approximation is slightly closer to 2.67 than to 2.68, it’s rounded down.

## The Perils of Floating Point
A famous post [The Perils of Floating Point](http://www.indowsway.com/floatingpoint.htm) by Bruce Bush enumerates the many pitfalls of floating-point arithmetic (mostly in FORTRAN, hey, it was the gold standard at one time). The errors in Python float operations are inherited from floating-point hardware, and on most machines won't exceed more than 1 part in $2^{53}$ per operation. That's more than adequate for most tasks, but you need to keep in mind that it's not decimal arithmetic and that every float operation can suffer a new rounding error.

While pathological cases do exist, for the most casual use of floating-point arithmetic you'll see the results you expect in the end if you simply round the display of your final results to the number of decimal digits you expect. Some cases which require exact decimal representations, try using `decimal` module which implements decimal arithmetic suitable for accounting applications and high-precision applications. 


In [None]:
# The following code may execute differently in different IDEs
import decimal as dc
x= 0.10
true_x = dc.Decimal(x)
print("64-bit exact: ", true_x) 

In [None]:
import numpy as np
x_32 = np.float32(0.1)
print("32-bit exact:", format(x_32, ".55f"))

In [None]:
dc.Decimal(2.675)

Another form of exact arithmetic is supported by the `fractions` module which implements arithmetic based on rational numbers. 

In [None]:
from fractions import Fraction

# Example floating point number
num = 0.19755

# Convert to a fraction
frac = Fraction(num).limit_denominator()

# Show approximation in base 2 representation
binary_rep = frac.limit_denominator(2**9)  # Limit denominator to power of 2

print(f"Floating point number: {num}")
print(f"Fraction approximation: {frac}")
print(f"Binary approximation (base-2 denominator): {binary_rep}")


Base Python even provides tools that may help on those rare occasions when you really do want to know exact values of a float.

## Floating-Point Notation
Floating-point numbers are represented in IEEE-754 standard notations that consists of 3 fields:

- sign
- mantissa
- exponent

Each field contains information about the number which is represented in the form,

$$(-1)^s \times 1.f\times 2^{e-bias}$$

The bias exists so that numbers can be compared by the same hardware that compares signed integers. The bias takes a value of 127 for 32-bit numbers and 1023 for 64-bit numbers. The leading 1 is part of the "normalization", and is assumed and not stored, saving a bit...of memory. Depending whether the float is 32-bit or 64-bit, each field occupies a different amount of space.

<img src="figures/IEEE_754_Single_Floating_Point_Format.png">
<img src="figures/IEEE_754_Double_Floating_Point_Format.png">


### Manual Algorithm to Transform Numbers to Floating Point
You can directly convert any number into floating point notation (32- or 64-bit) using the following algorithm
1. Normalize the number into $1.M×2^E$
2. Determine the sign bit (0 for positive, 1 for negative)
3. Calculate the biased exponent ($E+127$ or $E+1023$)
4. Extract and convert the fractional part of the mantissa to binary (there is a quick and seeming simple way of doing this)
5. Assemble the final 32-bit representation.

Of course, this seems like unnecessary work, especially since the notation is strictly used to store data on computers - let the computer do the work.

In [None]:
# The following code requires conversion.py
import conversion as cvn
x = 0.15625
x32 = num2float32(x)
x64 = num2float64(x)
print(f'{x32}\n{x64}')

## Special Cases
### Zero 
IEEE-754 does make a distinction between $\pm 0$ with the sign bit. The exponent and mantissa bits are all set to zero.
### Infinity
IEEE-754 does represent $\pm \infty$ with the sign bit. The exponent bits are all set to 1, while the mantissa is set to all zeros
### NaN
IEEE-754 has two types of NaNs. The "quiet" NaNs have an insignificant sign bit, exponent bits set to 1 and the mantissa '10000....'. The "signaling" NaNs also have an insignificant sign bit, exponent all exponent bits set to 1, and the mantissa '000....(something nonzero)'. The signaling NaN exists to trip an exception in execution.

Below is code that `claude` produced that supposedly creates a signaling NaN. I have my doubts. Python creates all NaNs as quiet by default and trying to construct a signaling NaN is not straightforward.

In [None]:
import struct

def is_signaling_nan(f):
    # Interpret the float as raw bits (single-precision)
    bits = struct.unpack('!I', struct.pack('!f', f))[0]
    exponent = (bits >> 23) & 0xFF
    mantissa = bits & 0x7FFFFF
    # A signaling NaN has exponent == 255, mantissa != 0, and the most significant bit of the mantissa is 0
    return exponent == 0xFF and mantissa != 0 and not (mantissa & 0x400000)

# Example: Manually create a signaling NaN
snan_bits = 0x7f800001  # IEEE 754 single-precision signaling NaN bit pattern
snan = struct.unpack('!f', struct.pack('!I', snan_bits))[0]

# Check if it is a signaling NaN
print("Signaling NaN detected:", is_signaling_nan(snan))

