In [None]:
import numpy as np

# Floating-point numbers

Floating-point numbers consist of a significand or mantissa and exponent.

Example with 5-digits for significand (base-10):

$ 12.345 = \underbrace{12345}_\text{significand} 
\times \underbrace{10}_\text{base}\overbrace{^{-3}}^\text{exponent}$

Some numbers cannot be represented exactly with 5 digits (e.g. $7716/625 = 12.3456$) so they are rounded to the nearest number that can be represented.

Python floats are 64-bits and base-2 ("double precision" in C, Fortran), corresponding to 15 decimal digits and exponents of -308 to +308.

In [None]:
# Both of these should equal 1, 
# but 1/49 cannot be represented exactly as a float
print(1/51. * 51)
print(1/49. * 49)

In [None]:
# Neither 0.1, nor 0.2 can be represented exactly as a  float
a = 0.1
b = 0.2
print(a+b)

In [None]:
# Exact comparisons of floats can fail because of roundoff
c = 0.1 + 0.2
d = 0.3
print( c == d )

In [None]:
# Use a tolerance when testing for equality of floats
print( np.abs(c-d) < 1e-10 )

# Numpy has a function for testing if two floats are close
print( np.isclose( c, d, rtol=1e-10 ) )

### Special float values

***Overflow*** occurs when the maximum exponent (+308) is exceeded. The value becomes `inf`.

***Underflow*** occurs when the minimum exponent is exceeded. The value becomes `0`.

***NaN*** (not a number) occurs when dividing by zero or multiplying `inf*0`. Sometimes used to mark missing data. 

In [None]:
# Overflow
print( 1e300 * 1e30, 
      -1e300 * 1e30 )

In [None]:
# Underflow
print( 1e-300 * 1e-30, 
      -1e-300 * 1e-30 )

In [None]:
# Specify inf and NaN
print( np.inf, 
      -np.inf, 
      np.nan )

In [None]:
# Operations producing NaN
print( 1e300 * 1e30 * 0,
        np.inf * 0,
        np.inf - np.inf,
        np.inf / np.inf, )

In [None]:
# Any operation with NaN results in NaN
print(  2 * np.nan,
        2 + np.nan, )

### Key messages
* Floating-point calculations are *inexact* because of rounding. The effect is usually small, but rounding errors can sometimes accumulate (advanced topic).
* Floating-point numbers should be compared within a tolerance, not exact equality.
* Integer values are discrete. They don't have roundoff error and it's fine to test if two integers are equal (e.g. `i == j`)