# Floating-point numbers

## Background

1. Computers cannot store every real number exactly. Many real numbers have infinite digits ($\pi$, $1/3$), while computers have finite memory. 

1. Scientific computing requires representing numbers across many orders of magnitudes, both huge numbers (e.g. mass of a planet) and tiny numbers (size of an atom) with good precision. Scientific notation can do that (e.g. $6.02\times 10^{23}$)

Floating-point numbers are a fast, memory-efficient way to represent many (not all) rational numbers.

## Floating-point numbers

Floating-point numbers consist of a significand (aka mantissa) and exponent. The decimal point "floats" or varies.

Example with 5-digits for significand (base-10):

$ 12.345 = \underbrace{12345}_\text{significand} 
\times \underbrace{10}_\text{base}\overbrace{^{-3}}^\text{exponent}$

Some numbers *cannot* be represented exactly with 5 digits (e.g. $7716/625 = 12.3456$) so they are rounded to the nearest number that can be represented (e.g. $12.3456 \Rightarrow 12.346$). 

Python floats are 64-bits and base-2 ("double precision" in C, Fortran), corresponding to 15 decimal digits and exponents of -308 to +308. 

In [None]:
# To illustrate float precision issues, 
# we will display numbers with 25 decimal places
precise = "{:.25f}"

In [27]:
# Many rational numbers cannot be represented exactly as floats

# How does Python represent 0.1, 0.2, 0.3 as floats?
a = 0.1
b = 0.2
c = 0.3
print( precise.format(a) )
print( precise.format(b) )
print( precise.format(c) )

0.1000000000000000055511151
0.2000000000000000111022302
0.2999999999999999888977698


In [30]:
# Rounding of floats introduces small errors in calculations
# Both of these should exactly equal 0.3, but rounding errors make them slightly different
print( precise.format( 0.1 + 0.2 ) )
print( precise.format( 0.3 ) )

0.3000000000000000444089210
0.2999999999999999888977698


In [31]:
# Another example of rounding errors
# Both of these should equal exactly 1.0, but rounding errors make them slightly different
print( precise.format( (1/51.) * 51 ) )
print( precise.format( (1/49.) * 49 ) )

1.0000000000000000000000000
0.9999999999999998889776975


In [None]:
# Testing equality of floats can fail because of roundoff errors
c = 0.3
d = 0.1 + 0.2

print( c == d ) # We expect this to be True because 0.1 + 0.2 should equal 0.3, but it is False

False


In [None]:
import numpy as np

# Instead of testing equality of floats, 
# test if the difference is smaller than a tolerance
# Tolerance can be absolute or relative

# Absolute tolerance
print( np.abs(c-d) < 1e-10 )

# Relative tolerance
print( np.abs(c-d) / np.abs(c) < 1e-10 )

True
True


In [35]:
# Numpy has a function for testing if two floats are close

# Tolerances can be absolute or relative
x = 1e10
y = x + 10

print( np.isclose( x, y, atol=10 ) )     # absolute tolerance: abs(x-y) < atol
print( np.isclose( x, y, rtol=1e-9 ) )   # relative tolerance: abs(x-y) < rtol * abs(x)

# Usually relative tolerance is preferred

True
True


### Special float values

***Overflow*** occurs when the maximum exponent (+308) is exceeded. The value becomes `inf`.

***Underflow*** occurs when the minimum exponent is exceeded. The value becomes `0`.

***NaN*** (not a number) occurs when dividing by zero or multiplying `inf*0`. Sometimes used to mark missing data. 

In [6]:
# Overflow
print( 1e300 * 1e30, 
      -1e300 * 1e30 )

inf -inf


In [7]:
# Underflow
print( 1e-300 * 1e-30, 
      -1e-300 * 1e-30 )

0.0 -0.0


In [8]:
# Specify inf and NaN
print( np.inf, 
      -np.inf, 
      np.nan )

inf -inf nan


In [9]:
# Operations producing NaN
print( 1e300 * 1e30 * 0,
        np.inf * 0,
        np.inf - np.inf,
        np.inf / np.inf, )

nan nan nan nan


In [10]:
# Any operation with NaN results in NaN
print(  2 * np.nan,
        2 + np.nan, )

nan nan


### Key messages
* Floating-point calculations are *inexact* because of rounding. The effect is usually small, but rounding errors can sometimes accumulate (advanced topic).
* Floating-point numbers should be compared within a tolerance, not exact equality.
* Integer values are discrete. They don't have roundoff error and it's fine to test if two integers are equal (e.g. `i == j`)