# Number Representation and Precision

Real numbers are stored with a decimal precision (or mantissa) and the decimal exponent range. The mantissa contains the significant figures of the number (and thereby the precision of the number). A number like (9.90625)10 in the decimal representation is given in a binary representation by

(1001.11101)$_2$ = $1\times2^3 +0\times2^2 +0\times2^1 +1\times2^0 +1\times2^{−1} +1\times2^{−2} +1\times2^{−3} +0\times2^{−4} +1 \times 2^{−5}$

and it has an exact machine number representation since we need a finite number of bits to represent this number. This representation is however not very practical. Rather, we prefer to use a scientific notation. In the decimal system we would write a number like 9.90625 in what is called the normalized scientific notation. This means simply that the decimal point is shifted and appropriate powers of 10 are supplied. Our number could then be written as
$9.90625 = 0.990625 \times 10^1$,
and a real non-zero number could be generalized as
$x = \pm r \times 10^n$,
with a $r$ a number in the range $1/10 \le r < 1$. In a similar way we can represent a binary number in
scientific notation as
$x = \pm q \times 2^m$,
with a $q$ a number in the range $1/2 \le q < 1$.

In a typical computer, floating-point numbers are represented in the way described above, but with certain restrictions on q and m imposed by the available word length. In the machine, our number x is represented as

$x = (−1)^s \times mantissa \times 2^{exponent}$

where $s$ is the sign bit, and the exponent gives the available range. With a single-precision word, 32 bits, 8 bits would typically be reserved for the exponent, 1 bit for the sign and 23 for the mantissa. 

## 32-bit – single precision:

Sign bit: 1 bit

Exponent: 8 bits

Significand precision: 24 bits (23 explicitly stored)

This gives 6–9 significant decimal digits precision!

## 64-bit = double precision:

Sign bit: 1 bit

Exponent: 11 bits

Significand precision: 53 bits (52 explicitly stored)

This gives 15–17 significant decimal digits precision.
This the the Python default standard


## 128-bit = quadruple precision:

Sign bit: 1 bit

Exponent: 15 bits

Significand precision: 113 bits (112 explicitly stored)

This gives 33–36 significant decimal digits precision.


## 256-bit – Octuple precision:

Sign bit: 1 bit
    
Exponent: 19 bits
    
Significand precision: 237 bits (236 explicitly stored)

THIS IS RARELY IMPLEMENTED


# Precision effects

One important consequence of rounding error is that you should **NEVER Use an if statment to test equality of two floats.**  For instance, you should nerev, in any program, have a statment like:

In [None]:
x = 3 * 1.1
if x == 3.3:
    print(x)

If you need to do a logic trigger based on a float:

In [2]:
epsilon = 1e-12
if abs(x-3.3) < epsilon:
    print(x)

NameError: name 'x' is not defined

## Which operations are most important in dealing with precision?

__Subtraction__ and __Derivatives__

## Subtraction

a = b - c

We have:   $fl(a) = fl(b) - fl(c) = a(1+\epsilon_a)$  or
            $fl(a) = b(1+\epsilon_b) - c(1+\epsilon_c)$
            
So, $fl(a)/a = 1 + \epsilon_b (b/a) - \epsilon_c (c/a)$

IF $b \sim c$, we have the potential of increased error on $fl(a)$


If we have:

$x = 1000000000000000$

$y = 1000000000000001.2345678901234$

as far the computer is concerned:
    

In [3]:
x = 1000000000000000
y = 1000000000000001.2345678901234
 
print(y-x) 

1.25


**The true result should be 1.2345678901234!**

In other words, instead of 16-figure accuracy we now only have three figures and the fractional error is a few percent of the true value.  This is much worse than before!


To see another exanple of this in practice, consider two numbers:

$x = 1$, and $ y = 1+10^{-14}\sqrt 2$ 

Simply we can see that:

$ 10^{14} (y - x) = \sqrt 2$

Let us try the same calculation in python:
 

In [12]:
from math import sqrt
x = 1.0
y = 1.0 + (1e-14)*sqrt(2)

print((1e14)*(y-x))
print(sqrt(2))

1.4210854715202004
1.4142135623730951


Again error off by a percent.  We need to be careful in how we code math!

## Example 1:  Summing $1/n$ 

Consider the series:

$$s_1 = \sum_{n=1}^N \frac{1}{n}$$ which is finite when N is finite, then consider

$$s_2 = \sum_{n=N}^1 \frac{1}{n}$$ which when summed analyitically should give $s_2 = s_1$

In [16]:
s1, s2= 0, 0
for n in range(1,100000001):
    s1 = s1 + 1/n
for m in range (100000000,0,-1):
    s2 = s2 + 1/m
print (s1, s2)

18.997896413852555 18.997896413853447


## Example 2: $e^{-x}$

There are three possible algorithms for $e^{-x}$

1) Simple: $$e^{-x} = \sum_{n=0}^{\infty} (-1)^n \; \frac{x^n}{n!}$$  

2) Recursion: $$e^{-x} = \sum_{n=0}^{\infty} s_n = \sum_{n=0}^{\infty} (-1)^n \; \frac{x^n}{n!}$$  where $$ S_n = -s_{n-1} \frac{x}{n}$$

3) Inverse:  $$e^{x} {\sum_{n=0}^{\infty} \frac{x^n}{n!}}$$  Then take the inverse:   $$e^{-x} = \frac{1}{e^{x}}$$


In [111]:
import numpy as np 
np.exp(-1)

from math import factorial
x = 2
def e_minusx_simple(x):
    for n in range (1, 101):
        emxsmp = (((-1)**n) * ((x**n/ factorial(n))))
        return emxsmp
def e_minusx_inverse(x):
    for n in range (1,101):
        emxinv = ((x**n)/factorial (n))
        return emxinv
# main code here
print(x, e_minusx_simple(x),e_minusx_inverse(x),np.exp(x))

2 -2.0 2.0 7.38905609893065


In [95]:
def e_minusx_recurse(x):
    s2 = 0
    def s(x,n):
        if n == 0:
            return 0
        else:
            return -1*s(x,n-1)*(x/n)
        
sn = s(x,n)
    emxrec = sn
print (e_minusx_recurse(x))

IndentationError: unexpected indent (1374243556.py, line 10)

In [96]:
import numpy as np
np.exp(-1) 

0.36787944117144233

In [91]:
for n in range (1, 101):
        x = 2
        emxsmp = (((-1)**n) * ((x**n/ factorial(n))))
        print (emxsmp)

-2.0
2.0
-1.3333333333333333
0.6666666666666666
-0.26666666666666666
0.08888888888888889
-0.025396825396825397
0.006349206349206349
-0.0014109347442680777
0.0002821869488536155
-5.130671797338464e-05
8.551119662230774e-06
-1.3155568711124266e-06
1.879366958732038e-07
-2.5058226116427174e-08
3.132278264553397e-09
-3.685033252415761e-10
4.0944813915730674e-11
-4.3099804121821765e-12
4.3099804121821766e-13
-4.104743249697311e-14
3.731584772452101e-15
-3.244856323871392e-16
2.7040469365594935e-17
-2.163237549247595e-18
1.6640288840366114e-19
-1.2326139881752676e-20
8.80438562982334e-22
-6.071990089533338e-23
4.047993393022225e-24
-2.6116086406595e-25
1.6322554004121876e-26
-9.892456972195077e-28
5.819092336585339e-29
-3.325195620905908e-30
1.8473309005032823e-31
-9.985572435152878e-33
5.255564439554146e-34
-2.6951612510534083e-35
1.3475806255267041e-36
-6.573564026959533e-38
3.130268584266444e-39
-1.4559388764029971e-40
6.617903983649987e-42
-2.9412906593999945e-43
1.2788220258260844e-44
-