# Error and Convergence

**ENGSCI233: Computational Techniques and Computer Systems** 

*Department of Engineering Science, University of Auckland*

# 0 What's in this Notebook?

Computer models can be wrong for all sorts of reasons. Sometimes you make some bad assumptions that don't match up with the real world, and so your model departs from reality. Other times, your code is riddled with bugs because you haven't been doing **[quality control](../quality_control/quality_control.ipynb)**. And sometimes it's just wrong because you're working on a computer and computers are imperfect. This notebook addresses the third issue.

On a computer we represent numbers using a **floating point** format. This can lead to all types of errors: rounding, representation, getting too close to **computer zero**, error accumulation and truncation. You need to know what causes these errors, when they are likely to occur, and how to avoid them.

So how can we know if a computer is giving us the right answer? One thing we can try is writing a loop to solve a problem, only we'll reduce the error each time. What we'd like to check is how much our answer changes by each time, and set a condition to stop if it's not changing much any more. This is called convergence, and the condition to stop is a **convergence test**. 

You need to know:
- Representation error is because floating points can't represent every possible number, e.g., 0.1, 1/3.
- Rounding error is because some calculations create more precision (numbers after the decimal) than a computer can handle.
- Error accumulation occurs when you chain numerous calculations together, each with a tiny error, but their aggregate a large error.
- Truncation error is when you get tired of an infinite series or algorithm and have to give up.
- Computer's cannot distinguish between (sufficiently) tiny numbers and zero - this can lead to problems.
- The uniform convergence test is the best. 

In [None]:
# imports and environment: this cell must be executed before any other in the notebook
%matplotlib notebook
from error233 import*

## 1 Error in Floating Point Arithmetic

<mark>***Where do these errors arise and how large are they?***</mark>

Consider our inability to write down the decimal representation of 1/3, i.e., 0.333… with an infinite number of recurring 3’s. If we cannot write down a number precisely, it follows that a computer cannot represent the number perfectly either. Indeed, there are a number of issues relating to error, precision and stability that arise from a computer’s shortcomings in representing floating point numbers. 

### 1.1 Computer Storage of Floating Point Numbers

In most computers, real numbers are stored using a floating point representation. In general, floating point representations have a base, $\beta$, and a precision, $p$, and represent a general number as
	
$$x=\pm d.ddd \cdots d\timesβ^e$$

where $e$ is the exponent and $d.ddd\cdots d$ is the significand, which has $p$ digits. More precisely, the floating point representation

$$\pm d_0.d_1 d_2\cdots d_{p-1}\times\beta^e$$

represents the number

$$\pm\beta^e \sum\limits_{i=0}^{p-1} d_i \beta^{-i},\quad 0\leq d_i\lt \beta.$$

For example, consider the floating point representation with $\beta$=2, $e$=2, $p$=4 and a significand of +1.101. This represents the number

$$2^2\left(1×2^0+1×2^{-1}+0×2^{-2}+1×2^{-3}\right)=4(1+1/2+1/8)=6.5$$

To store a floating point number in a computer’s memory, its representation needs to be encoded using a number of binary digits or bits. In most modern computers, this encoding uses a base of $\beta=$2. The value of $p$ and $e$ are determined by the type of encoding and the number of bits used to store the number, and hence varies between computers. However, most computers normally support two types of floating point numbers: single precision numbers and double precision numbers. Double precision numbers have a larger $p$ (not necessarily double however) and can represent a larger $e$ range than single precision numbers. The exact form of the encodings used in computers is covered in the Computer Systems section of the course.


In [None]:
# run this cell and then complete the exercise below
fun_with_floats()

***Use the dropdown and checkboxes above to construct a representation of 6.5. What is the exponent and significand for the numbers: 4, 7.5, 1.5625, 0.3125?***

> <mark>*~ your answer here ~*</mark>

#### 1.1.1 Commonly Encountered Floating Point Types



The table below outlines the three types of floating point precisions commonly encountered in Python. Decimal precision is the approximate number of decimal places to which the floating point is accurate.


|  name   | significand | exponent | approx. decimal precision |        type         |
|    -    |      -      |    -     |             -             |          -          |
|  single |     23      |    8     |             6             | ```numpy.float32``` |
|  double |     52      |    11    |             15            | ```numpy.float64``` |
|   half  |     10      |    5     |             3             | ```numpy.float16``` |

### 1.2 Representation Error


In [None]:
# more fun with floats - class exercise
fun_with_floats()


Consider using $n$ bits of information to store a floating point number. With $n$ bits, we can form $2^n$ distinct bit pattern, which we can use to represent floating point numbers. Unfortunately, there is an infinite number of real numbers that need to be represented. To represent these numbers, we hence need an infinite number of bits. Or, in other words, given a finite number of bits in the precision of a floating point number, we can only represent a finite number of the infinitely many real numbers. The result of this is that not every real number can be exactly represented as a floating point number.

For example, consider representing the real number 0.1 in a single precision IEEE floating point number. One possible significand for this base-2 representation is 1.10011001100110011001101, which corresponds to $1+\frac{1}{2}+$$\frac{1}{16}+$$ \frac{1}{32}+$$\frac{1}{256}+$$\frac{1}{512}+$$\frac{1}{4096}+$$\frac{1}{8192}+$$\frac{1}{65536}+$$\frac{1}{131072}+$$\frac{1}{1048576}+$$\frac{1}{2097152}+$$\frac{1}{8388608}$, which equals $\frac{13421773}{8388608}$. With an exponent of -4, this significand hence represents the number $2^{-4}\frac{13421773}{8388608}$ or $\frac{13421773}{134217728}$=0.100000001490116119384765625, which is slightly larger than 0.1. 


In [None]:
import numpy as np
# computing the decimal as a fraction
a = np.float32(1./10)
print("{:32.32f}".format(a))

# computing its exact representation
print("{:32.32f}".format(13421773./134217728))

To try and correct for the extra 0.00000001490116119384765625, we could remove the last bit from the significand to obtain 1.10011001100110011001100. With an exponent of -4, this represents the number $\frac{13421772}{134217728}$=0.0999999940395355224609375, which is slightly smaller than 0.1.

In [None]:
# computing the next smaller representation
print("{:32.32f}".format(13421772./134217728))

Increasing the precision does not help; in base-2, 0.1 cannot be represented exactly with any number of significand bits. This error in representing some floating point numbers is known as **representation error**.

***Modify the code above to compute 0.1 using half and double precision floats (you may need to extend the print format length).***

***What is the representation error when computing $1/2^n$ for integer $n$? Explain your answer.***

> <mark>*~ your answer here ~*</mark>


### 1.3 Rounding Error

**A short example:**

Recall the infinite series that defines, $e^x$, and suppose we wish to compute the exact value of $e$ by summing terms for $x=1$.

\begin{equation}
e^x = 1 + x+\frac{x^2}{2!}+\frac{x^3}{3!}+\cdots = \sum\limits_{n=0}^{\infty} \frac{x^n}{n!}
\end{equation}

Define: 
 
 - $e_i$, the estimate for $e$ computing the first $i+1$ terms
 - $\Delta e_i$, the extra term added to $e_i$ to obtain $e_{i+1}$ (i.e., $x^i/i!$)
 
***Execute the cell below to see how rounding error comes to dominate the sequence.***

In [None]:
# complete this function by computing the term Delta e_i
def delta_e(i, prec=np.float32):
    ''' Returns the i^th term in the exponential series for exp(1).
        
        Notes
        -----
        Must return a float of precision PREC
    '''
    i = prec(i)            # set i to appropriate precision
    num = prec(1.)**i      # compute numerator
    den = np.product(np.arange(1,i+1,dtype=prec))  # denominator
    de = num/den
    
    return de

exponential_example(delta_e)

***How does rounding error change with precision?***

> <mark>*~ your answer here ~*</mark>

***Why does our estimate of $e$ stop changing after a while?***

> <mark>*~ your answer here ~*</mark>

***Why is there no rounding error when computing the first two terms of $e$?***

> <mark>*~ your answer here ~*</mark>

Rounding error is associated with operations involving floating point numbers. Consider the case of multiplying two floating point numbers that each contain $n$ bits in the significand. The resulting product, if computed exactly, would have $2n$ bits in the significand – but must be stored in a floating point representation that contains only $n$ bits, i.e., the trailing $n$ bits are rounded out. For example, consider the case of multiplying the single precision number, $a=1+\frac{1}{2^n}$ , with itself for $n=8$. Both `a` and the product `a*a` are represented exactly (i.e., there is no **representation error**), and the significand ($p$=23) is sufficiently long that no **rounding error** occurs.

In [None]:
a32=np.float32(1.+1./2**8)          # np.float32 = a single precision number
print("  a = {:18.18f} \na*a = {:18.18f}".format(a32,a32*a32))

Now, imagine we perform the same calculation, this time using half precision numbers.

In [None]:
a16=np.float16(1.+1./2**8)          # np.float16 = a half precision number
print("  a = {:18.18f} \na*a = {:18.18f}".format(a16,a16*a16))

The number `a` is still represented exactly, however, now, the product `a*a` is subject to rounding error, because the significand is not long enough to represent the full precision result.

***- - - - CLASS CODING EXERCISE - - - -***

In [None]:
# PART ONE
# --------
# Keep making small_N smaller until adding it to big_N has no effect.
big_N = 1.e3
small_N = 1.e-3

print(big_N + small_N)

# How large is the error in this operation?

In [None]:
# PART TWO
# --------
# An iterative algorithm keeps a count ofsmall numbers.
# Make small_N small enough that adding it to sum has no effect.
sum = 1.e3
small_N = 1.e-3

for i in range(10000):
    sum += small_N
    
print(sum)

# How large is the error in this code?

In [None]:
# OPTIONAL CHALLENGE
# ------------------
# What THREE kinds of errors are present?

### 1.4 Error Accumulation


In general, every floating point operation will be associated with some representation and rounding error. This can cause a problem if the error is allowed to **accumulate**. To demonstrate accumulating representation error, consider the Python function that finds the sum of a vector’s components:

In [None]:
def sum(n, u):
    """ Computes the sum of components of length N vector U.
    """
    # set precision
    sum = type(u[0])(0.)

    # compute sum
    for i in range(n):
        sum = sum + u[i]

    return sum

We create a vector with components of the form $1+\frac{1}{2^n}$  where n is a random integer that, if large enough, will exceed the specified floating point precision. Defining a vector, $\mathbf{u}_j$, with either half, single or double precision components ($j$= 16, 32 or 64):

In [None]:
# length of vector
from numpy.random import randint
n = 100001

# create random vector
u64 = 1.+1/2**randint(1,12,n)  # double precision

u32 = np.float32(u64)          # single precision

u16 = np.float16(u64)          # half precision

Then, summing up the difference between the first $i$ components of $\mathbf{u}_{64}$ and either $\mathbf{u}_{32}$ or $\mathbf{u}_{16}$:

In [None]:
# compute vector sum of increasing number of components
for i in [10,100,1000,10000,100000]:
    # compute vector sums
        # compare single and double precision
    e32 = sum(i,u64[:i]-u32[:i])
        # compare half and double precision
    e16 = sum(i,u64[:i]-u16[:i])
    
    # display to screen
    print(("i={:7d}: e32={:18.18f}  e16={:18.18f}").format(i,e32,e16))

Note how there is no representation error when using single precision (this is because the smallest possible vector component, $1+\frac{1}{2^{12}}$, is well represented by the single precision float) and there is accumulating error when using half precision.
To demonstrate accumulating rounding error, consider the Python function that finds the dot product of two vectors, by multiplying and adding components:


In [None]:
def dot(n, u, v):
    """ Computes the dot product of length N vectors U and V.
    """
    # set precision
    sum = type(u[0])(0.)

    # compute sum
    for i in range(n):
        sum = sum + u[i]*v[i]

    return sum

Computing the dot product of $\mathbf{u}_{32}$ (which we established above is free of representation error) with itself, $\mathbf{u}_{32}\cdot\mathbf{u}_{32}$, and comparing this with $\mathbf{u}_{64}\cdot\mathbf{u}_{64}$:

In [None]:
# compute dot product of increasing numbers of components
for i in [10,100,1000,10000,100000]:
    # compute dot product
        # at high precision
    d64 = dot(i,u64[:i],u64[:i])
        # at low precision
    d32 = dot(i,u32[:i],u32[:i])
    
    # error between low and high precision dot products
    e32 = d32-d64
    
    # display to screen
    print(("i={:7d}: e32={:18.18f}").format(i,e32))

In this case, we observe the accumulating rounding error that arises from small losses in accuracy with each operation `u[i]*v[i]`.

### 1.5 Truncation Errors


Truncation error is a property of the method or algorithm that is used to compute a value. Truncation error is incurred even if the numbers in a calculation are represented exactly (i.e., with infinite precision) and the floating point operations are performed without rounding error. For example, consider computing $e^x$ with

$$e^x=1+\sum\limits_{i=1}^\infty \frac{x^i}{i!}=1+x+\frac{x^2}{2!}+\frac{x^3}{3!}+\cdots$$

Because the formula involves an infinite sum, we can only approximate its value using a finite sum, e.g.,

$$e^x\approx1+\sum\limits_{i=1}^3 \frac{x^i}{i!}=1+x+\frac{x^2}{2!}+\frac{x^3}{3!}+\text{Truncation error}$$

### 1.6 Errors in Subtraction


When subtracting two floating point errors that are roughly the same size, large errors can result. Consider $x-y$ with $x=$10.1 and $y=$9.93, $\beta=$10 and $p=$3. Here $x$ will be represented as $1.01\times10^1$. Because $y$ is smaller than $x$, it will have to be stored with the same exponent as $x$ so that subtraction can take place, i.e., $y$ is stored as $0.99\times10^1$. The result of the floating point computation is hence $x-y=0.02\times10^1$ whereas the correct answer should be 0.17.

Consider another case of $x-y$ with $x=$1.00000, $y=$0.0000001, $\beta=$10 and $p=$6. With these parameters, $x$ is represented as $1.00000\times10^0$. Now, to subtract the two numbers, $y$ must be represented as $0.00000\times10^0$ and thus the subtraction gives the result $1.00000\times10^0$. This is the same value as $x$!

The above case suggests that there exists a certain (absolute) value for which if any number below this value is added or subtracted, the result is the same as the original number. This value is known as the machine accuracy or machine precision. Numbers smaller in magnitude than the machine accuracy will appear like “zero” in the sense that $1.0+\epsilon=1.0$ with $\epsilon\neq0$. This also suggests that “zero” in a computer with finite precision is represented by an interval, as shown in the figure below.

<img src="img/computer_zero.png" alt="Drawing" style="width: 1000px;"/>


An implication of **computer zero** is that if the result of a floating point calculation needs to be tested to see if it is zero, it cannot be compared to an exact value of zero and must be tested to see if it falls within the computer zero range. 

Basically:

**[DON'T](https://i.stack.imgur.com/B7W67.png)**

`a == 0.0`

**DO**

`eps = np.finfo(np.float64).eps     # get computer zero for given precision`

`abs(a) < eps                       # check if size of a is less than computer zero`

Consider the operations below (note, in Python, $\epsilon$ is the smallest number that can be added to 1.0 such that the result is different from 1.0):


In [None]:
# a short demonstration of machine zero
def demo_eps(float_type=np.float32):
    '''
    '''
    # create a reference number
    a = float_type(1.0)
    
    # get computer zero for given precision 
    eps = np.finfo(float_type).eps
    print("eps = {:e}".format(eps))

    # add eps to a and then subtract a
    # correct result should be eps, but this may be lost to rounding error depending on the size of a
    print("")
    print("for a = {:e}".format(a))
    print("((a + eps) - a) = {:e}".format((a + eps) - a))

interact(demo_eps, float_type={'double':np.float64,'single':np.float32,'half':np.float16});

As soon as $\epsilon$ is added to a number larger than 1.0, the operation becomes equivalent to adding 0.

***Change $\text{a}$ to 10.0 - is the result as you expect?***

> <mark>*~ your answer here ~*</mark>

***How does $\epsilon$ depend on the floating point precision?***

> <mark>*~ your answer here ~*</mark>

***We defined $\epsilon$ as the next largest number that can be added to 1.0 that the result is different from 1.0. What is the equivalent to $\epsilon$ for 10.0?***

> <mark>*~ your answer here ~*</mark>


***- - - - CLASS CODING EXERCISE - - - -***

In [None]:
# EXECUTE THIS CELL TO DEFINE AND PLOT A FUNCTION
# -----------------------------------------------
# LOCATE where a function has a particular gradient

# An example function (expi = exponetial integral)
from scipy.special import expi            
def f(x): 
    return x*expi(x) - np.exp(x)    

# plot f(x)
x = np.linspace(0.,2.,101)[1:]
fig,ax = plt.subplots(1)
ax.plot(x, f(x),'k-')
ax.axhline(0., c='k', ls=':')
ax.set_xlim([0,2])
ax.set_xlabel('x')
ax.set_ylabel('f(x)=x Ei(x)-e^x')
plt.show()

In [None]:
# PART ONE
# --------
# Write a function that computes the FORWARD DIFFERENCE approximation
# of the gradient of f(x)
#
# dfdx(x) ~= (f(x+h)-f(x))/h

def dfdx(x,h):
    return ????

grad = dfdx(x=0.25, h=0.01) 
print(grad)

In [None]:
# PART TWO
# --------
# Write a condition that checks whether the finite difference gradient
# is EXACTLY equal to a search value.
# Write a SECOND condition that checks whether it is SUFFICIENTLY close
# to the search value.

grad_search = 1.

# Trial and error for value X
x = 0.25
h = 0.01

# CHECK gradient
if ????:
    print('found gradient {:2.1f} at x={:3.2f}'.format(dfdx(x,h), x))
else:
    print('gradient not found')


In [None]:
# OPTIONAL CHALLENGE
# ------------------
# Put your gradient test inside a while loop so you don't have to use 
# trial and error. 
# Choose the NEXT value of X based on the previous (hint: Google 'Newton
# method of optimisation')

In [None]:
# If we make h smaller, does the derivative approximation gets better or worse?

# What happens if we make h = 1.e-15?

***How should we test for equality between two floating point numbers?***

> <mark>*~ your answer here ~*</mark>

***As the step size `h` gets smaller, does the forward difference approximation get better or worse?***

> <mark>*~ your answer here ~*</mark>

***What happens if we set `h` to `1.e-15`?***

> <mark>*~ your answer here ~*</mark>

## 2 Numerical Tests for Convergence

<mark>***How to measure when a calculation is accurate enough?***</mark>

Consider a sequence of values that is calculated as a result of some numerical algorithm:

$$x_1,x_2,x_3,\cdots x_k,\cdots \rightarrow X$$

As these numbers are stored within and calculated using a computer, they will contain some floating point errors. Given these errors, how can we test for convergence of the sequence of values?

### 2.1 Absolute Value Test


One simple test is the **absolute value test**. It involves the absolute difference between successive values in a series. The series is considered converged when this difference is small, i.e.,

$$ |x_k-x_{k-1}|<\epsilon$$

for some small $\epsilon$, e.g., the machine precision.

### 2.2 Relative Test


The **relative test** is satisfied if

$$\frac{|x_k-x_{k-1}|}{|x_k|}<\epsilon.$$

In this test, the absolute difference between successive values in the series is **normalised** to take into account the magnitude of $x_k$, i.e., the best approximation to $X$.


### 2.3 Uniform Test

The **uniform test** combines the two previous tests. It is satisfied if

$$\frac{|x_k-x_{k-1}|}{1+|x_k|}<\epsilon$$

When $X$ is small, this test behaves like the absolute value test and when $X$ is large it behaves like the relative test. You should probably use this test. 

***- - - - CLASS CODING EXERCISE - - - -***

An infinite series representation of $pi$ is the Gregory-Leibniz series

\begin{equation}
\pi = 3 + \frac{4}{2\times3\times4} - \frac{4}{4\times5\times6} + \frac{4}{6\times7\times8} - \frac{4}{8\times9\times10} + \cdots
\end{equation}


In [None]:
# PART ONE
# --------
# COMPLETE the function below so that it computes the i'th term in the series
# **hints**
#  - i = 0, then output is 4/(2*3*4), if i = 1, then output is -4/(4*5*6)
#  - think about operations like... (-1)**i  2*i  2*i+1

def dpi(i):
    return ????
    
# print the first three terms
print(dpi(0), dpi(1), dpi(2))

# combine the terms to estimate pi
print(3. + dpi(0) + dpi(1) + dpi(2))

In [None]:
# PART TWO
# --------
# The function below computes the series approximation of PI using k terms
import numpy as np
def pik(k):
    return np.sum([dpi(i) for i in range(k)])+3.

print(pik(0), pik(1), pik(2), pik(3), pik(4))

# IMPLEMENT the UNIFORM test for case that k = 8 using the absolute value 
# function abs()

# **your code here**

# How LARGE is the COMPUTED error?
# What about for the OTHER tests?

In [None]:
# OPTIONAL CHALLENGE
# ------------------
# implement the uniform test as part of a while loop
# and keep computing pi until the error is smaller than 1.e-7

### 2.4 Convergence of $e_\infty$



Recall, the exponential function can be written

$$ e^x=1+x+\frac{x^2}{2!}+\frac{x^3}{3!}+\cdots=\sum\limits_{n=0}^\infty \frac{x^n}{n!}$$

We will use this expansion and your previous expression for $\Delta e_i$ to approximate $e$. 

***Write pseudocode to compute $e$ by iteratively adding terms in the approximation. Stop when some level of accuracy is met. Your pseudocode should include the headings: initialisation, iterations, and stopping criterion.***

> <mark>*~ your answer here (should make use of a loop, an accuracy test, the function `delta_e` defined in **Section 1.3**)~*</mark>

***Complete the code below by implementing the uniform test.***

In [None]:
# compute exp(x) iteratively using absolute test
eps = np.finfo(np.float32).eps   # threshold for convergence test, machine precision

# compute an approximation to exp(x)
n = 1
previous_estimate = 1. # the zeroth term estimate
keep_looping = True
while keep_looping:
    # compute next term in series
    # **to do**
    current_estimate = ___
    
    # compute error using uniform test
    # **to do**
    uniform_test = ___

    # display to screen, this is complete
    print('{:d} terms:'.format(n+1))
    print(' - ex_n-1= {:12.11f}'.format(previous_estimate))
    print(' -  ex_n = {:12.11f}'.format(current_estimate))
    print(' -   err = {:12.11f}'.format(uniform_test))
    print(' -   eps = {:12.11f}'.format(eps))
    
    # test whether convergence threshold has been reached
    if uniform_test<eps:
        print('series converged, exiting...')
        keep_looping = False
    
    # iterate to compute the next term
    n += 1
    previous_estimate = current_estimate
    if n>20: 
        raise ValueError("Your loop should have stopped by now...")


In [None]:
# answers for exercise 2.4
# --------------------------
#current_estimate = previous_estimate + delta_e(n)
#uniform_test = abs(current_estimate - previous_estimate)/(1+abs(current_estimate))
