# Lab 3: Numerical errors

In this lab we will investigate some of the practical implications of the way numbers are stored in computers. In particular, we will look at problems that can arise for the unwary when using floating-point numbers.

We will check our answers against the "official results" from `sys.float_info` – so before getting any further, let's import this:

In [1]:
from sys import float_info

## Machine precision

**Write a program to find the approximate machine precision** – that is, the largest number $\epsilon$ such that $1 + \epsilon = 1$ within the precision of the calculation.

An appropriate algorithm would be

    Set epsilon to 0.1
    Loop until 1 + epsilon is equal to 1:
        Set epsilon to half its current value

In [3]:
e=0.1
while 1+e!=1:
    e=e/2
e

8.881784197001253e-17

Let's check this against what we know about how numbers are stored in computers. `float_info.mant_dig` will tell you how many bits (i.e., binary digits) are available to store the significand:

In [4]:
float_info.mant_dig

53

Can you use this information to calculate the theoretical epsilon? You shouldn't expect to match exactly with the results of your test program (can you see why?) but you should have the correct order of magnitude.

In [10]:
2**(1-float_info.mant_dig)

2.220446049250313e-16

Check your answer against `float_info.epsilon`. This should match your theoretical value exactly.

In [6]:
float_info.epsilon

2.220446049250313e-16

## Overflow and underflow

**Write a program to find the maximum and minimum positive numbers representable as a Python `float`.**

You can use a very similar algorithm, but this time you will need to test whether the number is equal to $\infty$, which you can represent as `float('inf')`, or zero.

In [76]:
maxN=1.
while maxN < float('inf'):
    y=maxN
    maxN *=2
y

8.98846567431158e+307

In [51]:
minN=0.01
while minN>0:
    x=minN
    minN/=2
x

5e-324

**Check your answers** against the "official" numbers from the following code:

In [49]:
float_info.min, float_info.max

(2.2250738585072014e-308, 1.7976931348623157e+308)

You will notice something odd: while the greatest possible number should match your results fairly well, the smallest possible number from your results should be several orders of magnitude than is theoretically possible! This is because of a clever process known as *gradual underflow*. 

To investigate, **start with a number near the smallest possible, say $10^{-300}$, and divide repeatedly by 300, printing out the result at each step.** What do you notice about the precision of each result? Can you explain what is going on?

In [71]:
z=10**-300
li=[]
for i in range(10):
    z/=300
    li.append(z)
li

[3.333333333333333e-303,
 1.1111111111111111e-305,
 3.7037037037037035e-308,
 1.23456790123455e-310,
 4.11522633745e-313,
 1.371742115e-315,
 4.572474e-318,
 1.524e-320,
 5e-323,
 0.0]

rounding errors are carried forward

## Integer overflow

As we noted in class, it's a little tricky to demonstrate integer overflow using the base Python library, because it cleverly increases the amount of memory allocated in order to be able to store arbitrarily large integers. The `array` type from the `numpy` library, however, doesn't do this. (This is on balance a good thing: it is designed to deal fast with very large amounts of data, and being able to specify exactly how much memory should be reserved for each array entry makes sense if there are millions of entries.)

We will create an `array` with only one component, which we specify as an integer occupying sixteen bits of memory:

In [2]:
from numpy import array
my_integer = array([0], dtype='int16')
print(my_integer)

[0]


Remembering that one of those sixteen bits is reserved to indicate the sign, **what is the largest number that can be represented in this array?**

In [1]:
num=0
for i in range(15):
    num+=2**i
num

32767

**Check your answer by setting `my_integer[0]` to this number, then adding one.** Is the result what you expect?

In [4]:
my_integer[0] = num # insert your number here
# now add 1 and print the result
my_integer+1

array([-32768], dtype=int16)

▶ **CHECKPOINT 1**

## Subtractive cancellation

The well-known *quadratic formula* says that the equation $ax^2 + bx + c = 0$ has solutions

$$ x_{1, 2} = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}.$$ 

However, multiplying numerator and denominator by $-b \mp \sqrt{b^2 - 4ac}$, we find that these can alternatively be expressed as

$$ x'_{1,2} = \frac{-2c}{b\pm\sqrt{b^2 - 4ac}}.$$

**Write a program to calculate all four answers for given $a$, $b$, $c$. Your program should calculate the fractional differences $(x_1 - x_1')/x_1$ and $(x_2 - x_2')/x_2$.** 

(Remember that you can get a square root function from the `numpy` module: `from numpy import sqrt`.)

In [7]:
def quadratic(a,b,c):
    "finds root of quadratic equation:  ax2+bx+c=0ax2+bx+c=0"
    from math import sqrt
    x1=(-b+sqrt(b**2-4*a*c))/(2*a)
    x2=(-b-sqrt(b**2-4*a*c))/(2*a)
    x1prime=-2*c/(b+sqrt(b**2-4*a*c))
    x2prime=-2*c/(b-sqrt(b**2-4*a*c))
    x1Difference=(x1-x1prime)/x1 # fractional difference
    x2Difference=(x2-x2prime)/x2
    answers=(x1, x2, x1prime,x2prime, x1Difference, x2Difference)
    return answers

In [10]:
quadratic(5,6,1) # check -1,-0.2 

(-0.2, -1.0, -0.2, -1.0, -0.0, -0.0)

We will test your program on the equation with $a = b = 1$, $c = 10^{-n}$, $n = 1, 2, 3, \dots$, or in other words

$$ x^2 + x = -10^{-n}.$$

In the limit where the right-hand side tends to zero, of course the solutions are $x_1 = 0$ and $x_2 = -1$. For small but non-zero $c$, we can make a good approximation by noting that, since $x_1$ will be very small, $x_1^2$ will be negligible; thus $x_1\approx -10^{-n}$. Similarly, $x_2\approx 1 - 10^{-n}$.

**Test your program for $a = b = 1$, $c = 10^{-n}$, $n = 1, 2, 3, \dots$.** Can you explain your results? Where the two formulae differ, which is the most accurate and why?

In [8]:
table=[]
for n in range(1,17):
    table.append(quadratic(1,1,10**-n))
    n+=1

In [9]:
print("{:8.10s} {:8.10s} {:8.10s} {:8.10s} {:10.13s} {:10.13s}".format("x1", "x2","x1'","x2'",'x1 Diff','x2 Diff'))
for element in table:
    print("{:.6f} {:7.6f} {:8.6f} {:8.6f} {:8.6f} {:10.6f}".format(element[0], element[1],element[2],element[3],element[4],element[5] ))


x1       x2       x1'      x2'      x1 Diff    x2 Diff   
-0.112702 -0.887298 -0.112702 -0.887298 -0.000000  -0.000000
-0.010102 -0.989898 -0.010102 -0.989898 0.000000   0.000000
-0.001001 -0.998999 -0.001001 -0.998999 -0.000000  -0.000000
-0.000100 -0.999900 -0.000100 -0.999900 -0.000000  -0.000000
-0.000010 -0.999990 -0.000010 -0.999990 0.000000   0.000000
-0.000001 -0.999999 -0.000001 -0.999999 0.000000   0.000000
-0.000000 -1.000000 -0.000000 -1.000000 -0.000000  -0.000000
-0.000000 -1.000000 -0.000000 -1.000000 0.000000   0.000000
-0.000000 -1.000000 -0.000000 -1.000000 0.000000   0.000000
-0.000000 -1.000000 -0.000000 -1.000000 0.000000   0.000000
-0.000000 -1.000000 -0.000000 -1.000000 0.000000   0.000000
-0.000000 -1.000000 -0.000000 -0.999967 0.000033   0.000033
-0.000000 -1.000000 -0.000000 -0.999689 0.000311   0.000311
-0.000000 -1.000000 -0.000000 -1.000800 -0.000800  -0.000800
-0.000000 -1.000000 -0.000000 -1.000800 -0.000800  -0.000800
-0.000000 -1.000000 -0.000000 -0.900

**What would you expect to happen for the case $a = 1$, $b = -1$, $c = 10^{-n}$, $n = 1, 2, 3, \dots$?** Make a prediction then use your program to test it.

In [11]:
table2=[]
for n in range(1,17):
    table2.append(quadratic(1,-1,10**-n))
    n+=1
print("{:8.10s} {:8.10s} {:8.10s} {:8.10s} {:10.13s} {:10.13s}".format("x1", "x2","x1'","x2'",'x1 Diff','x2 Diff'))
for element in table2:
    print("{:.6e} {:7.6e} {:8.6e} {:8.6e} {:8.6e} {:10.6e}".format(element[0], element[1],element[2],element[3],element[4],element[5] ))


x1       x2       x1'      x2'      x1 Diff    x2 Diff   
8.872983e-01 1.127017e-01 8.872983e-01 1.127017e-01 -1.251240e-16 -1.231374e-16
9.898979e-01 1.010205e-02 9.898979e-01 1.010205e-02 2.018795e-15 2.060639e-15
9.989990e-01 1.001002e-03 9.989990e-01 1.001002e-03 -2.422711e-14 -2.426182e-14
9.999000e-01 1.000100e-04 9.999000e-01 1.000100e-04 -5.562774e-14 -5.555980e-14
9.999900e-01 1.000010e-05 9.999900e-01 1.000010e-05 1.662465e-12 1.662540e-12
9.999990e-01 1.000001e-06 9.999990e-01 1.000001e-06 4.633965e-12 4.633901e-12
9.999999e-01 1.000000e-07 9.999999e-01 1.000000e-07 -5.119217e-11 -5.119215e-11
1.000000e+00 1.000000e-08 1.000000e+00 1.000000e-08 5.758741e-10 5.758740e-10
1.000000e+00 1.000000e-09 1.000000e+00 1.000000e-09 2.622922e-08 2.622922e-08
1.000000e+00 1.000000e-10 9.999999e-01 1.000000e-10 8.264036e-08 8.264036e-08
1.000000e+00 1.000000e-11 9.999999e-01 1.000000e-11 8.273036e-08 8.273036e-08
1.000000e+00 1.000033e-12 9.999666e-01 1.000000e-12 3.338832e-05 3.338832e-0

▶ **CHECKPOINT 2**

## Series summation

We will write our own (Python) function to calculate the (mathematical) sine function. One obvious way is to evaluate the Taylor series:

$$
\sin x = x - \frac{x^3}{3!} + \frac{x^5}{5!} - \frac{x^7}{7!} + \dots = \sum_{n=1}^\infty\frac{(-1)^{n-1}x^{2n-1}}{(2n-1)!}
$$

A small trick will come in useful here. As $n$ gets larger (and we will certainly need to add lots of terms to get an accurate result!) it will take longer and longer to calculate both $x^{2n-1}$ and $(2n-1)!$. However, both of these are easy to calculate *given the previous term*. So the smart way to evaluate this series is to keep track of the previous term added, and then use a recursive relationship

$$
t_{n} = t_{n-1} \times \frac{-x^2}{(2n - 1)(2n - 2)}.
$$

**Check that you understand** how this works, then **write a function `sine_sum(x)`** to calculate $\sin(x)$ by this method. You will need to make a sensible choice for when to stop adding terms: discuss this with your demonstrator if you're not sure.

In [12]:
def sine_sum(x):
    sine_x=x
    tn=x
    n=2
    while abs(tn)>1e-5:
        tn = -tn * (x**2) / ((2*n-1)*(2*n-2))
        #tn = -tn*(x**2)*(2*n+1)*(2*n+2) / ((4*n**2-1)*(4*n**2-4))
        sine_x += tn
        n+=1
    return sine_x

In [15]:
from math import pi
sine_sum(pi/2) #check answer=1

array(0.999999943741051)

With care, it is not difficult to write functions that can cope with `array` arguments. However, there is a "cheat" way to do this automatically, using the `vectorize` function from `numpy`:

In [13]:
from numpy import vectorize
sine_sum = vectorize(sine_sum)

Now you can call `sine_sum()` on an array in the same way as you can on the library `sin()` function. This lets us plot this function very easily. 

(Advanced Python note: if you know in advance you are going to do this, you can use the *decorator* syntax:

    from numpy import vectorize
    
    @vectorize
    def sine_sum(x):
        #... function definition goes here

This does the same thing as calling `sine_sum = vectorize(sine_sum)` after the event as we’ve done here.)

**Complete the following code** to plot both your function and the library one. 

In [22]:
%matplotlib notebook
from pylab import plot, grid, xlim, ylim, sin, linspace

x = linspace(0.1,40,1000) # choose some appropriate values here
y_series = sine_sum(x)
y_library = sin(x)

plot(x, y_series,'k-', x, y_library,'r-') # The plot command can take as many x, y pairs as you like.
                                # Note that we do need to repeat x here since this array represents
                                # the x values of two different lines on the plot.
#xlim(-9*pi,9*pi)
#ylim(-1,1)
# you might like to set the x and y limits using the xlim and ylim functions, or to apply a grid.
# e.g., ylim(-2, 2) will set the y range to run from -2 to 2.

<IPython.core.display.Javascript object>

[<matplotlib.lines.Line2D at 0xb369a79860>,
 <matplotlib.lines.Line2D at 0xb369a79a58>]

When does your function start to diverge noticeably from the library one? Can you explain why it eventually stops behaving as well as the library function? Can you think of a way to fix this problem? (*Hint*: consider the periodicity of the $\sin$ function.)

▶ **CHECKPOINT 3**