## Representing numbers in the computer

Nearly all scientific computations require *storing* and *operating* on numbers. Consider the deceptively simple task of adding two numbers:

$$8 + 4 = 12$$

This requires:

1. storing "8" and "4" in some kind of memory
2. performing the addition operation
3. storing the result "12" in memory

You may have heard that digital computers store numbers in a "binary" representation. What does that mean? 

We are used to a "decimal," or base 10 system. That means that we have 10 different symbols at our disposal. So if we have 3 positions to record those symbols, we can represent integers from 0 through 999:

$$000,001,002,003,...,998,999$$

but in *binary* we only have two symbols, 0 and 1:

$$000, 001, 010,011,...,111$$

Each position in a binary number represents a power of 2 (and each position in a decimal number represents a power of 10). For example (the subscripts indicate in which base the number is represented):

$$101_2 = [1x2^2] + [0x2^1] + [1x2^0] = 4+1 = 5_{10}$$

----

**Question:** What is the largest number than can represented with three binary digits?

----

--> Aha! The range of numbers representable in binary is set by the number of positions (how many zeroes and ones) we set aside in memory to store the number. Each of those digit is called a "bit."

----

**Question:** What do 32-bit and 64-bit architectures refer to? (Back when I was your age, we had 8 bit video games, and we didn't complain ;0)

----

**Negative numbers.** How do we represent negative numbers? Add another bit for the sign.

**Real numbers.** Now we add some digits for *negative* powers of 2:

$$101001_2 = (-1)^1\times [(0\times2^2) + (1\times2^1) + (0\times2^0) + (0\times2^{(-1)}) + (1\times2^{(-2)})] = -2.25_{10}$$

----

**Question:** Can we represnt -2.3 exactly in our 6-bit binary representation? 

----

No! In the computer, it gets mapped to one of the two closest *representable* numbers. Thus, every finite precision binary corresponds to an (infinite) *interval* of real numbers! That means that in the computer you aren't actually adding numbers, you are adding intervals. (Note that this occurs in any base. See, e.g., 1/3 in base 10.)

Various data types are available in the computer (integer, long integer, single precision float, double precision float, etc.) For example, the single precision float has one bit for the sign, 8 bits for the exponent, and 23 bits for the fraction:

![float register](./Float_example.svg)

(image from https://en.wikipedia.org/wiki/Single-precision_floating-point_format) Let's take a look at some of the consequences of finite precision with a few simple lines of Python. First let's have a look at how the limnited range of representable numbers can manifest. On easy way is to add a big number and a small one:



In [None]:
import matplotlib.pyplot as plt
import numpy as np

a = 1.0
b = 0.0000000000000001
tot = a + b
print("#################################")
print("#################################\n")

print("adding 1 and 10^-16 = %18.16f"  % tot)

print("\n #################################")
print("#################################")

Now if you were just adding $10^-25$ to $1.0$ once, you would just drop the tiny number. But in numerical work we often want to accumulate lots of small increments, which may result in unexpected behavior if you are not familiar with such errors:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

tot1 = 1.0
tot2 = 0.0
N = 10000000
b = 0.0000000000000001
for j in range(0,N):
    tot1 = tot1 + b
    tot2 = tot2 + b
    
tot3 = 1.0 + tot2
print("#################################")
print("#################################\n")
print("adding 10^-16 to 1.0, 10 million times: %18.16f" % tot1)
print("adding 10^-16 to itself, 10 million times, **then** adding the result to 1.0: %18.16f" % tot3)

print("\n #################################")
print("#################################")

## Sneakier round-off errors

Finite precision arithmetic can also cause problems when evaluating expressions which tend toward particular limits as a function of the arguments, but do so by the addition of large numbers to end up with small numbers. Consider the function:

$$y = 2x\left[x - \sqrt{x^2 -1} \right] - 1$$

Consider the limits as $x$ approaches $1$ and $\infty$:
--> As $x\rightarrow 1$, $y\rightarrow 1$, without any issues
--> But as $x\rightarrow \infty$, $y\rightarrow ?$

Let's rewrite the $ \sqrt $ and use a Taylor series to see what the limit should be:

$$\begin{align}
(x^2-1)^{1/2} &= x(1-1/x^2)^{1/2} \\
&\approx 1 - 1/(2x)^2\\
y & \approx 2x\left[ x - \left( x - \frac {1}{2x} \right) \right] - 1
\end{align}$$

So as $x \rightarrow \infty$, $y\rightarrow 0$.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = []
y = []
y2 = []

#N = 1000
#start = 2

N = 100000000
start = 10000000

for i in range(start,N,10):
    x.append(i)
    tmp1 = float(i)
    #tmp3 = tmp1*np.sqrt(tmp1**2 - 1)

    #tmp2 = tmp1*tmp1- tmp1*np.sqrt(tmp1**2-1.0)
    tmp3 = 2.0*tmp1*(tmp1 - np.sqrt(tmp1**2-1.0)) - 1.0
    y.append(tmp3)
    
    
plt.plot(x,y)

What happened?? Floating point weirdness! Let's try and figure out where the problem is by printing out the intermediate steps of the calculation. In particular, let's see how $x^2$ compares to $x\sqrt{x^2-1}$ for large $x$. I'll also plot the *difference* of those two quantitites.



In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = []
y = []
y2 = []
y3 = []

#N = 1000
#start = 2

N = 100000000
start = 10000000

for i in range(start,N,10):
    x.append(i)
    tmp1 = float(i)
    #tmp3 = tmp1*np.sqrt(tmp1**2 - 1)
    tmp2 = tmp1*tmp1
    tmp3 = tmp1*np.sqrt(tmp1**2-1.0)
    tmp4 = tmp1*tmp1- tmp1*np.sqrt(tmp1**2-1.0)
    y.append(tmp2)
    y2.append(tmp3)
    y3.append(tmp4)

plt.figure(1, figsize=(10, 3))
plt.subplots_adjust(wspace=0.5)
ax = plt.subplot(131)
plt.plot(x,y)
plt.plot(x,y2)    

ax2 = plt.subplot(132)
ax2.set_ylabel('difference')
plt.plot(x,y3)


There are actually two curves on the first plot --- $x^2$ and $x\sqrt{x^2-1}$ are on top of each other. Note the scale of the vertical axis on the left and right plots. The error orccurs because as $x$ gets larger, we are subtracting two very big numbers to end up with a much smaller number. Since there are (at a minimum) 6 significant digits (in decimal represntation) in single floating point precision, we run into a problem when the difference falls below that threshold. 

To fix the issue, let's rewrite our function in a different way that doesn't involve subtracting two very large numbers to get a tiny result:

$$y = \frac{2x}{x+\sqrt{x^2-1}}-1$$

(You can verify that this is the same expression if you want.)

In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = []
y = []

#N = 1000
#start = 2

N = 100000000
start = 10000000

for i in range(start,N,10):
    x.append(i)
    tmp1 = float(i)
    tmp3 = 2.0*i
    #tmp4 = float(i) + np.sqrt()
    tmp2 = 2.0*tmp1/(tmp1 + np.sqrt(tmp1**2 - 1.0)) - 1.0
    y.append(tmp2)
    
plt.plot(x,y)