# 07 Numbers

* *Computational Physics*: Ch 2.4, 2.5, 3
* Python Tutorial [Floating Point Arithmetic: Issues and Limitations](https://docs.python.org/3/tutorial/floatingpoint.html)

## Binary representation
Computers store information with two-state logic. This can be represented in a binary system with numbers 0 and 1 (i.e. base 2)

Any number can be represented in any base as a polynomial (possibly with infinitely many terms): the digits are $0 \leq x_k < b$ and determine the contribution of the base $b$ raise to the $k$th power.

$$
q_b = \sum_{k=-\infty}^{+\infty} x_k b^k
$$

## Integers 

Convert 10 (base 10, i.e. $1 \times 10^1 + 0\times 10^0$) into binary (Note: `divmod(x, 2)` is `x // 2, x % 2`, i.e. integer division and remainder):

In [52]:
divmod(10, 2)

(5, 0)

In [53]:
divmod(5, 2)

(2, 1)

In [54]:
divmod(2, 2)

(1, 0)

The binary representation of $10_{10}$ is $1010_2$ (keep dividing until there's only 1 left, then collect the 1 and all remainders in reverse order, essentially long division).

Double check by multiplying out $1010_2$:

In [55]:
1*2**3 + 0*2**2 + 1*2**1 + 0*2**0

10

or in Python

In [56]:
int('0b1010', 2)

10

In [57]:
0b1010

10

### Summary: Integers in binary representation

**All integers are exactly representable in base 2 with a finite number of digits**.

* The sign (+ or –) is represented by a single bit (0 = +, 1 = –). 
* The number of available "bits" (digits) determines the largest representable integer. 

For example, with 8 bits available (a "*byte*"), what is the largest and smallest integer?

In [59]:
0b1111111  # 7 bits for number, 1 for sign (not included)

127

In [61]:
-0b1111111

-127

### Sidenote: using numpy to quickly convert integers

If you want to properly sum all terms, use numpy arrays and the element-wise operations:

In [6]:
import numpy as np

In [7]:
nbits = 7
exponents = np.arange(nbits)
bases = 2*np.ones(nbits)  # base 2
digits = np.ones(nbits)   # all 1

In [8]:
np.sum(digits * bases**exponents)

127.0

### Examples: limits of integers

What is the smallest and largest integer that you can represent

1. if you have 4 bits available and only consider non-negative ("unsigned") integers?
2. if you have 32 bits and consider positive and negative integers?
3. if you have 64 bits and consider positive and negative integers?

Smallest and largest 4 bit unsigned integer:

In [1]:
0b0000

0

In [77]:
0b1111

15

Smallest and largest 32-bit signed integer (int32):

1 bit is sign, 31 bits are available, so the highest number has 31 ones (111...11111). The *next highest* number is 1000...000, a one with 32 bits and 31 zeroes, i.e., $2^{31}$.

Thus, the highest number is $2^{31} - 1$:

In [116]:
2**31 - 1

2147483647

(and the smallest number is just $-(2^{31} - 1)$)

And int64 (signed):

In [120]:
max64 = 2**(64-1) - 1
print(-max64, max64)

-9223372036854775807 9223372036854775807


### Python's arbitrary precision integers 

In Python, integers *have arbitrary precision*: integer arithmetic (`+`, `-`, `*`, `//`) is exact and will not overflow. Thus the following code will run forever (until memory is exhausted); if you run it, you can stop the evaluation with the ''Kernel / Interrupt'' menu command in the notebook and then investigate `n` and `nbits`:

In [95]:
n = 1 
nbits = 0
while True:
    n *= 2
    nbits += 1

KeyboardInterrupt: 

In [98]:
type(n)

int

In [104]:
int.bit_length(n)

630912

In [105]:
nbits

630911

### NumPy has fixed precision integers
NumPy data types (dtypes) are fixed precision. Overflows "wrap around":

In [121]:
import numpy as np

In [112]:
np.array([2**15-1], dtype=np.int16)

array([32767], dtype=int16)

In [122]:
np.array([2**15], dtype=np.int16)

array([-32768], dtype=int16)

In [123]:
np.array([2**15 + 1], dtype=np.int16)

array([-32767], dtype=int16)

## Binary fractions
Decimal fractions can be represented as binary fractions:

Convert $0.125_{10}$ to base 2:

In [87]:
0.125 * 2  # 0.0

0.25

In [88]:
_ * 2      # 0.00

0.5

In [89]:
_ * 2      # 0.001

1.0

Thus the binary representation of $0.125_{10}$ is $0.001_2$.

General recipe:
- multiply by 2
- if you get a number < 1, add a digit 0 to the right
- if you get a number ≥ 1, add a digit 1 to the right and then use the remainder in the same fashion

In [91]:
0.3125 * 2    # 0.0

0.625

In [92]:
_ * 2         # 0.01

1.25

In [93]:
(_ - 1) * 2   # 0.010

0.5

In [94]:
_ * 2         # 0.0101

1.0

Thus, 0.3125 is $0.0101_2$.

What is the binary representation of decimal $0.1 = \frac{1}{10}$?

In [124]:
0.1 * 2  # 0

0.2

In [125]:
_ * 2   # 0

0.4

In [126]:
_ * 2  # 0

0.8

In [127]:
_ * 2  # 1

1.6

In [128]:
(_ - 1) * 2  # 1

1.2000000000000002

In [129]:
(_ - 1) * 2  # 0

0.40000000000000036

In [130]:
_ * 2  # 0 

0.8000000000000007

In [131]:
_ * 2  # 1 

1.6000000000000014

... etc: this is an infinitely repeating fraction and the binary representation of $0.1_{10}$ is $0.000 1100 1100 1100 ..._2$.

**Thus, with a finite number of bits, 0.1 is not exactly representable in the computer.**

The number 0.1 is not stored exactly in the computer. `print` only shows you a convenient approximation:

In [132]:
print(0.1)

0.1


In [143]:
print("{0:.55f}".format(0.1))

0.1000000000000000055511151231257827021181583404541015625


## Problems with floating point arithmetic

Only a subset of all real numbers can be represented with **floating point numbers of finite bit size**. Almost all floating point numbers are not exact:

In [144]:
0.1 + 0.1 + 0.1 == 0.3

False

... which should have yielded `True`! But because the machine representation of 0.1 is not exact, the equality cannot be fulfilled.

## Representation of floats: IEEE 754

Floating point numbers are stored in "scientific notation": e.g. $c = 2.88792458 \times 10^8$ m/s
  * **mantissa**: $2.88792458$
  * **exponent**: $+8$
  * **sign**: +

Format: 
$$
x = (-1)^s \times 1.f \times 2^{e - \mathrm{bias}}
$$

Format: 

$$
x = (-1)^s \times 1.f \times 2^{e - \mathrm{bias}}
$$

Note: 
* In IEEE 754, the highest value of $e$ in the exponent is reserved and not used, e.g. for a 32-bit *float* (see below) the exponent has $(30 - 23) + 1 = 8$ bit and hence the highest number for $e$ is $(2^8 - 1) - 1 = 255 - 1 = 254$. Taking the *bias* into account (for *float*, *bias* = 127), the largest value for the exponent is $2^{254 - 127} = 2^{127}$.
* The case of $e=0$ is also special. In this case, the format is $$x = (-1)^s \times 0.f \times 2^{-\mathrm{bias}}$$ i.e. the "ghost 1" becomes a zero, gaining a additional order of magnitude.

### IEEE float (32 bit)

IEEE *float* uses **32 bits**
  * $\mathrm{bias} = 127_{10}$
  * bits
    <table>
    <tr><td></td><td>s</td><td>e</td><td>f</td></tr>
    <tr><td>bit position</td><td>31</td><td>30–23</td><td>22–0</td></tr>
    </table>
  * **six or seven decimal places of significance** (1 in $2^{23}$)
  * range: $1.4 \times 10^{-45} \leq |x_{(32)}| \leq 3.4 \times 10^{38}$ 

In [149]:
1/2**23

1.1920928955078125e-07

### IEEE double (64 bit)
Python floating point numbers are 64-bit doubles. NumPy has dtypes `float32` and `float64`.


IEEE *double* uses **64 bits**
  * $\mathrm{bias} = 1023_{10}$
  * bits
    <table>
    <tr><td></td><td>s</td><td>e</td><td>f</td></tr>
    <tr><td>bit position</td><td>63</td><td>62–52</td><td>51–0</td></tr>
    </table>
  * **about 16 decimal places of significance** (1 in $2^{52}$)
  * range: $4.9 \times 10^{-324} \leq |x_{(64)}| \leq 1.8 \times 10^{308}$ 


In [150]:
1/2**52

2.220446049250313e-16

For numerical calculations, *doubles* are typically required.

### Special numbers
IEEE 754 also introduces special "numbers" that can result from floating point arithmetic
* `NaN` (not a number)
* `+INF` and `-INF` (infinity)
* `-0` (signed zero)

Python itself does not use the IEEE special numbers

In [151]:
1/0

ZeroDivisionError: division by zero

But numpy does:

In [153]:
np.array([1, -1])/np.zeros(2)

  if __name__ == '__main__':


array([ inf, -inf])

But beware, you cannot use `INF` to "take limits". It is purely a sign that something bad happened somewhere...

In [155]:
np.zeros(2)/np.zeros(2)

  if __name__ == '__main__':


array([ nan,  nan])

### Overflow and underflow

* underflow: typically just set to zero (and that works well most of the time)
* overflow: raises exception or just set to `inf`

In [165]:
big = 1.79e308
big

1.79e+308

In [166]:
2 * big

inf

In [168]:
2 * np.array([big], dtype=np.float64)

  if __name__ == '__main__':


array([ inf])

... but you can just use an even bigger data type:

In [169]:
2 * np.array([big], dtype=np.float128)

array([ 3.58e+308], dtype=float128)

### Insignificant digits

In [78]:
x = 1000.2
A = 1000.2 - 1000.0
print(A)

0.20000000000004547


In [76]:
A == 0.2

False

... oops

In [82]:
x = 700
y = 1e-14
x - y

700.0

In [83]:
x - y < 700

False

... ooops

## Machine precision
Only a limited number of floating point numbers can be represented. This *limited precision* affects calculations:


In [172]:
x = 5  + 1e-16
x

5.0

In [173]:
x == 5

True

... oops.

**Machine precision** $\epsilon_m$ is defined as the maximum number that can be added to 1 in the computer without changing that number 1:

$$
1_c + \epsilon_m := 1_c
$$

Thus, the *floating point representation* $x_c$ of an arbitrary number $x$ is "in the vicinity of $x$"

$$
x_c = x(1\pm\epsilon), \quad |\epsilon| \leq \epsilon_m
$$

where we don't know the true value of $\epsilon$.

Thus except for powers of 2 (which are represented exactly) **all floating point numbers contain an unknown error in the 6th decimal place (32 bit floats) or 15th decimal (64 bit doubles)**. 

This error should be treated as a random error because we don't know its magnitude.

In [195]:
N = 100
eps = 1
for nbits in range(N):
    eps /= 2
    one_plus_eps = 1.0 + eps
    # print("eps = {0}, 1 + eps = {1}".format(eps, one_plus_eps))
    if one_plus_eps == 1.0:
        print("machine precision reached for {0} bits".format(nbits))
        print("eps = {0}, 1 + eps = {1}".format(eps, one_plus_eps))
        break


machine precision reached for 52 bits
eps = 1.1102230246251565e-16, 1 + eps = 1.0


Compare to our estimate for the precision of float64:

In [193]:
1/2**52

2.220446049250313e-16

## Appendix

A quick hack to convert a floating point binary representation to a floating point number.

In [16]:
bits = "1010.0001100110011001100110011001100110011001100110011"

In [1]:
import math
def bits2number(bits):
    if '.' in bits:
        integer, fraction = bits.split('.')
    else:
        integer = bits
        fraction = ""
    powers = [int(bit) * 2**n for n, bit in enumerate(reversed(integer))]
    powers.extend([int(bit) * 2**(-n) for n, bit in enumerate(fraction, start=1)])
    return math.fsum(powers)

In [34]:
bits2number(bits)

10.1

In [35]:
bits2number('1111')

15.0

In [36]:
bits2number('0.0001100110011001100110011001100110011001100110011')

0.09999999999999964

In [37]:
bits2number('0.0001100')

0.09375

In [90]:
bits2number('0.0101')

0.3125

In [8]:
bits2number("10.10101")

2.65625

In [14]:
bits2number('0.0111111111111111111111111111111111111111')

0.4999999999990905

In [22]:
bits2number('0.110011001100')

0.7998046875

In [17]:
x = 0.6

In [21]:
x.is_integer()

False

In [23]:
2**8 - 1

255

Python can convert to binary using the `struct` module:

In [26]:
import struct
fpack = struct.pack('f', 6.0e-8)    # pack float into bytes
fint = struct.unpack('i', fpack)[0] # unpack to int
m_bits = bin(fint)[-23:]            # mantissa bits
print(m_bits)

00000001101100101011001


With phantom bit:

In [32]:
mantissa_bits = '1.' + m_bits
print(mantissa_bits)

1.00000001101100101011001


In [30]:
import math
mn, ex = math.frexp(6.0e-8)
print(mn, ex)

0.50331648 -23
