# Number Representations

Memory architecture is the source of round-off errors as well as the storage limits of different variable types.  In this notebook we demonstrate these concepts for integers and floats.

## Representation of integer numbers

Here I show some examples of how integers are represented in computers.

The simplest examples are unsigned 8-bit integers. I will define a NumPy quantity of this type, and then show the bits that express it.

In [3]:
import numpy as np
for value in [1, 3, 8, 95, 255]:
    print(value,' decimal -> ',np.binary_repr(value,width=8))

1  decimal ->  00000001
3  decimal ->  00000011
8  decimal ->  00001000
95  decimal ->  01011111
255  decimal ->  11111111


What happens if you increment past 255 for an unsigned 8-bit integer?

In [4]:
value=np.uint8(250)
for i in np.arange(10):
    print(value,' decimal -> ',np.binary_repr(value,width=8))
    value = value+np.uint8(1)

250  decimal ->  11111010
251  decimal ->  11111011
252  decimal ->  11111100
253  decimal ->  11111101
254  decimal ->  11111110
255  decimal ->  11111111
0  decimal ->  00000000
1  decimal ->  00000001
2  decimal ->  00000010
3  decimal ->  00000011


  value = value+np.uint8(1)


It is important to remember that the integers have a limited range when expressed this way (though it is a much larger range for 32- or 64-bit integers)! In Python, a warning is issued when this happens, but this does not happen in all computing environments.

For a signed integer, the upper half of the range is remapped to the negative numbers.

In [5]:
for value in [-128, -127, -2, -1, 0, 1, 127]:
    print(value,' decimal -> ',np.binary_repr(value,width=8))

-128  decimal ->  10000000
-127  decimal ->  10000001
-2  decimal ->  11111110
-1  decimal ->  11111111
0  decimal ->  00000000
1  decimal ->  00000001
127  decimal ->  01111111


What happens if you increment past 127 for a signed 8-bit integer?

In [6]:
value=np.int8(124)
for i in np.arange(10):
    print(value,' decimal -> ',np.binary_repr(value,width=8))
    value = value+np.int8(1)

124  decimal ->  01111100
125  decimal ->  01111101
126  decimal ->  01111110
127  decimal ->  01111111
-128  decimal ->  10000000
-127  decimal ->  10000001
-126  decimal ->  10000010
-125  decimal ->  10000011
-124  decimal ->  10000100
-123  decimal ->  10000101


  value = value+np.int8(1)


In Python 3, the default integer value is 64 bits.  For Python 2 the default integer value was 32 bits, which covered most (but not all!) cases of interest. Its maximum value in the signed case was $2^{32-1}-1=2147483647$, which is just over 2 billion. Astronomers and physicists can encounter catalogs with billions of entries, so 64-bit integers are sometimes necessary. Here are some examples for 32-bit signed integers.

In [7]:
for value in [-73654836, -3, 1, 850466]:
    print(value,' decimal -> ',np.binary_repr(value,width=32))

-73654836  decimal ->  11111011100111000001110111001100
-3  decimal ->  11111111111111111111111111111101
1  decimal ->  00000000000000000000000000000001
850466  decimal ->  00000000000011001111101000100010


We do not show 64-bit examples just because the binary representations would take up too much space.

## Representation of floating point numbers

To find the binary representation of 32-bit floating point numbers, I have to define a function.

In [8]:
import struct
def binary32(num):
    return ''.join(bin(c).replace('0b', '').rjust(8, '0') for c in struct.pack('!f', num))

In [11]:
for value in [0., 1., 1.5, 2., 5000., - 5000., 1.4e-45, 3.40055e+38, 1.e+100, 1.e-56]:
    bitlist=binary32(np.float32(value))
    sign = bitlist[0]
    exponent = bitlist[1:9]
    mantissa = bitlist[9:32]
    template = """{value} decimal ->
       sign = {sign} 
       exponent = {exponent} 
       mantissa = {mantissa}"""
    print(template.format(value=value, sign=sign, exponent=exponent, mantissa=mantissa))

0.0 decimal ->
       sign = 0 
       exponent = 00000000 
       mantissa = 00000000000000000000000
1.0 decimal ->
       sign = 0 
       exponent = 01111111 
       mantissa = 00000000000000000000000
1.5 decimal ->
       sign = 0 
       exponent = 01111111 
       mantissa = 10000000000000000000000
2.0 decimal ->
       sign = 0 
       exponent = 10000000 
       mantissa = 00000000000000000000000
5000.0 decimal ->
       sign = 0 
       exponent = 10001011 
       mantissa = 00111000100000000000000
-5000.0 decimal ->
       sign = 1 
       exponent = 10001011 
       mantissa = 00111000100000000000000
1.4e-45 decimal ->
       sign = 0 
       exponent = 00000000 
       mantissa = 00000000000000000000001
3.40055e+38 decimal ->
       sign = 0 
       exponent = 11111110 
       mantissa = 11111111101010000110110
1e+100 decimal ->
       sign = 0 
       exponent = 11111111 
       mantissa = 00000000000000000000000
1e-56 decimal ->
       sign = 0 
       exponent = 00000000

Note that the last two are an overflow and an underflow, respectively. Now we demonstrate what happens when adding two numbers that differ by a large dynamic range.

In [10]:
value_big = np.float32(1.)
value_small = np.float32(1.e-8)
bitlist = binary32(np.float32(value_big))
e10 = np.array([ np.int(ee) * 2**(7-indx) for indx, ee in zip(range(8), bitlist[1:9])], dtype=np.int32).sum() - 127
print("Exponent of {value}: {exponent} --> {e10}".format(value=value_big, exponent=bitlist[1:9], e10=e10))
print("Mantissa of {value}: {mantissa}".format(value=value_big, mantissa=bitlist[9:]))
bitlist = binary32(np.float32(value_small))
e10 = np.array([ np.int(ee) * 2**(7-indx) for indx, ee in zip(range(8), bitlist[1:9])], dtype=np.int32).sum() - 127
print("Exponent of {value}: {exponent} --> {e10}".format(value=value_small, exponent=bitlist[1:9], e10=e10))
print("Mantissa of {value}: {mantissa}".format(value=value_small, mantissa=bitlist[9:]))

Exponent of 1.0: 01111111 --> 0
Mantissa of 1.0: 00000000000000000000000
Exponent of 9.99999993922529e-09: 01100100 --> -27
Mantissa of 9.99999993922529e-09: 01010111100110001110111


In order to add these two numbers, the computer needs to make the exponents the same.  To do that, it must shift the digits in the lower number's mantissa to the right.

The number $10^{-8}$, originally expressed as:
- exponent 01100100 -> (-27)
- mantissa 01010111100110001110111

becomes after one shift
- exponent 01100101 -> (-26)
- mantissa 00101011110011000111011

Raising the exponent by 1 makes all the numbers in the mantissa shift to the right.  Note that just shifting the bits doesn't keep the value equal to $10^{-8}$ because of the "1+" in the mantissa.  However, the adding operation keeps track of how many shifts are performed to correct for this.

The main point is that this is a lower precision way to express $10^{-8}$ because you have lost the last bit which has a value of $2^{-23}\sim 10^{-7}$.  Thus you always lose some precision when shifting bits to add numbers.

In this case, it's really a problem because $10^{-8}<2^{-23}$.  Another way of saying it is that to add $10^{-8}$ to 1, you have to shift the bits 27 times to match the exponents.  After shifting 23 times, the mantissa becomes zero.  Then adding the mantissas results in no change.

In [12]:
value = value_big + value_small# add 1 and 1.e-8
bitlist = binary32(np.float32(value))
e10 = np.array([ np.int(ee) * 2**(7-indx) for indx, ee in zip(range(8), bitlist[1:9])], dtype=np.int32).sum() - 127
print("Exponent of {value}: {exponent} --> {e10}".format(value=value_big, exponent=bitlist[1:9], e10=e10))
print("Mantissa of {value}: {mantissa}".format(value=value_big, mantissa=bitlist[9:]))

Exponent of 1.0: 01111111 --> 0
Mantissa of 1.0: 00000000000000000000000


This error, called round-off error, is a consideration in building stable algorithms.  For example, this can be a serious issue when adding lots of small numbers.

In [13]:
# Here is a dumb way to add up a bunch of numbers that are supposed to add to unity.
f = np.float32(0.)
incr = np.float32(1.e-8)
for i in np.arange(100000000):
    f = f + incr
print(f)

0.25


In [14]:
# This is just as dumb but because the values have 64-bit precision it works out OK.
f = np.float64(0.)
incr = np.float64(1.e-8)
for i in np.arange(100000000):
    f = f + incr
print(f)

1.0000000022898672


In [15]:
# Note that NumPy must do something smarter with its array method sum()!
incrs = np.ones(100000000, dtype=np.float32) * np.float32(1.e-8)
print(incrs.sum() - 1.)

incrs = np.ones(100000000, dtype=np.float64) * np.float64(1.e-8)
print(incrs.sum() - 1.)

-0.00011271238327026367
-2.000621890374532e-13
