# Scalars

In [None]:
%matplotlib inline

In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Integers

### Binary representation of integers

In [None]:
format(16, '032b')

### Bit shifting

In [None]:
format(16 >> 2, '032b')

In [None]:
16 >> 2

In [None]:
format(16 << 2, '032b')

In [None]:
16 << 2

### Overflow

In general, the computer representation of integers has a limited range, and may overflow. The range depends on whether the integer is signed or unsigned.

For example, with 8 bits, we can represent at most $2^8 = 256$ integers.

- 0 to 255 unsigned
- -128 ti 127 signed

Signed integers

In [None]:
np.arange(130, dtype=np.int8)[-5:]

Unsigned integers

In [None]:
np.arange(130, dtype=np.uint8)[-5:]

In [None]:
np.arange(260, dtype=np.uint8)[-5:]

### Integer division

In Python 2 or other languages such as C/C++, be very careful when dividing as the division operator `/` performs integer division when both numerator and denominator are integers. This is rarely what you want. In Python 3 the `/` always performs floating point division, and you use `//` for integer division, removing a common source of bugs in numerical calculations.

In [None]:
%%python2

import numpy as np

x = np.arange(10)
print(x/10)

Python 3 does the "right" thing.

In [None]:
x = np.arange(10)
x/10

## Real numbers

Real numbers are represented as **floating point** numbers. A floating point number is stored in 3 pieces (sign bit, exponent, mantissa) so that every float is represented as get +/- mantissa ^ exponent. Because of this, the interval between consecutive numbers is smallest (high precision) for numbers close to 0 and largest for numbers close to the lower and upper bounds.

Because exponents have to be singed to represent both small and large numbers, but it is more convenient to use unsigned numbers here, the exponent has an offset (also known as the exponent bias). For example, if the exponent is an unsigned 8-bit number, it can represent the range (0, 255). By using an offset of 128, it will now represent the range (-127, 128).

![float1](http://www.dspguide.com/graphics/F_4_2.gif)

**Note**: Intervals between consecutive floating point numbers are not constant. In particular, the precision for small numbers is much larger than for large numbers. In fact, approximately half of all floating point numbers lie between -1 and 1 when using the `double` type in C/C++ (also the default for `numpy`).

![float2](http://jasss.soc.surrey.ac.uk/9/4/4/fig1.jpg)

Because of this, if you are adding many numbers, it is more accurate to first add the small numbers before the large numbers.

#### IEEE 754 32-bit floating point representation

![img](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Float_example.svg/590px-Float_example.svg.png)

See [Wikipedia](https://en.wikipedia.org/wiki/Single-precision_floating-point_format) for how this binary number is evaluated to 0.15625.

In [None]:
from ctypes import c_int, c_float

In [None]:
s = c_int.from_buffer(c_float(0.15625)).value

In [None]:
s = format(s, '032b')
s

In [None]:
rep = {
    'sign': s[:1], 
    'exponent' : s[1:9:], 
    'fraction' : s[9:]
}
rep

### Most base 10 real numbers are approximations

This is simply because numbers are stored in finite-precision binary format.

In [None]:
'%.20f' % (0.1 * 0.1 * 100)

### Never check for equality of floating point numbers

In [None]:
i = 0
loops = 0
while i != 1:
    i += 0.1 * 0.1
    loops += 1
    if loops == 1000000:
        break
i

In [None]:
i = 0
loops = 0
while np.abs(1 - i) > 1e-6:
    i += 0.1 * 0.1
    loops += 1
    if loops == 1000000:
        break
i

### Associative law does not necessarily hold

In [None]:
6.022e23 - 6.022e23 + 1

In [None]:
1 + 6.022e23 - 6.022e23

### Distributive law does not hold

In [None]:
a = np.exp(1)
b = np.pi
c = np.sin(1)

In [None]:
a*(b+c)

In [None]:
a*b + a*c

### Catastrophic cancellation

Consider calculating sample variance

$$
s^2= \frac{1}{n(n-1)}\sum_{i=1}^n x_i^2 - (\sum_{i=1}^n x_i)^2
$$

Be  careful whenever you calculate the difference of potentially big numbers.

In [None]:
def var(x):
    """Returns variance of sample data using sum of squares formula."""
    
    n = len(x)
    return (1.0/(n*(n-1))*(n*np.sum(x**2) - (np.sum(x))**2))

### What is the sample variance for numbers from a normal distribution with variance 1?

In [None]:
np.random.seed(15)
x_ = np.random.normal(0, 1, int(1e6))
x = 1e12 + x_
var(x)

### Numerically stable algorithms

In [None]:
np.var(x)

### Underflow

In [None]:
np.warnings.filterwarnings('ignore')

In [None]:
np.random.seed(4)
xs = np.random.random(1000)
ys = np.random.random(1000)
np.prod(xs)/np.prod(ys)

#### Prevent underflow by staying in log space

In [None]:
x = np.sum(np.log(xs))
y = np.sum(np.log(ys))
np.exp(x - y)

### Overflow

Let's calculate

$$
\log(e^{1000} + e^{1000})
$$

Using basic algebra, we get the solution $\log(2) + 1000$.

In [None]:
x = np.array([1000, 1000])
np.log(np.sum(np.exp(x)))

In [None]:
np.logaddexp(*x)

**logsumexp**

This function generalizes `logaddexp` to an arbitrary number of addends and is useful in a variety of statistical contexts.

Suppose we need to calculate a probability distribution $\pi$ parameterized by a vector $x$

$$
\pi_i = \frac{e^{x_i}}{\sum_{j=1}^n e^{x_j}}
$$

Taking logs, we get

$$
\log(\pi_i) = x_i - \log{\sum_{j=1}^n e^{x_j}}
$$

In [None]:
x = 1e6*np.random.random(100)

In [None]:
np.log(np.sum(np.exp(x))) 

In [None]:
from scipy.special import logsumexp

In [None]:
logsumexp(x)

### Other useful numerically stable functions 

**logp1 and expm1**

In [None]:
np.exp(np.log(1 + 1e-6)) - 1

In [None]:
np.expm1(np.log1p(1e-6))

**sinc**

In [None]:
x = 1

In [None]:
np.sin(x)/x

In [None]:
np.sinc(x)

In [None]:
x = np.linspace(0.01, 2*np.pi, 100)

In [None]:
plt.plot(x, np.sinc(x), label='Library function')
plt.plot(x, np.sin(x)/x, label='DIY function')
plt.legend()
pass