# Effect of changing numerical precision

In this notebook we look at the effect of changing precision for floating point numbers from 64-bit to 32-bit and 16-bit.

In [15]:
import numpy as np

from IPython.display import Markdown, display

Define a function to give us print statements with markdown formatting

In [16]:
def printmd(string):
    display(Markdown(string))

# Create the arrays
Create the random 64-bit 2D array

In [17]:
np.random.seed(3)
N = 10000
arr64 = np.random.standard_normal((N,N))

Create the 32-bit and 16-bit arrays

In [18]:
arr32 = arr64.astype(np.float32)
arr16 = arr64.astype(np.float16)

Check the size of the arrays in memory

In [19]:
printmd(f"Size of **64-bit** array: {arr64.nbytes/1e6} Mb")
printmd(f"Size of **32-bit** array: {arr32.nbytes/1e6} Mb")
printmd(f"Size of **16-bit** array: {arr16.nbytes/1e6} Mb")

Size of **64-bit** array: 800.0 Mb

Size of **32-bit** array: 400.0 Mb

Size of **16-bit** array: 200.0 Mb

# Changes in values with changes in precision

When changing precision we need to understand the effect it has on our results. We will use the `np.testing.assert_allclose` function to do so.

We first confirm that the 64-bit and 32-bit arrays are equal to 6 decimal places

In [20]:
np.testing.assert_allclose(actual=arr32,desired=arr64,atol=1e-6,rtol=0)

However, the big pink box shows that the 64-bit and 32-but arrays are **not** equal to 7 decimal places for every element

In [21]:
np.testing.assert_allclose(actual=arr32,desired=arr64,atol=1e-7,rtol=0)

AssertionError: 
Not equal to tolerance rtol=0, atol=1e-07

Mismatched elements: 736084 / 100000000 (0.736%)
Max absolute difference: 2.38294763e-07
Max relative difference: 5.95987588e-08
 x: array([[ 1.788628,  0.43651 ,  0.096497, ...,  0.742021, -0.455719,
         0.422186],
       [-0.041542, -1.826522, -0.844802, ..., -0.381443,  0.552564,...
 y: array([[ 1.788628,  0.43651 ,  0.096497, ...,  0.742021, -0.455719,
         0.422186],
       [-0.041542, -1.826522, -0.844802, ..., -0.381443,  0.552564,...

# Timing operations

## Mean along an axis
How long does it take to take the mean for each array?

In [22]:
printmd('**64-bit**')
%timeit -n 1 arr64.mean(axis=0)
printmd('**32-bit**')
%timeit -n 1 arr32.mean(axis=0)
printmd('**16-bit**')
%timeit -n 1 arr16.mean(axis=0)

**64-bit**

101 ms ± 34.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


**32-bit**

42 ms ± 3.31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


**16-bit**

The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached.
354 ms ± 256 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


We see that the operation on the 32-bit array is about twice as fast as on the 64-bit array while the 16-bit array is much slower than both. This is because the CPU does not have native support for 16-bits and so it must be emulated.

## Matrix multiplication

Let's try a more demanding computation - takinng the square of the array

In [14]:
printmd('**64-bit**')
%timeit -n 1 arr64**2
printmd('**32-bit**')
%timeit -n 1 arr32**2
printmd('**16-bit**')
%timeit -n 1 arr16**2

**64-bit**

493 ms ± 24.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


**32-bit**

248 ms ± 8.27 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


**16-bit**

840 ms ± 120 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Again we see that the operation on the 32-bit array is about twice as fast as on the 64-bit array