# NumPy Exercises

Tamás Gál (tamas.gal@fau.de)

The latest version of this notebook is available at [https://github.com/Asterics2020-Obelics](https://github.com/Asterics2020-Obelics/School2017/tree/master/numpy)

**Warning**: This notebook contains all the solutions. If you are currently sitting in the `NumPy` lecture, close this immediately ;-) You will now work in blank notebook, you don't need anything else!

In [1]:
import numpy as np
import numba as nb
import sys

print("Python version: {0}\n"
      "NumPy version: {1}\n"
      "numba version: {2}"
      .format(sys.version, np.__version__, nb.__version__))

Python version: 3.6.5 (default, Jun  1 2018, 14:48:24) 
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)]
NumPy version: 1.14.3
numba version: 0.38.1


In [2]:
def describe(np_obj):
    """Print some information about a NumPy object"""
    print("object type: {0}\n"
          "size: {o.size}\n"
          "ndim: {o.ndim}\n"
          "shape: {o.shape}\n"
          "dtype: {o.dtype}"
          .format(type(np_obj), o=np_obj))

In [3]:
from IPython.core.magic import register_line_magic

@register_line_magic
def shorterr(line):
    """Show only the exception message if one is raised."""
    try:
        output = eval(line)
    except Exception as e:
        print("\x1b[31m\x1b[1m{e.__class__.__name__}: {e}\x1b[0m".format(e=e))
    else:
        return output
    
del shorterr

## Exercise 1: Create a 5x5 matrix with 5's on its diagonal

```5 0 0 0 0
0 5 0 0 0
0 0 5 0 0
0 0 0 5 0
0 0 0 0 5
```

### Solution: `np.eye()`

In [4]:
np.eye(5) * 5

array([[5., 0., 0., 0., 0.],
       [0., 5., 0., 0., 0.],
       [0., 0., 5., 0., 0.],
       [0., 0., 0., 5., 0.],
       [0., 0., 0., 0., 5.]])

### Alternative solutions and further discussions

In [5]:
%timeit np.eye(500) * 5

614 µs ± 35.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [6]:
%%timeit
a = np.eye(500)
a  *= 5

824 µs ± 120 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [7]:
%%timeit
a = np.eye(500)
np.multiply(a, 5, out=a)  # avoid creating a copy 

758 µs ± 33.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [8]:
%%timeit
a = np.zeros((500, 500))
# faster on large arrays, no unnecessary multiplications
a[np.diag_indices_from(a)] = 5

514 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [9]:
%timeit np.diag(np.ones(500) * 5)

424 µs ± 9.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


## Exercise 2: Create a random array with 10 elements and replace its largest value with 0

### Solution:

In [10]:
a = np.random.random(10)
a

array([0.48284918, 0.11250542, 0.19689225, 0.26043274, 0.62960183,
       0.69848169, 0.49587437, 0.86417022, 0.41593701, 0.13620717])

In [11]:
np.argmax(a)  # gives the index of the maximum
a[np.argmax(a)] = 0
a

array([0.48284918, 0.11250542, 0.19689225, 0.26043274, 0.62960183,
       0.69848169, 0.49587437, 0.        , 0.41593701, 0.13620717])

## Exercise 3: Create the following array

    1 2 3 4 5
    1 2 3 4 5
    1 2 3 4 5
    1 2 3 4 5
    1 2 3 4 5


### Solution:

In [12]:
np.ones((5, 5)) * np.arange(1, 6)

array([[1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.]])

In [13]:
np.ones(5)[:, np.newaxis] * np.arange(1, 6)

array([[1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.]])

### Alternative solutions and further discussions

In [14]:
%timeit np.ones((500, 5)) * np.arange(1, 6)

14.7 µs ± 320 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [15]:
%%timeit
a = np.ones((500, 5))
np.multiply(a, np.arange(1, 6), out=a)

12.9 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [16]:
%timeit np.ones(500)[:, np.newaxis] * np.arange(1, 6)

18.6 µs ± 529 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [17]:
%%timeit
a = np.empty((500, 5))
a[:] = np.arange(1, 6)

6.07 µs ± 269 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [18]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [19]:
np.ones(5)[:, np.newaxis]  # adds a new dimension

array([[1.],
       [1.],
       [1.],
       [1.],
       [1.]])

In [20]:
np.ones(5)[:, np.newaxis].shape

(5, 1)

In [21]:
np.arange(1, 6).shape

(5,)

In [22]:
# broadcasting will turn (5, 1) and (5,) into (5, 5)
(np.ones(5)[:, np.newaxis] * np.arange(1, 6)).shape

(5, 5)

## Exercise 4: Create a checkerboard (8x8, 0s and 1s)

    0 1 0 1 0 1 0 1
    1 0 1 0 1 0 1 0
    0 1 0 1 0 1 0 1
    1 0 1 0 1 0 1 0
    0 1 0 1 0 1 0 1
    1 0 1 0 1 0 1 0
    0 1 0 1 0 1 0 1
    1 0 1 0 1 0 1 0

### Solution:

In [23]:
checkerboard = np.zeros((8, 8), dtype='i')
checkerboard[::2, 1::2] = 1
checkerboard[1::2, ::2] = 1
checkerboard

array([[0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [1, 0, 1, 0, 1, 0, 1, 0]], dtype=int32)

## Exercise 5: Extract the integer part of a random sample

    np.random.uniform(0, 10, 10)
    
e.g. `[23.5, 42.0, 500.3, 123.9] -> [23, 42, 500, 123]`

### Solution:

In [24]:
a = np.random.uniform(0, 10, 10)
a

array([9.63861756, 7.8662533 , 1.28692909, 0.6762453 , 4.40373238,
       8.44661195, 9.48386577, 4.16654032, 5.59141099, 1.69264134])

In [25]:
a - a%1

array([9., 7., 1., 0., 4., 8., 9., 4., 5., 1.])

In [26]:
np.floor(a)

array([9., 7., 1., 0., 4., 8., 9., 4., 5., 1.])

In [27]:
np.ceil(a) - 1

array([9., 7., 1., 0., 4., 8., 9., 4., 5., 1.])

In [28]:
np.trunc(a)

array([9., 7., 1., 0., 4., 8., 9., 4., 5., 1.])

In [29]:
a.astype(int)

array([9, 7, 1, 0, 4, 8, 9, 4, 5, 1])

### Further discussions

In [30]:
a = np.random.uniform(0, 10, 10000)

In [31]:
%timeit a - a%1

216 µs ± 9.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [32]:
%timeit np.floor(a)

44.2 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [33]:
%timeit np.ceil(a) - 1

48.6 µs ± 1.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [34]:
%timeit np.trunc(a)

47.4 µs ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [35]:
%timeit a.astype(int)  # the winner -> casting

6.37 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## Exercise 6: Create an array with 10 equidistant numbers between 0 and 1, excluding 0 and 1

### Solution:

In [36]:
a = np.linspace(0, 10, 11, endpoint=False)[1:]
a

array([0.90909091, 1.81818182, 2.72727273, 3.63636364, 4.54545455,
       5.45454545, 6.36363636, 7.27272727, 8.18181818, 9.09090909])

## Exercise 7: Find the value closest to a given number in an array

    a = np.random.random(10)
    target = 0.23

### Solution:

In [37]:
a = np.random.random(10)
target = 0.23
a

array([0.22060801, 0.96968938, 0.85229627, 0.34927312, 0.25208736,
       0.20241369, 0.04235003, 0.0491631 , 0.19640968, 0.3720309 ])

In [38]:
a[np.argmin(np.abs(a - target))]

0.22060800963025007

## Exercise 8: Multiply two arrays elementwise

    a = np.random.random(1234567)
    b = np.random.random(1234567)

### Solution:

In [39]:
a = np.random.random(1234567)
b = np.random.random(1234567)

In [40]:
%time a*b

CPU times: user 4.91 ms, sys: 5.41 ms, total: 10.3 ms
Wall time: 8.31 ms


array([0.48286692, 0.00889187, 0.08523639, ..., 0.39076243, 0.74744327,
       0.22063116])

In [41]:
%time np.multiply(a, b)

CPU times: user 5.48 ms, sys: 6.08 ms, total: 11.6 ms
Wall time: 9.5 ms


array([0.48286692, 0.00889187, 0.08523639, ..., 0.39076243, 0.74744327,
       0.22063116])

In [42]:
%time np.multiply(a, b, out=a)

CPU times: user 5.44 ms, sys: 976 µs, total: 6.41 ms
Wall time: 5.04 ms


array([0.48286692, 0.00889187, 0.08523639, ..., 0.39076243, 0.74744327,
       0.22063116])

## Exercise 9: Calculate the cosine of 12.345.678 elements

    a = np.random.random(12345678)

### Solution:

In [43]:
a = np.random.random(12345678)

In [44]:
%time np.cos(a)

CPU times: user 161 ms, sys: 38.7 ms, total: 199 ms
Wall time: 198 ms


array([0.97031855, 0.98885946, 0.72625018, ..., 0.90879397, 0.87079935,
       0.99971884])

In [45]:
%time np.cos(a, out=a)

CPU times: user 116 ms, sys: 1.66 ms, total: 118 ms
Wall time: 116 ms


array([0.97031855, 0.98885946, 0.72625018, ..., 0.90879397, 0.87079935,
       0.99971884])

## Exercise 10: Calculate the following, with two 1.234.567 length arrays:

    a = np.random.random(1234567)
    b = np.random.random(1234567)
    
$$
c_i = \tan(a_i) \cdot b_i - a_i^{b_i}
$$

for $i \in [0, 1234566]$

### Solution:

In [46]:
a = np.random.random(1234567)
b = np.random.random(1234567)

In [47]:
def f(a, b):
    return np.tan(a) * b - a**b

In [48]:
%time f(a, b)

CPU times: user 59 ms, sys: 7.18 ms, total: 66.2 ms
Wall time: 64.5 ms


array([-0.90930626, -0.08877409, -0.97185353, ..., -0.29577132,
       -0.00388191, -0.14719914])

### What about a Python loop?

    a = np.random.random(1234567)
    b = np.random.random(1234567)
    
$$
c_i = \tan(a_i) \cdot b_i - a_i^{b_i}
$$

for $i \in [0, 1234566]$

In [49]:
def silly_func(a, b):
    c = np.empty_like(a)
    for i in range(len(a)):
        c[i] = np.tan(a[i]) * b[i] - np.power(a[i], b[i])
    return c

In [50]:
%time silly_func(a, b)

CPU times: user 6.04 s, sys: 14.7 ms, total: 6.06 s
Wall time: 6.07 s


array([-0.90930626, -0.08877409, -0.97185353, ..., -0.29577132,
       -0.00388191, -0.14719914])

### Let's JIT it with `numba`!

In [51]:
@nb.jit
def silly_func(a, b):
    c = np.empty_like(a)
    for i in range(len(a)):
        c[i] = np.tan(a[i]) * b[i] - np.power(a[i], b[i])
    return c

In [52]:
%time silly_func(a, b)  # first execution includes the compilation!

CPU times: user 286 ms, sys: 36.2 ms, total: 322 ms
Wall time: 473 ms


array([-0.90930626, -0.08877409, -0.97185353, ..., -0.29577132,
       -0.00388191, -0.14719914])

In [53]:
%time silly_func(a, b)  # the second is pure LLVM optimised code

CPU times: user 53.6 ms, sys: 5.52 ms, total: 59.2 ms
Wall time: 61 ms


array([-0.90930626, -0.08877409, -0.97185353, ..., -0.29577132,
       -0.00388191, -0.14719914])

In [54]:
@nb.jit
def silly_func_mutating_a(a, b):
    for i in range(len(a)):
        a[i] = np.tan(a[i]) * b[i] - np.power(a[i], b[i])

In [55]:
%time silly_func_mutating_a(a, b);  # first execution includes the compilation!

CPU times: user 148 ms, sys: 3.95 ms, total: 152 ms
Wall time: 153 ms


In [56]:
%time silly_func_mutating_a(a, b);  # the second is pure LLVM optimised code

CPU times: user 27.9 ms, sys: 456 µs, total: 28.4 ms
Wall time: 28.9 ms


Summary (running them once on my 2017 MacBook Air 17: 2.2GHz i7, numbers may vary in the notebook outputs above):
- **~60ms** (numpy)
- **~7000ms** (Python)
- **~160ms** (numba, inc. JIT comp.)
- **~50ms** (numba, JIT)
- **~140ms** (reusing `a`, numba, inc. JIT comp.)
- **~25ms** (reusing `a`, numba, JIT)

## Exercise 11: Given two arrays `a` anb `b`, check if they are equal

    a = np.random.random(1234567)
    b = a.copy()

    b[-1] = 23  # artificially make them differ at the very end ;)

In [57]:
a = np.random.random(1234567)
b = a.copy()

b[-1] = 23  # artificially make them differ at the very end ;)

### Solution:

In [58]:
%time np.allclose(a, b)

CPU times: user 16.9 ms, sys: 9.01 ms, total: 25.9 ms
Wall time: 25.7 ms


False

### Using numba?

In [59]:
@nb.jit
def allclose(a, b):
    for i in range(len(a)):
        if np.abs(a[i] - b[i]) > 0.0001:
            return False
    return True

In [60]:
%time allclose(a, b)

CPU times: user 71.1 ms, sys: 3.18 ms, total: 74.3 ms
Wall time: 74.9 ms


False

In [61]:
%time allclose(a, b)

CPU times: user 2.32 ms, sys: 60 µs, total: 2.38 ms
Wall time: 2.87 ms


False

## Exercise 12: Make a  numpy array immutable

In [62]:
    a = np.ones(23)

In [63]:
a = np.ones(23)

### Solution:

In [64]:
a.flags.writeable = False
a[5] = 0

ValueError: assignment destination is read-only

## Exercise 13: Calculate the diagonal of a dot product

In [65]:
A = np.random.random((5, 5))
B = np.random.random((5, 5))

### Solution:

In [66]:
%time np.diag(np.dot(A, B))

CPU times: user 357 µs, sys: 298 µs, total: 655 µs
Wall time: 442 µs


array([1.16435715, 1.13775728, 1.89066125, 0.682437  , 1.59440784])

In [67]:
%time np.sum(A * B.T, axis=1)

CPU times: user 59 µs, sys: 8 µs, total: 67 µs
Wall time: 71 µs


array([1.16435715, 1.13775728, 1.89066125, 0.682437  , 1.59440784])

In [68]:
%time np.einsum("ij,ji->i", A, B)

CPU times: user 49 µs, sys: 19 µs, total: 68 µs
Wall time: 75.1 µs


array([1.16435715, 1.13775728, 1.89066125, 0.682437  , 1.59440784])

## Exercise 14: Find the most frequent value in an array

In [69]:
a = np.random.randint(0, 23, 123456)
a[:42]

array([22,  4, 18, 21, 16,  0, 10, 16,  2,  0,  0, 12,  6, 20, 21, 12,  8,
        6,  3, 19,  6,  4, 18,  4, 18, 21, 12,  0, 18,  8, 21, 21, 13, 10,
       15, 10, 20,  0,  5, 11, 14,  1])

### Solution:

In [70]:
%time np.bincount(a).argmax()

CPU times: user 619 µs, sys: 128 µs, total: 747 µs
Wall time: 503 µs


1

In [71]:
@nb.jit
def most_frequent(values):
    bins = {}
    for i in range(len(values)):
        v = values[i]
        if v not in bins:
            bins[v] = 1
        else:
            bins[v] += 1
    return max(bins, key=bins.get)

In [72]:
%time most_frequent(a)

CPU times: user 242 ms, sys: 6.62 ms, total: 249 ms
Wall time: 253 ms


1

In [73]:
%time most_frequent(a)  # numba can't handle dicts!

CPU times: user 54.8 ms, sys: 339 µs, total: 55.1 ms
Wall time: 54.8 ms


1

In [74]:
def most_frequent_pure_python(values):
    bins = {}
    for i in range(len(values)):
        v = values[i]
        if v not in bins:
            bins[v] = 1
        else:
            bins[v] += 1
    return max(bins, key=bins.get)

In [75]:
%time most_frequent_pure_python(a)  # as fast as the JITted numba function

CPU times: user 56 ms, sys: 2.61 ms, total: 58.6 ms
Wall time: 56.8 ms


1

## Exercise 15: Roll two 6-sided dice 123456 times and count each individual value

### Solution:

I show you an ugly version first and then we proceed...

In [76]:
def roll_dice(n=123456):
    dice_1 = np.random.randint(1, 6, n)
    dice_2 = np.random.randint(1, 6, n)
    sums = dice_1 + dice_2
    return np.unique(sums, return_counts=True)

In [77]:
roll_dice()

(array([ 2,  3,  4,  5,  6,  7,  8,  9, 10]),
 array([ 4922,  9774, 14941, 19862, 24669, 19601, 14832,  9937,  4918]))

## Exercise 16: Roll five 12-sided dice 123456 times and count each individual value

If you did it right, you now only need to change 2 parameters of your previous code ;)

### Solution:

In [78]:
def roll_dice(n_rolls, n_sides, n_die):
    rolls = np.sum(np.random.randint(1, n_sides+1, n_rolls*n_die)
                   .reshape(n_die, n_rolls), axis=0)
    return np.unique(rolls, return_counts=True)

In [79]:
roll_dice(123456, 12, 5)

(array([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
        23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
        40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
        57, 58, 59]),
 array([   1,    9,   17,   31,   80,  123,  173,  253,  383,  495,  701,
         913, 1093, 1490, 1877, 2190, 2641, 3132, 3599, 3944, 4477, 4964,
        5442, 5712, 5826, 6087, 6208, 6223, 6063, 5907, 5610, 5301, 5064,
        4513, 4019, 3413, 3151, 2647, 2220, 1790, 1407, 1181,  931,  660,
         521,  346,  235,  166,  112,   56,   34,   16,    8,    1]))

## Acknowledgements
![](images/eu_asterics.png)

This tutorial was supported by the H2020-Astronomy ESFRI and Research Infrastructure Cluster (Grant Agreement number: 653477).