# Composability demo
This notebook assumes that MKL-enabled Numpy, TBB and SMP modules, and their IPython kernels are already installed like below:
```
conda install -c intel mkl numpy tbb4py smp
jupyter kernelspec install python-tbb/ --sys-prefix
jupyter kernelspec install python-smp/ --sys-prefix
```

Change kernels using Jupyter menu, e.g. Kernel->Change kernel->Python 3/TBB; then re-run the first cell below:

In [1]:
import numpy as np
from multiprocessing.pool import ThreadPool
pool = ThreadPool(10)

#### Default Python kernel

In [2]:
%timeit pool.map(np.linalg.qr, [np.random.random((256, 256)) for i in range(10)])

341 ms ± 63.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


#### Python -m SMP kernel

In [2]:
%timeit pool.map(np.linalg.qr, [np.random.random((256, 256)) for i in range(10)])

19.5 ms ± 59.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


#### Python -m TBB kernel

In [2]:
%timeit pool.map(np.linalg.qr, [np.random.random((256, 256)) for i in range(10)])

17.2 ms ± 581 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


#### Machine specification
In order to observe any difference, machine should be big enough

In [4]:
!lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                88
On-line CPU(s) list:   0-87
Thread(s) per core:    2
Core(s) per socket:    22
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               1441.343
BogoMIPS:              4395.56
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              56320K
NUMA node0 CPU(s):     0-21,44-65
NUMA node1 CPU(s):     22-43,66-87
