# A Reduction Tale

> Objectives:
> * Compare operations taking place in different data containers
> * Compare sizes for these data containers
> * Help deciding when it is best to use a container or another

Let's suppose that we are going to need reductions a lot and we want to choose the best container for performing them.  First, let's start by activating our MemWatcher agent:

In [None]:
from ipython_memwatcher import MemWatcher
mw = MemWatcher()
mw.start_watching_memory()

and choose a different container for the data that we want to reduce, starting with a list:

## Regular lists

In [None]:
a = [float(i) for i in range(10*1000*1000)]

Now, proceed with a simple reduction (sum):

In [None]:
t = %timeit -o sum(a)

which, in MFLOPS (Mega-FloatingPointOps-Per-Second) is:

In [None]:
print("MFLOPS:", round((len(a) / t.best / 1e6), 1))

Ok, so that seems fast, but we don't have other references to compare with.  In addition, a list is not the best kind of container in terms of space consumption.  So let's try now a container that seems quite optimal in terms of memory savings.

## NumPy

In [None]:
import numpy as np

In [None]:
na = np.array(a, dtype=np.float64)

In [None]:
print("SIZE:", round((na.size * na.itemsize) / 2**20., 3))

We see that, with 8 bytes/element, NumPy is a very efficient container.

In [None]:
t = %timeit -o sum(na)

In [None]:
print("MFLOPS:", round(len(a) / t.best / 1e6, 3))

### Exercise

The performance for NumPy is several times slower than the computation with the list.  Why so?

*Hint: * We are using sum() which is a Python function.

### Solution

## Exercise

The speed in the above reduction is limited by memory speed, not CPU speed.  Could you provide a hint on the maximum memory speed that supports your laptop?

#### Solution

## Using compressed in-memory containers with bcolz

But let us suppose that we have really big data to process in our laptop and want to see if we can store our data in less space.  Enter compression:

In [None]:
import bcolz
bcolz.print_versions()
bcolz.defaults.cparams['cname'] = 'blosclz'
bcolz.defaults.cparams['clevel'] = 9
bcolz.defaults.cparams['shuffle'] = bcolz.SHUFFLE
bcolz.set_nthreads(4)

In [None]:
ca = bcolz.carray(na)

In [None]:
print("mem_used:", mw.measurements.memory_delta)

Ok, this time the amount of memory used seems much lower.  Also, bcolz containers can provide an estimation on how much memory they are taking; let's have a look:

In [None]:
ca

In this case we see that bcolz estimation is reasonably close to `ipython_memwatcher` measurements.  Let's have a look at the speed of the reduction:

In [None]:
t = %timeit -o ca.sum()
print("MFLOPS:", round(len(a) / t.best / 1e6, 3))

This is around 2~5x slower (depending on the machine) than a regular NumPy array, but the size of the array is an impressive 76x smaller.  But is compression the only responsible of the overhead?  Let's investigate a bit further.

## Using uncompressed containers with bcolz

In order to see if this is because of the compression overhead, let's use an uncompressed array:

In [None]:
cau = bcolz.carray(a, cparams=bcolz.cparams(clevel=0))

In [None]:
cau

In [None]:
t = %timeit -o cau.sum()
print("MFLOPS:", round(len(a) / t.best / 1e6, 3))

As we can see, the times with an uncompressed `carray` are between 1.5x and 2x faster than with a compressed one, so compressing is not the only source of the overhead.  The other source of the difference is the memory layout of the different containers (bcolz's carray data container layout is a bit more complex than NumPy).

So, while bcolz allows to use compressed in-memory data containers, this usually represents more cost in performance (compared with NumPy).  But sometimes you may prefer to keep more data in-memory and assume that the computations are going to be slower.

## Exercise

bcolz uses Blosc, a multithreaded meta-compressor, to do the compression under the hood.  Blosc can use different codecs, and each one has different behavior in terms of performance.  Given the next computation:

In [None]:
bcolz.defaults.cparams['cname'] = 'blosclz'
bcolz.defaults.cparams['clevel'] = 9
bcolz.defaults.cparams['shuffle'] = bcolz.SHUFFLE
bcolz.set_nthreads(4)
ca = bcolz.carray(na)
%timeit ca.sum()

Play with the different parameters and see:

1) Which provides the best compression

2) Which the fastest speed

3) The combination that strikes a good balance between compression and performance