#A Tale of a Reduction

> Objectives:
> * Compare operations taking place in different data containers
> * Compare sizes for these data containers
> * Help deciding when it is best to use a container or another

Let's suppose that we are going to need reductions a lot and we want to choose the best container for performing them.  First, let's start by activating our MemWatcher agent:

In [None]:
from ipython_memwatcher import MemWatcher
mw = MemWatcher()
mw.start_watching_memory()

and choose a different container for the data that we want to reduce, starting with a list:

##Regular lists

In [None]:
a = [i for i in range(1000*1000)]

Now, proceed with a simple reduction (sum):

In [None]:
t = %timeit -o sum(a)

which, in MIPS (Mega-Instructions-Per-Second) is:

In [None]:
print("MIPS:", round(1e6 / t.best / 1e6, 1))

Ok, so that seems fast, but we don't have other references to compare with.  In addition, a list is not the best kind of container in terms of space consumption.  So let's try now a container that seems quite optimal in terms of memory savings.

##Containers using the array module in Python

In [None]:
# Create an array of 'l'ong integers (8 bytes on 32-bit platforms)
import array
arr = array.array('l', a)

7.7 MB vs 31 MB seems like a good deal.  In fact, the array module seems to provide optimal containers from a memory consumption point of view:

In [None]:
# Size per element:
(mw.memory_delta * 2**20) / 1e6

But how it performs during reductions?

In [None]:
t = %timeit -o sum(arr)

In [None]:
print("MIPS:", round(1e6 / t.best / 1e6, 1))

Well, that's a bit disappointing, as the array object performs up to 2x slower than a regular array.  Not sure about the resons, but probably the array module is not getting too much attention performance-wise mainly because the NumPy existance.  Speaking of NumPy: here we go!

##NumPy

In [None]:
import numpy as np

In [None]:
na = np.array(a, dtype=np.int64)

We see that, with 8 bytes/element, NumPy is also an efficient container.

In [None]:
t = %timeit -o sum(na)

In [None]:
print("MIPS:", round(1e6 / t.best / 1e6, 3))

Oops, this is more than several times slower than the `array` module.  Why so?

**Answer:** NumPy has a lot of overhead in producing a Python integer for every element in the array.

*Hint:* Use internal methods (ufuncs) when possible.

In [None]:
t = %timeit -o na.sum()

In [None]:
print("MIPS:", round(1e6 / t.best / 1e6, 3))

This is more than 100x the speed of sum() on a Python list and it is also pretty optimal in terms of both execution time and space consumed. 

But let us suppose that we have really big data to process in our laptop and want to see if we can store our data in less space.  Enter compression:

##Using compressed in-memory containers with bcolz

In [None]:
import bcolz

In [None]:
ca = bcolz.carray(na)

Why so much memory consumption?  This is an artifact of the OS memory subsystem and is probably OS dependent.  Let's try again and create a new carray:

In [None]:
ca2 = bcolz.carray(na)

In [None]:
print("mem_used:", mw.measurements.memory_delta)

Ok, this time the amount of memory used seems much lower.  Let's see how much memory the container thinks it has:

In [None]:
ca

In [None]:
t = %timeit -o ca.sum()
print("MIPS:", round(1e6 / t.best / 1e6, 3))

This is around 3~4x slower than a regular NumPy array, but the size of the array is an impressive 20x smaller.  Is compression the responsible of the  overhead?

## Using uncompressed containers with bcolz

In order to see if this is because of the compression overhead, let's use an uncompressed array:

In [None]:
cau = bcolz.carray(a, cparams=bcolz.cparams(clevel=0))

In [None]:
cau2 = bcolz.carray(a, cparams=bcolz.cparams(clevel=0))

In [None]:
cau

In [None]:
t = %timeit -o cau.sum()
print("MIPS:", round(1e6 / t.best / 1e6, 3))

As we can see, the times with an uncompressed `carray` are very close to a compressed one, so compressing is not the source of the overhead.

So, bcolz allows to use compressed in-memory data containers at the cost of a performance overhead for this case.  But this overhead is not always a problem, and sometimes you prefer to keep more data in-memory.  In another hand, we are going to see that bcolz can be competitive with NumPy performance wise in other cases.

### Exercise: Using bcolz in real scenarios

bcolz does not get good compression ratios only with synthetic data, but with real data too.  Be sure to check out this URL:

http://nbviewer.ipython.org/gist/alimanfoo/e93a532eb6bde311ea39/genotype_bitshuffle.ipynb

and let's discuss this specific case of bcolz usage in genomics:

* Which are the typical compression ratios for this case?

* Is there a difference in speed accessing data in compressed and non-compressed state (clevel=0)

* Which are the compressors achieving the best compression/speed ratio?