# Introduction to Memory Profiling

> Objectives:
> * Be introduced to memory profiling using different tools
> * Some small introduction to time profiling in IPython too


## ipython_memwatcher

Our recommended way to profile memory consumption for this tutorial will be [ipython_memwatcher](https://pypi.python.org/pypi/ipython_memwatcher):


In [1]:
from ipython_memwatcher import MemWatcher
mw = MemWatcher()
mw.start_watching_memory()

In [1] used 0.000 MiB RAM in 0.001s, peaked 0.000 MiB above current, total RAM usage 43.398 MiB


In [2]:
# Let's create a big object
a = [i for i in range(1000*1000)]

In [2] used 39.465 MiB RAM in 0.032s, peaked 0.000 MiB above current, total RAM usage 82.863 MiB


In [3]:
# Get some measurements from the last executed cell:
meas = mw.measurements
meas

Measurements(memory_delta=39.46484375, time_delta=0.03173661231994629, memory_peak=0, memory_usage=82.86328125)

In [3] used 0.344 MiB RAM in 0.013s, peaked 0.000 MiB above current, total RAM usage 83.207 MiB


In [4]:
# MemWatcher.measurements is a named tuple.  We can easily get info out of it:
meas.memory_delta

39.46484375

In [4] used 0.000 MiB RAM in 0.002s, peaked 0.000 MiB above current, total RAM usage 83.207 MiB


In [5]:
# This takes between 32 ~ 40 bytes per element:
meas.memory_delta * (2**20) / 1e6

41.381888

In [5] used 0.000 MiB RAM in 0.002s, peaked 0.000 MiB above current, total RAM usage 83.207 MiB


In [6]:
# What are these elements made of?
type(a[0])

int

In [6] used 0.000 MiB RAM in 0.002s, peaked 0.000 MiB above current, total RAM usage 83.207 MiB


In [7]:
# How much memory take an int?
# Beware: the size below will depend on whether you are using Python 2 or Python 3.
# Here Python 3 is assumed.
import sys
sys.getsizeof(a[1])

28

In [7] used 0.000 MiB RAM in 0.002s, peaked 0.000 MiB above current, total RAM usage 83.207 MiB


Ok.  On 64-bits, that means that the int object allocates 4 bytes for the integer values, and 24 bytes for other metadata (Python object).  But 28 is quite less than ~40.  Where this overhead comes from?

Well, it turns out that the list structure needs additional pointers to reference the different components.

![List diagram](http://www.brpreiss.com/books/opus7/html/img579.gif)

## memory_profiler

[memory_profiler](https://pypi.python.org/pypi/memory_profiler) is a basic module for memory profiling that many others use (like the `ipython_memwatcher` above) and it interacts well with ipython, so it is worth to see how it works:

In [8]:
%load_ext memory_profiler

In [8] used 0.082 MiB RAM in 0.001s, peaked 0.000 MiB above current, total RAM usage 83.289 MiB


In [9]:
# Use %memit magic command exposed by memory_profiler
%memit b = [i for i in range(1000*1000)]

peak memory: 156.91 MiB, increment: 73.61 MiB
In [9] used 38.984 MiB RAM in 0.202s, peaked 0.000 MiB above current, total RAM usage 122.273 MiB


Please note that the `peak_memory` in this case is different than the `peaked_memory` reported by ipython_memwatcher package.

## %time and %timeit

In [10]:
# IPython provides a magic command to see how much time a command takes to run
%time asum = sum(a)

CPU times: user 12 ms, sys: 0 ns, total: 12 ms
Wall time: 10.2 ms
In [10] used 0.008 MiB RAM in 0.011s, peaked 0.000 MiB above current, total RAM usage 122.281 MiB


Note that `%time` offers quite detailed statistics on the time spent.

Also, the time reported by MemoryWatcher has a typical overhead of 1~5 ms over the time reported by %time, so when the times to measure are about this order then it is better to rely on the %time (or %timeit below) values.  

In [11]:
# We have another way to measure timings doing several loops and getting the mean
%timeit bsum = sum(a)
# However, one must notice that %timeit does not return the result of expressions

100 loops, best of 3: 6.57 ms per loop
In [11] used 0.020 MiB RAM in 2.706s, peaked 0.000 MiB above current, total RAM usage 122.301 MiB


Interestingly, %timeit allows to retrieve the measured times in loops with the -o flag:

In [12]:
t = %timeit -o sum(a)
print(t.all_runs)
print(t.best)

100 loops, best of 3: 6.57 ms per loop
[0.6601571469800547, 0.659801764995791, 0.6567317120498046]
0.006567317120498046
In [12] used 0.078 MiB RAM in 2.715s, peaked 0.000 MiB above current, total RAM usage 122.379 MiB


And one can specify the number of loops (-n) and the number of repetitions (-r):

In [13]:
t = %timeit -r1 -n1 -o sum(a)
print(t.all_runs)
print(t.best)

1 loop, best of 1: 7.28 ms per loop
[0.007284534978680313]
0.007284534978680313
In [13] used 0.000 MiB RAM in 0.009s, peaked 0.000 MiB above current, total RAM usage 122.379 MiB


### Exercise 1

Provided a dictionary like:

```
d = dict(("key: %i"%i, i*2) for i in a)
```

Try to guess how much RAM it uses using the techniques introduced above.

Why do you think it takes more space than a list?

*Hint*: Every entry in a dictionary has pointers to two objects: key and value. 

## Solution

In [14]:
d = dict(("key: %i"%i, i*2) for i in a)

In [14] used 151.723 MiB RAM in 0.428s, peaked 0.000 MiB above current, total RAM usage 274.102 MiB


In [15]:
# Compute the size of key + value
sys.getsizeof("key: 100000") + sys.getsizeof(1)

88

In [15] used 0.195 MiB RAM in 0.002s, peaked 0.000 MiB above current, total RAM usage 274.297 MiB


So, `sys.getsizeof()` is telling us that the keys + values take ~90 MB.  However, our `MemWatcher` instance is reporting just ~150 MB, so that means that the dict structure takes around ~60 MB.  The take away lesson is: do not assume that the consumption is just in the elements; the data container can also take quite a bit of memory!

### Exercise 2

Provided the next operation:

In [16]:
import numpy as np

In [16] used 11.656 MiB RAM in 0.074s, peaked 0.000 MiB above current, total RAM usage 285.953 MiB


In [17]:
x = np.ones(int(1e8)); y = np.zeros(int(1e8)); z = np.arange(int(1e8))

In [17] used 1525.914 MiB RAM in 0.164s, peaked 0.000 MiB above current, total RAM usage 1811.867 MiB


In [18]:
w = x * y + z

In [18] used 763.129 MiB RAM in 0.316s, peaked 727.883 MiB above current, total RAM usage 2574.996 MiB


Explain why the peak memory consumption is so high.

### Solution

In this case NumPy requires a temporary for keeping the result of the `x * y` computation.  This is what the peak memory is measuring in this case.