# Using less RAM


### Objects for Primitives Are Expensive
A list with 100,000,000 items consumes approximately 760 MB of RAM, if the items are the same object. If we store 100,000,000 different items (e.g., unique integers), then we can expect to use gigabytes of RAM! Each unique object has a memory cost.

In [1]:
%load_ext memory_profiler
%memit [0]*int(1e8)

peak memory: 802.64 MiB, increment: 763.28 MiB


In [2]:
%memit # show how much RAM this process is consuming right now

peak memory: 39.80 MiB, increment: 0.05 MiB


In [3]:
%memit [n for n in xrange(int(1e8))]

peak memory: 3231.78 MiB, increment: 3191.97 MiB


In [4]:
%memit

peak memory: 172.89 MiB, increment: 0.02 MiB


In [5]:
%memit [n for n in xrange(int(1e8))] # reuse again, far more cheaply

peak memory: 3135.86 MiB, increment: 2962.94 MiB


### The Array Module Stores Many Primitive Objects Cheaply
The array module efficiently stores primitive types like integers, floats, and characters, but not complex numbers or classes. It creates a contiguous block of RAM to hold the underlying data.

In [7]:
import array
%memit array.array('l', xrange(int(1e8))) # much less than list

peak memory: 804.93 MiB, increment: 752.61 MiB


In [8]:
array?

The array module works with a limited set of datatypes with varying precisions. Choose the smallest precision that you need, so you allocate just as much RAM as needed and no more.

In [None]:
Type:        module
String form: <module 'array' from '/Users/nguyenkh/anaconda2/lib/python2.7/lib-dynload/array.so'>
File:        ~/anaconda2/lib/python2.7/lib-dynload/array.so
Docstring:  
This module defines an object type which can efficiently represent
an array of basic values: characters, integers, floating point
numbers.  Arrays are sequence types and behave very much like lists,
except that the type of objects stored in them is constrained.  The
type is specified at object creation time by using a type code, which
is a single character.  The following type codes are defined:

    Type code   C Type             Minimum size in bytes 
    'c'         character          1 
    'b'         signed integer     1 
    'B'         unsigned integer   1 
    'u'         Unicode character  2 
    'h'         signed integer     2 
    'H'         unsigned integer   2 
    'i'         signed integer     2 
    'I'         unsigned integer   2 
    'l'         signed integer     4 
    'L'         unsigned integer   4 
    'f'         floating point     4 
    'd'         floating point     8 

The constructor is:

array(typecode [, initializer]) -- create a new array

**numpy** has arrays that can hold a wider range of datatypes—you have more control over the number of bytes per item, and you can use complex numbers and datetime objects. A complex128 object takes 16 bytes per item: each item is a pair of 8-byte floating-point numbers. You can’t store complex objects in a Python array, but they come for free with numpy.

In [10]:
import numpy as np
%memit arr=np.zeros(int(1e8), np.complex128)

peak memory: 58.13 MiB, increment: 0.08 MiB


In [14]:
arr.size # same as len(arr)

100000000

In [12]:
arr.nbytes

1600000000

In [13]:
arr.nbytes/arr.size # bytes per item

16

In [15]:
arr.itemsize # another way of checking

16

- As a regular integer it takes 24 bytes (the object has a lot of overhead), and as a long integer it consumes 36 bytes
- We can do the same check for byte strings. An empty string costs 37 bytes, and each additional character adds 1 byte to the cost
- When we use a list we see different behavior. getsizeof isn’t counting the cost of the contents of the list, just the cost of the list itself. An empty list costs 72 bytes, and each item in the list takes another 8 bytes on a 64-bit laptop
- getsizeof only reports some of the cost, and often just for the parent object. As noted previously, it also isn’t always implemented, so it can have limited usefulness.
- We should use %memit for real value

## Bytes vs Unicode
One of the compelling reasons to switch to Python 3.3+ is that Unicode object storage is significantly cheaper than it is in Python 2.7. If you mainly handle lots of strings and they eat a lot of RAM, definitely consider a move to Python 3.3+. You get this RAM saving absolutely for free.

In [16]:
# Python 2
%memit b"a" * int(1e8)

peak memory: 152.47 MiB, increment: 94.28 MiB


In [17]:
%memit u"a" * int(1e8)

peak memory: 334.71 MiB, increment: 182.23 MiB


the costs of the byte and Unicode versions of an ASCII character are the same, and that using a non-ASCII character (sigma) the memory usage only doubles—this is still better than the Python 2.7 situation.

In [None]:
# Python 3
%memit b"a" * int(1e8)

In [None]:
peak memory: 91.77 MiB, increment: 71.41 MiB

In [None]:
%memit u"a" * int(1e8)

In [None]:
peak memory: 91.54 MiB, increment: 70.98 MiB

In [None]:
%memit u"Σ" * int(1e8)

In [None]:
peak memory: 174.72 MiB, increment: 153.76 MiB

## Suggest for using less RAM
- Generally, if you can avoid putting it into RAM, do. Everything you load costs you RAM. You might be able to load just a part of your data, for example using a memory-mapped file; alternatively, you might be able to use generators to load only the part of the data that you need for partial computations rather than loading it all at once.

- If you are working with numeric data, then you’ll almost certainly want to switch to using numpy arrays—the package offers many fast algorithms that work directly on the underlying primitive objects. The RAM savings compared to using lists of numbers can be huge, and the time savings can be similarly amazing.

- In Python 3, range is generator, in Python 2, use xrange instead

- If you’re working with lots of bit strings, investigate numpy and the bitarray package; they both have efficient representations of bits packed into bytes. You might also benefit from looking at Redis, which offers efficient storage of bit patterns.