# NumPy Arrays

NumPy (module ``numpy``) provides an array datatype with vectorized operations (similar to Matlab or IDL)

In [1]:
import numpy as np

Create two NumPy arrays containing 5 elements each. The ``numpy`` module contains a number of functions for generating common arrays:

In [2]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [6]:
x[0:4]

array([0, 1, 2, 3])

In [7]:
y = np.zeros(5)
y

array([ 0.,  0.,  0.,  0.,  0.])

In [10]:
y.dtype

dtype('float64')

In [9]:
np.array([2,4,5.1,8,1])

array([ 2. ,  4. ,  5.1,  8. ,  1. ])

Operations are vectorized, so we can do arithmetic with arrays (as long as the dimensions match!) as we would with scalar variables.

In [12]:
(x - y) * 3

array([  0.,   3.,   6.,   9.,  12.])

In [14]:
x[0:2] - y

ValueError: operands could not be broadcast together with shapes (2,) (5,) 

Supports the same type of list operations as ordinary Python lists:

In [15]:
sorted(x - y * 3)

[0.0, 1.0, 2.0, 3.0, 4.0]

...except the data type must match! A NumPy array only holds values of a single data type.

* This allows them to be packed efficiently in memory like C arrays

In [16]:
x.dtype

dtype('int64')

## Speed comparison

Math with NumPy arrays is much faster and more intuitive than the equivalent native Python operations

Consider the function $y = 1.324\cdot a - 12.99\cdot b + 1$

In pure Python we would define:

In [17]:
def py_add(a, b):
    c = []
    for i in xrange(0,len(a)):
        c.append(1.324 * a[i] - 12.99*b[i] + 1)
    return c

Using NumPy we could instead define:

In [18]:
def np_add(a, b):
    return 1.324 * a - 12.99 * b + 1

Now let's create a couple of very large arrays to work with:

In [21]:
a = np.arange(10**6)
b = np.random.randn(10**6)
len(a)

1000000

Use the magic function ``%timeit`` to test the performance of both approaches.

In [22]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %install_default_config  %install_ext  %install_profiles  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %popd  %pprint  %precision  %profile  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%latex  %%

In [23]:
%timeit py_add(a,b)

1 loops, best of 3: 833 ms per loop


In [24]:
%timeit np_add(a,b)

100 loops, best of 3: 10.5 ms per loop
