# NumPy Arrays

NumPy (module ``numpy``) provides an array datatype with vectorized operations (similar to Matlab or IDL)

In [1]:
import numpy as np

Create two NumPy arrays containing 5 elements each. The ``numpy`` module contains a number of functions for generating common arrays:

In [2]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [3]:
y = np.ones(5)
y - 99

array([-98., -98., -98., -98., -98.])

Operations are vectorized, so we can do arithmetic with arrays (as long as the dimensions match!) as we would with scalar variables.

In [4]:
x - (y+0.005) * 3

array([-3.015, -2.015, -1.015, -0.015,  0.985])

In [5]:
np.array([3,3,"string",5,5])

array(['3', '3', 'string', '5', '5'], 
      dtype='<U21')

In [6]:
_[3] * 5

'55555'

Supports the same type of list operations as ordinary Python lists:

In [7]:
sorted(x - y * 3)

[-3.0, -2.0, -1.0, 0.0, 1.0]

...except the data type must match! A NumPy array only holds values of a single data type.

* This allows them to be packed efficiently in memory like C arrays

In [8]:
y.dtype

dtype('float64')

In [9]:
z = np.array([5,6.66666666666,7,8,9], dtype=np.float128)
z

array([ 5.0,  6.6666667,  7.0,  8.0,  9.0], dtype=float128)

## Speed comparison

Math with NumPy arrays is much faster and more intuitive than the equivalent native Python operations

Consider the function $y = 1.324\cdot a - 12.99\cdot b + 1$

In pure Python we would define:

In [10]:
def py_add(a, b):
    c = []
    for i in xrange(0,len(a)):
        c.append(1.324 * a[i] - 12.99*b[i] + 1)
    return c

Using NumPy we could instead define:

In [11]:
def np_add(a, b):
    return 1.324 * a - 12.99 * b + 1

Now let's create a couple of very large arrays to work with:

In [12]:
a = np.arange(1000000)
b = np.random.randn(1000000)
len(a)

1000000

In [13]:
b[0:20]

array([ 1.25413855,  0.82177968,  0.98364116,  1.58641217,  0.31157737,
       -0.7148944 , -1.27339215,  0.44504308,  1.58770747,  0.36002879,
       -0.01865808, -0.31815458, -0.20956559, -1.19350768, -0.32606531,
       -0.31753751, -0.58692554,  0.4906943 , -0.11437693,  0.90268728])

Use the magic function ``%timeit`` to test the performance of both approaches.

In [14]:
%timeit py_add(a,b)

NameError: name 'xrange' is not defined

In [None]:
%timeit np_add(a,b)