## NumPy 

NumPy is the fundamental package for scientific computing with Python. It contains among other things:
* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities

### NumPy Arrays

Python has lists of data. You've used these already. NumPy offers its own way to do arrays. Why you would want to use NumPy arrays:
* All elements are same datatype (all integers, or floats, etc). Python lists can be a mixture of different types, making it slower.
* Multidimensional arrays
* Many ways to create, many ways to operate
* NumPy array operations are *fast* (written in compiled language). When possible, AVOID LOOPS over elements-- use NumPy operations when possible

In general, the specialized nature of NumPy arrays allows for them to be optimized, and therefore fast. This is important if you're doing scientific computing and working with large data.

In [2]:
import numpy as np

Note, you can also do `from numpy import *` -- this will mean you don't have to write `np.` in front of every command.  It can be dangerous to import so many commands into the main namespace, but it is much simpler to code this way.

#### Making arrays

In [4]:
a = np.arange(20).reshape(4,5)

In [5]:
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [6]:
print a.ndim
print a.shape
print a.size
print a.dtype

2
(4, 5)
20
int64


Turning a list into an array

Creating a 2d array via a list of lists:

In [7]:
a = [[2.,3,4],[5,6,7]]

In [8]:
a

[[2.0, 3, 4], [5, 6, 7]]

In [9]:
anp = np.array(a)

In [10]:
anp

array([[ 2.,  3.,  4.],
       [ 5.,  6.,  7.]])

What is the difference?  You can do math with numpy arrays, not with ordinary python lists.

In [11]:
print a**2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [12]:
print anp**2

[[  4.   9.  16.]
 [ 25.  36.  49.]]


More ways to make arrays:

In [13]:
np.zeros((2,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [14]:
np.ones((2,3))

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [15]:
np.empty((2,3)) #careful, unitialized, don't use this

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

## Array arithmetic

In [16]:
a =  np.arange(6).reshape(2,3)

In [17]:
a

array([[0, 1, 2],
       [3, 4, 5]])

In [18]:
b = np.array([[4.,5,6], [7,6,9]] )

In [19]:
b

array([[ 4.,  5.,  6.],
       [ 7.,  6.,  9.]])

In [20]:
a * b #elementwise product

array([[  0.,   5.,  12.],
       [ 21.,  24.,  45.]])

In [21]:
a.dot(np.transpose(b)) #matrix dot product

array([[ 17.,  24.],
       [ 62.,  90.]])

In [22]:
a *= 3 #multiply a by 3, in place

In [23]:
a

array([[ 0,  3,  6],
       [ 9, 12, 15]])

In [24]:
a + b #matrix addition

array([[  4.,   8.,  12.],
       [ 16.,  18.,  24.]])

In [25]:
a + 1 #scalar addition

array([[ 1,  4,  7],
       [10, 13, 16]])

In [26]:
a.sum()

45

In [27]:
a.sum(axis = 0)

array([ 9, 15, 21])

In [28]:
a.sum(axis = 1)

array([ 9, 36])

## slicing / indexing

In [29]:
a =  np.arange(6).reshape(2,3)

In [30]:
a[0,:]

array([0, 1, 2])

In [31]:
a[1:, 1:]

array([[4, 5]])

In [32]:
a.flatten()

array([0, 1, 2, 3, 4, 5])

In [33]:
a?

In [None]:
#can also do help(a)

## copying arrays

In [34]:
a =  np.arange(6).reshape(2,3)

In [35]:
a

array([[0, 1, 2],
       [3, 4, 5]])

In [36]:
b = a

In [37]:
b *= 2

In [38]:
a

array([[ 0,  2,  4],
       [ 6,  8, 10]])

**watch out** -- b and a point to the same object in memory.

In [39]:
b is a

True

To copy, use copy():

In [40]:
a =  np.arange(6).reshape(2,3)

In [41]:
c = a.copy()

In [42]:
c[0, 1] = -1

In [43]:
c

array([[ 0, -1,  2],
       [ 3,  4,  5]])

In [44]:
a

array([[0, 1, 2],
       [3, 4, 5]])

You can also create a "shallow" copy, which gives a new "view" but doesn't copy data over.  Useful if you are short on memory.

In [45]:
d = a[:]

In [46]:
d.shape = (6)

In [47]:
d

array([0, 1, 2, 3, 4, 5])

Now if we modify d, we also modify a

In [48]:
d[2] = -1

In [49]:
a

array([[ 0,  1, -1],
       [ 3,  4,  5]])

## Stacking arrays

In [50]:
a =  np.arange(6).reshape(2,3)

In [51]:
np.vstack((a,a))

array([[0, 1, 2],
       [3, 4, 5],
       [0, 1, 2],
       [3, 4, 5]])

In [52]:
np.hstack((a,a))

array([[0, 1, 2, 0, 1, 2],
       [3, 4, 5, 3, 4, 5]])

## Boolean and `where` operations

In [53]:
a =  np.arange(6).reshape(2,3)

Let's set all values above 3 to zero.

In [54]:
a > 3

array([[False, False, False],
       [False,  True,  True]], dtype=bool)

In [55]:
a[a > 3] = 0

In [56]:
a

array([[0, 1, 2],
       [3, 0, 0]])

Success.  Now get a shorter array that matches some condition:

In [57]:
a =  np.random.random((2,3))

In [58]:
a

array([[ 0.82188953,  0.60211425,  0.73756468],
       [ 0.10959709,  0.4841543 ,  0.71186541]])

In [59]:
np.where(a > .5)

(array([0, 0, 0, 1]), array([0, 1, 2, 2]))

In [67]:
a[np.where((a > .5) & (a < .9))]


array([ 0.82188953,  0.60211425,  0.73756468,  0.71186541])

## Speed

In [68]:
a = np.random.random(1000)

Let's use the builtin `sum` command

In [69]:
%timeit np.sum(a)

The slowest run took 19.19 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 6.98 µs per loop


In [70]:
def mysum(input):
    output = 0.
    for i in range(len(input)):
        output += input[i]
    return output

In [71]:
%timeit mysum(a)


1000 loops, best of 3: 246 µs per loop


**Lesson: avoid for loops over arrays of numbers as much as possible** 
Do operations with the entire array at once whenever you can.