# Scientific Programming with numpy 

AKA: I miss MATLAB

AKA: Why is Python so slow?

numpy is a Python library that provides efficient implementations of a large number of functions commonly used in scientific programming.  

The numpy array object is numpy's workhorse.  As opposed to Python's list, the numpy array has a specific data type (e.g. float32, uint8, etc).  Almost all numpy operations are optimized c functions, which means they are typically much faster than Python code.

In [1]:
import numpy as np

# python list of integers
nums = [ 1, 2, 3, 4, 5 ]

# numpy array of integers
arr1 = np.array( nums )

print "arr1", arr1

arr1 [1 2 3 4 5]


## np.array operators

numpy arrays have some nice matlab-like operators

In [2]:
# add a scalar
arr2 = arr1 + 2
print "arr2", arr2

# element-wise addition
arr3 = arr1 + arr2
print "arr3", arr3

# element-wise multiplication
arr4 = arr1 * arr2
print "arr4", arr4

arr2 [3 4 5 6 7]
arr3 [ 4  6  8 10 12]
arr4 [ 3  8 15 24 35]


## np.array creation functions

In [3]:
# arange: create a sequence of integers
arr5 = np.arange(0, 10, 1)
print "arr5", arr5

# one two skip a few
arr6 = np.arange(0, 10, 2)
print "arr6", arr6

# linspace: subdivide a range of values
arr7 = np.linspace(0.0, 1.0, 5)
print "arr7", arr7

arr5 [0 1 2 3 4 5 6 7 8 9]
arr6 [0 2 4 6 8]
arr7 [ 0.    0.25  0.5   0.75  1.  ]


In [4]:
# zeros, ones
arr8 = np.zeros((4,4))
print "arr8", arr8

arr9 = np.ones((3,4)) * 2.0
print "arr9", arr9

# identity matrix
arr10 = np.eye(3)
print "arr10", arr10

# random values in [0,1]
arr11 = np.random.rand(4)
print "arr11", arr11

arr8 [[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
arr9 [[ 2.  2.  2.  2.]
 [ 2.  2.  2.  2.]
 [ 2.  2.  2.  2.]]
arr10 [[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]
arr11 [ 0.0990703   0.6391112   0.04176239  0.92494733]


## numpy data types

With numpy you have control over the data type of your arrays.  

In [5]:
# numpy will guess data types for you
arr = np.array([0.0, 1.0, 2.0])
print arr.dtype, arr

# or you can specify them 
arr = np.array([0.0, 1.0, 2.0], dtype=np.uint8)
print arr.dtype, arr

float64 [ 0.  1.  2.]
uint8 [0 1 2]


## common functions

In [6]:
# reshape
arr = np.arange(0,10,1)
print arr
arr = arr.reshape((2,5))            
print "reshaped", arr

# mean
print "mean", arr.mean()
print "mean by axis", arr.mean(1)

# sum
print "sum", arr.sum()
print "sum by axis", arr.sum(1)

# dot product
print "dot", np.dot([1,2,3],[4,5,6])

[0 1 2 3 4 5 6 7 8 9]
reshaped [[0 1 2 3 4]
 [5 6 7 8 9]]
mean 4.5
mean by axis [ 2.  7.]
sum 45
sum by axis [10 35]
dot 32


## array indexing

In [7]:
arr = np.arange(0,10,1).reshape((2,5))
print "arr", arr

# remember, indexing is from zero!
print "[0,2]", arr[0,2]

# grab an entire row or column
print "row 1", arr[1,:]
print "col 2", arr[:,2]

arr [[0 1 2 3 4]
 [5 6 7 8 9]]
[0,2] 2
row 1 [5 6 7 8 9]
col 2 [2 7]


## array masking

In [8]:
arr = np.arange(0,10,1).reshape((2,5))
print arr

# generate a logical mask of your array
mask = arr > 4
print mask

# now use it to modify the elements that match
arr[mask] += 100
print arr

[[0 1 2 3 4]
 [5 6 7 8 9]]
[[False False False False False]
 [ True  True  True  True  True]]
[[  0   1   2   3   4]
 [105 106 107 108 109]]
