# Intro to Numpy

Today

* IPython Slides

* Numpy!

* Vectorized numerical code

IPython Slides!!!

* `ipython nbconvert --to slides numpy.ipynb`

* Consider following along in the notebook so you can mess around with the code.

# Numpy

* Much of artificial intelligence amounts to working with points in $\mathbb{R}^n$ (points in $n$-dimensional space).

    * We need fast, convenient ways to work with $\mathbb{R}^n$.

In [3]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [4]:
from sklearn import datasets, neighbors
(easy_x, easy_y) = datasets.make_classification(n_samples=400, n_features = 2, n_informative = 2,
                             n_redundant = 0, n_repeated = 0, n_clusters_per_class=1, class_sep=2)
scatter(easy_x[:,0], easy_x[:,1], c = easy_y, cmap = 'cool')

ImportError: No module named scipy

In [None]:
# these two lines create a classifier and train the classifier.
n_neighbors = 10
nn_classifier = neighbors.KNeighborsClassifier(n_neighbors, 'distance')
nn_classifier.fit(easy_x, easy_y)

# These seven lines show how points in a grid would be classified
h = 0.05 # mesh size
x_min, x_max = easy_x[:, 0].min() - 1, easy_x[:,0].max() + 1
y_min, y_max = easy_x[:, 1].min() - 1, easy_x[:,1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = nn_classifier.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
figure()
pcolormesh(xx,yy, Z, cmap='cool')

# this line plots the training data
scatter(easy_x[:,0], easy_x[:,1], c=easy_y, cmap='cool')

How do we work with $\mathbb{R}^n$?

* We need *fast* $n$-dimensional vectors.

* We need *fast* $n \times m$ matrices.

* We need fast methods for doing I/O with vectors and matrices.

Numpy provides these three things, and more!

In [None]:
# What wrong with this code?
v1 = range(10000)
v2 = range(10000, 20001)
v3 = []

for x,y in zip(v1,v2):
    v3.append(x+y)
    
print len(v3), v3[0:5]

What's in numpy?

* A type for multidimensional arrays (`ndarray`)

* Fast operations on `ndarray`s

    * Implemented in C and/or Fortran for efficiency
    
* Code is easily "vectorized."
   
    * Functions operating on entire `ndarray`s.

# Importing numpy

In [None]:
import numpy as np

In [None]:
# Is this code better?
v1 = np.arange(10000)
v2 = np.arange(10000, 20000)
v3 = v1 + v2
print len(v3), v3[0:5]

We can convert array objects to numpy arrays

In [None]:
x = [1,2,3,4,5]
type(x)

In [None]:
npx = np.asarray(x)
type(npx)

We can create ranges, as in regular Python

In [None]:
xs = np.arange(0, 100, 2)
xs[0:10]

In [None]:
xs[-6:]

In [None]:
# Potentially problematic...
ys = np.arange(0.0, 1, 0.1)
ys

What's wrong with the above?

* Hint - do we always know how many elements we'll get back?

* Can we easily ensure that we'll always get the endpoints?

In [None]:
# Use linspace instead to specify a number of elements:
ys = np.linspace(0, 1, 10)
ys

There's also a `logspace`, which does what you would expect...

# Random Data

If I told you I was giving you random data, how could you check?

In [None]:
# uniformly random data
unifs = np.random.rand(10000)
hist(unifs)

In [None]:
nunif = np.random.randn(10000)
hist(nunif)

# "Default" Arrays

In [None]:
zero_vector = np.zeros(5)
print zero_vector

print

# note the extra parens!!!
zero_matrix = np.zeros((5,5))
print zero_matrix

In [None]:
ones_vector = np.ones(5)
print ones_vector

print 

ones_tensor = np.ones((3,2,2))
print ones_tensor

# Numpy arrays have a few important properties

In [None]:
# what does this imply about the nature of ndarrays?
print "element type, aka dtype: ", zero_matrix.dtype
print
print "number of dimensions: ", zero_matrix.ndim
print
print "vector shape: ", xs.shape
print "array shape: ", zero_matrix.shape
print
print "bytes per element: ", zero_matrix.itemsize
print
print "total bytes: ", zero_matrix.nbytes


# Indexing

In [None]:
# individual element access
zero_matrix[2,2] = 42
zero_matrix[1,2] = 13
print zero_matrix
print 
ones_tensor[1,1,1] = 5
print ones_tensor

In [None]:
# column access
print zero_matrix[:, 2]

# row access
print zero_matrix[1, :]

# we can modify rows and columns too
zero_matrix[:, 0] = 4
print
print zero_matrix

# Arithmetic

* We should try to avoid doing things element by element

* Numpy makes this pretty easy

In [None]:
zero_matrix + zero_matrix

In [None]:
ones_vector + ones_vector

What should happen if the arguments are different sizes?

In [None]:
ones_vector + 100

In [None]:
print zero_matrix
print
print zero_matrix + ones_vector

In [None]:
ones_tensor + ones_vector

How should multiplication work?

In [None]:
small_vec = np.array([1,2,3,4], dtype=float64)
print small_vec.shape
small_square = small_vec.reshape((2,2))
print small_square.shape
print small_square * small_square

What!?

# Linear Algebra

In [None]:
# Actual matrix-vector multiplication. Or vector-vector multiplication...
print np.dot(small_square, [1,0])
print np.dot(small_square, [0,1])

In [None]:
# still column vectors
print (np.dot(small_square, [1,0])).shape

# Logic

In [None]:
rands = np.random.random_integers(0, 10, 10)
rands

In [None]:
rands > 5

In [None]:
rands[rands > 5]

In [None]:
(rands > 5).any()

In [None]:
(rands > 0).all()

In [None]:
(rands >= 0).all()

# Vectorized Functions

In [None]:
# inputs and outputs are defined on the next subslide. How would you define them?
plot(inputs, outputs)

In [None]:
# what should be the type of x?
def sigm(x):
    return (1.0 / (1.0 + np.exp(-x)) )

inputs = np.linspace(-10, 10, 100)
outputs = sigm(inputs)