# CME 193 - Scientific Python
### Lecture 3 (4/12)
Spring 2016, Stanford University

## Administrative Details
* The room...sorry :(
* No class next Tuesday!
* Sign up for Piazza! http://piazza.com/stanford/spring2016/cme193
* We have a website! http://icme.github.io/cme193
* Thank you for doing the survey! 

## Last time

* Functions, docstrings, abstraction
* Lists


## Today
* Primative Data Structures
* Intro to Numpy!

# Data Structures

## Lists

* Group variables together
* Specific order
* Access items using square brackets: [ ]

**However, do not confuse a list with the mathematical notion of a vector.**

Quick recap...

In [6]:

data = [14.0, 16.4, 33.2, 11.5, 9.01] # measurements in inches
print 'data[0] = {}, data[-1] = {}'.format(data[0], data[-1])
# extremely pythonic!
data_feet = [x / 12.0 for x in data]
data_gt_12 = [x for x in data if x > 12.0]

data[0] = 14.0, data[-1] = 9.01


Another example: quicksort!

classic example of a **divide and conquer** algorithm

In [71]:
import random
def quicksort(A):
    if len(A) < 1:
        return []
    P = random.choice(A)
    L = [a for a in A if a < P]
    R = [a for a in A if a > P]
    return quicksort(L) + [P] + quicksort(R)

In [72]:
x = [4, 3, 6, 7, 1.2, 77, 32, 1]
qsort(x)

[1, 1.2, 3, 4, 6, 7, 32, 77]

## Tuples

Tuples are identical to lists except for one thing -- mutability. Let's look at some code!

In [25]:
tup = (1, 2, 3, 'luke')

# tuples support indexing in the same way as lists!

print 'tup[0] =', tup[0]
print 'tup[-2:] =', tup[-2:]

tup[0] = 1
tup[-2:] = (3, 'luke')


lets modify something!

```python
tup[1] = 9.8#??
```

In [26]:
tup[1] = 9.8

TypeError: 'tuple' object does not support item assignment

Voilà! We have that tuples cannot be changed via their own interface, and are therefore *immutable*. 

Why would we care about this, and why could this be useful?

There are some very weird behaviors, though, and one should be aware of them. Let's look at a tuple of lists...

In [27]:
# lets make a list here
a = [1, 3, 4]

# and make a tuple of lists...
tup = ([1, 4], a)
print tup

([1, 4], [1, 3, 4])


In [28]:
# lets modify the constituent list...
a[0] = 16
print a

[16, 3, 4]


what happens to the tuple?

In [29]:
tup

([1, 4], [16, 3, 4])

Tuples/lists allow for a very convenient thing called *unpacking*

In [30]:
tup = (3.3, 4.0, 7.1)

# -- unpack the values into variables (containers!)
x, y, z = tup

# this also works for functions!

def fancy_func(x):
    return x, x**2

X, Xsq = fancy_func(9)

print 'X = {}, X ** 2 = {}'.format(X, Xsq)

X = 9, X ** 2 = 81


## Dictionaries

A dictionary is a collection of key-value pairs. It is an *associative array*

An example: the keys are all words in the English language, and their corresponding values are the meanings.

Lists + Dictionaries = $$$

There are two main methods of creating dictionaries...

In [45]:
# method 1
d = {}
d[1] = "one"
d[2] = "two"

print 'd =', d

d = {1: 'one', 2: 'two'}


In [75]:
# method 2
e = {1: 'one', 'hello': True}
print 'e =', e

d = {1: 'one', 2: 'two'}
e = {1: 'one', 'hello': True}


Note that we can have *mixed types*. We have only one restriction on dictionaries -- the keys of a dictionary must be **immutable**

In [76]:
l = [1, 2, 3]

# -- error
d = {l : 4}

TypeError: unhashable type: 'list'

Keys to dictionaries must be unique! Old values get overwritten...

In [77]:
d = {'luke' : 'instructor', 'joe' : 'student'}
print 'd =', d 

d['luke'] = 'grad student'
print 'd =', d

d = {'luke': 'instructor', 'joe': 'student'}
d = {'luke': 'grad student', 'joe': 'student'}


There is a conceptually important note about dictionaries -- you can access values by their keys, but *not* the other way around! This is because values can be *mutable*.

Lets look at a dictionary with some more interesting data...

In [46]:
laptop = {
    'make' : 'apple', 
    'model' : 'MacBook Pro',
    'screen_size' : (15, 'in'),
    'age' : (3, 'yrs'),
    'memory' : (16, 'GB'),
    'storage' : (2, 'TB')
}

In [47]:
print 'keys:'
print laptop.keys()
print 'values:'
print laptop.values()

if 'model' in laptop.keys():
    print 'We know the laptop model'

keys:
['age', 'screen_size', 'storage', 'memory', 'model', 'make']
values:
[(3, 'yrs'), (15, 'in'), (2, 'TB'), (16, 'GB'), 'MacBook Pro', 'apple']
We know the laptop model


In [48]:
# lets combine enumeration, looping, and unpacking!

for i, (k, v) in enumerate(laptop.iteritems()):
    print 'Field #{}: {} = {}'.format(i + 1, k, v)

Field #1: age = (3, 'yrs')
Field #2: screen_size = (15, 'in')
Field #3: storage = (2, 'TB')
Field #4: memory = (16, 'GB')
Field #5: model = MacBook Pro
Field #6: make = apple


Dictionaries also offer comprehension!

Dictionary comprehension allows us to do pythonic dictionary creation.

In [7]:
people = ['luke', 'rob', 'sally', 'jen']
ages = [23, 44, 32, 25]
d = {name: age for (name, age) in zip(people, ages)}

print d

{'luke': 23, 'rob': 44, 'sally': 32, 'jen': 25}


## Sets

Our final data structure is the *Set*. Think of this exactly like the mathematical definition! An *unordered* collection of unique elements

In [8]:
basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana', 'apple'] # notice the duplicates!
fruit = set(basket)
print fruit

set(['orange', 'pear', 'apple', 'banana'])


Sets have very fast membership access...

In [9]:
if 'orange' in fruit:
    print 'We have oranges'

We have oranges


You can of this like a dictionary with only keys!

We also have set comprehension!

In [32]:
fruit = {item for item in basket if item != 'pear'}
print fruit

set(['orange', 'apple', 'banana'])


# Numpy

Ok, let's shift gears! Let's talk numpy. 

* Fundamental package for scientific computing with Python
* N-dimensional array object
* Linear algebra, Fourier transform, random number capabilities
* Building block for other packages (e.g. Scipy, scikit-learn)
* Open source, huge dev community!

## Installation

If you installed Python with `anaconda`, you should already have Python installed. To test if you have numpy already, go to your terminal or command prompt and type:

```bash
python -c 'import numpy'
```

If this does nothing, congrats! You have numpy. 

If the output looks something like this:

```bash
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named numpy
```

Then you don't...

To install numpy, simply go to your terminal and type 

```bash
pip install numpy
```

You may need to type 

```bash
sudo pip install numpy
```

and type your computer password if you get an error that says "blah blah permission denied blah blah".

## Why numpy?

A very common question people ask is "why can't I just use lists for math?"

Any ideas?

Here are a few reasons why not:

* Real vectors can be big!
* How to handle $n$ dimensions? If we have lists, there is no restriction. 
* How about very sparse data?
* *abstraction*! Something like $A = U\Sigma V^T$ is common enough that we want to encapsulate that.
* Speed

## A quick lesson on `import`ing in Python

There are 3 basic ways to import a package in Python.

* `from numpy import linspace`
* `import numpy as np`
* `import numpy`

Lets say you know that numpy has the function `linspace`. Here is how you access that function in each scenario:

* `linspace(...)`
* `np.linspace(...)`
* `numpy.linspace(...)`

Hurray! 

In [33]:
# -- let's jump in! The first thing to do is import numpy.
import numpy as np

In [34]:
A = np.array([[1, 2, 3], [4, 5, 6]]) 
print 'A =\n', A

Af = np.array([[1, 2, 3], [4, 5, 6]], float)
print '\nAf =\n', Af

A =
[[1 2 3]
 [4 5 6]]

Af =
[[ 1.  2.  3.]
 [ 4.  5.  6.]]


In [35]:
# -- numpy provides many ways to create arrays subject to mathematical constraints
print 'arange example =', np.arange(0, 1, 0.2)

print '\nlinspace example =', np.linspace(0, 2*np.pi, 4)

# -- a matrix of zeros
A = np.zeros((2,3))
print '\nzeros example =\n', A

print '\nA.shape =', A.shape ## a tuple!

arange example = [ 0.   0.2  0.4  0.6  0.8]

linspace example = [ 0.          2.0943951   4.1887902   6.28318531]

zeros example =
[[ 0.  0.  0.]
 [ 0.  0.  0.]]

A.shape = (2, 3)


In [36]:
# -- numpy provides routines for random array creation
print np.random.random((2,3))

[[ 0.47891651  0.73126373  0.96360067]
 [ 0.42954122  0.70769899  0.0043117 ]]


In [37]:
a = np.random.normal(loc=1.0, scale=2.0, size=(2,2))
print a

[[ 0.98219584 -2.09226372]
 [ 4.9280081  -1.95649594]]


In [38]:
# -- we can serialize!
np.savetxt("a_out.txt", a)
b = np.loadtxt("a_out.txt")

In [39]:
print 'a = \n', a
print 'b = \n', b

a = 
[[ 0.98219584 -2.09226372]
 [ 4.9280081  -1.95649594]]
b = 
[[ 0.98219584 -2.09226372]
 [ 4.9280081  -1.95649594]]


**NUMPY ARRAYS ARE MUTABLE**

Copy is by default *shallow*

In [40]:
A = np.zeros((2, 2))
C = A
C[0, 0] = 1
# what happens to A?

In [41]:
print A 

[[ 1.  0.]
 [ 0.  0.]]


In [42]:
# -- arrays are extremely flexible...
a = np.arange(10)
print 'a =', a

a = [0 1 2 3 4 5 6 7 8 9]


In [43]:
a = a.reshape((2,5))
print '\nafter reshape, a =\n', a


after reshape, a =
[[0 1 2 3 4]
 [5 6 7 8 9]]


In [44]:
print '\na.ndim =', a.ndim
print '\na.shape =', a.shape
print '\na.size =', a.size
print '\na.T =\n', a.T
print '\na.dtype =', a.dtype


a.ndim = 2

a.shape = (2, 5)

a.size = 10

a.T =
[[0 5]
 [1 6]
 [2 7]
 [3 8]
 [4 9]]

a.dtype = int64


Numpy has overloaded math operators

In [45]:
a = np.arange(4)
print 'a = ', a

a =  [0 1 2 3]


In [46]:
b = np.array([2, 3, 2, 4])
print 'b =', b

b = [2 3 2 4]


In [47]:
print 'a * b =', a * b 
print 'b - a =', b - a  
c = [2, 3, 4, 5]
print 'c =', c
print 'a * c =', a * c 
# if we want, we can also use +=, -=, *=, etc

a * b = [ 0  3  4 12]
b - a = [2 2 0 1]
c = [2, 3, 4, 5]
a * c = [ 0  3  8 15]


## Array Broadcasting

When operating on two arrays, numpy compares shapes. Two dimensions are compatible when:

* They are of equal size
* One of them is 1

What does this look like in a picture?

![bc](./nb-assets/img/broadcasting.png)

Array broadcasting also works with scalars

This also allows us to add a constant to a matrix or multiply a matrix by a constant

In [61]:
A = np.ones((3,3))
print 3 * A - 1

[[ 2.  2.  2.]
 [ 2.  2.  2.]
 [ 2.  2.  2.]]


Numpy gives us vector ops!

In [62]:
u = [1, 2, 3]
v = [1, 1, 1]

In [63]:
print 'np.inner(u, v) =', np.inner(u, v)

print 'np.outer(u, v) =\n', np.outer(u, v)

print 'np.dot(u, v) =', np.dot(u, v)

np.inner(u, v) = 6
np.outer(u, v) =
[[1 1 1]
 [2 2 2]
 [3 3 3]]
np.dot(u, v) = 6


More matrix operations

In [64]:
# first, some matricies
A = np.ones((3, 2))
print 'A.T =\n', A.T
B = np.ones((2, 3))
print 'B =\n', B

A.T =
[[ 1.  1.  1.]
 [ 1.  1.  1.]]
B =
[[ 1.  1.  1.]
 [ 1.  1.  1.]]


Which ones of these are valid?
```python
print 'np.dot(A, B) =\n', np.dot(A, B)

print 'np.dot(B, A) =\n', np.dot(B, A)

print 'np.dot(B.T, A.T) =\n', np.dot(B.T, A.T)

print 'np.dot(A, B.T) =\n', np.dot(A, B.T)
```

In [65]:
print 'np.dot(A, B) =\n', np.dot(A, B)

np.dot(A, B) =
[[ 2.  2.  2.]
 [ 2.  2.  2.]
 [ 2.  2.  2.]]


In [66]:
print 'np.dot(B, A) =\n', np.dot(B, A)

np.dot(B, A) =
[[ 3.  3.]
 [ 3.  3.]]


In [67]:
print 'np.dot(B.T, A.T) =\n', np.dot(B.T, A.T)

np.dot(B.T, A.T) =
[[ 2.  2.  2.]
 [ 2.  2.  2.]
 [ 2.  2.  2.]]


In [68]:
print 'np.dot(A, B.T) =\n', np.dot(A, B.T)

np.dot(A, B.T) =


ValueError: shapes (3,2) and (3,2) not aligned: 2 (dim 1) != 3 (dim 0)

In [69]:
# -- lets see what operations we can do across the axes of a matrix
a = np.random.random((2,3))
print 'a =\n', a

print '\na.sum() =', a.sum()

print '\na.sum(axis=0) =', a.sum(axis=0)

print '\na.cumsum() =', a.cumsum()

print '\na.cumsum(axis=1) =', a.cumsum(axis=1)

print '\na.min() =', a.min()

print '\na.max(axis=0) =', a.max(axis=0)


a =
[[ 0.43511615  0.11172763  0.83416375]
 [ 0.90216561  0.46259987  0.3366538 ]]

a.sum() = 3.08242682853

a.sum(axis=0) = [ 1.33728177  0.57432751  1.17081755]

a.cumsum() = [ 0.43511615  0.54684379  1.38100754  2.28317316  2.74577303  3.08242683]

a.cumsum(axis=1) = [[ 0.43511615  0.54684379  1.38100754]
 [ 0.90216561  1.36476549  1.70141929]]

a.min() = 0.111727634922

a.max(axis=0) = [ 0.90216561  0.46259987  0.83416375]


In [70]:
# -- arrays are like lists, they can be sliced!

a = np.random.random((4,5))
print 'a =\n', a
print '\na[2, :] =', a[2, :]
# third row, all columns
print '\na[1:3] =', a[1:3]
# 2nd, 3rd row, all columns
print '\na[:, 2:4] =', a[:, 2:4]
# all rows, columns 3 and 4

a =
[[ 0.25174034  0.70838223  0.38973372  0.87026975  0.09404772]
 [ 0.2076283   0.74603753  0.5586514   0.51783667  0.52946439]
 [ 0.25519206  0.06955869  0.50325101  0.96163732  0.81916982]
 [ 0.50184566  0.65783193  0.20502724  0.68126187  0.29254343]]

a[2, :] = [ 0.25519206  0.06955869  0.50325101  0.96163732  0.81916982]

a[1:3] = [[ 0.2076283   0.74603753  0.5586514   0.51783667  0.52946439]
 [ 0.25519206  0.06955869  0.50325101  0.96163732  0.81916982]]

a[:, 2:4] = [[ 0.38973372  0.87026975]
 [ 0.5586514   0.51783667]
 [ 0.50325101  0.96163732]
 [ 0.20502724  0.68126187]]


## Iterating

Iterating over multidimensional arrays is done with respect to the first axis: `for row in A`

One can loop over all elements with `for element in A.flat`

## Reshaping

Reshape using `reshape`. Total size must remain the same. For example, 
```python
a = np.arange(10).reshape((2,5))
```

## Linear Algebra

Start with `import numpy.linalg as la`

* `la.eye(3)`, Identity matrix
* `la.trace(A)`, Trace
* `la.column_stack((A,B))`, Stack column wise
* `la.row_stack((A,B,A))`, Stack row wise
* `la.qr`, Computes the QR decomposition
* `la.cholesky`, Computes the Cholesky decomposition
* `la.inv(A)`, Inverse
* `la.solve(A,b)`, Solves $Ax = b$ for $A$ full rank
* `la.lstsq(A,b)`, Solves $\arg\min_x \|Ax-b\|_2$
* `la.eig(A)`, Eigenvalue decomposition
* `la.eig(A)`, Eigenvalue decomposition for
symmetric or hermitian
* `la.eigvals(A)`, Computes eigenvalues.
* `la.svd(A, full)`, Singular value decomposition
* `la.pinv(A)`, Computes pseudo-inverse of A

## Random Numbers

Start with `import numpy.random as rng`

* `rng.rand(d0,d1,...,dn)`, Random values in a given shape
* `rng.randn(d0, d1, ...,dn)`, Random standard normal
* `rng.randint(lo, hi, size)`, Random integers `[lo, hi)`
* `rng.choice(a, size, repl, p)`, Sample from a
* `rng.shuffle(a)`, Permutation (in-place)
* `rng.permutation(a)`, Permutation (new array)
* Also, have parameterized distributions: `beta`, `binomial`, `chisquare`, `exponential`, `dirichlet`, `gamma`, `laplace`, `lognormal`, `pareto`, `poisson`, `power`...

## Next time...

...advanced Numpy and Scipy!