# PNI Biomath Bootcamp 2016 -- Programming module -- Day 4


### Items to be covered

1.Brief revisit of sieve of Eratosthenes. 
    * Setting debugging break points?
2. Matrix problems from yesterday -- adding, multiplying scalars, adding matrices, matrix multiplication
3. Slicing and indexing 
    * Colon notation slicing.
    * indexing with boolean statements
4. Dictionaries; saving and loading files with numpy.savez
    * finding out what variables there are in a file
5. Displaying an image; Subplots
      Arbitrary placing of axes?
6. beerncode problem from yestarday

In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as img
import matplotlib.colors as colors

def printvar(var, str):
    """shorthand for printing out a string (typically the name of a variable) and a variable
    
    PARAMETERS:
        var      an expression
        str       a string
        
    RETURNS:  none        
    """
    print("\n", str, " = \n", var, "\n")

---

# (Involuntary but useful) example of debugging: sieve of eratosthenes

* Use the keystrokes < Ctrl \>-M then L to toggle seeing lines numbers in cells -- very useful for seeing where in the code you are when the debugger says you are in line 30, for example!
* Use the line

    `import pdb; pdb.set_trace()`

    to set a breakpoint. There's a "break linenumber" command in the debugger but for some reason it didn't work for me; the above line did.


In [None]:
# %%debug

import numpy as np

Ncols = 10
Nrows = 5  # this means we'll go up to 50 in checking for primes 

M = np.array(range(1, Ncols*Nrows+1))
M = np.reshape(M, [Nrows, Ncols])


# The bug that we had in class was that I was using the variable "myrow" to index the row of the current
# number being considered as a prime; but then I was using the same variable name "myrow" to find the row
# of the multiple that was going to be crossed off!  That confused the two things completely!  
#      The bug has been left in below. See if you can fit it!

verbose = False
for myrow in range(0, Nrows):
    for mycolumn in range(0, Ncols):
        # The two loops above mean that we'll go through all the rows and all the columns,
        # going first through all the columns of the first row; then all cols of the second row; etc.

        mynumber = M[myrow, mycolumn]
        if verbose:
            print("row = ", myrow, " col = ", mycolumn, " value = ", mynumber)
        if mynumber!=1 and mynumber>0:   # Don't look at multiples of 1; and only look at
                                         # non-negatives, that means it hasn't been crossed off                
            if verbose:
                print("mynumber=", mynumber)
            mult = 2*mynumber  # we'll start by crossing off twice the number
            while mult <= np.size(M):  # only cross multiples off if they're within the matrix
                if verbose:
                    print("   mynumber=", mynumber, "  crossing off mult=", mult)
                myrow = int(np.floor((mult-1)/Ncols))
                mults_col = (mult-1)%Ncols
                if M[myrow,mults_col] > 0:   # only cross it off if it hadn't yet been crossed off
                    M[myrow,mults_col] = -M[myrow,mults_col]  # that means "crossed off"

                mult = mult + mynumber  # then go to the next multiple
                # if mult > np.size(M):
                    # import pdb; pdb.set_trace()
               
#
print(M)
print("primes = ", M[M>0])

In [None]:
nrows = 3
ncols = 4
a = np.reshape(np.array(range(1, nrows*ncols+1)), [nrows, ncols])
b = np.random.randn(nrows, ncols)
printvar(a, "a")
printvar(b, "b")

In [None]:
printvar(a+1000, "a+1000")

In [None]:
printvar(a, "a")
printvar(a*2, "a*2")
printvar(np.sin(a), "np.sin(a)")

In [None]:
printvar(a, "a")
printvar(b, "b")
printvar(a+b, "a+b")

a.reshape([2,6]) + b

In [None]:
printvar(a, "a")
printvar(b, "b")
printvar(a*b, "a*b")

### From the problem set: Linear algebra rules for vector and matrix multiplication and addition.
**Addition and subtraction:** Addition and subtraction of two matrices are just element-by-element
addition and subtraction. For this to make sense, the two matrices must
be the same size, i.e., have the same number of rows and the same number of
columns.

**Multiplication:** Under linear algebra rules, a row vector (arranged horizontally) times
a column vector (arranged vertically) means: “multiply element-by-element; then
sum it all”. Thus, it produces a single number (i.e., a *scalar*). For example,

$$[1 \; 3 \; 4] * 
\begin{align}
    \begin{bmatrix}
       2 \\
       3 \\
       2
    \end{bmatrix}
\end{align}
= 1*2 + 3*3 + 4*2 = 19$$

When matrix $A$ has $A_r$ rows and $A_c$ columns, and matrix $B$ has $B_r$ rows and $B_c$
columns, you can multiply them *if and only if* the number columns of $A$ equals the
number of rows of $B$ (that is, $A_c = B_r$). Lets call their product the matrix $C$:

$$C = A * B$$

The rules are that the element in the $r^{th}$ row and $c^{th}$ column of $C$ will be equal to multiplying the
$r^{th}$ row of $A$ times the $c^{th}$ column of $B$. The matrix $C$ will have $A_r$ rows and $B_c$ columns.

---

---
**2)** Let 

$$a = \begin{bmatrix}1 & 4 & 5 & 2 \end{bmatrix} \quad\quad 
  b = \begin{bmatrix}2 & 3 & 2 & 1 \end{bmatrix} \quad\quad
  c = \begin{bmatrix}1 & 1 \end{bmatrix}$$

Remember that `.T` means *transpose*: turns rows into columns, and vice-versa. First compute by hand, and then confirm using NumPy, the value of $a*b^T$, $b*a^T$, $a^T*b$, and $a*c^T$.

(Do all of these operations make sense? If not, which ones don't, and why?)

In [None]:
a = np.array([[1, 4, 5, 2]])
b = np.array([[2, 3, 2, 1]])

print(a * b.T)
print(b.T @ a)

**3)** Add the following sets of matrices by hand, then confirm your answers
using NumPy.

**(a)**
$$
A=
  \begin{bmatrix}
    2 & 4 \\
    1 & 5
  \end{bmatrix} \quad
B=
  \begin{bmatrix}
    1 & 8 \\
    1 & 4
  \end{bmatrix}
$$

**(b)**
$$
A=
  \begin{bmatrix}
    5 & 1 & 3 & 1 \\
    6 & 0 & 8 & 5
  \end{bmatrix} \quad
B=
  \begin{bmatrix}
    4 & 2 & 9 & 0 \\
    3 & 8 & 9 & 2
  \end{bmatrix}
$$

**5)** Multiply the following sets of matrices by hand, then confirm your answers using
NumPy. Is matrix multiplication commutative (i.e. does $AB=BA$)?

**(a)**
$$
A=
  \begin{bmatrix}
    5 & 3 \\
    8 & 9
  \end{bmatrix} \quad
B=
  \begin{bmatrix}
    1 & 9 \\
    4 & 4
  \end{bmatrix}
$$

In [None]:
A3 = np.array([[5,3],[8,9]])
B3 = np.array([[1,9],[4,4]])

In [None]:
print(A3 @ B3)
print(B3 @ A3)
print("No, matrix multiplication is not commutative")
#</answer>

**(b)**
$$
A=
  \begin{bmatrix}
    2 & 4 \\
    0 & 1 \\
    3 & 2 \\
    2 & 7
  \end{bmatrix} \quad
B=
  \begin{bmatrix}
    9 & 5 & 1 \\
    8 & 4 & 8
  \end{bmatrix}
$$

In [None]:
#<answer>
A4 = np.array([[2,4],[0,1],[3,2],[2,7]])
B4 = np.array([[9,5,1],[8,4,8]])
print(A4 @ B4)
#</answer>

---
# Looking into matrix subparts: Indexing and slicing

## Colon notation

For each dimension of a matrix, the indices to be used can be specified using colon notation,

    [start_index default=0] : [stop_index (not included) default=last] [: [step default=1]]
    
all of which are optional (that's what the square brackets mean). Thus

    :-3
    
means start at 0, go up to but not including the third from the end, in steps of 1. (Remember, the minus means backwards from the end)

In [None]:
a = np.array(range(1, 12))
print(a)

print(a[:-3])

    :-3:2
    
means the same as above, but now in steps of 2

In [None]:
print(a[:-3:2])

and

    ::2
    
means go in steps of 2 all the way from the beginning to the end

In [None]:
printvar(a, "a")
printvar(a[::2], "a[::2]")

    :-3:2
    
means the same as above, but now in steps of 2

In [None]:
print(a[4:-3:2])

And we can do lots of other examples.

We can also do it with 2-dimensional matrices, with colon notation for each one.

In [None]:
nrows = 3
ncols = 5
a = np.reshape(np.array(range(1, nrows*ncols+1)), [nrows, ncols])
print(a)

For example, we could look at all rows, but only every other column of a matrix (colon statement before the comma is for rows, and after the comma for the columns)

In [None]:
print(a[:, ::2])

The colon notation serves to read out the contents of parts of a matrix, but can also be used to SET the contents of the same parts:

In [None]:
nrows = 3
ncols = 5
a = np.reshape(np.array(range(1, nrows*ncols+1)), [nrows, ncols])

printvar(a, "a")
a[1:,::2] = 100

printvar(a, "new a")

a[1:,::2] = a[1:,::2]*-1

printvar(a, "newest a")


## boolean indexing

Another very useful way to look at subparts of a matrix is to use boolean expressions:

In [None]:
a = np.array(range(1, 12))
print(a)

printvar(a%2==0, "a%2==0")

That boolean vector we got out can be used directly to index into a: when used as an index, only those shelves for which the boolean was true will be accessed:

In [None]:
I = a%2==0

printvar(a[I], "a[I]")

It is often used directly, without storing it in an intermediate variable I:

In [None]:
printvar(a[a%2==0], "a[a%2==0]")

In [None]:
# Multiply every even number by 3:
a = np.array(range(1, 12))
printvar(a, "a")

I = a%2==0
printvar(I, "I")
a[a%2==0] = 3*a[a%2==0]

printvar(a, "a")

We can use it in matrices, too:

In [None]:
nrows = 3
ncols = 5
a = np.reshape(np.array(range(1, nrows*ncols+1)), [nrows, ncols])
print(a)

In [None]:
printvar(a%2==0, "a%2==0")

a[a%2==0] = 3*a[a%2==0]

printvar(a, "a")

# Whitby asks --unanswered-- what if we want to do it only for selected rows (i.e., with colon notation)?

## --Dictionaries

We've gotten used to storing things in lists (or arrays), and accessing them with the square bracket notation, by asking for the item in a particular position.

Dictionaries allow us to give the different "shelves" names rather than positions: the position no longer matters, and what you do is access a particular shelf through its name (which is a string).

The names of the shelves are called `keys` and the values are called `items`

In [None]:
# Let's create a dictionary in variable b, which will have a shelf called "key1" with 
# the string "value1" stored in it, and a shelf called "key2" with the number 300 stored in it.

# Note curly braces and the colon linking each key to its value.
b = {'key1' : "value1", "key2" : 300}

# You access a shelf with the usual square bracket notation, but now you put the string for a key in it.
print(b['key2'])

# if you ask for a key that doesn't exist you get an error
# For example:
# >> b['gg']
#   KeyError: 'gg'

The values can be anything -- even other dictionaries

In [None]:
d = {'my' : "this", 'and' : 2, 'furthermore' : [1, 2, 3]}
print(d['my'])

d['my']= b
print(d['my'])



In [None]:
# You can find what keys a dictionary has with the .keys() function. This returns an iterable, 
# that you can put in a for loop:

print(d.keys(), "d.keys()")
for i in d.keys():
    printvar(i, "key")

# Or, you can cast it into a list and see it directly:    
print("\nList of keys is ", list(d.keys()), "\n")    
    
# The .items() function returns a (key, value) pair that you can iterate over:  
for k,v in d.items():
    print("key is =", k, ";\t  the value = ", v)


### Using .keys() to find out what variables are stored in a .npz file


In [None]:
g = np.load('beerncode_data.npz')
print(list(g.keys()))

---

# Displaying an image and using subplots

In [None]:
%matplotlib notebook
import matplotlib.image as img
import matplotlib.colors as colors
import matplotlib.pyplot as plt
import numpy as np

In [None]:
A = img.imread('salgado_bombay.jpg')
np.shape(A)

In [None]:
A = colors.rgb_to_hsv(A)


In [None]:
# Pick out just the v in the h,s,v triplets-- thats index 2 in the last dimension
A1 = A[:,:,2]
printvar(np.shape(A1), "shape of A1")

# squeeze says drop any dimensions that are equal to 1
A2 = np.squeeze(A1)
printvar(np.shape(A2), "shape of A2")

A = A2

In [None]:
fig1 = plt.figure(1, [11,4])  # first param is figure number, second is size in w x h inches
fig1.clf()  # clear figure 1
plt.subplot(1,3,1)  # this says make a 1-by-3 matrix of plotting axes, and choose the first one

plt.set_cmap('gray')  # use the gray colormap, not the default one
plt.imshow(A)

# plt.set_cmap('hot')

# plt.subplot(1,3,2)
# p = plt.hist(A.flatten(), bins=range(0, 255), log=False) 
# plt.xlabel('value of pixel in A')
# plt.ylabel('count of pixels')

plt.subplot(1,3,3)
B = A.copy()
# B[B<128] = 128
B[:,0::10] = 255  # make every tenth column white (max value)
B[::10,:] = 0     # make every tenth row black (min value)
# B[0,0] = 0
plt.imshow(B)

plt.subplot(1,3,2)
p = plt.hist(B.flatten(), bins=range(0, 255), log=True) 
plt.xlabel('value of pixel in B')
plt.ylabel('count of pixels')


Following example of using multiple axes stolen from: [Matplotlib documentation](http://matplotlib.org/examples/pylab_examples/axes_demo.html)

In [None]:
# Example of using multiple axes stolen from 
import matplotlib.pyplot as plt
import numpy as np

fig2 = plt.figure(2)
fig2.clf()

# create some data to use for the plot
dt = 0.001
t = np.arange(0.0, 10.0, dt)
r = np.exp(-t[:1000]/0.05)               # impulse response
x = np.random.randn(len(t))
s = np.convolve(x, r)[:len(x)]*dt  # colored noise

# the main axes is subplot(111) by default
plt.plot(t, s)
plt.axis([0, 1, 1.1*np.amin(s), 2*np.amax(s)])
plt.xlabel('time (s)')
plt.ylabel('current (nA)')
plt.title('Gaussian colored noise')

# this is an inset axes over the main axes
a = plt.axes([.65, .6, .2, .2], axisbg='y')
n, bins, patches = plt.hist(s, 400, normed=1)
plt.title('Probability')
plt.xticks([])
plt.yticks([])

# this is another inset axes over the main axes
a = plt.axes([0.2, 0.6, .2, .2], axisbg='y')
plt.plot(t[:len(r)], r)
plt.title('Impulse response')
plt.xlim(0, 0.2)
plt.xticks([])
plt.yticks([])

plt.show()


In-class exercise:  plot two vertical lines on the same axis: one from (1,1) to (1,2)  and another from (3,1) to (3,2)

If we call plt.plot(x, y) and x and y are 2-d matrices (should be same size), one line per column will be plotted

In [None]:
fig = plt.figure(3, [3, 3])
fig.clf()
x = np.array([[1, 3],[1, 3]])
y = np.array([[1, 1], [2, 2]])

# alternatively
x = np.array([[1,1], [3,3]]).T  # This way we write it out as "each line is a row" then 
                                 # we take the transpose to turn them into the columns that plot expects
y = np.array([[1,2], [1,2]]).T
h = plt.plot(x, y, 'b-')  # this returns a python list of line objects
h0 = h[0]           # this is the first line
h0.axes.set_xlim([0, 4])   # and we can ask what axes that line is in and set the xlimits for it
h0.axes.set_ylim([0, 3])

h1 = h[1]
h1.set_linewidth(3)
print(h1.get_xdata())

---

# Example problem from yesterday: beer and code

**7)** Load `beerncode_data.npz` into a variable called `beerncode`.

As you’ll see, this `npz` file contains four items, that can be accessed like a dictionary:
* `names`: an array of names
* `beers`: an array of number of beers drunk last week
* `heights`: an array of heights
* `errors`: an array of number of code errors made so far ;)

Create a variable for each of these four arrays, by accessing the data in the `npz` file.

Remember to close the `npz` file when you finish extracting the data!

**a)** Make a scatterplot of code errors versus beers. Does it look like there is a relationship?
**b)** Go back and edit your answer from `a` to include the name of each person beside their data point.

For the next question, we will use a 2-dimensional “rotation” matrix:


$$
R=
  \begin{bmatrix}
    \cos(\theta)  & \sin(\theta) \\
    -\sin(\theta) & \cos(\theta)
  \end{bmatrix}
$$

where $\theta$ is in radians. When this matrix is multiplied using linear algebra rules
times a column vector $\begin{bmatrix} x \\ y \end{bmatrix}$ that represents a point in space, it produces a new point in space that is equal to the original $\begin{bmatrix} x \\ y \end{bmatrix}$, but has been rotated by $\theta$ degrees. In other words, let’s say

<center> $\text{new_pos} = R*\begin{bmatrix} x \\ y \end{bmatrix}$ </center>

Then $\text{new_pos}$ will be a column vector with two entries, the first of which is the x-coordinate of the new point, and the second is the y-coordinate of the new point. That
new point will be at a distance from the origin equal to that of the original $\begin{bmatrix} x \\ y \end{bmatrix}$ point,
but its angle with respect to the horizontal will be rotated by $\theta$ degrees.

---

**c)**  Build an array to represent `R`, and use it to rotate all the points in your scatterplot by $\theta$ (`theta`) degrees. Then plot *both* the original data and the rotated data in different colors.

(Try running your code with different `theta` values, to confirm that the rotation works as intended.)

**d)** (Bonus) Loop over that code to make a movie of a rotating data plot. Starting with the original data, rotate in increments of `theta` and update the display.


In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
def printvar(var, str):
    """shorthand for printing out a string (typically the name of a variable) and a variable
    
    PARAMETERS:
        var      an expression
        str       a string
        
    RETURNS:  none        
    """
    print("\n", str, " = \n", var, "\n")
    
    
g = np.load('beerncode_data.npz')
print(g.keys())

beers = g['beers']
heights = g['heights']
names   = g['names']
errors  = g['errors']
g.close()

# d = {'key1' : 30, 'key2': 40}
# print(d.keys())

packemin = np.vstack((beers, errors))
fig = plt.figure(10, [8, 8])
plt.plot(packemin[0,:], packemin[1,:], 'o', markersize=12)
plt.xlabel('beers')
plt.ylabel('errors')

# npeople = len(errors)
# for i in range(0, npeople):
#     plt.text(beers[i]+0.2, errors[i], names[i],
#             horizontalalignment='left')


theta = 5   # in degrees
theta = theta*np.pi/180   # now converted to radians
R = np.array([[np.cos(theta), np.sin(theta)], [-np.sin(theta), np.cos(theta)]])

printvar(np.shape(beers), "np.shape(beers)")
current_positions = np.vstack((beers, errors))

printvar(np.shape(current_positions), "np.shape(current_positions)")
new_positions = R @ current_positions

# First row of new_positions are the new horizontal coords;
# Second row are the vertical coords
h = plt.plot(new_positions[0,:], new_positions[1,:], 'ro', markersize=12)
h0 = h[0]
h0.axes.set_xlim([-20, 20])
h0.axes.set_ylim([-20, 20])

for k in range(0, 1000):
    new_positions = R @ new_positions # for each point, compute a new position for it
    h0.set_xdata(new_positions[0,:])  # set the new horizontal positions
    h0.set_ydata(new_positions[1,:])  # set the new vertical positions
    fig.canvas.draw()   # this is the draw now command