<img src="support_files/cropped-SummerWorkshop_Header.png">  

<h1 align="center">Python Bootcamp</h1> 
<h3 align="center">August 20-21, 2016</h3> 

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<center><h1>Numpy</h1></center>

<p>Numpy is a Python library that provides efficient implementations of a large number of functions commonly used in scientific programming.  

<p>Consider trying to add two lists together in Python:
</div>

In [None]:
A = [1.0, 3, "hello, "]
B = [3.4, 3.2, "world"]

print([ a+B[i] for i,a in enumerate(A)])     # why can't we type A+B?

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p> In order to add these two lists, Python has to first figure out which type of object is in each place in each list (note that there are *three* different types above).  It then has to figure out which method is appropriate for adding the two lists together.  In one case, above, it also must cast the type to another type so that the two can be added.
All of that takes computation time and in some cases space.  In addition, the objects themselves, being Python objects, are relatively complex.  Lists are designed to allow *flexibility*, not *efficiency*.

<p>Numpy is built around the numpy `array` object.  This object solves each of these problems. 

<ol>
<li>  Numpy objects are collections of a fixed, specific type. 
<li>  The types used in (most) numpy arrays are extremely simple types that the hardware itself uses and so do not come with the normal overhead of Python
</ol>

<p> In addition, the numpy array helps us to manipulate large data sets in important and convenient ways, by allowing us to index portions of the data in a manner appropriate to that data.

</div>

In [None]:
import numpy as np
from __future__ import print_function #clean up the print functions

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>A simple way to generate a numpy array is by casting a list.

</div>

In [None]:
# python list of integers
nums = [ 1, 2, 3, 4, 5 ]

# numpy array of integers
arr1 = np.array( nums )

print("arr1:",arr1)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">
<p> From the output, you can't tell that anything is different.  The important features of numpy arrays are under the hood.

<left><h2>numpy array operators</h2></left>






<p> Scalar addition:  This adds a constant to each element of the array.
</div>

In [None]:
arr2 = arr1 + 2

print("arr2:", arr2)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">

<p>In contrast to lists, most common operations for numpy arrays act in an elementwise fashion.
<p>Arrays of the same size can be added together in an elementwise manner.

Elementwise addition:
</div>

In [None]:
arr3 = arr1 + arr2

print("arr3:", arr3)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>Elementwise multiplication:

</div>

In [None]:
arr4 = arr1 * arr2

print("arr4:", arr4)

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.1:**  Create two arrays with floating point values instead of integers.
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.2:**Create an array that represents the elementwise difference between two arrays.
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.3:**  Divide one array by a second array in an elementwise fashion and save the result as a new array.  What happens if the second array contains a zero?
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.4:**Make the two arrays different sizes - how does that affect the operations above?
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">
<left><h1>numpy array creation functions</h1></left>

<p>Use ```arange``` to create a sequence of values between two limits with defined step sizes

<p>Note that the end value is *excluded*

</div>

In [None]:
arr5 = np.arange(start=0, stop=10, step=1)

print("arr5:", arr5)

In [None]:
#use a larger step size
arr6 = np.arange(0, 10, 2)

print("arr6:", arr6)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>Use ```linspace``` to create a sequence of values between two limits with a defined number of steps

<p>Note that the end value is *included*

</div>

In [None]:
arr7 = np.linspace(start=0.0, stop=1.0, num=5)

print("arr7:", arr7)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>The arrays we have looked at so far have been 1-dimensional.  Quite often we work with data which is most meaningfully arranged in multiple-dimensions, e.g. images are arrays of data that are naturally described in terms of width and height, or a set of recordings from multiple neurons.

Multi-dimensional arrays can be created by using "lists of lists".
</div>

In [None]:
A = [[1,2],[3,4]]

array_A = np.array(A)
print(array_A)

In [None]:
B = [[[1,2],[3,4],[5,6]],[[3,4],[5,6],[7,8]]]

array_B = np.array(B)
print(array_B)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>numpy arrays have an attribute called "shape" that indicates how "long" the array is in each dimension.
</div>

In [None]:
print(array_A.shape)
print(array_B.shape)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>For real use cases, lists of lists is quite clumsy.

<p>One can use the functions ```zeros``` and ```ones``` to create an array of zeros or ones of a defined shape that are initially filled with zeros or ones, respectively.

</div>

In [None]:
# zeros, ones
zero_array = np.zeros((4,4))

print("zero_array:\n", zero_array) #\n is a newline character

In [None]:
one_array = np.ones((3,4,2))

print("one_array:\n", one_array)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>Use ```zeros_like``` and ```ones_like``` to create an array of zeros or ones that matches the shape of an existing array

</div>

In [None]:
another_one_array = np.ones_like(zero_array)

print("another_one_array:\n", another_one_array)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>The `size` of an array is the total number of elements.  It will be the product of the elements of the `shape`.
</div>

In [None]:
A = np.array([[1,2,3],[4,5,6]])
print(A.size)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>Use ```eye``` to create an array who elements are those of the identity matrix.
</div>

In [None]:
arr10 = np.eye(3)
print("arr10:\n", arr10)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 10px; background: #F0FAFF; ">
<p>The ```random``` module contains functions for creating arrays with randomly distributed values. For instance, ```np.random.rand()``` gives uniformly distributed values between 0 and 1.
</div>

In [None]:
# random values in [0,1]
arr11 = np.random.rand(4,3)
print("arr11:\n", arr11)

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.5:**  Create a 100x60 array of normally distributed values.  (Hint:  look at the functions available in np.random.)
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.6:**  Perform elementwise operations on the array generated above such that the mean is 5 and the standard deviation is 2.
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.7:**  `array`s have the methods ```mean``` and ```std```.  Use these methods to check your answer from the previous exercise.  These methods also have a keyword argument `axis` that allows one to compute the mean only along specific axes.  Try computing the `mean` and `std` over the first axis.  (The numpy module has functions called `mean` and `std`.  We recommend using the array methods.)
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">
<left><h1>numpy data types</h1></left>

<p>As mentioned above, numpy gains efficiency by using consistent, simple data types for every element of the array.  Your machine probably defaults to `np.float64`, which is an 8-byte double precision floating point number.  Other common data types are `np.int8` for 8-bit integers and `np.uint8` for 8-bit unsigned integers. 

<p>See here for a full list of available data types: http://docs.scipy.org/doc/numpy/user/basics.types.html

<p>The data type of an `array` is given by the `dtype` attribute.  Numpy will make assumptions about your desired dtype based on the values passed in to define the array.  Alternatively, you can use the `dtype` keyword argument to specify the dtype upon creation.

</div>

In [None]:
# numpy will guess data types for you
arr12 = np.array([0.0, 1.0, 2.0])
print(arr12.dtype, arr12)

# or you can specify them 
arr13 = np.array([0.0, 1.0, 2.0], dtype=np.uint8)
print(arr13.dtype, arr13)

arr14 = np.array([0,1,2])
print(arr14.dtype, arr14)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">
<left><h1>Changing shape</h1></left>

<p>Reshape:  return a *new* array with a different shape.  The total `size` of the array must be compatible.  A common and convenient reshape operation is the transpose, which can be accessed with `.T`.

Resize:  change the shape of an array *in place*.  This does *not* return a new array.
</div>

In [None]:
# reshape
arr17 = np.arange(0,10,1)
print(arr17)

arr18 = arr17.reshape((2,5))            
print("\nreshaped:\n", arr18)

arr19 = arr18.T 
print("\ntransposed:\n", arr19)

print(arr17.shape)
arr17.resize((2,5))
print(arr17.shape)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">
<p>  In numpy, matrix multiplication is accomplished with the "dot" method, and not with the "*" operator
</div>

In [None]:
# dot product
I = np.eye(3,3)
Z = np.ones((3,3))
print(np.dot(I,Z),"\n")
print(I*Z)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">
<p>  Some array operations do not require that the shapes of the arrays be the same; if the rightmost dimensions of the the array agree, then any remaining dimensions will be "broadcast" across.  The documentation can help you if you are confused: http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html
</div>

In [None]:
# This works:
A = np.ones((3,4,4))
B =   np.ones((4,4))
print((A+B).shape)

# This doesn't:
A = np.ones((3,4,4))
B =   np.ones((4,3))
print((A+B).shape)


<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.8:**  Create a small array of random numbers.  Use a method of the array to compute the sum of those numbers.  
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.9:**  Create a small array of random numbers.  Replace one element in that array with ```np.nan``` (which stands for 'not a number').  Compute the sum of this array.
</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.10:**  Check out the list of mathematical operations here: http://docs.scipy.org/doc/numpy/reference/routines.math.html. Find a function that will allow you to sum over various dimensions while ignoring the ```np.nan```.
</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">
<left><h1>array indexing</h1></left>

<p>Indexing can be used to identify and extract specific values from an array.  Remember that **Python indexes from 0!**

</div>

In [None]:
arr20 = np.arange(0,10,1).reshape((2,5))
print("arr20:\n", arr20)

# remember, indexing is from zero!
print("\nvalue at location[0,2]\n", arr20[0,2])

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">
<p>  Similarly to lists, you can also use slicing to retrieve portions of an array.  Unlike lists, slicing an array gives you a new view into the array, it does *not* produce a copy.

<p>  You can also access elements of an array with "index arrays".  These can be arbitrary arrays that contain indices for each dimension of the array.  Each index array must be the same shape, and the returned array will have that shape.
</div>

In [None]:
# grab an entire row or column
print("arr20:\n", arr20)
print("\nrow 1:\n", arr20[1,:])
print("\ncol 2:\n", arr20[:,2])
print("\n")
# index arrays with a 1-dimensional array
arr21 = np.arange(10)
index_array = np.array([[3,4,5],[7,9,8]])
print(arr21[index_array])

# index arrays with a 2-dimensional array
print("\n(row0 col2), (row0, col4) and (row1, col3):\n", arr20[(0, 0, 1),(2, 4, 3)])

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">

<h2>Boolean indexing</h2>

<p>Boolean comparisons with scalars will test the arrays elementwise against the scalar.  You can use this to generate an array indicating for which elements in an array that a condition is met.

<p>Furthermore, a boolean array can be used to index into another array of the same shape.  The return value will be a 1-dimensional array with the elements indicated as `True`.

</div>

In [None]:
print("arr20:\n", arr20)

print("\nboolean array for values greater than 3:\n", arr20>3)

print("\napply the boolean array to the orignal array to return values > 3:\n", arr20[arr20>3])

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">

<p>The numpy ```where``` function will return the indices where a given condition is true.
<p>Numpy has a set of logical functions that allow for comparisons between different arrays elementwise.  The function for `and` is  ```logical_and```.  This can be used to next comparisons.

</div>

In [None]:
print("arr20:\n", arr20)

print("\nindex array for values greater than 3 AND less than or equal to 5:\n", 
      np.where(np.logical_and(arr20>3,arr20<=5)))

print("\napply the index array to the orignal array to return values > 3 and <=5:\n", 
      arr20[np.where(np.logical_and(arr20>3,arr20<=5))])

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.11:**

<p>Return all of the values of arr20 that are less than 2 or greater than 7

</div>

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">

<p>array masking

</div>

In [None]:
arr21 = np.arange(0,10,1).reshape((2,5))
print("\noriginal array:\n", arr21)

# generate a logical mask of your array
mask = arr21 > 4
print("\nmask:\n", mask)

# now use it to modify the elements that match
arr21[mask] += 100
print("\nmasked array:\n", arr21)

<div style="border-left: 3px solid #000; padding: 1px; padding-left: 8px; background: #F0FAFF; ">

<h2>MATLAB gotchas</h2>

<p>Transitioning from MATLAB to python/numpy can be fun, but tricky. Here is a list of things to keep in mind:

<LI> Matlab indices start at "1"; python/numpy start at "0"
<LI> `For` loops, like MATLAB, are very slow in python, you should vectorize when possible.
<LI> In python, matrix multiplication is done with "dot", not "*"
<LI> Integer division is not automatically typecast to float for you.

<p> Check out http://mathesaurus.sourceforge.net/matlab-numpy.html for code examples help ease the transition

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<left><h1>Exercises</h1></left>

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.12:**

<p>Create a 1-d numpy array of the first 10,000,000 integers, and call it x. Compute x+x^2+x^3 two ways 1) By using "vectorized computations" on a single line, and once by using a for loop.  Which takes longer? 

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.13:**

<p>Create a 1-d numpy array (call it ```ts``` for timestamps), with increasing integers from 3 to 300 (including 3 but not 300) with increment of 3, and print the array

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.14:**

<p>change the dtype of ```ts``` into 32-bit floating point and print ```ts```

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.15:**

<p>add 0.5 to every element in ts which is larger than 200 and print ts

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.16:**

<p>calculate the mean difference between adjacent elements and print it
<p>hint: http://docs.scipy.org/doc/numpy/reference/generated/numpy.diff.html

</div>

<div style="background: #DFF0D8; border-radius: 3px; padding: 10px;">
<p>**Exercise 4.17:**

<p>find the index of the element closest to the number 212.3, and print that index and print the element

</div>