# Numpy, Scipy, Matplotlib-- essentials of scientific programming in python

In this session we will take a look at three indispensible modules for Python programming in our business. `


## Numpy
Numpy is *the* core numerical library for technical programming in Python. In particular it implements fast routines for multidimensional arrays (think matrices or tensors). Much of our work for the remainder of this class will depend on numpy.

### Numpy Arrays
A numpy arrays are sort of like lists, in that you can index in to them to find elements, but they are much, much more efficient. Arrays can only take elements of a particular data type (all `int64` for instance) and they can be of arbitrary shape. For instance:

In [44]:
import numpy as np

x = np.array([1,3,4])
print(type(x),x.dtype)

print(x[0],x[2])

print(x.shape)

<class 'numpy.ndarray'> int64
1 4
(3,)


In [51]:
## input the matrix
##
## 1 2 3
## 4 5 6
#
y = np.array([[1,2,3],[4,5,6]])
print(y.shape)
print(y[0,0],y[1,1])


(2, 3)
1 5
6


Initializing numpy arrays is easy and there are many functions available to give common kinds of matrices.

In [52]:

a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
print("--------")     #          [ 0.  0.]]"

b = np.ones((1,3))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1. 1.]]"
print("--------")
c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"
print("--------")
d = np.eye(4)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0. 0. 0.]
                      #          [ 0.  1. 0. 0.]
                      #          [ 0.  0. 1. 0.]
                      #          [ 0.  0. 0. 1.]]"
print("--------")
e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print "[[ 0.91940167  0.08143941]
                             #               [ 0.68744134  0.87236687]]"

[[0. 0.]
 [0. 0.]]
--------
[[1. 1. 1.]]
--------
[[7 7]
 [7 7]]
--------
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
--------
[[0.29169385 0.3680325 ]
 [0.82343321 0.84380624]]


In [62]:
print(b)
b2 = np.ones((3))
b[0,0]
b2[0]
b2.shape

[[1. 1. 1.]]


(3,)

## Indexing Numpy arrays
There are multiple ways of indexing into numpy arrays and I'd encourage you to delve more deeply into this in the [documentation](https://docs.scipy.org/doc/numpy-1.15.1/reference/arrays.indexing.html). Here we will cover just the basics.


In [66]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)
print("----------")
# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]
print(b)

# 
print(a[0, 1])  
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a)   

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
----------
[[2 3]
 [6 7]]
2
[[ 1 77  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


notice that last bit of behavior-- A slice of an array is a view into the same data, so modifying it will modify the original array.

numpy also allows us to mix integer indices with slices

In [68]:
# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
print(a)
print(a[1])
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)  # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape)  # Prints "[[5 6 7 8]] (1, 4)"

# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
#print(col_r1, col_r1.shape)  # Prints "[ 2  6 10] (3,)"
#print(col_r2, col_r2.shape)  # Prints "[[ 2]
                             #          [ 6]
                             #          [10]] (3, 1)"

[[ 1 77  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[5 6 7 8]
[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)


### Integer array indexing
When you index into numpy arrays using slicing, the resulting array view will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array.

In [72]:
a = np.array([[1,2], [3, 4], [5, 6]])
print(a)
# The returned array will have shape (3,)
#print(a[[0, 1, 2], [0, 1, 0]])  # Prints "[1 4 5]"

#another way to do the same thing
#print(np.array([a[0, 0], a[1, 1], a[2, 0]]))  # Prints "[1 4 5]"

# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]])  # Prints "[2 2]"

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))  # Prints "[2 2]"

[[1 2]
 [3 4]
 [5 6]]
[2 2]
[2 2]


one thing that comes up again and again is operating or selecting on individual elements of numpy arrays

In [75]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [78]:
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
print(a)
print("--------")
# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])  
print("--------")

# change one element from each row of a using the indices in b
a[np.arange(4), b] = 0

print(a)


[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
--------
[ 1  6  7 11]
--------
[[ 0  2  3]
 [ 4  5  0]
 [ 0  8  9]
 [10  0 12]]


In [80]:
i = np.eye(3)
print(i)
a = np.array([i[0,0],i[1,1],i[2,2]])
print(a)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[1. 1. 1.]


**Exercise:**-- use the `eye()` function shown above to create an identity matrix of rank 3, then using integer indexing pull out the diagonal elements to create an array of length 3 that only has the values 1.

### Boolean indexing
we can do something similiar using Booleans. Indeed booleans can allow us to do selection of elements based on some conditional.


In [81]:
a = np.array([[1,2], [3, 4], [5, 6]])
print(a)

bool_idx = (a > 2) #return the indices of elements greater than 2 as bools

print(bool_idx)

# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print(a[bool_idx])  

# We can do all of the above in a single concise statement:
print(a[a > 2])     


[[1 2]
 [3 4]
 [5 6]]
[[False False]
 [ True  True]
 [ True  True]]
[3 4 5 6]
[3 4 5 6]


# Array math
most basic arthimetic opperators work elementwise on numpy arrays and do so in an optimized fashion


In [82]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
print(x + y)
print("--------")
print(np.add(x, y))
print("--------")
# Elementwise difference; both produce the array
print(x - y)
print("--------")
print(np.subtract(x, y))
print("--------")
# Elementwise product; both produce the array
print(x * y)
print("--------")
print(np.multiply(x, y))
print("--------")

[[ 6.  8.]
 [10. 12.]]
--------
[[ 6.  8.]
 [10. 12.]]
--------
[[-4. -4.]
 [-4. -4.]]
--------
[[-4. -4.]
 [-4. -4.]]
--------
[[ 5. 12.]
 [21. 32.]]
--------
[[ 5. 12.]
 [21. 32.]]
--------


### Speed check of Numpy vs pure Python
One major reason we use numpy is because of the speed advantage it offers over pure, base level python.

In [83]:
import numpy as np
from timeit import Timer

size_of_vec = 1000
X_list = range(size_of_vec)
Y_list = range(size_of_vec)
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)

def pure_python_version():
    Z = [X_list[i] + Y_list[i] for i in range(len(X_list)) ]

def numpy_version():
    Z = X + Y

timer_obj1 = Timer("pure_python_version()", 
                   "from __main__ import pure_python_version")
timer_obj2 = Timer("numpy_version()", 
                   "from __main__ import numpy_version")

print(timer_obj1.timeit(10))
print(timer_obj2.timeit(10))  # Runs Faster!

print(timer_obj1.repeat(repeat=3, number=10))
print(timer_obj2.repeat(repeat=3, number=10)) # repeat to prove it!


0.009101293981075287
0.00015891995280981064
[0.003866925952024758, 0.005266298016067594, 0.0033348440192639828]
[0.0004287969786673784, 3.102299524471164e-05, 2.9800983611494303e-05]


### Matrix / Vector operations
the above functions all looked at elementwise operations on numpy arrays. Numpy is also well optimized for matrix operations (i.e. linear algebra). By way of example let's consider the `dot` funtion which is used to compute so-called dot products of vectors / matrices. 

The dot product is defined as 
$ X \cdot Y = \sum_{i=1}^{n} X_i Y_i $

In [29]:
v = np.array([9,10])
w = np.array([11, 12])
#dot product
print(np.dot(v, w))

#matrix times vector
x = np.array([[1,2],[3,4]])
print(np.dot(x, v))

#matrix times matrix
y = np.array([[2,3],[4,5]])
print(np.dot(x, y))

219
[29 67]
[[10 13]
 [22 29]]


# Buit in numpy functions
One of the most useful parts of numpy are the built in functions. We can leverage these for all kinds of things, saving us from writing code and speeding up things as we go along. 

A classic that you will use again and again will be things like `sum` or `mean`. For instance

In [31]:
x = np.array([11, 12])
print(x.sum())
print(x.mean())

23
11.5


In [40]:
# make array of 1000 normally distributed random
# numbers with mean -0.5, and stdev 1.
# note we are using numpy's random number library
x = np.random.normal(-0.5,1,1000)
print(x.mean())
print(x.std())

-0.465089847498351
1.003542222740998


**Exercise:** write a function to compute the sum of a numpy array. Now compare how long it takes for your function to compute the sum versus what numpy can do. Try it out on an array of random numbers as above.

### More with numpy

Numpy has a boatload of build in functions that are available. I'd encourage you to peruse them [here](https://docs.scipy.org/doc/numpy/reference/routines.html). As a decent example of something a bit more complex we will quickly use numpy to solve a system of equations

In [42]:
#Solve 3 * x0 + x1 = 9 and x0 + 2 * x1 = 8

a = np.array([[3,1], [1,2]])
b = np.array([9,8])

print(np.linalg.solve(a, b))

[2. 3.]


# Scipy
While numpy provides all of our core numerical support, Scipy provides layers of more specialized routines for scientific computing. These include routines for numerical integration, optimization, interpolation, and statistics. 

