# Numpy, Scipy, Matplotlib-- essentials of scientific programming in python

In this session we will take a look at three indispensible modules for Python programming in our business. `


## Numpy
Numpy is *the* core numerical library for technical programming in Python. In particular it implements fast routines for multidimensional arrays (think matrices or tensors). Much of our work for the remainder of this class will depend on numpy.

### Numpy Arrays
A numpy arrays are sort of like lists, in that you can index in to them to find elements, but they are much, much more efficient. Arrays can only take elements of a particular data type (all `int64` for instance) and they can be of arbitrary shape. For instance:

In [3]:
import numpy as np

x = np.array([1,3,4])
print(type(x),x.dtype)

print(x[0],x[2])

print(x.shape)

<class 'numpy.ndarray'> int64
1 4
(3,)


In [8]:
## input the matrix
##
## 1 2 3
## 4 5 6
#
y = np.array([[1,2,3],[4,5,6]])
print(y.shape)
print(y[0,0],y[1,1])


(2, 3)
1 5


Initializing numpy arrays is easy and there are many functions available to give common kinds of matrices.

In [12]:

a = np.zeros((2,2))   # Create an array of all zeros
print(a)              # Prints "[[ 0.  0.]
print("--------")     #          [ 0.  0.]]"

b = np.ones((1,3))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1. 1.]]"
print("--------")
c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"
print("--------")
d = np.eye(4)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0. 0. 0.]
                      #          [ 0.  1. 0. 0.]
                      #          [ 0.  0. 1. 0.]
                      #          [ 0.  0. 0. 1.]]"
print("--------")
e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     # Might print "[[ 0.91940167  0.08143941]
                             #               [ 0.68744134  0.87236687]]"

[[0. 0.]
 [0. 0.]]
--------
[[1. 1. 1.]]
--------
[[7 7]
 [7 7]]
--------
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
--------
[[0.92484933 0.35516302]
 [0.49867608 0.09451313]]


## Indexing Numpy arrays
There are multiple ways of indexing into numpy arrays and I'd encourage you to delve more deeply into this in the [documentation](https://docs.scipy.org/doc/numpy-1.15.1/reference/arrays.indexing.html). Here we will cover just the basics.


In [None]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]

# 
print(a[0, 1])  
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   

notice that last bit of behavior-- A slice of an array is a view into the same data, so modifying it will modify the original array.

numpy also allows us to mix integer indices with slices

In [None]:
# Two ways of accessing the data in the middle row of the array.
# Mixing integer indexing with slices yields an array of lower rank,
# while using only slices yields an array of the same rank as the
# original array:
row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)  # Prints "[5 6 7 8] (4,)"
print(row_r2, row_r2.shape)  # Prints "[[5 6 7 8]] (1, 4)"

# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print(col_r1, col_r1.shape)  # Prints "[ 2  6 10] (3,)"
print(col_r2, col_r2.shape)  # Prints "[[ 2]
                             #          [ 6]
                             #          [10]] (3, 1)"

### Speed check of Numpy vs pure Python
One major reason we use numpy is because of the speed advantage it offers over pure, base level python.

In [13]:
import numpy as np
from timeit import Timer

size_of_vec = 1000
X_list = range(size_of_vec)
Y_list = range(size_of_vec)
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)

def pure_python_version():
    Z = [X_list[i] + Y_list[i] for i in range(len(X_list)) ]

def numpy_version():
    Z = X + Y

timer_obj1 = Timer("pure_python_version()", 
                   "from __main__ import pure_python_version")
timer_obj2 = Timer("numpy_version()", 
                   "from __main__ import numpy_version")

print(timer_obj1.timeit(10))
print(timer_obj2.timeit(10))  # Runs Faster!

print(timer_obj1.repeat(repeat=3, number=10))
print(timer_obj2.repeat(repeat=3, number=10)) # repeat to prove it!


0.007411368016619235
0.006730540015269071
[0.0037990520359016955, 0.003515981021337211, 0.0027394599746912718]
[0.0001420890330336988, 9.819702245295048e-05, 7.205700967460871e-05]
