# NumPy

"NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more"  
-https://docs.scipy.org/doc/numpy-1.10.1/user/whatisnumpy.html.

In [55]:
#In python importing packages is a way of bringing in new functions that you can use 

import numpy as np

Let's run through an example showing how powerful NumPy is. Suppose we have two lists a and b, consisting of the first 100,000 non-negative numbers, and we want to create a new list c whose *i*th element is a[i] + 2 * b[i].  

Without NumPy:

In [56]:
a = list(range(100000))
b = list(range(100000))

In [57]:
c = []
for i in range(len(a)):
    c.append(a[i] + 2 * b[i])
print(a[0:10], b[0:10], c[0:10])

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [0, 3, 6, 9, 12, 15, 18, 21, 24, 27]


What we would like to do is just get c = a + 2*b. What happens if we try this with a list?

In [58]:
c = a+2*b
print(a[0:10], b[0:10], c[0:10])
#adding lists just concatenate them and multiplying by 2 just repeats the list twice
print(len(c), len(a), len(b))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
300000 100000 100000


With NumPy:

In [59]:

a = np.arange(100000)
b = np.arange(100000)

In [60]:
c = a + 2 * b

The result is intuitively what we want and it is in fact also faster (as it does not require a loop)

The process we used above is **vectorization**. Vectorization refers to applying operations to arrays instead of just individual elements (i.e. no loops).

Why vectorize?
1. Much faster
2. Easier to read and fewer lines of code
3. More closely assembles mathematical notation

Vectorization is one of the main reasons why NumPy is so powerful.

## ndarray

ndarrays, n-dimensional arrays of homogenous data type, are the fundamental datatype used in NumPy. As these arrays are of the same type and are fixed size at creation, they offer less flexibility than Python lists, but can be substantially more efficient runtime and memory-wise. (Python lists are arrays of pointers to objects, adding a layer of indirection.)

The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

In [78]:
# Can initialize ndarrays with Python lists, for example:
a = np.array([1, 2, 3])   # Create a rank 1 array
print('type:', type(a))            # Prints "<class 'numpy.ndarray'>"
print('shape:', a.shape)            # Prints "(3,)" 
print('a:', a)   # Prints "1 2 3"

a_cpy= a.copy()
a[0] = 5                  # Change an element of the array
print('a modified:', a)                  # Prints "[5, 2, 3]"
print('a copy:', a_cpy)

#A 2-D array is essentially a list of lists - we can build a 2x3 array from a list of 2 length 3 lists
b = np.array([[1, 2, 3],
              [4, 5, 6]])    # Create a rank 2 array
print('shape:', b.shape)                     # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"

type: <class 'numpy.ndarray'>
shape: (3,)
a: [1 2 3]
a modified: [5 2 3]
a copy: [1 2 3]
shape: (2, 3)
1 2 4


Note that in the above we used a.shape. This is equivalent to np.shape(a) and you will see this in numpy a lot. NumPy supports an object-oriented paradigm, such that ndarray has a number of methods and attributes, with functions similar to ones in the outermost NumPy namespace. This is similar to what we talked about earlier with shape. For example, we can do things like np.mean(x) or x.mean() or np.min(x) or x.min()

There are many other initializations that NumPy provides:

In [62]:
#Remember we had the range() in our loops- range() does not actually return a list of numbers
#So what if we wanted to generate a 1xd array from 0 to 10

a = np.arange(10)
print(a)

a = np.zeros((2, 2))   # Create an array of all zeros
print(a)               # Prints "[[ 0.  0.]
                       #          [ 0.  0.]]"

b = np.full((2, 2), 7)  # Create a constant array
print(b)                # Prints "[[ 7.  7.]
                        #          [ 7.  7.]]"

c = np.eye(2)         # Create a 2 x 2 identity matrix
print(c)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

d = np.random.random((2, 2))  # Create an array filled with random values
print(d)                      # Might print "[[ 0.91940167  0.08143941]
                              #               [ 0.68744134  0.87236687]]"
    
a = np.ones((2, 2))    # Create an array of all ones
print(a)               # Prints "[[ 1.  1.]
                       #          [ 1.  1.]]"
    

[0 1 2 3 4 5 6 7 8 9]
[[0. 0.]
 [0. 0.]]
[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]
[[0.0563363  0.45613252]
 [0.86228115 0.73333724]]
[[1. 1.]
 [1. 1.]]


How would you find the mean of the sum of 2 vectors x1 and x2, each with 10 uniform random numbers? You can do this using three lines of code (2 to initialize each vector) and 1 to calculate the mean of the sum. 

In [77]:
x1 = np.random.random((10,1))
x2 = np.random.random((10,1))
result = (x1+x2).mean()
print(result)

1.1471467200626402


How do we create a 10 by 10 matrix of all values = 0.5?

In [63]:
a = np.ones((10, 10))*0.5    # Create an array of all ones
print(a)               # Prints "[[ 1.  1.]
                       #          [ 1.  1.]]"

[[0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]
 [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5]]


What if we want to rearrange an array into a different shape? For example, let's say we have a a sequence of 3 2x2 matrices we would like to store. We can either store them as a 5x10x10 matrix or reshape this into a 3x4 matrix for easier storage. 

In [64]:
mats = np.random.random((3,2,2))
print(mats)
print(mats.shape)

mats_reshape = mats.reshape((3, 4))
print('Reshaped:\n', mats_reshape)
print(mats_reshape.shape)


[[[0.56405837 0.8505574 ]
  [0.82851894 0.83534297]]

 [[0.08494956 0.71559965]
  [0.74853249 0.77904526]]

 [[0.39620456 0.47170181]
  [0.13885256 0.80982635]]]
(3, 2, 2)
Reshaped:
 [[0.56405837 0.8505574  0.82851894 0.83534297]
 [0.08494956 0.71559965 0.74853249 0.77904526]
 [0.39620456 0.47170181 0.13885256 0.80982635]]
(3, 4)


What if we don't know the number of datapoints in the matrix? Numpy is smart and can figure this out if there is just one dimension to figure out. For example, if we wanted to turn our (3,2,2) matrix into a (3,4) we didn't need to specify 4. We could have used reshape(3, -1) where -1 implies numpy should figure out this dimension. 

In [65]:
print(mats.reshape(3,-1))

[[0.56405837 0.8505574  0.82851894 0.83534297]
 [0.08494956 0.71559965 0.74853249 0.77904526]
 [0.39620456 0.47170181 0.13885256 0.80982635]]


In [66]:
nums = np.arange(8)
print(nums.min())     # Prints 0
print(np.min(nums))   # Prints 0

0
0


## Basic Array Operations/Math

NumPy supports many useful elementwise operations:

In [67]:
x = np.array([[1, 2],
              [3, 4]], dtype=np.float64)
y = np.array([[5, 6],
              [7, 8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(np.array_equal(x + y, np.add(x, y)))

# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(np.array_equal(x - y, np.subtract(x, y)))

# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(np.array_equal(x * y, np.multiply(x, y)))

# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

True
True
True
[[1.         1.41421356]
 [1.73205081 2.        ]]


How do we elementwise divide between two arrays?

In [68]:
x = np.array([[1, 2], [3, 4]], dtype=np.float64)
y = np.array([[5, 6],
              [7, 8]], dtype=np.float64)

# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


Note * is elementwise multiplication, not matrix multiplication. We will come back to matrix multiplication later. But for those of you who know about that, numpy uses the @ symbol to do a matrix multiplication:



In [69]:
x @ y

array([[19., 22.],
       [43., 50.]])

There are many more useful functions built into NumPy,but we will encounter these through the following data analysis exercises as we go.

## Indexing

NumPy also provides powerful indexing schemes which is necessary when dealing with high dimensional data.

In [70]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])
print('Original:\n', a)

# Can select an element as you would in a 2 dimensional Python list
print('Element (0, 0) (a[0][0]):\n', a[0][0])   # Prints 1
# or as follows
print('Element (0, 0) (a[0, 0]) :\n', a[0, 0])  # Prints 1

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
print('Sliced (a[:2, 1:3]):\n', a[:2, 1:3])

# Steps are also supported in indexing. The following reverses the first row:
print('Reversing the first row (a[0, ::-1]) :\n', a[0, ::-1]) # Prints [4 3 2 1]

# slice by the first dimension, works for n-dimensional array where n >= 1
print('slice the first row by the [...] operator: \n', a[0, ...])

Original:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Element (0, 0) (a[0][0]):
 1
Element (0, 0) (a[0, 0]) :
 1
Sliced (a[:2, 1:3]):
 [[2 3]
 [6 7]]
Reversing the first row (a[0, ::-1]) :
 [4 3 2 1]
slice the first row by the [...] operator: 
 [1 2 3 4]


Often, it's useful to select or modify one element from each row of a matrix. The following example employs **fancy indexing**, where we index into our array using an array of indices (say an array of integers or booleans):

In [71]:
# Create a new array from which we will select elements
a = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

print(a)  # prints "array([[ 1,  2,  3],
          #                [ 4,  5,  6],
          #                [ 7,  8,  9],
          #                [10, 11, 12]])"

# Create an array of indices
b = np.array([0, 2, 0, 1])

# Select one element from each row of a using the indices in b
print(a[np.arange(4), b])  # Prints "[ 1  6  7 11]"





[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
[ 1  6  7 11]


Just like we did earlier with variables, we can also do in-place operations with numpy arrays- this means the array values are changed without *assigning* them to another variable

In [72]:
# Mutate one element from each row of a using the indices in b
a[np.arange(4), b] += 10

print(a)  # prints "array([[11,  2,  3],
          #                [ 4,  5, 16],
          #                [17,  8,  9],
          #                [10, 21, 12]])


[[11  2  3]
 [ 4  5 16]
 [17  8  9]
 [10 21 12]]


We can also use boolean indexing/masks. Suppose we want to set all elements greater than MAX to MAX:

In [73]:
MAX = 5
nums = np.array([1, 4, 10, -1, 15, 0, 5])
print(nums > MAX)            # Prints [False, False, True, False, True, False, False]

nums[nums > MAX] = 100
print(nums)                  # Prints [1, 4, 5, -1, 5, 0, 5]

[False False  True False  True False False]
[  1   4 100  -1 100   0   5]


In [74]:
nums = np.array([1, 4, 10, -1, 15, 0, 5])
nums > 5

array([False, False,  True, False,  True, False, False])

## Summary

1. NumPy is an incredibly powerful library for computation providing both massive efficiency gains and convenience.
2. Vectorize! Orders of magnitude faster.
3. Keeping track of the shape of your arrays is often useful.
4. Many useful math functions and operations built into NumPy.
5. Select and manipulate arbitrary pieces of data with powerful indexing schemes.
