# Introduction to Numpy
 * The fundamental python library for manipulating matrices and arrays

 * Much faster than base Python for computation 

 * Official Docs: https://docs.scipy.org/doc/numpy/reference/index.html.

 
 * If you are coming from Matlab, [here](http://scipy.github.io/old-wiki/pages/NumPy_for_Matlab_Users) is a handy reference comparing the similarities and differences between Matlab and numpy. 

## Contents
### [Introduction to `ndarray`s](#ndarray)
####  [Construction](#constructors)
#### [Reshaping](#reshaping)
#### [Subsetting](#subsetting)
### [A note on copying](#A_Note_on_Copying)
#### [Aggregation](#aggregation)
#### [Array math](#array_math)
#### [Comparisons](#comparisons)
#### [Combining Arrays](#combining)
### [Broadcasting and Vectorization](#broadcasting)


The official documentation for the numpy library can be found at: https://docs.scipy.org/doc/numpy/reference/index.html. You will see that there are many things that you can do with numpy arrays. This module will aim to cover some of the basics as well as provide general guidelines for how to make sure you utilize numpy correctly to take advantage of the underlying optimizations.

Some of you may be much more familiar with Matlab and its associated array data structures. [here](http://scipy.github.io/old-wiki/pages/NumPy_for_Matlab_Users) is a handy reference comparing the similarities and differences between Matlab and numpy. 

In this module, we will cover how to various methods for creating arrays, indexing and slicing them, typing, and performing basic linear algebra computations. In addition, we will introduce the concept of *broadcasting*, which is an important tool avoiding unnecessarily slow `for` loops in python

### Importing Numpy

In [None]:
import numpy as np

<a id=ndarray></a>
## The N-Dimensional Array (`ndarray`) 


The `ndarray` is numpy's fundamental data structure. As the name suggests, it is a single or multidimensional array that contains elements that are of fixed type and size.

In [None]:
example_ndarray = np.array([[1,2,3], [4,5,6], [7,8,9]], dtype = np.float32)
print(example_ndarray)

 Two of the most important attributes of the `ndarray` class is its `shape` and its `dtype`. 

You can access a numpy array's `shape` and `dtype` by simply calling `.shape`and `.dtype`

In [None]:
print('the type of this array is {}'.format( type(example_ndarray )))
print('the shape of this array is {}'.format( example_ndarray.shape ))
print('the dtype of this array is {}'.format( example_ndarray.dtype ))

In the example above, we created the array using a list of lists. However, there are many other ways to create numpy arrays. Here are some of the more useful constructors

<a id='constructors'></a>
### How to construct an `ndarray` 

In [None]:
np.ones(shape = (3,3), dtype = np.float64)

In [None]:
np.zeros(shape = (3,3))

In [None]:
# arange takes start and stop argument, similar to python's range()
np.arange(0,9)

In [None]:
# linspace creates evenly spaced arrays
np.linspace(0, 9, 20)

In [None]:
np.linspace?

In [None]:
# Draws from a uniform distribution with support [0.0, 1.0)
np.random.random((3,3))

In [None]:
np.full((3,3), 17, dtype = np.int16)

In [None]:
# You can also change the dtype of a numpy array by calling astype() or passing the array to np.array and specifying a new dtyps
typed_array = np.ones((3,3), dtype = np.int16)

typed_array.astype(np.float32)

In [None]:
np.array(typed_array, np.uint8)

<a id='reshaping'></a>
### Reshaping

Let's say you have an array of size (3,3), but you'd it to actually be (9,1). You can simply call `reshape` with the dimensions that you wish to resize the array to

In [None]:
np.ones((3,3)).reshape(9,1)

In [None]:
# Or even into 3 dimensions
np.ones((4,4)).reshape(2,4,2)

One useful tool if the `flatten()` command, which will take in an array and return a 1-d version of it

In [None]:
random_array = np.arange(0,9).reshape(3,3)
random_array

In [None]:
random_array.flatten()

In [None]:
random_array.flatten('F')

In [None]:
random_array.flatten('F').shape #(9,) is just python's way of representing a tuple with 1 element

You can flatten by going through each row first (default behavior), or by using `order = 'F'` or just `'F'` as an argument to specify column-first ('F' here stands for FORTRAN-style). For more information on this behavior, you can see the docs here:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.html

<a id='subsetting'></a>
### Subsetting 

Now that you know how to create `ndarray`s, you may want to access particular elements within an array. To do that, `numpy` offers various ways to slice arrays to your heart's content

In [None]:
a = np.array([1,2,3,4,5,6])
a

In [None]:
a[2:-1]

In [None]:
a[3] # Access the 3rd element(Remember, python is 0-indexed) 

In [None]:
a[:3] # Elements up to the 3rd element

In [None]:
a[3:] # 3rd element onwards

In [None]:
a[:-1] #Elements up to the last element

In [None]:
a[2:-1] # Combine the two

In [None]:
# 2-d Arrays
b = np.arange(0,9).reshape(3,3)
b

In [None]:
b[1,2] # first row, 2nd column

In [None]:
b[:, 2]  # Get all rows, and grab the 2nd value in each column

In [None]:
b[b < 5] # Boolean indexing

<a id='A_Note_on_Copying'></a>
## A Note on Copying

Be careful when copying numpy arrays. Assigning a new variable to an existing numpy array provides a view to that array, rather than actually copying the array. Here's a quick demonstration illustrating this.

In [None]:
# Let's create an array:

c = np.arange(0,9).reshape(3,3)
c

In [None]:
id(c) # here is the id() of the object c, which is essentially a universal identifier for that object

In [None]:
d = c

In [None]:
d

In [None]:
id(d) # This is the same id!!! 

If I make a change to the array `c`, those changes will be propagated to `d`.

In [None]:
c.shape = (9,1)
c

In [None]:
d

To truly copy a numpy array, use the `np.copy()` method

In [None]:
e = np.arange(0,9).reshape(3,3)
f = e.copy() # Alternatively, f = np.copy(e)
e.shape = (9,1)
e


In [None]:
f # These are now different

<a id='aggregation'></a>
### Aggregation Functions
Numpy supports means, medians, standard deviations, etc across the entire array or just along a particular axis.

In [None]:
g = np.array(range(0, 100)).reshape(10,10)
g

In [None]:
g.mean()

In [None]:
g.mean(axis = 0) # Row-wise average

In [None]:
g.mean(axis = 1) # Column-wise average

In [None]:
g.cumsum(axis = 0) # Cumulative sum by row

<a id='array_math'></a>
## Array Mathematics

As you might expect, numpy has built in functions for many array operations, such as matrix multiplication, transposes, eigenvalues, etc. In general, you can use python's built-in operators such as `+`, `-`, `*`, `/`. Alternatively, you can call `np.add`, `np.multiply`, etc. However, note that, unlike matlab, the `*` operator does not perform matrix multiplication. Instead, it is used for element-wise multiplication. `np.dot` is the correct operator for doing matrix multiplication

In [None]:
b = np.arange(0, 9).reshape(3,3)

In [None]:
h = np.arange(10, 19).reshape(3,3)
h

In [None]:
b + h

In [None]:
# Or you can use np.add()
np.add(b, h)

In [None]:
b / h # element-wise division

In [None]:
b * h # !!!!! element-wise multiplication

In [None]:
np.dot(b, h) # Matrix multiplication

In [None]:
b.T # Transpose

In [None]:
np.sin(b)

`numpy` also has a linear algebra module for performing common linear algebra operations. A complete list of commands can be found in the docs: https://docs.scipy.org/doc/numpy/reference/routines.linalg.html

In [None]:
np.linalg.eig(h) # Returns a tuple of 2 arrays. The first is the eigenvalues, and the second is the eigenvectors

<a id='comparisons'></a>
## Comparisons

In [None]:
b

In [None]:
i = np.ones((3,3))
i

In [None]:
b == i # Element-wise comparison -- returns array of the same shape

In [None]:
b < i 

In [None]:
b < 5

In [None]:
np.array_equal(b, i)

In [None]:
i = np.arange(0,9).reshape(3,3)
i

#### Exercise: 

Look at the documentation for the following function: https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

We would like to use this to construct the following array:

A length 100 array, where the first 50 numbers are consecutive odd numbers starting at 1 and then the next 50 are
numbers that are even. Follow the instructions below in order to create this array: 


 1. Create an array of length 100 of consecutive odd numbers starting at 1. (i.e. [1, 3, 5 ...])
 2. Create an array of length 100 of consecutive even numbers starting at 2. (i.e., [2, 4, 6, ...])
 3. Using the above documentation, create the final array described above. 

In [None]:
even_array = np.arange(2, 201, 2)
odd_array1 = np.arange(1, 200, 2)

In [None]:
np.where(odd_array1 < 100, odd_array1, even_array)

In [None]:
np.concatenate( [odd_array1[odd_array1 < 100] , even_array[even_array > 100]], axis=0  )

<a id='combining'></a>
## Combining Arrays

One common array operation is to concatenate or stack arrays. Here are the most common ways to combine arrays together:

In [None]:
b

In [None]:
np.concatenate((b,b), axis = 0) # stacked b twice along the rows (axis = 0)

In [None]:
np.concatenate((b, b), axis = 1)

In [None]:
# alternatively, you can use hstack, which stands for Horizontal Stacking
np.hstack((b,b))

In [None]:
# Or Vstack
np.vstack((b,b))

<a id='broadcasting'></a>
## Broadcasting and Vectorization

Broadcasting is a method that numpy and other numerical computation libraries in python -- such as `tensorflow`, `pytorch`, and others -- use to allow array functions to be applied to arrays with different shapes/sizes. Although broadcasting can seem initially confusing, it can often significantly decrease the amount of time needed to execute certain matrix operations. We'll see a few examples here, but this is by no means an exhaustive guide to broadcasting. To learn more about broadcasting, you can see the official documentation here: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

Generally, you can apply matrix arithmetic operations elementwise if the two arrays in question have the same `shape`

In [None]:
j = np.linspace(0, 8, 9).reshape(3,3)
j

In [None]:
k = np.arange(0, 9).reshape(3,3)
k

In [None]:
# Array (3x3) + Array(3x3) = Array(3x3)
j + k

In [None]:
j * k

broadcasting allows for arrays of *different* sizes to undergo these operations as well. The simplest case of this is operations involving a scalar value.

In [None]:
l = 10

In [None]:
j + l

In [None]:
j * l

In general, you can determine what the shape after an operation will be and whether or not a broadcasting operation is even possible according to the following rules:

1. If the arrays have the same rank, or number of dimensions, each dimension must either match, or be of size 1
2. If the arrays are not the same rank, pad the smaller array with a 1 starting from the left until the two arrays have the same dimension
3. If the dimensions are the same, apply the operation element-wise. If one of them is a 1, broadcast that operation across the other array along that dimension

Let's illustrate this with an example:

In [None]:
j

In [None]:
j.shape

In [None]:
m = np.linspace(0, 2, 3)

In [None]:
m

In [None]:
m.shape

The array `j` has a shape of (3,3) and `m` has a shape of (3,). According to the rule, broadcasting an operation will pad `m` to be of size (1,3). The operation will then be broadcast as follows:

`j` (3x3)

`m` (1x3)

`result` (3x3)

In [None]:
j + m 

In [None]:
j * m

Here's a slightly more complicated example:

In [None]:
n = np.random.random((4,2,3,1))

In [None]:
o = np.arange(0, 12).reshape(3, 4)
o

The array `n` has a shape of (4,2,3,1) and `o` has a shape of (3,4). According to the rule, broadcasting an operation will pad `o` to be of size (1,1, 3, 4). The operation will then be broadcast as follows:

`n` (4x2x3x1)

`o` (1x1x3x4)

`result` (4x2x3x4)

In [None]:
(n + o).shape

In [None]:
n + o

For a list of more examples, see the docs: https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

## Vectorization

Numpy is fast mainly because of 2 things: 
 - Each `ndarray` has a known type, so no type checking has to occur
 - Operations can be vectorized

In [None]:
import math

test_vector = np.arange(0, 1000000)

In [None]:
test_vector

In [None]:
%timeit [math.sqrt(x) for x in test_vector]

%timeit np.sqrt(test_vector)