In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

![](https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/NumPy_logo.svg/775px-NumPy_logo.svg.png)

## The Scientific Python ecosystem is built on Numpy
![](https://gcpy.readthedocs.io/en/latest/_images/state_of_the_stack_2015.png)

## Arrays

The basic object of numpy is the ndarray

An ndarray is a multidimensional container of items of the same type and size. The number of dimensions and items in an array is defined by its shape

In [None]:
x = np.array([1,2,3,4])
x, x.shape, x.dtype

In [None]:
x = np.array([1.,2,3,4])
x, x.shape, x.dtype

In [None]:
# Mixed type arrays are coerced to the more general type or throws an error
x = np.array([1,2,3,'oops'])
x, x.dtype

In [None]:
x = np.array([[1,1],[2,2]])
x, x.shape

In [None]:
x = np.array([[[1,1],[2,2]],[[3,3],[4,4]]])
x, x.shape

In [None]:
x[0,1]

In [None]:
x[0][1]

## Vectorization
Arrays are important because they enable you to express batch operations on data without writing any for loops. This is usually called vectorization. Any arithmetic operations between equal-size arrays applies the operation elementwise.

Vectorization is more than just a concise way of writing computations, it is **fast**.

For loops are generally executed in Python while numpy vectorized calculations are offloaded to compiled C or Fortran code.  This has two benefits:
1. compiled C and Fortan are just faster than Python as an interpreted language
2. Frees up the Python GIL to do other work

In [None]:
# Initialize two 1-D arrays of length 1,000
a = np.random.rand(1000)
b = np.random.rand(1000)

In [None]:
%%timeit 
a*b

In [None]:
%%timeit 
for i in range(1000):
    a[i]*b[i]

The vectorized calculation in this example is 100s of times faster than the non-vectorized calculation.

**When working with numerical computation on an array, you should always look for a vectorized way of executing.**

## Universal Functions (ufunc)

Numpy refers to its vectorized calculations as universal functions. From the docs: a ufunc is a “vectorized” wrapper for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs.  There are a [ton of them](https://docs.scipy.org/doc/numpy-1.15.1/reference/ufuncs.html) many of which have short-hand arithmetic notation.  Many of the ufuncs are also callable as a method (see sum)

In [None]:
%%timeit
np.sin(a).sum()

In [None]:
%%timeit 
tot = 0.
for i in range(1000):
    tot += np.sin(a[i]) 

In [None]:
import textwrap

In [None]:
# what can we do with an array?
textwrap.wrap(' '.join([i for i in dir(a) if i[0] != '_']), 100)

#### Exercise 3.1: What is the coefficient of variation of x (standard deviation/mean) of x?:

In [None]:
x = np.arange(100)

In [None]:
%load ./answers/03.01.py


## Slicing
Often you will want to hone into certain data in the array.  If you know the position of the elements in the array, you can use slicing notation which uses brackets of the form `[start:stop:step]` for every dimension of the array.  Each paramater of the slicer is optional.
1. If you omit start, it will start from the first element
2. If you omit stop, it will end at the last element
3. If you omit step, it will grab every element between start and stop
4. If you set step to -1, it will reverse the order of elements

** Slicing is a Python construct not unique to numpy. These rules are useful for pandas, lists in addition to numpy arrays **

In [None]:
x = np.array([1,2,3,4,5,6,7,8,9,10])
print(x[3:6])
print(x[:])
print(x[1::2])
print(x[::-2])

In [None]:
x = np.array([[(y+1)*(x+1) for y in range(10)] for x in range(10)])
print(x)
print(x[5:8,0:3])
print(x[::-1,::-1])

#### Exercise 3.2: Extract the values in the 3rd through 6th index of x in reverse order?

In [None]:
x = np.arange(10)

In [None]:
# Your answer here:

#%load ./answers/03.2.py

## Filtering with boolean arrays
A boolean array is an array where the elements take on True of False values.  You can filter a multidimensional array by boolean arrays.  You can also create boolean arrays by applying boolean expressions against non-boolean arrays.  To filter by boolean arrays, the boolean array must have the same shape as the array to be filtered.

In [None]:
x = np.array([1,2,3,4,5,6])
y = np.array([True,False,False,True,True,False])
x[y]

In [None]:
x[x>3]

#### Exercise 3.3: Create an array of values 1 to 100 and filter out all elements that are divisible by 4 using the np.mod ufunc.

In [None]:
x = np.arange(100)+1

In [None]:
x[np.mod(x, 5) != 0]

In [None]:
# Your answer here:

#%load ./answers/03.3.py

## Aggregates along an Axis (dimension)
There are a variety of aggregate functions you can apply to a dimension
sum, min, max, count, cumsum, var, prod, percentile are all valid aggregate functions.
To aggregate over a specific axis, use the axis argument.  There are versions of all of these functions 
that are designed to ignore invalid elements.


In [None]:
x = np.array([1,2,3,4,5,6])
print(np.sum(x))
x = np.array([1,2,np.nan,4,5,6])
print(np.sum(x))

In [None]:
x = np.array([1,2,np.nan,4,5,6])
print(np.nansum(x))

In [None]:
x = np.array([[(y+1)*(2*x+1) for y in range(10)] for x in range(10)])
print(x)
print(np.sum(x))
print(x.sum())
print(np.sum(x,axis=0))
print(np.sum(x,axis=1))
print(x.sum(1))

##### Exercise 3.4: What is the mean of the sum of each column in x?

In [None]:
x = np.array([[(y+1)*(x+1) for y in range(-5,5)] for x in range(10)])
x.sum(0)

In [None]:
# Your answer here:

#%load ./answers/03.4.py

## Modifying shape
You will often need to append additional dimensions, elements within dimensions, or generally change the shape of a multi-dimensional array.

reshape, expand_dims, and concatenate are your friends for this.

In [None]:
x = np.arange(12)+1
print(x)
print(np.reshape(x, (4,3)))
print(np.reshape(x, (3,2,2)))

In [None]:
x = np.arange(10)
print(x)
x = np.expand_dims(x, axis=1)
print(x)
y = 2*x
print(np.concatenate((x,y),axis=1))

In [None]:
print(np.hstack((x,y)))
print(np.vstack((x,y)))

#### Exercise3.5: Transpose x to a 100 x 1 matrix using reshape.

In [None]:
x = np.arange(100)
print(x.reshape((10,10)))
print(x.reshape((10,10)).T)

In [None]:
# Your answer here:

#%load ./answers/03.5.py