# Mini Intro to Numpy
by Liang Jin

Part of AcF701 Python Sessions:
- [github.com/drliangjin/mini-python-book](https://github.com/drliangjin/mini-python-book)

Official NumPy Doc:
- [numpy.org](http://www.numpy.org/)

## NumPy -- Numerical Python
One of the most important foundational package for numerical computing in Python.
- ndarray: multi-dimensional array
- mathematical functions
- linear algebra, random number generation, and so on

NumPy based algorithes are generally **10** to **100** faster (or more) than pure Python algorithms

### Getting started with Numpy

In [None]:
import numpy

In [None]:
# Python Convention
import numpy as np

#### NumPy ndarray Object

In [None]:
# Generate random data
# np ==> short for Numpy
# random ==> sub-module in Numpy
# randn ==> a function in sub-module, random, in Numpy
arr1 = np.random.randn(2, 3) # ==> 2 rows, 3 columns

In [None]:
arr1

In [None]:
# basic attributes on the array
arr1.ndim, arr1.shape, arr1.dtype

#### Creating ndarrays

In [None]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

In [None]:
# we can a specify data type
arr3 = np.array(data2, dtype=np.float64)
arr3

In [None]:
# or cast an array using astype method
arr4 = arr3.astype('int64')
arr3.dtype, arr4.dtype

#### Arithmetic operations

In [None]:
# element-wise calculations
arr3 * 2

In [None]:
# array-wise calculations
arr3 - arr3

#### Indexing and Slicing

In [None]:
# create an array using arange, similar to python's built-in range
arr = np.arange(10) # again, 10 elements from 0 to 9
arr

In [None]:
# retrieve element(s)
arr[5], arr[5:]

In [None]:
# update element(s).
arr[5:] = -99
arr

In [None]:
# WARNING: mutations, need to use .copy() method
arr_slice = arr[5:]
arr_slice[1]=-100
arr

#### Boolean Indexing

In [None]:
# generate random data
data = np.random.randn(2,5)
data

In [None]:
# create a new array which is booleans
cond = data <= 0
cond

In [None]:
# filter data using conditions
data_cond = data[cond]
data_cond

#### NumPy functions (1) 

In [None]:
# create an array
arr = np.arange(5)
arr

In [None]:
# universal (element-wise) functions: abs, square, exp, log, and so on
arr_sqrt = np.sqrt(arr) # <= fast element-wise operations
arr_sqrt

More funcs: `abs`, `square`, `exp`, `log`, and so on.

#### NumPy functions (2)

In [None]:
x = np.random.randn(5)
x

In [None]:
y = np.random.randn(5)
y

In [None]:
# binary (array-wise) function
# obtain the maxium values between two arrays
np.maximum(x, y)

More funcs: `add`, `subtract`, `multiply`, and so on.

#### Mathematical and Statistical Methods

In [None]:
arr = np.random.randn(2, 5) # <= 2 rows, 5 columns

In [None]:
# obtain the mean of elements in the array
arr.mean(), np.mean(arr)

In [None]:
# what if we want row-wise mean instead of whole array?
arr.mean(axis=1), np.mean(arr, axis=1)

In [None]:
# what if column-wise?
arr.mean(axis=0), np.mean(arr, axis=0)

#### Why we need vectorizational computation? Why all these array "non-senses"?


#### An Example: Random Walks

In [None]:
# Python built-in "loop" style
%matplotlib inline
import random, matplotlib.pyplot as plt
position = 0 # <== starting point
walk = [position] # <== a list with the starting point
steps = 100 # <= 100 steps
for i in range(steps):
    step = 1 if random.randint(0,1) else -1 # 0 is False, 1 is True
    position += step # <== incremental operations: position = position + step
    walk.append(position) # append new position to the list
# plot data
plt.plot(walk);

#### Numpy Approach using array

In [None]:
nsteps = 1000

In [None]:
draws = np.random.randint(0, 2, size=nsteps) # <= random draw from 0, 1

In [None]:
np_steps = np.where(draws > 0, 1, -1) # if draw = 1 then step 1, otherwise, -1

In [None]:
np_walk = np.cumsum(np_steps) # or np_steps.cumsum(), NumPy method, cumulative sum

In [None]:
plt.plot(np_walk);

#### Array makes simulating many random walks at once!

In [None]:
nwalks = 10000 # 10,000 simulations
nsteps = 1000

draws = np.random.randint(0, 2, size=(nwalks, nsteps)) # <== again 0 or 1
steps = np.where(draws > 0, 1, -1) # change 0 to -1
walks = steps.cumsum(axis=1) # apply the cumsum() method across columns...

walks # 10000 simulations... all in an array....