# NumPy

Numpy is a powerful package for fast array/matrix computing. It is the fundation for other scientific computing and data analytics packages in Python. 

- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transformation, and random number capabilities

We only cover NumPy topics frequently used in analytics. See <a href="https://www.numpy.org/devdocs/user/quickstart.html">the official NumPy tutorial at numpy.org</a> for a complete guide.

## Basics

### Import the NumPy package
To use NumPy, you need to first import the NumPy package. It is a well-accepted convention to use "np" as an alias for this package:

In [1]:
import numpy as np

### Create ndarray object
At the core of NumPy is the `ndarray` object, an N-dimensional array object of homogeneous data types. `ndarray` is also called `array` as the latter is an alias of the former in NumPy.

We can convert Python Lists to ndarray object using `numpy.array()`:

In [2]:
# create ndarray from list
x = [1, 2, 3, 4]
arr1 = np.array(x)

In [3]:
type(arr1)

numpy.ndarray

In [4]:
type(x)

list

Exercise: Create ndarray from list [[1,2,3,4], [5, 6, 7, 8]], and print it.

In [6]:
ndarray=[[1,2,3,4], [5, 6, 7, 8]]
print(ndarray)

[[1, 2, 3, 4], [5, 6, 7, 8]]


We can also create ndarray objects using built-in NumPy functions: `arange`, `ones`, `zeros`, `empty`, `eye`

In [7]:
# arange() for ndarray is similar to range() for list
np.arange(2, 10, 2)

array([2, 4, 6, 8])

In [8]:
np.ones(4)

array([1., 1., 1., 1.])

In [9]:
np.ones([2, 3])
# This can also be written as np.ones((2, 3))

array([[1., 1., 1.],
       [1., 1., 1.]])

### Exercises

Create 3x4 array with all ones

In [10]:
np.ones([3,4])

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

Create 4x4 array of all zeros

In [11]:
np.zeros([4,4])

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

Try `np.empty(5)` and `np.empty(10)`, and observe the outputs. It is a good habit to google any function that is not obvious to you when trying it.
+ Or, use the Tab key for quick help.

In [12]:
np.empty(5)

array([1.15611963e-311, 4.32673299e-307, 1.33360363e+241, 9.39106209e+093,
       3.51074257e-312])

In [13]:
np.empty(10)

array([1.15611820e-311, 1.77863633e-322, 0.00000000e+000, 0.00000000e+000,
       1.15436571e-311, 5.10873596e-066, 3.27042117e+179, 4.71081280e-090,
       4.76441576e-038, 1.15482334e-311])

### The properties of ndarray object
check array **properties** `.ndim`, `.shape`, `.dtype`

In [18]:
x = np.array([[ 0,  1,  2,  3,  4],
              [ 5,  6,  7,  8,  9],
              [10, 11, 12, 13, 14]])
print(x)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


Find out dimension of x

In [15]:
x.ndim

2

Find out shape of x

In [16]:
x.shape

(3, 5)

Find out data type of x

In [20]:
type(x)

numpy.ndarray

Notice that `type(x)` and `x.dtype` serve different purposes!

### Change data types for ndarrays

In [21]:
x = np.array([1.3, 2.4, 3.5])

In [22]:
x.dtype

dtype('float64')

There are two equivalent ways to change data type of ndarray object:

In [23]:
# option 1
y = np.array(x, dtype='int')
y.dtype

dtype('int32')

In [24]:
# option 2
y = x.astype(int)
y

array([1, 2, 3])

## Operations between arrays

### Two arrays of the same shape
Any arithmetic operations between equal-size arrays applies the operation *element-wise* 
- (Not required) This is often called **vectorization**. Modern CPUs/GPUs are often designed for fast vectorization operations

In [25]:
x = np.ones([4, 5])
x

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [26]:
# ndarray.reshape() is a popularly used method to change the shape of an array.
y = np.arange(20).reshape((4, 5))
y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [27]:
x + y

array([[ 1.,  2.,  3.,  4.,  5.],
       [ 6.,  7.,  8.,  9., 10.],
       [11., 12., 13., 14., 15.],
       [16., 17., 18., 19., 20.]])

In [28]:
x * y

array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.]])

### Array and Scalar
Arithmetic operations with scalars propagate the value to each element

In [29]:
# recall what y is:
y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

Exercise: What is the result of y + 1?

In [30]:
y+1

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

The above is a simple example of an important concept called **broadcasting** in NumPy. Simply put, broadcasting automatically changes the shape of a smaller array to match that of a larger array, so an arithmetic operation will make sense. 
- Used with caution, broadcasting can significantly simply our coding.
- (Not required) Just in case you are interested in learning more, see (https://numpy.org/devdocs/user/theory.broadcasting.html) for details.

## Indexing and Slicing
Similar to slicing in Python Lists, and extends intuitively to n-dimension.

In [31]:
y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [32]:
y[0, 0]

0

In [33]:
y[3, 4]

19

In [34]:
y[-1, -1]

19

In [35]:
y[0, :]

array([0, 1, 2, 3, 4])

In [36]:
y[:, -1]

array([ 4,  9, 14, 19])

In [37]:
y[0:2, 0:2]

array([[0, 1],
       [5, 6]])

Exercise: Select the first three rows of y

In [40]:
y[0:3, :]

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

Exercise: Select the first and third rows of y

In [43]:
y[[0,3],:]

array([[ 0,  1,  2,  3,  4],
       [15, 16, 17, 18, 19]])

**An important difference between Numpy `array` and Python `list`**: array slices are views on the original array. Any modification to the view will be reflected in the source array 

In [44]:
y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [45]:
y[0:2, :] = 0

In [46]:
y

array([[ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

**If you want to avoid change to the original array, make a copy.** Any change to the copy won't affect the original array.

In [47]:
y = np.arange(20).reshape((4, 5))
x = y.copy()
y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [48]:
x[0:2, :] = 1

In [49]:
x

array([[ 1,  1,  1,  1,  1],
       [ 1,  1,  1,  1,  1],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [50]:
y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

### Boolean Indexing

In [51]:
z = x - 12
z

array([[-11, -11, -11, -11, -11],
       [-11, -11, -11, -11, -11],
       [ -2,  -1,   0,   1,   2],
       [  3,   4,   5,   6,   7]])

Sometimes we want to modify an array based on some logic operation. Below are two popular examples.

First, suppose we want to extract all nonnegative elements from z:

In [52]:
nonnegative_elements_of_z = z[z>=0]
nonnegative_elements_of_z

array([0, 1, 2, 3, 4, 5, 6, 7])

Second, suppose we want to set all negative elements in z to 0. To do so, we first create an array of boolean values:

In [53]:
check_if_below_0 = z < 0
check_if_below_0

array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True, False, False, False],
       [False, False, False, False, False]])

Assign all negative elements in x to 0

In [54]:
z[check_if_below_0] = 0
z

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 1, 2],
       [3, 4, 5, 6, 7]])

In [55]:
# the above can also be done with the following simpler code:
z[z<0] = 0

## How to represent missing values and infinite?
Missing values can be represented using `np.nan` object, while `np.inf` represents infinite.

In [56]:
x = np.ones((3, 4))
x

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [57]:
x[0, 0] = np.nan
x

array([[nan,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])

In [58]:
x + 1

array([[nan,  2.,  2.,  2.],
       [ 2.,  2.,  2.,  2.],
       [ 2.,  2.,  2.,  2.]])

## Functions and Statistics

### Universal Functions `ufuncs`: Fast Element-wise Array Functions

Many commonly used math functions, e.g., sqrt(), max(), are supported by NumPy through universal functions (a.k.a., ufuncs). Basically, NumPy simply apply these functions *element-wise* through broadcasting. Let's look at a few examples. For more details, see (https://docs.scipy.org/doc/numpy/reference/ufuncs.html).

In [59]:
x = np.random.randint(0, 11, (4, 3))
x

array([[10,  6,  6],
       [ 6,  0,  4],
       [ 3, 10,  2],
       [ 2,  6,  0]])

Unary unfuncs, e.g. `abs()`, `sqrt()` that take one argument and perform elementwise transformations 

In [60]:
np.sqrt(x)

array([[3.16227766, 2.44948974, 2.44948974],
       [2.44948974, 0.        , 2.        ],
       [1.73205081, 3.16227766, 1.41421356],
       [1.41421356, 2.44948974, 0.        ]])

Binary unfuncs, e.g. 'subtract()' that take two arguments

In [61]:
np.subtract(x, np.ones((4, 3)))

array([[ 9.,  5.,  5.],
       [ 5., -1.,  3.],
       [ 2.,  9.,  1.],
       [ 1.,  5., -1.]])

In [62]:
# alternatively, we can write
x - np.ones((4, 3))

array([[ 9.,  5.,  5.],
       [ 5., -1.,  3.],
       [ 2.,  9.,  1.],
       [ 1.,  5., -1.]])

In [63]:
# alternatively, we can write
x-1.

array([[ 9.,  5.,  5.],
       [ 5., -1.,  3.],
       [ 2.,  9.,  1.],
       [ 1.,  5., -1.]])

One more example:

In [64]:
np.maximum(x, np.ones((4, 3))*5)

array([[10.,  6.,  6.],
       [ 6.,  5.,  5.],
       [ 5., 10.,  5.],
       [ 5.,  6.,  5.]])

### User `lambda` function to perform complex element-wise array operations

In [65]:
# suppose we want to normalize x and center it around 0
f = lambda e: (e-5)/10

In [66]:
f(x)

array([[ 0.5,  0.1,  0.1],
       [ 0.1, -0.5, -0.1],
       [-0.2,  0.5, -0.3],
       [-0.3,  0.1, -0.5]])

### Statistics
NumPy statstics include `median`, `average`, `mean`, `amax`, `amin` and many others

<a href="https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.statistics.html"> Reference </a>

In [67]:
x = np.random.randint(0, 11, (4, 5))
x

array([[ 9,  4,  0,  2,  9],
       [10,  7,  0,  4,  5],
       [ 1,  4,  4, 10,  9],
       [ 8,  6, 10,  1,  9]])

In [68]:
np.median(x, axis=0)

array([8.5, 5. , 2. , 3. , 9. ])

In [69]:
np.median(x, axis=1)

array([4., 5., 4., 8.])

In [70]:
np.mean(x, axis=0)

array([7.  , 5.25, 3.5 , 4.25, 8.  ])

In [71]:
np.mean(x, axis=1)

array([4.8, 5.2, 5.6, 6.8])

In [72]:
np.min(x, axis=0)

array([1, 4, 0, 1, 5])

### Exercises

Create a random integer array `x` of shape (4, 3) with values between 0 and 10 (inclusive of both 0 and 10).

In [73]:
x = np.random.randint(0, 10, (4, 3))
x

array([[1, 9, 9],
       [5, 3, 6],
       [0, 3, 4],
       [1, 8, 1]])

calcualte mean for each column

In [75]:
np.mean(x, axis=1)

array([6.33333333, 4.66666667, 2.33333333, 3.33333333])

Subtract the mean of each column of the matrix

Change elements of x that are less than 5, to 0

In [77]:
x[x<5]=0

In [78]:
x

array([[0, 9, 9],
       [5, 0, 6],
       [0, 0, 0],
       [0, 8, 0]])