# Introduction

Numpy is a foundational component of numerical computing in Python. This Notebook aims to provide a foundational understanding of NumPy to help you analyse and make sense of large datesets. In particular, this Notebook teaches core Numpy functionalities, including:
- fast vectorised array operations for data cleaning, filtering, and transformation;
- common array algorithms such as sorting, unique, and set operations;
- descriptive statistics and aggregating data; and 
- data manipulation for merging and joining together datasets.

# What is NumPy?

Numerical Python, or Numpy, is one of the most widely used open-source numerical computing Python packages available today. Most scientific computing packages, such as scikit-learn, rely on NumPy's array objects as the format for data exchange. 

Here's some of the tools and functionalities that NumPy offers:  
- Powerful N-Dimensional Arrays: an efficient multidimensional arrays providing fast array-oriented arithmetic operations and flexible broadcasting capabilities;
- Mathematical functions for fast operations on large arrays of data without writing loop structures;
- Numerical Computing Tools: NumPy offers comprehensive mathematical functions such as random number generators, linear algebra routines, and Fourier transforms; and
- A C programming language APO for connecting NumPy with libraries written in C, C++, and FORTRAN;

## What is the NumPy N-dimensional array?

An array is a collection of items stored at contiguous memory locations. The idea is to store multiple items of the same type together. One of the key characteristics of arrays is that it allows you to perform mathematical operations on whole blocks of data. 

One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python. An ndarray is a multidimensional container of items of the same type and size. Every array has a `shape`, a tuple indicating the size of each dimension, and a `dtype`, an object describing the data type of the array.

This section will teach you the basics of using NumPy arrays, including creating ndarrays, the various data types of ndarrays, converting an array from one dtype to another, and more.

## Creating ndarrays

The easiest way to create an array is to use the `array` function. This function accepts any sequence-like object and produces a NumPy array containing the passed data.

In [1]:
# first you need to import the numpy package
# this needs to be done only once in a notebook
import numpy as np

myData = [1, 2, 3, 4, 5]
myArray = np.array(myData)
myArray

array([1, 2, 3, 4, 5])

In [12]:
myData2 = [15.94, 95.42, 49.03, 16.87, 77.09]
myArray2 = np.array(myData2)
myArray2

array([15.94, 95.42, 49.03, 16.87, 77.09])

In [13]:
# you can even create an ndarray from a list of lists
myData3 = [[1,38,41,78,14], [96,81,84,73,22]]
myArray3 = np.array(myData3)
myArray3

array([[ 1, 38, 41, 78, 14],
       [96, 81, 84, 73, 22]])

Unless explicity specified, the `array` function automatically infers a data type for the array it creates. The data type is stored in a special `dtype` metadata object and can be accessed as such:

In [11]:
myArray3.dtype

dtype('int64')

## Data types of ndarrays

The data type or dtype is a special object containing the information the `ndarray` needs to interpret a chunk of memory as a particular type of data. The  dtypes are named as such: a type name, like `float` or `int`, followed by a number indicating the number of bits per element. For example, the following code creates a ndarray of `int8`:

In [15]:
myArray = np.array([96, 81, 84, 73, 22], dtype=np.int8)
myArray

array([96, 81, 84, 73, 22], dtype=int8)

### Numpy supports the following data types:

**int8** and **uint8** – signed and unsigned 8-bit (1 byte) integer types

**int16** and **uint16** – signed and unsigned 16-bit integer types

**int32** and **uint32** – signed and unsigned 32-bit integer types

**int64** and **uint64** – signed and unsigned 64-bit integer types

**float16** – half-precision floating point

**float32** – standard single-precision floating point

**float64** – standard double-precision floating point

**float128** – extended-precision floating point

**complex64**, **complex128**, **complex256** – complex numbers represented by two 32, 64, or 128 floats

**bool** – boolean type storing `True` and `False` values

**string_** – fixed-length ASCII string type (1 byte per character)


### Converting an array from one dtype to another

It is possible to explicitly convert or cast an array from one dtype to another using ndarray’s `astype` method. In the following example, we cast integer values to floating points.

In [21]:
myArray = np.array([96, 81, 84, 73, 22]) #int64
myArray.dtype

float_myArray = myArray.astype(np.float64) #float64
float_myArray.dtype

dtype('float64')

The reverse can also be done – casting floating point values to integers. However, if we cast a floating point value to integer dtype, the decimal part will be truncated.

In [22]:
myArray = np.array([15.94,95.42,49.03])

int_myarray = myArray.astype(np.int64)
int_myarray

array([15, 95, 49])

If you have an array of strings that represent numbers, `astype` can be used to convert them to numeric form:

In [30]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_) # original array
numeric_strings.astype(np.float64) # casted to floats

array([ 1.25, -9.6 , 42.  ])

It is also possible to use another array's dtype attribute:

In [59]:
myArray_float = np.array([15.94,95.42,49.03], dtype=np.float64) #float64 array

myArray_int = np.array([82,9,69,66,89]) #int64 array

myArray_int.astype(myArray_float.dtype).dtype #cast to myArray_float dtype

dtype('float64')

## ndarray attributes

### ndim

The `ndim` attribute return the number of array dimensions

In [61]:
myArray = np.array([[96, 81, 84, 73, 22], [82, 9, 69, 66, 89]], dtype=np.int8)
myArray.ndim

2

### shape

The `shape` attribute is used to get the current shape of an array, but may also be used to reshape the array in-place by assigning a tuple of array dimensions to it.

In [62]:
myArray.shape

(2, 5)

## ndarray functions

### np.zeros and np.ones

The `zeros` and `ones` functions create arrays of 0s or 1s, respectively, with a given length or shape.

In [68]:
np.zeros(10) #create an array zeros of 10x1 dimension

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [67]:
np.ones((3,4)) #create an array ones of 3x4 dimension

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

### np.empty

The `empty` function returns a new array of given shape and type, without initializing entries.

In [70]:
np.empty((2,3)) #create an array zeros of 10x1 dimension

array([[0., 0., 0.],
       [0., 0., 0.]])

It’s not safe to assume that `np.empty` will return an array of all zeros. In some cases, it may return uninitialized “garbage” values.

### np.arange

The `arange` function returns evenly spaced values within a given interval.

In [3]:
np.arange(10) #create an array of evenly spaced values from 0 to 9

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### np.reshape

The `reshape` function converts an array from one shape to another shape without copying any data. The `reshape` function accepts a tuple indicating the new shape of the array.

In [22]:
myArray = np.arange(10) #create an array of evenly spaced values from 0 to 9

myArray.reshape((5,2)) #reshape to 5x2

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

## Arithmetic operations with NumPy

Numpy arrays are important because they allow you to perform batch computataions on large data without using loops. This is called _vectorisation_.

### +, -, *, /

Arithmetic operations between equal-size arrays apply the operation element-wise.

In [6]:
arr1 = np.array([[4,6,1,3,8],[5,10,2,9,7]])
arr1

array([[ 4,  6,  1,  3,  8],
       [ 5, 10,  2,  9,  7]])

In [8]:
arr1 * arr1

array([[ 16,  36,   1,   9,  64],
       [ 25, 100,   4,  81,  49]])

In [9]:
arr1 + arr1

array([[ 8, 12,  2,  6, 16],
       [10, 20,  4, 18, 14]])

Arithmetic operations with scalar values applies the scalar value to each element in the array.

In [13]:
arr1/2 # each element in the array divided by 2

array([[2. , 3. , 0.5, 1.5, 4. ],
       [2.5, 5. , 1. , 4.5, 3.5]])

In [12]:
arr1 ** 2 #each element in the array to the power of two

array([[ 16,  36,   1,   9,  64],
       [ 25, 100,   4,  81,  49]])

### Comparisons (>, ==, <)

It is also possible to compare two arrays. Comparisons between arrays of the same size produces an array of boolean value.

In [14]:
arr2 = np.array([[4,63,11,3,8],[5,10,24,9,76]])

In [15]:
arr1 < arr2

array([[False,  True,  True, False, False],
       [False, False,  True, False,  True]])

In [18]:
arr1 == arr2

array([[ True, False, False,  True,  True],
       [ True,  True, False,  True, False]])

## Broadcasting

Broadcasting describes how arithmetic operations work between arrays of different shapes. For instance, in the following example, we _broadcast_ the scalar value 2 to all of the elements in myArray in the multiplication process.

In [24]:
myArray = np.arange(5)
myArray

array([0, 1, 2, 3, 4])

In [26]:
myArray * 4 #broadcast 4 to all element in the multiplication process.

array([ 0,  4,  8, 12, 16])

In the following example, we demean each column of myArray2 by subtracting the column means.

In [28]:
myArray2 = np.random.randn(5,2)
myArray2

array([[-0.05545734,  0.16474914],
       [ 0.47784372, -0.80437864],
       [-0.53399124,  0.41830123],
       [-1.36253664,  0.3157035 ],
       [-0.77860918, -0.35351775]])

In [34]:
myMean = np.mean(myArray2, axis=0) #find mean of each column
demeaned = myArray2 - myMean
demeaned

array([[ 0.3950928 ,  0.21657765],
       [ 0.92839386, -0.75255013],
       [-0.08344111,  0.47012973],
       [-0.9119865 ,  0.367532  ],
       [-0.32805905, -0.30168925]])

In [36]:
np.mean(demeaned, axis=0) #find the new mean which is 0

array([5.55111512e-17, 0.00000000e+00])

### Broadcasting Rule  
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when:
- they are equal, or
- one of them is 1

If these conditions are not met, a `ValueError: operands could not be broadcast together` exception is thrown, indicating that the arrays have incompatible shapes.

Arrays do not need to have the same number of dimensions. For example, if you have a 256x256x3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. 

In [47]:
arr1 = np.arange(3) #shape = 1x3
arr2 = np.arange(4) # shape = 1x4

arr1 + arr2 #does not satisfy the rules of broadcasting so error is raised. i.e., trailing dimensions do not match 

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

In [46]:
arr3 = np.random.randn(3,2)
arr4 = np.arange(2)

arr3 + arr4 #This works because the trailing dimensions match, and the leading dimension of arr4 is 1

array([[-1.18175879,  0.91772141],
       [-0.61557157,  1.52025603],
       [ 0.85810953,  1.19720677]])

## Indexing and slicing

NumPy array indexing and slicing allow you to select a subset of your data or individual elements. For example:

In [48]:
arr1 = np.arange(10)
arr1

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [50]:
arr1[5] # select element at index 5

5

In [52]:
arr1[5:8] #create a slice of arr1, by selecting elements from index 5 to index 8

array([5, 6, 7])

In [54]:
arr1[:] #this is called the bare slice [:], where all values in the array is selected

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### Views

NumPy arrays may act similarly to Python lists. However, one important first distinction of NumPy arrays and Python lists is that NumPy array slices are _views_ on the original array. In other words, the data is not copied in the array slices, and any modification to the view will be relected in the source array.

In [56]:
myArray = np.arange(15)
myArray

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [57]:
arr_slice = myArray[5:12]
arr_slice

array([ 5,  6,  7,  8,  9, 10, 11])

Below is a demonstration of _views_: when one of the values of `arr_slice` is changed, the change is also relected in the orignal array `MyArray`

In [59]:
arr_slice[5] = 2312312
arr_slice

array([      5,       6,       7,       8,       9, 2312312,      11])

In [60]:
myArray

array([      0,       1,       2,       3,       4,       5,       6,
             7,       8,       9, 2312312,      11,      12,      13,
            14])

### .copy()

It may be surprising that data is not copied when an array is sliced. As NumPy is designed to work with very large datasets, it would cause many performance and memory problems if Numpy copied data during every operation.

If you would like a copy of a slice of an `ndarray` instead of a view, NumPy requires that the `copy()` is called explicitly. 

In [62]:
copied_slice = myArray[5:10].copy()
copied_slice[2] = 9877987
myArray, copied_slice

(array([      0,       1,       2,       3,       4,       5,       6,
              7,       8,       9, 2312312,      11,      12,      13,
             14]),
 array([      5,       6, 9877987,       8,       9]))

### 2d slicing and indexing

In two-dimensional arrays, the elements at each index are no longer scalars byt one-dimensional arrays. As such, individual elements can be accessed recursively. 

In [66]:
my2dArray = np.random.randn(6,4)
my2dArray

array([[ 0.43678679, -0.22112239,  0.81351036, -1.37334885],
       [-0.84242751,  0.02469704, -2.41419131, -1.21071366],
       [-0.21504034,  0.15573722, -2.58627308,  1.15331323],
       [ 0.32880068, -1.05812349,  0.33430366,  0.89271259],
       [-1.98610341, -0.96459486,  0.39905663,  0.23054855],
       [ 1.41547439, -0.19628165,  0.60225916, -0.52652606]])

In [82]:
my2dArray[2][3] #access the element at the third column and second row of the array

1.1533132274031528

Typing out each individual index with square brackets `[` can be tedious. You can also pass a comma-separated list of indices to select individual elements, as such:

In [83]:
my2dArray[2,3] #equivalent to the above: access the element at the third column and second row of the array

1.1533132274031528

Slicing two-dimensional arrays can be done with a similar syntax to slicing one-dimensional arrays. The following code slices `my2dArray` along axis 0, which is the first axis.

In [85]:
my2dArray[:2] # select the first two rows of my2dArray.

array([[ 0.43678679, -0.22112239,  0.81351036, -1.37334885],
       [-0.84242751,  0.02469704, -2.41419131, -1.21071366]])

It is also possible to pass multiple slices to an array:

In [88]:
my2dArray[:2, 1:] # select the first two rows of my2dArray but only from column 1 onwards

array([[-0.22112239,  0.81351036, -1.37334885],
       [ 0.02469704, -2.41419131, -1.21071366]])

The following diagram illustrates slicing on two-dimensional arrays.

![Screen Shot 2021-08-02 at 2.16.41 pm.png](attachment:4a1a81ca-9634-443a-a943-24eee2c1b3af.png)

### 3d slicing and indexing

In multi-dimensional arrays, if you omit later indices, the returned object will be a lower dimensional ndarray consisting of all the data along the higher dimensions.

In [72]:
my3dArray = np.random.randn(2,2,3)
my3dArray

array([[[ 0.5216942 ,  1.13859917,  1.23127342],
        [-1.08511161,  0.063279  ,  1.30882231]],

       [[ 0.51857236,  0.27466862,  0.54066515],
        [ 0.1598828 , -0.54177412, -0.94675146]]])

In [75]:
my3dArray[0] #this returns a 2x3 array of all the data with the index 0 along the third dimension.

array([[ 0.5216942 ,  1.13859917,  1.23127342],
       [-1.08511161,  0.063279  ,  1.30882231]])

Suppose we want to fetch this 1-d array: `[ 0.51857236,  0.27466862,  0.54066515]`.

This 1d array is located at index 1 of the third dimension, and index 0 of the second dimension of the array. In NumPy this is expressed in two steps as such:

In [79]:
x = my3dArray[1]
x

array([[ 0.51857236,  0.27466862,  0.54066515],
       [ 0.1598828 , -0.54177412, -0.94675146]])

In [80]:
x[0]

array([0.51857236, 0.27466862, 0.54066515])

The above two steps can also be expressed as a single operation as such:

In [93]:
my3dArray[1,0] # this outputs all the values whose indices start with (1,0) forming a 1-dimensional array

array([0.51857236, 0.27466862, 0.54066515])

## Transposing arrays

Transposing is a special form of reshaping that similarly returns a view on the under‐ lying data without copying anything. Arrays have the transpose method and also the special T attribute:

In [94]:
myArray = np.arange(18).reshape((6,3))
myArray

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14],
       [15, 16, 17]])

In [97]:
myArray.T #transpose the matrix (i.e., the rows become the column, and the column become the rows.)

array([[ 0,  3,  6,  9, 12, 15],
       [ 1,  4,  7, 10, 13, 16],
       [ 2,  5,  8, 11, 14, 17]])

Simple transposing with `.T` or `.transpose()` is a special case of swapping axes. ndarray has the method swapaxes, which takes a pair of axis numbers and switches the indicated axes to rear‐ range the data:

In [99]:
myArray = np.arange(16).reshape((2,2,4))
myArray

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [101]:
myArray.swapaxes(1,2)

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

## Conditional logic as array operations – `.where`

The NumPy `.where` function is a vectorized version of the ternary expression `x if condition else y`.

Suppose you had a boolean array, `bool_arr`, and two arrays of values, `x_arr` and `y_arr`, and you wanted to take a value from `x_arr` whenever the corresponding value in `bool_arr` is `True`, otherwise take the value from `y_arr`.

With `np.where` this operation can be done very concisely:

In [103]:
bool_arr = np.array([True, False, True, False, True])
x_arr = np.array([2,3,6,1,4])
y_arr = np.array([20,21,27,23,30])

In [105]:
result = np.where(bool_arr, x_arr, y_arr)
result

array([ 2, 21,  6, 23,  4])

A typical use of the `.where` function is to produce a new array of values based on another array. For example, suppose you have a matrix of positive and negative number and you would like to replace the negative values with -99. This can easily be achieved with `np.where`.

In [106]:
myArray = np.random.randn(5,4)
myArray

array([[-0.63714309,  0.5897583 ,  1.08682264, -1.51438043],
       [-0.09187642, -0.00690983,  2.83318313,  0.15125999],
       [ 0.97025296, -0.80912916, -2.08533181, -0.75718448],
       [ 0.25890482,  0.36545951, -0.76907453, -1.90568891],
       [ 0.17024234, -0.45196682, -0.27309375,  1.36456298]])

In [110]:
np.where(myArray > 0, myArray, -99)

array([[-99.        ,   0.5897583 ,   1.08682264, -99.        ],
       [-99.        , -99.        ,   2.83318313,   0.15125999],
       [  0.97025296, -99.        , -99.        , -99.        ],
       [  0.25890482,   0.36545951, -99.        , -99.        ],
       [  0.17024234, -99.        , -99.        ,   1.36456298]])

## Statistical computations in NumPy

NumPy offers mathametical functions that allow you to compute statistics about entire arrays. Let's look at some of these functions:

In [115]:
myArray = np.random.randn(6,5)
myArray

array([[ 1.31413099, -0.54970673,  0.30489557, -0.71253493, -1.30262104],
       [ 0.31957366, -0.06861005, -0.25896071,  0.09350543, -0.51762873],
       [-1.78470456, -0.66276046,  1.46721209, -0.21629359, -2.25332719],
       [ 0.92380584, -1.7767122 ,  0.04177097,  0.13231884, -0.47499546],
       [-0.75776528,  0.92589469, -0.59270723, -0.82423622,  0.55758733],
       [-0.79664458, -0.44390774,  0.5358653 , -1.17071379, -0.87238822]])

### sum

`.sum()` function returns the sum of an array elements over a given axis.

In [120]:
myArray.sum(axis=1) #sum of values along the columns (i.e., horizontal sum of matrix)

array([-0.94583616, -0.4321204 , -3.44987371, -1.15381201, -0.69122672,
       -2.74778902])

In [121]:
myArray.sum(axis=0) #sum of values along the rows (i.e., vertical sum of matrix)

array([-0.78160393, -2.57580249,  1.49807599, -2.69795427, -4.86337331])

### mean

`.mean()` function returns the mean of an array elements over a given axis.

In [123]:
myArray.mean(axis=1) #mean of values along the columns

array([-0.18916723, -0.08642408, -0.68997474, -0.2307624 , -0.13824534,
       -0.5495578 ])

In [122]:
myArray.mean(axis=0) #sum of values along the rows

array([-0.13026732, -0.42930042,  0.24967933, -0.44965905, -0.81056222])

### std, var

`.std()` function returns the standard deviation of elements over a given axis.

`.var()` function returns the variance of elements over a given axis.

In [128]:
myArray.std(axis=1) #standard deviation of values along the columns

array([0.91078927, 0.28748303, 1.30545576, 0.89316749, 0.73177881,
       0.58998498])

In [127]:
myArray.var(axis=0) #variance of values along the rows

array([1.16244809, 0.64189741, 0.42981549, 0.23625285, 0.73296056])

### min, max

`.min()` function returns the minimum value over a given axis.

`.max()` function returns the maximum value over a given axis.

In [129]:
myArray.min(axis=1) #minimum value along the columns

array([-1.30262104, -0.51762873, -2.25332719, -1.7767122 , -0.82423622,
       -1.17071379])

In [131]:
myArray.max(axis=0) #maximum value along the rows

array([1.31413099, 0.92589469, 1.46721209, 0.13231884, 0.55758733])

### cumsum

In multidimensional arrays, accumulation functions such as `.cumsum()` returns an array of the same size with the cumulative sum of the elements along a given axis.

In [133]:
myArray = np.array([[1,0,6],[5,3,4],[8,2,9]])
myArray

array([[1, 0, 6],
       [5, 3, 4],
       [8, 2, 9]])

In [134]:
myArray.cumsum(axis=0)

array([[ 1,  0,  6],
       [ 6,  3, 10],
       [14,  5, 19]])

### cumprod

The`.cumprod()` returns the cumulative product of elements along a given axis

In [136]:
myArray = np.array([[1,0,6],[5,3,4],[8,2,9]])
myArray

array([[1, 0, 6],
       [5, 3, 4],
       [8, 2, 9]])

In [137]:
myArray.cumprod(axis=0)

array([[  1,   0,   6],
       [  5,   0,  24],
       [ 40,   0, 216]])

## Other NumPy Functions

### .sort()

NumPy allows to sort elements in-place with the `.sort()` function:

In [139]:
myArray = np.array([1,0,6,5,3,4,8,2,9])
myArray

array([1, 0, 6, 5, 3, 4, 8, 2, 9])

In [142]:
myArray.sort()
myArray

array([0, 1, 2, 3, 4, 5, 6, 8, 9])

### .unique()

The `np.unique(x)` function returns the sorted unique values in an array.

In [143]:
myArray = np.array([1,8,1,6,3,1,9,1,4])
myArray

array([1, 8, 1, 6, 3, 1, 9, 1, 4])

In [145]:
np.unique(myArray)

array([1, 3, 4, 6, 8, 9])

### intersect1d(x, y)

The `np.intersect1d(x,y)` function computes the sorted,common elements in `x` and `y`

In [147]:
x = np.array([1,8,1,6,3,1,9,1,4])
y = np.array([6,6,0,5,1,6,9,8,7])

In [148]:
np.intersect1d(x, y)

array([1, 6, 8, 9])

### union1d(x ,y)

The `np.union1d(x,y)` function computes the sorted union of elements.

In [147]:
x = np.array([1,8,1,6,3,1,9,1,4])
y = np.array([6,6,0,5,1,6,9,8,7])

In [149]:
np.union1d(x, y)

array([0, 1, 3, 4, 5, 6, 7, 8, 9])

### in1d(x, y)

The `np.in1d(x,y)` function compute a boolean array indicating whether each element of `x` is contained in `y`

In [150]:
x = np.array([1,8,1,6,3,1,9,1,4])
y = np.array([6,6,0,5,1,6,9,8,7])

In [151]:
np.in1d(x, y)

array([ True,  True,  True,  True, False,  True,  True,  True, False])

### setdiff1d(x, y)

The `np.setdiff1d(x,y)` function computes the set difference; elements in x that are not in y

In [152]:
x = np.array([1,8,1,6,3,1,9,1,4])
y = np.array([6,6,0,5,1,6,9,8,7])

In [153]:
np.setdiff1d(x, y)

array([3, 4])

### setxor1d(x,y)

The `np.setxor1d(x,y)` function computes the set symmetric differences; elements that are in either of the arrays, but not both.

In [155]:
x = np.array([1,8,1,6,3,1,9,1,4])
y = np.array([6,6,0,5,1,6,9,8,7])

In [156]:
np.setxor1d(x, y)

array([0, 3, 4, 5, 7])

## Linear Algebra

Linear algebra, like matrix multiplication, decompositions, determinants, and other square matrix math, is an important part of any array library. This section discusses some common linear algebra functions that NumPy offers.

### .dot()

The `np.dot(x, y)` function computes the dot product of two arrays.

In [161]:
x = np.array([[0,6,4],[1,4,9],[6,6,5]]) #3x3
y = np.array([[6,1],[5,8],[0,9]]) #3x2

In [162]:
np.dot(x, y) #outputs a 3x2 array

array([[ 30,  84],
       [ 26, 114],
       [ 66,  99]])

### diag

The `np.diag` function returns the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or convert a 1D array into a square matrix with zeros on the off-diagonal.

In [165]:
x = np.array([[0,6,4],[1,4,9],[6,6,5]]) #3x3
x

array([[0, 6, 4],
       [1, 4, 9],
       [6, 6, 5]])

In [168]:
np.diag(x)

array([0, 4, 5])

### trace

The `np.trace` function computes the sum of the diagonal elements

In [166]:
x = np.array([[0,6,4],[1,4,9],[6,6,5]]) #3x3
x

array([[0, 6, 4],
       [1, 4, 9],
       [6, 6, 5]])

In [169]:
np.trace(x)

9

### eig

The `np.linalg.eig` function computes the eigenvalues and eigenvectors of a square matrix

In [174]:
x = np.array([[0,6,4],[1,4,9],[6,6,5]]) #3x3
x

array([[0, 6, 4],
       [1, 4, 9],
       [6, 6, 5]])

In [175]:
np.linalg.eig(x)

(array([14.47910038+0.j        , -2.73955019+2.79773284j,
        -2.73955019-2.79773284j]),
 array([[ 0.4356405 +0.j        , -0.16534649-0.54483739j,
         -0.16534649+0.54483739j],
        [ 0.61002814+0.j        ,  0.63361813+0.j        ,
          0.63361813-0.j        ],
        [ 0.6618784 +0.j        , -0.45610608+0.25750352j,
         -0.45610608-0.25750352j]]))

### inv

The `np.linalg.inv` function computes the the inverse of a square matrix

In [179]:
x = np.array([[0,6,4],[1,4,9],[6,6,5]]) #3x3
x

array([[0, 6, 4],
       [1, 4, 9],
       [6, 6, 5]])

In [180]:
np.linalg.inv(x)

array([[-0.15315315, -0.02702703,  0.17117117],
       [ 0.22072072, -0.10810811,  0.01801802],
       [-0.08108108,  0.16216216, -0.02702703]])

### pinv

The `np.linalg.pinv` function computes the Moore-Penrose pseudo-inverse of a matrix

In [181]:
x = np.array([[0,6,4],[1,4,9],[6,6,5]]) #3x3
x

array([[0, 6, 4],
       [1, 4, 9],
       [6, 6, 5]])

In [182]:
np.linalg.pinv(x)

array([[-0.15315315, -0.02702703,  0.17117117],
       [ 0.22072072, -0.10810811,  0.01801802],
       [-0.08108108,  0.16216216, -0.02702703]])

### qr

The `np.linalg.qr` function computes the QR decomposition

In [183]:
x = np.array([[0,6,4],[1,4,9],[6,6,5]]) #3x3
x

array([[0, 6, 4],
       [1, 4, 9],
       [6, 6, 5]])

In [184]:
np.linalg.qr(x)

(array([[ 0.        , -0.89685441, -0.44232587],
        [-0.16439899, -0.43630755,  0.88465174],
        [-0.98639392,  0.07271792, -0.14744196]]),
 array([[-6.08276253, -6.57595949, -6.4115605 ],
        [ 0.        , -6.69004908, -7.15059594],
        [ 0.        ,  0.        ,  5.45535238]]))

### svd

The `np.linalg.svd` function computes the singular value decomposition (SVD)

In [185]:
x = np.array([[0,6,4],[1,4,9],[6,6,5]]) #3x3
x

array([[0, 6, 4],
       [1, 4, 9],
       [6, 6, 5]])

In [186]:
np.linalg.svd(x)

(array([[-0.45173888, -0.15811365, -0.87802737],
        [-0.63717696, -0.63167825,  0.44157457],
        [-0.62444976,  0.75893521,  0.18460726]]),
 array([14.63010865,  4.78633961,  3.17031133]),
 array([[-0.2996475 , -0.61556889, -0.7288939 ],
        [ 0.81940133,  0.22526951, -0.52710067],
        [ 0.48866434, -0.75520103,  0.43689652]]))