# Introduction
NumPy is a powerful Python package for scientific computing. It helps handle large multidimensional arrays and matrices, along with a large library of functions to operate on these arrays.

Python comes with the `List` data structure, which can be used for array operations. However, NumPy arrays are heavily optimized for computing and has useful functions for computing that Python does not include.

In [None]:
import numpy as np

# Terminology refresh



## Definitions
- Vector: An element of a vector space. The dimension of a vector conventionally refers to the subspace of which it is a part of, that is $n$ for $\vec{v}\in\mathbb{R}^n$, for a real vector.
  - However, it is not entirely wrong to say that vectors are one-dimensional, in the sense that a vector can be represented in a single row or column (i.e. the definition of dimension for a matrix).
  - Example: $\vec{v}=\begin{bmatrix}1\\2\\3\end{bmatrix}$. The vector exists in 3-space.
- Matrix: An $m$ by $n$ matrix is a function of two variables, the first of which has domain $\{1,\ldots,m\}$ and the second of which has domain $\{1,\ldots,n\}$.
  - Matrices are two-dimensional by definition.
  - Example: $A=\begin{bmatrix}1&2\\3&4\end{bmatrix}$. The size of the matrix is $2\times2$.
- Array: A linear data structure of a fixed number of sequentially arranged elements of uniform type.
- List: A linear data structure of sequentially arranged elements. Lists are essentially arrays that can have elements of different types and elements can be added/removed.

## Notes
- A column vector can be thought of as a matrix of size $m\times1$ and a row vector can be thought of as a matrix of size $1\times n$.
- Arrays and lists are computer representations of structures of elements. "Array" and "list", as defined above, are more conventional terms used in computer science terms than terms with strict mathematical definitions.
- Arrays and lists can represent structures with higher dimensions than that of a matrix. (Very) Roughly speaking, vectors are a subset of matrices, which are a subset of arrays/lists.
- The mathematical equivalent to "array" may be "tensor", in that it is a matrix with more dimensions.

## Convention
Moving forward, we will refer to 2D objects as either arrays or matrices. Objects of higher dimensions will be referred to as arrays.

# Array Creation

## Vector
Given the row vector $\vec{v}=\begin{bmatrix}1&2&3&4\end{bmatrix}$, store it in a variable.

An example using Python lists are only provided for reference. In practice, use `ndarray`.

In [None]:
# as Python list
v_list = [1, 2, 3, 4]

# as numpy ndarray
## use np.array array creation routine, which takes an array_like
## (in this case, a Python list) and converts it to an `ndarray`.
v_arr = np.array([1, 2, 3, 4])

In [None]:
print(v_list, type(v_list))
print()
print(v_arr, type(v_arr))

[1, 2, 3, 4] <class 'list'>

[1 2 3 4] <class 'numpy.ndarray'>


## Matrix
Given the matrix $A=\begin{bmatrix}1&2\\3&4\end{bmatrix}$, store it in a variable.

Note that `np.matrix` is strictly two-dimensional while `np.ndarray` is N-dimensional. Matrix objects are a subclass of `ndarray`.

An example using Python lists are only provided for reference. In practice, use `ndarray`.

In [None]:
# as Python list
A_list = [[1, 2],
          [3, 4]]

# as numpy ndarray
A_arr = np.array([[1, 2],
              [3, 4]])
# or
A_mat = np.mat("1, 2; 3, 4")

In [None]:
print(A_list, type(A_list))
print()
print(A_arr, type(A_arr))
print()
print(A_mat, type(A_mat))

[[1, 2], [3, 4]] <class 'list'>

[[1 2]
 [3 4]] <class 'numpy.ndarray'>

[[1 2]
 [3 4]] <class 'numpy.matrix'>


## Zeros and Ones
Create an array, $A$, with shape $3\times4$ of all zeros and an array, $B$, with shape $3\times4$ of all ones.

In [None]:
A = np.zeros([3, 4])
B = np.ones([3, 4])

In [None]:
print(A)
print()
print(B)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


## From a `DataFrame`

In [None]:
import pandas as pd

my_own_dataset = pd.DataFrame({'Col1': range(5),
                               'Col2': 1.0,
                               'Col3': 1.0,
                               'Col4': range(5, 0, -1)})

my_own_dataset

Unnamed: 0,Col1,Col2,Col3,Col4
0,0,1.0,1.0,5
1,1,1.0,1.0,4
2,2,1.0,1.0,3
3,3,1.0,1.0,2
4,4,1.0,1.0,1


In [None]:
np.array(my_own_dataset)

array([[0., 1., 1., 5.],
       [1., 1., 1., 4.],
       [2., 1., 1., 3.],
       [3., 1., 1., 2.],
       [4., 1., 1., 1.]])

## Identity
Create an $n\times n$ identity matrix, $I_n$, where $n=3$.

`np.eye` can be used to create non-square arrays, as well.

In [None]:
I_n = np.eye(3)

In [None]:
print(I_n)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## Other
Visit https://numpy.org/doc/stable/reference/routines.array-creation.html.

Array creation routines that do not take shape as an argument can be reshaped into a desired shape as shown [below](##Reshape).

# Array Attributes
Given $A=\begin{bmatrix}1&2\\3&4\end{bmatrix}$, find the dimensions (rank) $A$, shape, number of elements, datatype, and memory usage of $A$.

In [None]:
A = np.array([[1, 2],
              [3, 4]])

n = A.ndim
shape = A.shape
size = A.size
dtype = A.dtype
memory = A.nbytes # 64 bits/element * 4 elements = 8 bytes/element * 4 elements

In [None]:
print(n)
print()
print(shape)
print()
print(size)
print()
print(dtype)
print()
print(memory)

2

(2, 2)

4

int64

32


# Array Operations
Any operations applied to matrices are element-wise by default.

## Addition, Subtraction, and Division
Given $A=\begin{bmatrix}1&2\\3&4\end{bmatrix}$, compute $A+A$, $A+1$, $A-A$, $A-1$, $A/A$ and $A/2$.

These are all element-wise operations. The second operand can either be an array of the same shape or a single element which would be broadcasted to an array of the same shape.

In [None]:
A = np.array([[1, 2],
              [3, 4]])

# add
add_AA = A + A
add_A1 = A + 1

# sub
sub_AA = A - A
sub_A1 = A - 1

# div
div_AA = A / A
div_A2 = A / 2

In [None]:
print(add_AA)
print(add_A1)
print()
print(sub_AA)
print(sub_A1)
print()
print(div_AA)
print(div_A2)

[[2 4]
 [6 8]]
[[2 3]
 [4 5]]

[[0 0]
 [0 0]]
[[0 1]
 [2 3]]

[[1. 1.]
 [1. 1.]]
[[0.5 1. ]
 [1.5 2. ]]


## Multiplication
Given the matrix $A=\begin{bmatrix}1&2\\3&4\end{bmatrix},B=\begin{bmatrix}4&3\\2&1\end{bmatrix}$, compute their standard product ($AB$) and element-wise (Hadamard) product ($A\odot B$).

In [None]:
A = np.array([[1, 2],
              [3, 4]])
B = np.array([[4, 3],
              [2, 1]])

standard_product_arr = A @ B
hadamard_product_arr = A * B

# using matrices
A = np.mat("1, 2; 3, 4")
B = np.mat("4, 3; 2, 1")

standard_product_mat = A @ B # A * B does the same thing
hadamard_product_mat = np.multiply(A, B)

In [None]:
print(standard_product_arr, type(standard_product_arr))
print(hadamard_product_arr, type(hadamard_product_arr))
print()
print(standard_product_mat, type(standard_product_mat))
print(hadamard_product_mat, type(hadamard_product_mat))

[[ 8  5]
 [20 13]] <class 'numpy.ndarray'>
[[4 6]
 [6 4]] <class 'numpy.ndarray'>

[[ 8  5]
 [20 13]] <class 'numpy.matrix'>
[[4 6]
 [6 4]] <class 'numpy.matrix'>


## Boolean Operations
Given $A=\begin{bmatrix}1&2\\3&4\end{bmatrix}$, find $A > 2.5$, $A < 2.5$, $A==1$, and $A != 1$.

In [None]:
A = np.array([[1, 2],
              [3, 4]])

A_gt = A > 2.5
A_lt = A < 2.5
A_eq = A == 1
A_neq = A != 1

In [None]:
print(A_gt)
print()
print(A_lt)
print()
print(A_eq)
print()
print(A_neq)

[[False False]
 [ True  True]]

[[ True  True]
 [False False]]

[[ True False]
 [False False]]

[[False  True]
 [ True  True]]


# Array Indexing
A single number in a subscript denotes the element at the zero-indexed position.

A `:` in a subscript means that you are taking a slice of the object. `A[a:b]` takes a slice of `A` from `a` (inclusive) to `b` (exclusive). Leaving out one of the fields extends the slice to the end in the corresponding direction (i.e. `A[:b]==A[0:b] and A[a:]==A[a:len(A)]`).

Commas in the subscript delimit the dimensions of which you are indexing. This is a feature provided by `numpy` and will not work for Python lists.

## Array Elements
Given $A=\begin{bmatrix}1&2\\3&4\end{bmatrix}$, get $A_{12}$.

In [None]:
A = np.array([[1, 2],
              [3, 4]])

# get A_12
A_12 = A[0][1]

In [None]:
print(A_12)

2


## Array Row(s)
Given $A=\begin{bmatrix}1&2&3\\4&5&6\\7&8&9\end{bmatrix}$, get the first row, last two rows, and rows 1 and 3 of $A$.

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# get A_1* (first row of A)
A_r_1 = A[0,:]

# get last two rows of A
A_r_23 = A[1:3,:]

# get rows 1 and 3 of A
A_r_13 = A[[0, 2],:]

In [None]:
print(A_r_1)
print()
print(A_r_23)
print()
print(A_r_13)

[1 2 3]

[[4 5 6]
 [7 8 9]]

[[1 2 3]
 [7 8 9]]


## Array Column(s)
Given $A=\begin{bmatrix}1&2&3\\4&5&6\\7&8&9\end{bmatrix}$, get the first column, last two columns, and columns 1 and 3 of $A$.

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# get A_*1 (first column of A)
A_c_1 = A[:,0]

# get last two columns of A
A_c_23 = A[:,1:3]

# get columns 1 and 3 of A
A_c_13 = A[:,[0,2]]

In [None]:
print(A_c_1)
print()
print(A_c_23)
print()
print(A_c_13)

[1 4 7]

[[2 3]
 [5 6]
 [8 9]]

[[1 3]
 [4 6]
 [7 9]]


## Array Rows and Columns
Given $A=\begin{bmatrix}1&2&3\\4&5&6\\7&8&9\end{bmatrix}$, get rows 1 and 3 and columns 2 and 3 of $A$.

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# get specified rows and columns

## using subscript operator
### select rows
result1 = A[[0, 2],:]
### select columns
result1 = result1[:,[1, 2]]

## using np.ix_ (recommended)
result2 = A[np.ix_([0, 2], [1, 2])]

In [None]:
print(result1)
print()
print(result2)

[[2 3]
 [8 9]]

[[2 3]
 [8 9]]


# Matrix Shape Manipulation

## Transpose
Given $A=\begin{bmatrix}1&2\\3&4\\5&6\end{bmatrix}$, find $A^T$.

In [None]:
A = np.array([[1, 2],
              [3, 4],
              [5, 6]])

A_T = A.T

In [None]:
print(A_T)

[[1 3 5]
 [2 4 6]]


## Flatten
Given $A=\begin{bmatrix}1&2\\3&4\end{bmatrix}$, flatten $A$. In other words, change its shape from $m\times n$ to $1\times (m+n)$

In [None]:
A = np.array([[1, 2],
              [3, 4]])

A_flattened = np.ravel(A) # ravel and unravel mean the same thing

In [None]:
print(A_flattened)

[1 2 3 4]


## Reshape
Given $A=\begin{bmatrix}1&2&3&4\\5&6&7&8\end{bmatrix}$, reshape $A$ from $2\times4$ to $4\times2$, $2\times2\times2$, and $*$ (infer dimensions).

Be careful with using `ndarray.resize`, it is possible to drop or add elements.

In [None]:
A = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8]])

# reshape out-of-place
A_reshaped_2d = A.reshape([4, 2])
A_reshaped_3d = A.reshape([2, 2, 2])
A_reshaped_infer = A.reshape([-1])

# reshape in-place
A_resized_2d = A.copy()
A_resized_2d.resize([4, 2])
A_resized_3d = A.copy()
A_resized_3d.resize([2, 2, 2])
# not possible to infer dimension with resize

In [None]:
print(A_reshaped_2d)
print()
print(A_reshaped_3d)
print()
print(A_reshaped_infer)
print()
print(A_resized_2d)
print()
print(A_resized_3d)

[[1 2]
 [3 4]
 [5 6]
 [7 8]]

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]

[1 2 3 4 5 6 7 8]

[[1 2]
 [3 4]
 [5 6]
 [7 8]]

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


# References
1. S. Madhavan. (2015). Mastering Python for Data Science. Available: https://github.com/AmandaZou/Data-Science-books-/blob/master/Mastering%20Python%20for%20Data%20Science.pdf
2. A. Boschetti, L. Massaron. Python Data Science Essentials. Available: https://aaronyeo.org/books_/Data_Science/Python/Python%20Data%20Science%20Essentials.pdf