# NumPy

[NumPy](https://docs.scipy.org/doc/numpy-1.10.0/index.html) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The ancestor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers.

Python's built-in data structures include (but is not limited to) lists, tuples, sets, and dictionaries.

Lists:

* are enclosed in square brackets `[one, two, three]`
* are ordered (and uses an index for access)
* are mutable (you can modify a list after creation)
* are not unique (item duplication is possible)
* can be composed of different data types (strings, integers, etc.)

NumPy arrays have the same characteristics as lists but arrays:

* need to be declared and creating an array requires a specific function (e.g. `numpy.array()`)
* can store data more efficiently than lists
* support a large collection of numerical operations

Data analytics rely heavily on NumPy arrays.

Use `import` to load NumPy.

In [1]:
import numpy as np

Create a NumPy array using `np.array()`.

In [2]:
number = np.array(list(range(1, 11)))
print(type(number))

<class 'numpy.ndarray'>


`number` is an array containing ten items.

In [3]:
number

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Element-wise operations

In [4]:
print(number + number)
print(number * 10)

[ 2  4  6  8 10 12 14 16 18 20]
[ 10  20  30  40  50  60  70  80  90 100]


Access fifth item.

In [5]:
print(number[4])

5


Array slice.

In [6]:
print(number[1:3])

[2 3]


Array of booleans.

In [7]:
number > 5

array([False, False, False, False, False,  True,  True,  True,  True,
        True])

Subset using array of booleans.

In [8]:
print(number[number > 5])

[ 6  7  8  9 10]


Using AND with two boolean operators.

In [9]:
print(np.logical_and(number > 2, number < 8))

[False False  True  True  True  True  True False False False]


Using OR.

In [10]:
print(np.logical_or(number == 7, number < 5))

[ True  True  True  True False False  True False False False]


Subset using condition above.

In [11]:
print(number[np.logical_or(number == 7, number < 5)])

[1 2 3 4 7]


NumPy arrays with different types are coerced. The example below shows that all types are coerced into strings.

In [12]:
test = np.array([True, "string", 1])
print(test)
print(type(test[0]))

['True' 'string' '1']
<class 'numpy.str_'>


Booleans and integers are coerced into integers.

In [13]:
test2 = np.array([True, False, 1])
print(test2)
print(type(test2[0]))

[1 0 1]
<class 'numpy.int64'>


A two dimensional array.

In [14]:
two_d = np.array(
    [
        [1, 2, 3],
        [4, 5, 6]
    ]
)

print(two_d)
print(type(two_d))

[[1 2 3]
 [4 5 6]]
<class 'numpy.ndarray'>


Shape of array, e.g. integer vector describing how many dimensions along each axis.

In [15]:
print(two_d.shape)

(2, 3)


Access cell on second row and second column.

In [16]:
print(two_d[1, 1])
print(two_d[1][1])

5
5


Access all items on first row.

In [17]:
print(two_d[0,:])

[1 2 3]


Access all items on second column.

In [18]:
print(two_d[:,1])

[2 5]


Three dimensional array.

In [19]:
three_d = np.array(
    [
        [
            [5, 78, 2, 34, 0],
            [6, 79, 3, 35, 1],
            [7, 80, 4, 36, 2]
        ],
        [
            [5, 78, 2, 34, 0],
            [6, 79, 3, 35, 1],
            [7, 80, 4, 36, 2]
        ],
        [
            [5, 78, 2, 34, 0],
            [6, 79, 3, 35, 1],
            [7, 80, 4, 36, 2]
        ]
    ]
)

print(three_d.shape)
print(three_d.ndim)

(3, 3, 5)
3


Looping over a two dimensional array.

In [20]:
print(np.nditer(two_d))
for x in np.nditer(two_d):
    print(x)

<numpy.nditer object at 0x7f5ec806bdf0>
1
2
3
4
5
6


## Iris

Import iris data using [load_iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html).

In [21]:
from sklearn import datasets
iris = datasets.load_iris()
print(type(iris))

<class 'sklearn.utils.Bunch'>


Feature names.

In [22]:
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

Check out first few rows.

In [23]:
print(iris.data[0:6,])

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]]


Print shape and dimension.

In [24]:
print(iris.data.shape)
print(iris.data.ndim)

(150, 4)
2


Column type.

In [25]:
print(type(iris.data[:,0]))

<class 'numpy.ndarray'>


Calculate simple statistics on first column.

In [26]:
print(np.mean(iris.data[:, 0]))
print(np.median(iris.data[:,0]))
print(np.sum(iris.data[:,0]))

5.843333333333334
5.8
876.5


Correlation of sepal and petal length.

In [27]:
print(np.corrcoef(iris.data[:,0], iris.data[:,2]))

[[1.         0.87175378]
 [0.87175378 1.        ]]


## Matrix operations

Many machine learning algorithms rely on finding a vector of coefficients (internal coefficients) that can result in an approximation of the vector of response values when multiplied by the matrix of features. In such models, the formulation is:

$$ y = Xb + c $$

where $y$ is the response vector, $X$ the feature matrix, $b$ is the vector of coefficients, and $c$ is a scalar constant.

In [28]:
X = np.array([
    [4, 5],
    [2, 4],
    [3, 3]
])
b = [3, -2]

print(X @ b)

[ 2 -2  3]


Transposition occurs when a $n{\times}m$ matrix is transformed into a $m{\times}n$ matrix.

In [34]:
A = np.array([
    [1, 2, 8],
    [4, 9, 6],
    [7, 1, 9]
])

print(A.T)

[[1 4 7]
 [2 9 1]
 [8 6 9]]


Matrix inversion is carried out on square matrices, i.e. a $m{\times}m$ matrix, and this operation allows the immediate resolution of equations involving matrix multiplication. When we multiply a matrix by its inverse we get the Identity Matrix $I$.

In [43]:
A = np.array([
    [1, 2, 8],
    [4, 9, 6],
    [7, 1, 9]
])

A_inv = np.linalg.inv(A)

print(A @ A_inv)

[[ 1.00000000e+00 -5.55111512e-17 -6.93889390e-18]
 [ 5.55111512e-17  1.00000000e+00 -6.76542156e-17]
 [ 8.32667268e-17  0.00000000e+00  1.00000000e+00]]


An [example](https://www.mathsisfun.com/algebra/matrix-inverse.html) of how the inverse is useful.

<blockquote>
    A group took a trip on a bus, at \$3 per child and \$3.20 per adult, and paid \$118.40. They took the train back, at \$3.50 per child and \$3.60 per adult, and paid \$135.20. How many children and adults were in this group?
</blockquote>

$$
\begin{bmatrix}
x_1 & x_2
\end{bmatrix}
\begin{bmatrix}
3 & 3.5 \\
3.2 & 3.6
\end{bmatrix}
=
\begin{bmatrix}
118.4 & 135.2
\end{bmatrix}
$$

In [48]:
A = np.array([
    [3, 3.5],
    [3.2, 3.6]
])

B = np.array([
    [118.4, 135.2]
])

A_inv = np.linalg.inv(A)

X = B @ A_inv

print(X @ A)

[[118.4 135.2]]


## Etc.

Random numbers using [numpy.random](https://docs.scipy.org/doc/numpy/reference/routines.random.html)

In [29]:
np.random.seed(1984)
print(np.random.rand())

# simulate a die
print(np.random.randint(1,7))

0.00875572949604042
4
