<h1>CS4618: Artificial Intelligence I</h1>
<h1>Vectors and Matrices</h1>
<h2>
    Derek Bridge<br>
    School of Computer Science and Information Technology<br>
    University College Cork
</h2>

<h1>Initialization</h1>
$\newcommand{\Set}[1]{\{#1\}}$ 
$\newcommand{\Tuple}[1]{\langle#1\rangle}$ 
$\newcommand{\v}[1]{\pmb{#1}}$ 
$\newcommand{\cv}[1]{\begin{bmatrix}#1\end{bmatrix}}$ 
$\newcommand{\rv}[1]{[#1]}$ 
$\DeclareMathOperator{\argmax}{arg\,max}$ 
$\DeclareMathOperator{\argmin}{arg\,min}$ 
$\DeclareMathOperator{\dist}{dist}$
$\DeclareMathOperator{\abs}{abs}$

In [11]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [12]:
import numpy as np

import numpy.linalg as npla

from math import sqrt

<h1>Doing Things with Data</h1>
<ul>
    <li>All of the following are about doing things with data:
        <ul>
            <li>data science, data analytics, machine learning, statistics, statistical machine learning, statistical inference,
            data mining, knowledge discovery, pattern recognition, &hellip;
            </li>
        </ul>
    </li>
    <li>These fields have been given impetus by:
        <ul>
            <li>availability of lots of data (sometimes 'big data'), partly due to sensors, the Internet, &hellip;</li>
            <li>availability of hardware for high volume storage and processing, including GPUs and TPUs, cloud computing, &hellip;
                <!-- The next generation of Google's Pixel phones (Pixel 6) uses TPUs: 
                     https://blog.google/products/pixel/google-tensor-debuts-new-pixel-6-fall/ -->
            </li>
        </ul>
    </li>
    <li>We use techniques discovered by these fields for tasks in AI such as prediction (regression, classification),
        clustering, speech recognition, machine translation, &hellip;
    </li>
    <li>But, first, some background maths!</li>
</ul> 

<h1>Matrices</h1>
<ul>
    <li>A <b>matrix</b> is a rectangular array, in our case of real numbers.</li>
    <li>
        In general, we use bold capital letters, e.g. $\v{A}$, for matrices, e.g.
        $$\v{A} = \begin{bmatrix}
                      2 & 4 & 0 \\
                      1 & 3 & 2
                  \end{bmatrix}
        $$
    </li>
    <li>
        A matrix with $m$ rows and $n$ columns is an <b>$m \times n$ matrix</b>.
        <ul>
            <li>
                What are $m$ and $n$ for $\v{A}$?
            </li>
        </ul>
        $m$ and $n$ are sometimes called its <b>dimensions</b>.
    </li>
    <li>
        We refer to an <b>element</b> of a matrix either using subscripts or indexes:
        <ul>
            <li>
                $\v{A}_{i,j}$ or $\v{A}[i,j]$ is the element in the $i$th row and $j$th column.
            </li>
            <li>
                We will index from 1.
                <ul>
                    <li>
                        However, we will sometimes use position 0 for 'technical' purposes.
                    </li>
                    <li>
                        And we must be aware that Python numpy arrays and matrices are 0-indexed.
                    </li>
                </ul>
             </li>
             <li>
                 So what are $\v{A}_{2,1}$, $\v{A}_{1,2}$, $\v{A}_{0,0}$ and $\v{A}_{3, 2}$?
             </li>
        </ul>
    </li>
</ul>

<h1>Vectors</h1>
<ul>
    <li>A <b>vector</b> is a matrix that has only one column, i.e. a $m \times 1$ matrix.</li>
    <li>
        A vector with $m$ rows is called a <b>$m$-dimensional</b> vector.
    </li>
    <li>
        In general, we use bold lowercase letters for vectors, e.g.
        $$\v{x} = \cv{2\\4\\3}$$
    <li>
        Sometimes this is called a <b>column vector</b>.
    </li>
    <li>
        Then, by contrast, a <b>row vector</b> is a matrix that has only one row, i.e. a $1 \times n$ matrix, e.g.
        $$\rv{2, 4, 3}$$
    </li>
    <li>
        Unless stated otherwise, a vector should be assumed to be a column vector.
    </li>
    <li>
        We can refer to an element using a single subscript, again most of the time indexed from 1.
        <ul>
            <li>
                So what is $\v{x}_1$?
            </li>
        </ul>
    </li>
</ul>

<h1>Vectors and Matrices in Python</h1>
<ul>
    <li>We won't use lists!</li>
</ul>

In [13]:
# We won't be doing it this way

x = [2, 4, 3]

A = [[2, 4, 0], [1, 3, 2]]

<ul>
    <li>Of the many ways of representing vectors and matrices in Python, we will use two:
        <ul>
            <li>
                pandas library:
                <ul>
                    <li>for vectors: <code>Series</code>, a kind of one-dimensional array;</li>
                    <li>for matrices: <code>DataFrames</code>, which are tabular data structures of rows and (named) columns.
                </ul>
            </li>
            <li>
                numpy library:
                <ul>
                    <li>numpy arrays, which can be one-dimensional, two-dimensional, or have more dimensions.
                </ul>
                The scikit-learn library expects its data to arrive as numpy arrays.
            </li>
        </ul>
    </li>
</ul> 

<h1>Using numpy arrays</h1>

In [14]:
# Vectors
# We will use a numpy 1d array, which we can create from a list
# Note that, done this way, there is no way for us to distinguish between column- and row-vectors
x = np.array([2, 4, 3])

# Matrices
# We will use a numpy 2d array, which we can create from a list of lists
A = np.array([[2, 4, 0], [1, 3, 2]])

<p>
    We can see their dimensions:
</p>

In [15]:
x.ndim

1

In [16]:
x.shape

(3,)

In [17]:
A.ndim

2

In [18]:
A.shape

(2, 3)

<p>
    Note that the shape is always a tuple. Hence, x.shape is (3,), not 3.
</p>
<!--
<p>
    We can make it into a nested list using the reshape method, and then its shape is (3,1):
</p>
X = x.reshape((3,1))
X
X.shape
<p>
    Reshaping it to $(3, 1)$ makes it more clearly a column vector: 3 rows, 1 column.
</p>
<p>
    If we had reshaped it to $(1, 3)$, then it would a be more like a row vector: 1 row, 3 columns:
</p>
X = x.reshape((1, 3))
X
X.shape
<p>
    In general, we won't reshape unless necessary. For vectors, we'll just work with 1d numpy arrays.
</p>
-->

We can use the reshape method to give us the same data but with a different shape.

In [19]:
B = A.reshape((3,2))
B

array([[2, 4],
       [0, 1],
       [3, 2]])

In [20]:
B.shape

(3, 2)

<h1>Transpose</h1>
<ul>
    <li>The <b>transpose</b> of $m \times n$ matrix $\v{A}$, written $\v{A}^T$, is the $n \times m$ matrix in 
        which the first row of $\v{A}$ becomes the first column of $\v{A}^T$, the second row of $\v{A}$ becomes 
        the second column of $\v{A}^T$, and so on:
        <ul>
            <li>
                $\v{A}_{i,j}^T = \v{A}_{j,i}$ for all $i,j$
            </li>
        </ul>
    </li>
    <li>
        E.g.
        $$\v{A} = \begin{bmatrix}
                2 & 4 & 0 \\
                1 & 3 & 2
              \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
          \v{A}^T = \begin{bmatrix}
                     2 & 1 \\
                     4 & 3 \\
                     0 & 2
                    \end{bmatrix}
        $$
    </li>
    <li>
        As a special case, if $\v{x}$ is a $m$-dimensional column vector ($m \times 1$), then $\v{x}^T$ is a 
        $m$-dimensional row vector ($1 \times m$), e.g.
        $$\v{x} = \cv{2\\4\\3}\,\,\,\,\,\,\,\,\,\, \v{x}^T = \rv{2, 4, 3}$$
    </li>
</ul>

<h2>Transpose in numpy</h2>
<ul>
    <li>numpy arrays offer easy ways to compute their transpose: either the <code>transpose</code> method or 
       the <code>T</code> attribute:
    </li>
</ul>

In [21]:
A = np.array([[2, 4, 0], [1, 3, 2]])

In [22]:
# Transpose as a method
A.transpose()

array([[2, 1],
       [4, 3],
       [0, 2]])

In [23]:
# Tranpose as an attribute
A.T

array([[2, 1],
       [4, 3],
       [0, 2]])

<h1>Tensors</h1>
<ul>
    <li>A quantity (a number), often referred to in this context as a <b>scalar</b>, has no dimensions.</li>
    <li>A vector has one dimension, $m$.</li>
    <li>A matrix has two dimensions, $m$ and $n$.</li>
    <li>We can also have objects that have three or more dimensions.</li>
    <li>We refer to all of these objects as <b>tensors</b> and we refer to the number of dimensions as the <b>rank</b> of the tensor.
        <ul>
            <li>A scalar is a rank 0 tensor.</li>
            <li>A vector is a rank 1 tensor.</li>
            <li>A matrix is a rank 2 tensor.</li>
            <li>And we can have rank 3 tensors, rank 4 tensors, and so on.</li>
        </ul>
    </li>
    <li>Be warned that there are lots of different definitions of 'scalar, 'vector', 'dimension' and 'rank' 
        that you may find if you read around the subject. They may not all agree with my usage. My usage,
        I believe, is consistent with the way we use these words in AI.
    </li>
    <li>The rest of this lecture continues to work only with scalars, vectors and matrices.</li>
</ul>

<h1>Scalar Addition and Scalar Multiplication</h1>
<ul>
    <li>Scalar addition and multiplication both work <b>elementwise</b>, i.e.:
        <ul>
            <li>in scalar addition, we add the number to each element in the matrix;</li>
            <li>in scalar multiplication, we multiply each element in the matrix by the number.</li>
        </ul>
     </li>
     <li>E.g.
        $$\v{A} = 
            \begin{bmatrix}
                2 & 4 & 0 \\
                1 & 3 & 2
            \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
          2 + \v{A} = 
            \begin{bmatrix}
                4 & 6 & 2 \\
                3 & 5 & 4
            \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
          2\v{A} = 
            \begin{bmatrix}
                4 & 8 & 0 \\
                2 & 6 & 4
            \end{bmatrix}
        $$
    </li>
</ul>

<h2>Scalar Addition and Scalar Multiplication in numpy</h2>
<ul>
    <li>numpy arrays enable operations like these using the normal addition, subtraction, multiplication and division
        operators and without writing for loops.
    </li>
</ul>

In [24]:
A = np.array([[2, 4, 0], [1, 3, 2]])

In [25]:
2 + A

array([[4, 6, 2],
       [3, 5, 4]])

In [26]:
2 * A

array([[4, 8, 0],
       [2, 6, 4]])

<ul>
    <li>Other Python operators also work:</li>
</ul>

In [27]:
A**2

array([[ 4, 16,  0],
       [ 1,  9,  4]])

<h1>Matrix Addition and Hadamard Product</h1>
<ul>
    <li>Matrix addition and Hadamard product require two matrices that have <em>the same dimensions</em>.</li>
    <li>They are also defined elementwise: by adding or multiplying <em>corresponding</em> elements.</li>
     <li>E.g.
    $$
        \v{A} = \begin{bmatrix}
                2 & 4 & 0 \\
                1 & 3 & 2
            \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
        \v{B} = \begin{bmatrix}
                1 & 0 & 5 \\
                2 & 3 & 2
            \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
        \v{A}+\v{B} = \begin{bmatrix}
                3 & 4 & 5 \\
                3 & 6 & 4
              \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
        \v{A}*\v{B} = \begin{bmatrix}
                2 & 0 & 0 \\
                2 & 9 & 4
              \end{bmatrix}
    $$
    </li>
    <li>In maths, Hadamard product is more often written with a dot ($\cdot$ or $\circ$), but we will use $\ast$.</li>
</ul>

<h2>Matrix Addition and Hadamard Product in numpy</h2>
<ul>
    <li>We don't need to write any loops, just use <code>+</code> and <code>&ast;</code>:</li>
</ul>

In [28]:
A = np.array([[2, 4, 0], [1, 3, 2]])
B = np.array([[1, 0, 5], [2, 3, 2]])

In [29]:
A + B

array([[3, 4, 5],
       [3, 6, 4]])

In [30]:
A * B

array([[2, 0, 0],
       [2, 9, 4]])

<h1>Matrix Multiplication</h1>
<ul>
    <li>We can compute $\v{A}\v{B}$, the result of multiplying matrices $\v{A}$ and $\v{B}$, provided the number of columns
        of $\v{A}$ equals the number of rows of $\v{B}$.
        <ul>
            <li>
                If $\v{A}$ is a $m \times p$ matrix and $\v{B}$ is a $p \times n$ matrix, then we can compute $C = \v{A}\v{B}$.
            </li>
            <li>
                $\v{C}$ will be a $m \times n$ matrix.
            </li>
        </ul>
    </li>
    <li>
        $\v{C}_{i,j}$ is obtained by multiplying elements of the $i$th row of $\v{A}$ by corresponding elements
        of the $j$th column of $\v{B}$ and summing:
        $$\v{C}_{i,j} = \sum_{k=1}^p\v{A}_{i,k}\v{B}_{k,j}$$
    </li>
    <li>E.g.
        $$\v{A} = \begin{bmatrix}
                    2 & 4 & 0 \\
                    1 & 3 & 2
                  \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
          \v{B} = \begin{bmatrix}
                    3 & 1 & 2\\
                    2 & 3 & 1\\
                    1 & 3 & 3
                    \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
         \v{A}\v{B} = \begin{bmatrix}
                         14 & 14 & 8\\
                         11 & 16 & 11
                      \end{bmatrix}
         $$
    </li>
    <li>Since vectors are just one-column vectors, matrix multiplication can apply &mdash; provided the dimensions are OK, e.g.
        $$\v{A} = \begin{bmatrix}
                  2 & 4 & 0 \\
                  1 & 3 & 2
                  \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
          \v{x} = \cv{2\\3\\1}\,\,\,\,\,\,\,\,\,\,
          \v{y} = \cv{2\\3}\,\,\,\,\,\,\,\,\,\,
          \v{A}\v{x} = \cv{16\\13}\,\,\,\,\,\,\,\,\,\,
          \v{A}\v{y} \mbox{ is undefined}
        $$
    </li>
    <li>What about carying out this operation if both are vectors? Well, they'd need to have the same dimension. For
        example, here they are both 3-dimensional:
        $$\v{x} = \cv{2\\3\\1}\,\,\,\,\,\,\,\,\,\,\v{y} = \cv{-1\\6\\4}$$
        But, even if they are the same dimension, we cannot compute $\v{x}\v{y}$ because we need the number of columns
        of $\v{x}$ (in this case, 1) to equal the number of rows of $\v{y}$ (in this case, 3). 
    </li>
    <li>To get this to work, we need to use the transpose of $\v{x}$:
        $$\v{x}^T = \rv{2,3,1}\,\,\,\,\,\,\,\,\,\,\v{y} = \cv{-1\\6\\4}\,\,\,\,\,\,\,\,\,\,\v{x}^T\v{y} = 20$$
        The number of columns
        of $\v{x}^T$ (3) is equal to the number of rows of $\v{y}$ (also 3). 
        Note how the result is a scalar. This operation (multiply two vectors) is so common that it crops up with some other names including
        the <b>dot product</b> or the scalar product and even the inner product (although, technically, the inner product is 
        a more general concept).
    </li>
</ul> 

<h2>Matrix Multiplication in numpy</h2>
<ul>
    <li>numpy offers <code>dot</code> as a function or method for matrix multiplication:
</ul>

In [31]:
A = np.array([[2, 4, 0], [1, 3, 2]])
B = np.array([[3, 1, 2], [2, 3, 1], [1, 3, 3]])

In [32]:
# Multiplication as a function
np.dot(A, B)

array([[14, 14,  8],
       [11, 16, 11]])

In [33]:
# Multiplication as a method
A.dot(B)

array([[14, 14,  8],
       [11, 16, 11]])

<ul>
    <li>Remember, matrix multplication in numpy is done with <code>dot</code>, not &ast;.</li>
    <!--<li>Broadcasting does not apply to matrix multiplication, since it's not an elementwise operation</li>-->
</ul>

<h1>Identity Matrices</h1>
<ul>
    <li>The $n \times n$ <b>identity matrix</b>, $\v{I}_n$, contains zeros except for entries on the main diagonal
        (from top left to bottom right):
        <ul>
            <li>
                $\v{I}_n[i,i] = 1$ for $i = 1,\ldots,n$ and $\v{I}_n[i,j] = 0$ for $i \neq j$
            </li>
        </ul>
    <li>E.g.:
        $$\v{I}_3 = \begin{bmatrix}
                    1 & 0 & 0 \\
                    0 & 1 & 0 \\
                    0 & 0 & 1
                    \end{bmatrix}
        $$
    </li>
    <li>
        If $\v{A}$ is an $m \times n$ matrix then, $\v{A}\v{I}_n = \v{I}_m\v{A} = \v{A}$
    </li>
</ul>

<h2>Identity Matrices in numpy</h2>
<ul>
    <li>Create identity matrices using the <code>identity</code> function:</li>
</ul>

In [34]:
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

<h1>Inverses</h1>
<ul>
    <li>If $\v{A}$ is a $n \times n$ matrix, then its <b>inverse</b>, $\v{A}^{-1}$ (<em>if it has one</em>) is also 
        a $n \times n$ matrix such that $\v{A}\v{A}^{-1} = \v{I}_n$.
    </li>
    <li>E.g.
        $$\v{A} = \begin{bmatrix}
                    1 &  0 & 2 \\
                    2 & -1 & 3 \\
                    4 &  1 & 8
                 \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
         \v{A}^{-1} = \begin{bmatrix}
                     -11 &  2 &  2 \\
                      -4 &  0 &  1 \\
                       6 & -1 & -1
                      \end{bmatrix}
        $$
    </li>
    <li>
        Some $n \times n$ matrices do not have inverses, e.g.
        $$\begin{bmatrix}
            1 & 1 & 1 \\
            1 & 1 & 1 \\
            1 & 1 & 1
           \end{bmatrix}$$
        In these cases, provided the matrix is square, you can compute a <b>pseudo-inverse</b>, which you can use 
        for <em>some</em> of the same purposes instead.
    </li>
</ul>

<h2>Inverses in numpy</h2>
<ul>
    <li>numpy.linalg offers function <code>inv</code> for computing inverses, but also function 
        <code>pinv</code> for computing the Moore-Penrose pseudo-inverse:
    </li>
</ul>

In [35]:
A = np.array([[1, 0, 2], [2, -1, 3], [4, 1, 8]])

npla.inv(A)

array([[-11.,   2.,   2.],
       [ -4.,  -0.,   1.],
       [  6.,  -1.,  -1.]])

In [36]:
npla.pinv(A)

array([[-1.10000000e+01,  2.00000000e+00,  2.00000000e+00],
       [-4.00000000e+00,  1.17413148e-14,  1.00000000e+00],
       [ 6.00000000e+00, -1.00000000e+00, -1.00000000e+00]])

In [37]:
B = np.ones((3,3))

npla.inv(B) # raises an exception

LinAlgError: Singular matrix

In [38]:
npla.pinv(B)

array([[0.11111111, 0.11111111, 0.11111111],
       [0.11111111, 0.11111111, 0.11111111],
       [0.11111111, 0.11111111, 0.11111111]])

<h1>Some numpy Methods</h1>
<ul>
    <li>numpy offers methods for calculations that, in other languages, would require you to write loops</li>
    <li>E.g. <code>sum</code>, <code>mean</code>, <code>min</code>, <code>max</code>, <code>argmin</code>, <code>argmax</code>,
        &hellip;
    </li>
</ul>

In [39]:
x = np.array([2, 4, 3])
A = np.array([[2, 4, 0], [1, 3, 2]])

In [40]:
x.sum()

9

In [41]:
A.sum()

12

<h1>Some numpy Universal Functions</h1>
<ul>
    <li>Consider a function such as <code>sqrt</code>.</li>
    <li>In Python, <code>sqrt</code> (from the <code>math</code> library) takes in a number but can't take in a
        list of numbers:
    </li>
</ul>

In [42]:
sqrt(9)

3.0

In [43]:
sqrt([1, 4, 9]) # Raises an exception

TypeError: must be real number, not list

<ul>
    <li>But, the corresponding numpy function can apply to arrays:</li>
</ul>

In [44]:
np.sqrt(9)

3.0

In [45]:
x = np.array([1, 4, 9])

np.sqrt(x)

array([1., 2., 3.])

In [46]:
A = np.array([[1, 4, 9], [16, 25, 36], [49, 64, 81]])

np.sqrt(A)

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

<ul>
    <li>The function is applied elementwise</li>
    <li>In numpy, these are called 'universal functions' (or 'ufuncs')</li>
    <li>Others include: <code>abs</code>, <code>exp</code>, <code>log10</code>, &hellip;</li>
</ul>

<h1>Vectorization</h1>
<ul>
    <li>Algorithms that might otherwise need for-loops and indexing can often be written much more succinctly by expressing them
            in terms of operators, methods and functions that work on entire arrays.
    </li>
    <li>More than this, if your programming language has efficient implementations of
        these operators, methods and functions, the resulting programs can run much faster too.
        <ul>
            <li>numpy's operators, methods and functions, for example, are typically one or more orders of magnitude faster 
                than their pure Python equivalents (written using loops and indexing).
            </li>
        </ul>
        So, avoid loops and indexing!
    </li>
    <li>Using fast array operators, methods and functions in this way is known as <b>vectorization</b>.
    </li>
</ul>

In [47]:
def sum(L):
    total = 0.0
    for x in L:
        total += x
    return total

In [48]:
x = list(range(1, 201))

In [49]:
%timeit sum(x)

4.08 μs ± 23.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [50]:
x = np.arange(1, 201)

In [51]:
%timeit np.sum(x)

1.24 μs ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


<p>
    By default, <code>timeit</code> runs your code enough times to get sufficient accuracy (100000 in the above) and computes the average run time, then it does it again,
    and then again, until it has done it seven times, and tells you the best of these seven average run times. It does this to try to make its measurements
    robust when other things are happening on your machine at the same time.
</p>