<h1>CS4618: Artificial Intelligence I</h1>
<h1>Vectors and Matrices</h1>
<h2>
    Derek Bridge<br>
    School of Computer Science and Information Technology<br>
    University College Cork
</h2>

<h1>Initialization</h1>
$\newcommand{\Set}[1]{\{#1\}}$ 
$\newcommand{\Tuple}[1]{\langle#1\rangle}$ 
$\newcommand{\v}[1]{\pmb{#1}}$ 
$\newcommand{\cv}[1]{\begin{bmatrix}#1\end{bmatrix}}$ 
$\newcommand{\rv}[1]{[#1]}$ 
$\DeclareMathOperator{\argmax}{arg\,max}$ 
$\DeclareMathOperator{\argmin}{arg\,min}$ 
$\DeclareMathOperator{\dist}{dist}$
$\DeclareMathOperator{\abs}{abs}$

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]:
import numpy.linalg as npla

from math import sqrt

<h1>Doing Things with Data</h1>
<ul>
    <li>All of these are about doing things with data:
        <ul>
            <li>data science, data analytics, machine learning, statistics, statistical machine learning, statistical inference,
            data mining, knowledge discovery, pattern recognition, &hellip;
            </li>
        </ul>
    </li>
    <li>These fields have been given impetus by:
        <ul>
            <li>availability of lots of data (sometimes 'big data'), partly due to sensors, the Internet, &hellip;</li>
            <li>availability of hardware for high volume storage and processing, including GPUs, cloud computing, &hellip;</li>
        </ul>
    </li>
    <li>We use techniques discovered by these fields for tasks in AI such as prediction (regression, classification),
        clustering, speech recognition, machine translation, &hellip;
    </li>
    <li>But, first, some background maths!</li>
</ul> 

<h1>Matrices</h1>
<ul>
    <li>A <b>matrix</b> is a rectangular array, in our case of real numbers</li>
    <li>
        In general, we use bold capital letters, e.g. $\v{A}$, for matrices, e.g.
        $$\v{A} = \begin{bmatrix}
                      2 & 4 & 0 \\
                      1 & 3 & 2
                  \end{bmatrix}
        $$
    </li>
    <li>
        <b>Dimension</b>: A matrix with $m$ rows and $n$ columns is an <b>$m \times n$ matrix</b>
        <ul>
            <li>
                What are $m$ and $n$ for $\v{A}$?
            </li>
        </ul>
    </li>
    <li>
        We refer to an <b>element</b> of a matrix either using subscripts or indexes
        <ul>
            <li>
                $\v{A}_{i,j}$ or $\v{A}[i,j]$ is the element in the $i$th row and $j$th column
            </li>
            <li>
                We will index from 1
                <ul>
                    <li>
                        However, we will sometimes use position 0 for 'technical' purposes
                    </li>
                    <li>
                        And we must be aware that Python numpy arrays and matrices are 0-indexed
                    </li>
                </ul>
             </li>
             <li>
                 So what are $\v{A}_{2,1}$, $\v{A}_{1,2}$, $\v{A}_{0,0}$ and $\v{A}_{3, 2}$?
             </li>
        </ul>
    </li>
</ul>

<h1>Vectors</h1>
<ul>
    <li>A <b>vector</b> is a matrix that has only one column, i.e. a $m \times 1$ matrix</li>
    <li>
        A vector with $m$ rows is called a <b>$m$-dimensional</b> vector
    </li>
    <li>
        In general, we use bold lowercase letters for vectors, e.g.
        $$\v{x} = \cv{2\\4\\3}$$
    <li>
        Sometimes this is called a <b>column vector</b>
    </li>
    <li>
        Then, by contrast, a <b>row vector</b> is a matrix that has only one row, i.e. a $1 \times n$ matrix, e.g.
        $$\rv{2, 4, 3}$$
    </li>
    <li>
        Unless stated otherwise, a vector should be assumed to be a column vector.
    </li>
    <li>
        We can refer to an element using a single subscript, again most of the time indexed from 1
        <ul>
            <li>
                So what is $\v{x}_1$?
            </li>
        </ul>
    </li>
</ul>

<h1>Vectors and Matrices in Python</h1>
<ul>
    <li>Of the many ways of representing vectors and matrices in Python, we will use two:
        <ul>
            <li>
                pandas library:
                <ul>
                    <li>for vectors: <code>Series</code>, a kind of one-dimensional array</li>
                    <li>for matrices: <code>DataFrame</code>s, which are tabular data structures of rows and (named) columns
                </ul>
            </li>
            <li>
                numpy library
                <ul>
                    <li>numpy arrays, which can be one-dimensional, two-dimensional, or have more dimensions
                </ul>
                The scikit-learn library expects its data to arrive as numpy arrays
            </li>
        </ul>
    </li>
</ul> 

<h1>Using numpy arrays</h1>

In [4]:
# Vectors
# We will use a numpy 1d array, which we can create from a list
# But, done this way, there is no way for us to distinguish between column- and row-vectors
x = np.array([2, 4, 3])

# Matrices
# We will use a numpy 2d array, which we can create from a list of lists
A = np.array([[2, 4, 0], [1, 3, 2]])

<p>
    We can see their dimensions:
</p>

In [5]:
x.shape

(3,)

In [6]:
A.shape

(2, 3)

<p>
    You can think of (3,) as saying it's not a nested list: it has 3 numbers in it.
</p>
<p>
    We can make it into a nested list using the reshape method, and then its shape is (3,1):
</p>

In [7]:
X = x.reshape((3,1))
X

array([[2],
       [4],
       [3]])

In [8]:
X.shape

(3, 1)

<h1>Scalar Addition and Scalar Multiplication</h1>
<ul>
    <li>In this context, 'scalar' simply means a number</li>
    <li>Scalar addition and multiplication both work <b>elementwise</b>, i.e.:
        <ul>
            <li>In scalar addition, we add the number to each element in the matrix</li>
            <li>In scalar multiplication, we multiply each element in the matrix by the number</li>
        </ul>
     </li>
     <li>E.g.
        $$\v{A} = 
            \begin{bmatrix}
                2 & 4 & 0 \\
                1 & 3 & 2
            \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
          2 + \v{A} = 
            \begin{bmatrix}
                4 & 6 & 2 \\
                3 & 5 & 4
            \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
          2\v{A} = 
            \begin{bmatrix}
                4 & 8 & 0 \\
                2 & 6 & 4
            \end{bmatrix}
        $$
    </li>
</ul>

<h1>Scalar Addition and Scalar Mutliplication in numpy</h1>
<ul>
    <li>numpy arrays enable operations like these using the normal addition, subtraction, multiplication and division
        operators and without writing for loops
    </li>
</ul>

In [9]:
A = np.array([[2, 4, 0], [1, 3, 2]])

In [10]:
2 + A

array([[4, 6, 2],
       [3, 5, 4]])

In [11]:
2 * A

array([[4, 8, 0],
       [2, 6, 4]])

<ul>
    <li>Other Python operators also work</li>
</ul>

In [12]:
A**2

array([[ 4, 16,  0],
       [ 1,  9,  4]])

<h1>Matrix Addition and Hadamard Product</h1>
<ul>
    <li>Matrx addition and Hadamard product require two matrices that have <em>the same dimensions</em></li>
    <li>They are also defined elementwise: by adding or multiplying <em>corresponding</em> elements</li>
     <li>E.g.
    $$
        \v{A} = \begin{bmatrix}
                2 & 4 & 0 \\
                1 & 3 & 2
            \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
        \v{B} = \begin{bmatrix}
                1 & 0 & 5 \\
                2 & 3 & 2
            \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
        \v{A}+\v{B} = \begin{bmatrix}
                3 & 4 & 5 \\
                3 & 6 & 4
              \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
        \v{A}*\v{B} = \begin{bmatrix}
                2 & 0 & 0 \\
                2 & 9 & 4
              \end{bmatrix}
    $$
    </li>
    <li>In maths, Hadamard product is more often written with a dot ($\cdot$ or $\circ$), but we will use $\ast$</li>
</ul>

<h1>Matrix Addition and Hadamard Product in numpy</h1>
<ul>
    <li>We don't need to write any loops, just use <code>+</code> and <code>&ast;</code></li>
</ul>

In [13]:
A = np.array([[2, 4, 0], [1, 3, 2]])
B = np.array([[1, 0, 5], [2, 3, 2]])

In [14]:
A + B

array([[3, 4, 5],
       [3, 6, 4]])

In [15]:
A * B

array([[2, 0, 0],
       [2, 9, 4]])

<h1>Broadcasting in numpy</h1>
<ul>
    <li>In maths, matrix addition and Hadamard product require the matrices to have the same dimensions</li>
    <li>But, in numpy, things are less rigid, e.g.:</li>
</ul>

In [16]:
x = np.array([2, 4, 3])
A = np.array([[2, 4, 0], [1, 3, 2]])

A + x

array([[4, 8, 3],
       [3, 7, 5]])

<ul>
    <li>Conceptually, the smaller array is copied enough times to make its dimensions compatible with the larger array
        <ul>
            <li>But it isn't <em>literally</em> copied and, in many cases, is substantially faster that writing your
                own loops
            </li>
            <li>This is called <b>broadcasting</b></li>
        </ul>
    </li>
</ul>

<h1>numpy's Rules for Broadcasting</h1>
<ul>
    <li>The rules for broadcasting are: the dimensions of the two arrays are compared, starting from the trailing dimensions,
        and are compatible when
        <ul>
            <li>they are equal, or</li>
            <li>one of them is 1</li>
        </ul>
    </li>
    <li>In the example above $\v{A}$ was $2 \times 3$ and $\v{x}$ was 3</li>
    <li>Hence, which of these will work, and which will give errors?
    </li>
</ul>

In [None]:
A = np.ones((5, 4))
x = np.ones(4)

A + x

In [None]:
A = np.ones((5,4))
B = np.ones((5,1))

A + B

In [None]:
A = np.ones((5,4))
x = np.ones(5)

A + x

In [None]:
A = np.ones((5, 4))
x = np.ones(5)
B = x.reshape((5,1))

A + B

In [None]:
A = np.ones((2, 1))
B = np.ones((4, 3))

A * B

<h1>Matrix Multiplication</h1>
<ul>
    <li>We can compute $\v{A}\v{B}$, the result of multiplying matrices $\v{A}$ and $\v{B}$, provided the number of columns
        of $\v{A}$ equals the number of rows of $\v{B}$
        <ul>
            <li>
                If $\v{A}$ is a $m \times p$ matrix and $\v{B}$ is a $p \times n$ matrix, then we can compute $C = \v{A}\v{B}$
            </li>
            <li>
                $\v{C}$ will be a $m \times n$ matrix
            </li>
        </ul>
    </li>
    <li>
        $\v{C}_{i,j}$ is obtained by multiplying elements of the $i$th row of $\v{A}$ by corresponding elements
        of the $j$th column of $\v{B}$ and summing:
        $$\v{C}_{i,j} = \sum_{k=1}^p\v{A}_{i,k}\v{B}_{k,j}$$
    </li>
    <li>E.g.
        $$\v{A} = \begin{bmatrix}
                    2 & 4 & 0 \\
                    1 & 3 & 2
                  \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
          \v{B} = \begin{bmatrix}
                    3 & 1 & 2\\
                    2 & 3 & 1\\
                    1 & 3 & 3
                    \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
         \v{A}\v{B} = \begin{bmatrix}
                         14 & 14 & 8\\
                         11 & 16 & 11
                      \end{bmatrix}
         $$
    </li>
    <li>Since vectors are just one-column vectors, matrix multiplication can apply &mdash; provided the dimensions are OK, e.g.
        $$\v{A} = \begin{bmatrix}
                  2 & 4 & 0 \\
                  1 & 3 & 2
                  \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
          \v{x} = \cv{2\\3\\1}\,\,\,\,\,\,\,\,\,\,
          \v{y} = \cv{2\\3}\,\,\,\,\,\,\,\,\,\,
          \v{A}\v{x} = \cv{16\\13}\,\,\,\,\,\,\,\,\,\,
          \v{A}\v{y} \mbox{ is undefined}
        $$
    </li>
</ul> 

<h1>Matrix Multiplication in numpy</h1>
<ul>
    <li>numpy offers <code>dot</code> as a function or method for matrix multiplication:
</ul>

In [20]:
A = np.array([[2, 4, 0], [1, 3, 2]])
B = np.array([[3, 1, 2], [2, 3, 1], [1, 3, 3]])

# Multiplication as a function
# np.dot(A, B)

# Multiplication as a method
A.dot(B)

array([[14, 14,  8],
       [11, 16, 11]])

<ul>
    <li>Remember, matrix multplication in numpy is done with <code>dot</code>, not &ast;</li>
    <li>Brodcasting does not apply to matrix multiplication, since it's not an elementwise operation</li>
</ul>

<h1>Transpose</h1>
<ul>
    <li>The <b>transpose</b> of $m \times n$ matrix $\v{A}$, written $\v{A}^T$, is the $n \times m$ matrix in 
        which the first row of $\v{A}$ becomes the first column of $\v{A}^T$, the second row of $\v{A}$ becomes 
        the second column of $\v{B}$, and so on:
        <ul>
            <li>
                $\v{A}_{i,j}^T = \v{A}_{j,i}$
            </li>
        </ul>
    </li>
    <li>
        E.g.
        $$\v{A} = \begin{bmatrix}
                2 & 4 & 0 \\
                1 & 3 & 2
              \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
          \v{A}^T = \begin{bmatrix}
                     2 & 1 \\
                     4 & 3 \\
                     0 & 2
                    \end{bmatrix}
        $$
    </li>
    <li>
        As a special case, if $\v{x}$ is a $m$-dimensional column vector ($m \times 1$), then $\v{x}^T$ is a 
        $m$-dimensional row vector ($1 \times m$), e.g.
        $$\v{x} = \cv{2\\4\\3}\,\,\,\,\,\,\,\,\,\, \v{x}^T = \rv{2, 4, 3}$$
    </li>
</ul>

<h1>Transpose in numpy</h1>
<ul>
    <li>numpy arrays offer easy ways to compute their transpose: either the <code>transpose</code> method or 
       the <code>T</code> attribute:
    </li>
</ul>

In [21]:
A = np.array([[2, 4, 0], [1, 3, 2]])

# Transpose as a method
# A.transpose()

# Tranpose as an attribute
A.T

array([[2, 1],
       [4, 3],
       [0, 2]])

<h1>Identity Matrices</h1>
<ul>
    <li>The $n \times n$ <b>identity matrix</b>, $\v{I}_n$, contains zeros except for entries on the main diagonal
        (from top left to bottom right):
        <ul>
            <li>
                $\v{I}_n[i,i] = 1$ for $i = 1,\ldots,n$ and $\v{I}_n[i,j] = 0$ for $i \neq j$
            </li>
        </ul>
    <li>E.g.:
        $$\v{I}_3 = \begin{bmatrix}
                    1 & 0 & 0 \\
                    0 & 1 & 0 \\
                    0 & 0 & 1
                    \end{bmatrix}
        $$
    </li>
    <li>
        If $\v{A}$ is an $m \times n$ matrix then, $\v{A}\v{I}_n = \v{I}_m\v{A} = \v{A}$
    </li>
</ul>

<h1>Identity Matrices in numpy</h1>
<ul>
    <li>Create identity matrices using the <code>identity</code> function:</li>
</ul>

In [22]:
np.identity(3)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

<h1>Inverses</h1>
<ul>
    <li>If $\v{A}$ is a $n \times n$ matrix, then its <b>inverse</b>, $\v{A}^{-1}$ (<em>if it has one</em>) is also 
        a $n \times n$ matrix such that $\v{A}\v{A}^{-1} = \v{I}_n$.
    </li>
    <li>E.g.
        $$\v{A} = \begin{bmatrix}
                    1 &  0 & 2 \\
                    2 & -1 & 3 \\
                    4 &  1 & 8
                 \end{bmatrix}\,\,\,\,\,\,\,\,\,\,
         \v{A}^{-1} = \begin{bmatrix}
                     -11 &  2 &  2 \\
                      -4 &  0 &  1 \\
                       6 & -1 & -1
                      \end{bmatrix}
        $$
    </li>
    <li>
        Some $n \times n$ matrices do not have inverses, e.g.
        $$\begin{bmatrix}
            1 & 1 & 1 \\
            1 & 1 & 1 \\
            1 & 1 & 1
           \end{bmatrix}$$
        In these cases, provided the matrix is square, you can compute a <b>pseudo-inverse</b>, which you can use 
        for <em>some</em> of the same purposes instead
    </li>
</ul>

<h1>Inverses in numpy</h1>
<ul>
    <li>numpy.linalg offers function <code>inv</code> for computing inverses, but also function 
        <code>pinv</code> for computing the Moore-Penrose pseudo-inverse:
    </li>
</ul>

In [23]:
A = np.array([[1, 0, 2], [2, -1, 3], [4, 1, 8]])

npla.inv(A)

array([[-11.,   2.,   2.],
       [ -4.,  -0.,   1.],
       [  6.,  -1.,  -1.]])

In [24]:
npla.pinv(A)

array([[ -1.10000000e+01,   2.00000000e+00,   2.00000000e+00],
       [ -4.00000000e+00,   1.42108547e-14,   1.00000000e+00],
       [  6.00000000e+00,  -1.00000000e+00,  -1.00000000e+00]])

In [25]:
B = np.ones((3,3))

npla.inv(B) # raises an exception

LinAlgError: Singular matrix

In [26]:
npla.pinv(B)

array([[ 0.11111111,  0.11111111,  0.11111111],
       [ 0.11111111,  0.11111111,  0.11111111],
       [ 0.11111111,  0.11111111,  0.11111111]])

<h1>Some numpy Methods</h1>
<ul>
    <li>numpy offers methods for calculations that, in other languages, would require you to write loops</li>
    <li>E.g. <code>sum</code>, <code>mean</code>, <code>min</code>, <code>max</code>, <code>argmin</code>, <code>argmax</code>,
        &hellip;
    </li>
</ul>

In [27]:
x = np.array([2, 4, 3])
A = np.array([[2, 4, 0], [1, 3, 2]])

In [28]:
x.sum()

9

In [29]:
A.sum()

12

<h1>Some numpy Universal Functions</h1>
<ul>
    <li>Consider a function such as <code>sqrt</code></li>
    <li>In Python, <code>sqrt</code> (from the <code>math</code> library) takes in a number but can't take in a
        list of numbers
    </li>
</li>

In [30]:
sqrt(9)

3.0

In [31]:
sqrt([1, 4, 9])

TypeError: a float is required

<ul>
    <li>But, the corresponding numpy function can apply to arrays</li>
</ul>

In [32]:
np.sqrt(9)

3.0

In [33]:
x = np.array([1, 4, 9])

np.sqrt(x)

array([ 1.,  2.,  3.])

In [34]:
A = np.array([[1, 4, 9], [16, 25, 36], [49, 64, 81]])

np.sqrt(A)

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.],
       [ 7.,  8.,  9.]])

<ul>
    <li>The function is applied elementwise</li>
    <li>In numpy, these are called 'universal functions' (or 'ufuncs')</li>
    <li>Others include: <code>abs</code>, <code>exp</code>, <code>log10</code>, &hellip;</li>
</ul>

<h1>Vectorization</h1>
<ul>
    <li>Algorithms that might otherwise need for-loops and indexing can often be written much more succinctly by expressing them
            in terms of operators, methods and functions that work on entire arrays
    </li>
    <li>More than this, if your programming language has efficient implementations of
        these operators, methods and functions, the resulting programs can run much faster too
        <ul>
            <li>numpy's operators, methods and functions, for example, are typically one or more orders of magnitude faster 
                than their pure Python equivalents, written using loops and indexing
            </li>
        </ul>
        So, avoid lops and indexing!
    </li>
    <li>Using fast array operators, methods and functions in this way is known as <b>vectorization</b>
    </li>
</ul>

In [None]:
def sum(L):
    total = 0.0
    for x in L:
        total += x
    return total

In [None]:
x = list(range(1, 101))

% timeit sum(x)

In [None]:
x = np.arange(1, 101)

% timeit np.sum(x)

<p>
    (By default, <code>timeit</code> runs your code 100000 times and computes the average run time, then it does it again,
    and then again, and tells you the best of these three average run times. It does this to try to make its measurements
    robust when other things are happening on your machine at the same time.)
</p>