# Linear Algebra

**(C) 2018 by [Damir Cavar](http://damir.cavar.me/)**

**Version:** 1.0, January 2018

**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))

This is a tutorial related to the L665 course on Machine Learning for NLP, Fall 2018 at Indiana University.

The following material is based on *Linear Algebra Review and Reference* by Zico Kolter (updated by Chuong Do) from September 30, 2015. See also James E. Gentle (2017) [Matrix Algebra: Theory, Computations and Applications in Statistics](http://www.springer.com/us/book/9780387708720). Second edition. Springer. Another good resource is Philip N. Klein (2013) [Coding the Matrix: Linear Algebra through Applications to Computer Science](http://codingthematrix.com/), Newtonian Press.

# Basic Concepts and Notation

The following system of equations:

$\begin{equation}
\begin{split}
4 x_1 - 5 x_2 & = -13 \\
 -2x_1 + 3 x_2 & = 9
\end{split}
\end{equation}$

We are looking for a unique solution for the two variables $x_1$ and $x_2$.  The system can be described as:

$Ax = b$

as matrices:

$A = \begin{bmatrix}
       4  & -5 \\[0.3em]
       -2 &  3 
     \end{bmatrix},\ 
 b = \begin{bmatrix}
       -13 \\[0.3em]
       9 
     \end{bmatrix}$ .

A **scalar** is an element in a vector, containing a real number **value**. In a vector space model or a vector mapping of (symbolic, qualitative, or quantitative) properties the scalar holds the concrete value or property of a variable.

A **vector** is an array, tuple, or ordered list of scalars (or elements) of size $n$, with $n$ a positive integer. The **length** of the vector, that is the number of scalars in the vector, is also called the **order** of the vector.

**Vectorization** is the process of creating a vector from some data using some process.

Vectors of the length $n$ could be treated like points in $n$-dimensional space. One can calculate the distance between such points using measures like [Euclidean Distance](https://en.wikipedia.org/wiki/Euclidean_distance). The similarity of vectors could also be calculated using [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity).

## Notation

A **matrix** is a list of vectors that all are of the same length. $A$ is a matrix with $m$ rows and $n$ columns, antries of $A$ are real numbers:

$A \in \mathbb{R}^{m \times n}$

A vector $x$ with $n$ entries of real numbers, could also be thought of as a matrix with $n$ rows and $1$ column, or as known as a **column vector**.

$x = \begin{bmatrix}
       x_1 \\[0.3em]
       x_2 \\[0.3em]
       \vdots \\[0.3em]
       x_n
     \end{bmatrix}$

Representing a **row vector**, that is a matrix with $1$ row and $n$ columns, we write $x^T$ (this denotes the transpose of $x$, see above).

$x^T = \begin{bmatrix}
       x_1 & x_2 & \cdots & x_n
     \end{bmatrix}$

We use the notation $a_{ij}$ (or $A_{ij}$, $A_{i,j}$, etc.) to denote the entry of $A$ in the $i$th row and
$j$th column:

$A = \begin{bmatrix}
       a_{11} & a_{12} & \cdots & a_{1n} \\[0.3em]
       a_{21} & a_{22} & \cdots & a_{2n} \\[0.3em]
       \vdots & \vdots & \ddots & \vdots \\[0.3em]
       a_{m1} & a_{m2} & \cdots & a_{mn} 
     \end{bmatrix}$

We denote the $j$th column of $A$ by $a_j$ or $A_{:,j}$:

$A = \begin{bmatrix}
       \big| & \big| &  & \big| \\[0.3em]
       a_{1} & a_{2} & \cdots & a_{n} \\[0.3em]
       \big| & \big| &  & \big|  
     \end{bmatrix}$

We denote the $i$th row of $A$ by $a_i^T$ or $A_{i,:}$:

$A = \begin{bmatrix}
      -- & a_1^T  & -- \\[0.3em]
       -- & a_2^T  & -- \\[0.3em]
          & \vdots &  \\[0.3em]
       -- & a_m^T  & -- 
     \end{bmatrix}$

A $n \times m$ matrix is a two-dimensional array with $n$ rows and $m$ columns.

## Matrix Multiplication

The result of the multiplication of two matrixes $A \in \mathbb{R}^{m \times n}$ and $B \in \mathbb{R}^{n \times p}$ is the matrix:

$C = AB \in \mathbb{R}^{m \times n}$

That is, we are multiplying the columns of $A$ with the rows of $B$:

$C_{ij}=\sum_{k=1}^n{A_{ij}B_{kj}}$

The number of columns in $A$ must be equal to the number of rows in $B$.

### Vector-Vector Products

#### Inner or Dot Product of Two Vectors

For two vectors $x, y \in \mathbb{R}^n$, the **inner product** or **dot product** $x^T y$ is a real number:

$x^T y \in \mathbb{R} = \begin{bmatrix}
       x_1 & x_2 & \cdots & x_n
     \end{bmatrix} \begin{bmatrix}
       y_1 \\[0.3em]
       y_2 \\[0.3em]
       \vdots \\[0.3em]
       y_n
     \end{bmatrix} = \sum_{i=1}^{n}{x_i y_i}$

The **inner products** are a special case of matrix multiplication.

It is always the case that $x^T y = y^T x$.

##### Example

To calculate the inner product of two vectors $x = [1 2 3 4]$ and $y = [5 6 7 8]$, we can loop through the vector and multiply and sum the scalars (this is simplified code):

In [7]:
x = (1, 2, 3, 4)
y = (5, 6, 7, 8)
n = len(x)
if n == len(y):
    result = 0
    for i in range(n):
        result += x[i] * y[i]
    print(result)

70


It is clear that in the code above we could change line 7 to `result += y[i] * x[i]` without affecting the result.

We can use the *numpy* module to apply the same operation, to calculate the **inner product**. We import the *numpy* module and assign it a name *np* for the following code:

In [3]:
import numpy as np

We define the vectors $x$ and $y$ using *numpy*:

In [37]:
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
print("x:", x)
print("y:", y)

x: [1 2 3 4]
y: [5 6 7 8]


We can now calculate the $dot$ or $inner product$ using the *dot* function of *numpy*:

In [28]:
np.dot(x, y)

70

The order of the arguments is irrelevant:

In [29]:
np.dot(y, x)

70

Note that both vectors are actually **row vectors** in the above code. We can transpose them to column vectors by using the *shape* property:

In [38]:
print("x:", x)
x.shape = (4, 1)
print("xT:", x)
print("y:", y)
y.shape = (4, 1)
print("yT:", y)

x: [1 2 3 4]
xT: [[1]
 [2]
 [3]
 [4]]
y: [5 6 7 8]
yT: [[5]
 [6]
 [7]
 [8]]


In fact, in our understanding of Linear Algebra, we take the arrays above to represent **row vectors**. *Numpy* treates them differently.

We see the issues when we try to transform the array objects. Usually, we can transform a row vector into a column vector in *numpy* by using the *T* method on vector or matrix objects:

In [45]:
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
print("x:", x)
print("y:", y)
print("xT:", x.T)
print("yT:", y.T)

x: [1 2 3 4]
y: [5 6 7 8]
xT: [1 2 3 4]
yT: [5 6 7 8]


The problem here is that this does not do, what we expect it to do. It only works, if we declare the variables not to be arrays of numbers, but in fact a matrix:

In [46]:
x = np.array([[1, 2, 3, 4]])
y = np.array([[5, 6, 7, 8]])
print("x:", x)
print("y:", y)
print("xT:", x.T)
print("yT:", y.T)


x: [[1 2 3 4]]
y: [[5 6 7 8]]
xT: [[1]
 [2]
 [3]
 [4]]
yT: [[5]
 [6]
 [7]
 [8]]


Note that the *numpy* functions *dot* and *outer* are not affected by this distinction. We can compute the dot product using the mathematical equation above in *numpy* using the new $x$ and $y$ row vectors:

In [52]:
print("x:", x)
print("y:", y.T)
np.dot(x, y.T)

x: [[1 2 3 4]]
y: [[5]
 [6]
 [7]
 [8]]


array([[70]])

Or by reverting to:

In [58]:
print("x:", x.T)
print("y:", y)
np.dot(y, x.T)

x: [[1]
 [2]
 [3]
 [4]]
y: [[5 6 7 8]]


array([[70]])

To read the result from this array of arrays, we would need to access the value this way:

In [59]:
np.dot(y, x.T)[0][0]

70

#### Outer Product of Two Vectors

For two vectors $x \in \mathbb{R}^m$ and $y \in \mathbb{R}^n$, where $n$ and $m$ do not have to be equal, the **outer product** of $x$ and $y$ is:

$xy^T \in \mathbb{R}^{m\times n}$

The **outer product** results in a matrix with $m$ rows and $n$ columns by $(xy^T)_{ij} = x_i y_j$:

$xy^T \in \mathbb{R}^{m\times n} = \begin{bmatrix}
       x_1 \\[0.3em]
       x_2 \\[0.3em]
       \vdots \\[0.3em]
       x_n
     \end{bmatrix} \begin{bmatrix}
       y_1 & y_2 & \cdots & y_n
     \end{bmatrix} = \begin{bmatrix}
       x_1 y_1 & x_1 y_2 & \cdots & x_1 y_n \\[0.3em]
       x_2 y_1 & x_2 y_2 & \cdots & x_2 y_n \\[0.3em]
       \vdots  & \vdots  & \ddots & \vdots \\[0.3em]
       x_m y_1 & x_m y_2 & \cdots & x_m y_n \\[0.3em]
     \end{bmatrix}$

Some useful property of the outer product: assume $\mathbf{1} \in \mathbb{R}^n$ is an $n$-dimensional vector of scalars with the value $1$. Given a matrix $A \in \mathbb{R}^{m\times n}$ with all columns equal to some vector $x \in \mathbb{R}^m$, using the outer product $A$ can be represented as:

$A = \begin{bmatrix}
       \big| & \big| &  & \big| \\[0.3em]
       x & x & \cdots & x \\[0.3em]
       \big| & \big| &  & \big|  
     \end{bmatrix} = \begin{bmatrix}
       x_1    & x_1    & \cdots & x_1    \\[0.3em]
       x_2    & x_2    & \cdots & x_2    \\[0.3em]
       \vdots & \vdots & \ddots & \vdots \\[0.3em]
       x_m    &x_m     & \cdots & x_m
     \end{bmatrix} = \begin{bmatrix}
       x_1 \\[0.3em]
       x_2 \\[0.3em]
       \vdots \\[0.3em]
       x_m
     \end{bmatrix} \begin{bmatrix}
       1 & 1 & \cdots & 1
     \end{bmatrix} = x \mathbf{1}^T$

##### Example

If we want to compute the outer product of two vectors $x$ and $y$, we need to transpose the row vector $x$ to a column vector $x^T$. This can be achieved by the *reshape* function in *numpy*, the *T* method, or the *transpose()* function. The *reshape* function takes a parameter that describes the number of colums and rows for the resulting transposing:

In [66]:
x = np.array([[1, 2, 3, 4]])
print("x:", x)
print("xT:", np.reshape(x, (4, 1)))
print("xT:", x.T)
print("xT:", x.transpose())

x: [[1 2 3 4]]
xT: [[1]
 [2]
 [3]
 [4]]
xT: [[1]
 [2]
 [3]
 [4]]
xT: [[1]
 [2]
 [3]
 [4]]


We can now compute the **outer product** by multiplying the column vector $x$ with the row vector $y$:

In [69]:
x = np.array([[1, 2, 3, 4]])
y = np.array([[5, 6, 7, 8]])
x.T * y

array([[ 5,  6,  7,  8],
       [10, 12, 14, 16],
       [15, 18, 21, 24],
       [20, 24, 28, 32]])

*Numpy* provides an *outer* function that does all that:

In [70]:
np.outer(x, y)

array([[ 5,  6,  7,  8],
       [10, 12, 14, 16],
       [15, 18, 21, 24],
       [20, 24, 28, 32]])

Note, in this simple case using the simple arrays for the data structures of the vectors does not affect the result of the *outer* function:

In [73]:
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
np.outer(x, y)

array([[ 5,  6,  7,  8],
       [10, 12, 14, 16],
       [15, 18, 21, 24],
       [20, 24, 28, 32]])

### Matrix-Vector Products

Assume a matrix $A \in \mathbb{R}^{m\times n}$ and a vector $x \in \mathbb{R}^n$ the product results in a vector $y = Ax \in \mathbb{R}^m$.

$Ax$ could be expressed as the dot product of row $i$ of matrix $A$ with the column value $j$ of vector $x$. Let us first consider matrix multiplication with a scalar:

$A = \begin{bmatrix}
       1 & 2 \\[0.3em]
       3 & 4
     \end{bmatrix}$

We can compute the product of $A$ with a scalar $n = 2$ as:

$A = \begin{bmatrix}
       1 * n & 2 * n \\[0.3em]
       3 * n & 4 * n
     \end{bmatrix} = \begin{bmatrix}
       1 * 2 & 2 * 2 \\[0.3em]
       3 * 2 & 4 * 2
     \end{bmatrix} = \begin{bmatrix}
       2 & 4 \\[0.3em]
       6 & 8
     \end{bmatrix} $

Assume that we have a column vector $x$:

$x = \begin{bmatrix}
       1 \\[0.3em]
       2 \\[0.3em]
       3 
     \end{bmatrix}$

To be able to multiply this vector with a matrix, the number of columns in the matrix must correspond to the number of rows in the column vector. The matrix $A$ must have $3$ columns, as for example: 

$A = \begin{bmatrix}
       4 & 5 & 6\\[0.3em]
       7 & 8 & 9
     \end{bmatrix}$

To compute $Ax$, we multiply row $1$ of the matrix with column $1$ of $x$:

$\begin{bmatrix}
  4 & 5 & 6
 \end{bmatrix}
 \begin{bmatrix}
 1 \\[0.3em]
 2 \\[0.3em]
 3 
\end{bmatrix} = 4 * 1 + 5 * 2 + 6 * 3 = 32 $

We do the compute the dot product of row $2$ of $A$ and column $1$ of $x$:

$\begin{bmatrix}
  7 & 8 & 9
 \end{bmatrix}
 \begin{bmatrix}
 1 \\[0.3em]
 2 \\[0.3em]
 3 
\end{bmatrix} = 7 * 1 + 8 * 2 + 9 * 3 = 50 $

The resulting column vector $Ax$ is:

$Ax = \begin{bmatrix}
       32 \\[0.3em]
       50 
     \end{bmatrix}$

Using *numpy* we can compute $Ax$:

In [91]:
A = np.array([[4, 5, 6],
             [7, 8, 9]])
x = np.array([1, 2, 3])
A.dot(x)

array([32, 50])

We can thus describe the product writing $A$ by rows as:

$y = Ax = \begin{bmatrix}
 -- & a_1^T  & -- \\[0.3em]
 -- & a_2^T  & -- \\[0.3em]
    & \vdots &  \\[0.3em]
 -- & a_m^T  & -- 
\end{bmatrix} x = \begin{bmatrix}
 a_1^T x \\[0.3em]
 a_2^T x \\[0.3em]
 \vdots \\[0.3em]
 a_m^T x 
\end{bmatrix}$

This means that the $i$th scalar of $y$ is the inner product of the $i$th row of $A$ and $x$, that is $y_i = a_i^T x$.

If we write $A$ in column form, then:

$y = Ax =
\begin{bmatrix}
 \big| & \big| &  & \big| \\[0.3em]
 a_1 & a_2 & \cdots & a_n \\[0.3em]
 \big| & \big| &  & \big|  
\end{bmatrix}
\begin{bmatrix}
 x_1 \\[0.3em]
 x_2 \\[0.3em]
 \vdots \\[0.3em]
 x_n
\end{bmatrix} =
\begin{bmatrix}
 a_1
\end{bmatrix} x_1 + 
\begin{bmatrix}
 a_2
\end{bmatrix} x_2 + \dots +
\begin{bmatrix}
 a_n
\end{bmatrix} x_n
$

In this case $y$ is a **[linear combination](https://en.wikipedia.org/wiki/Linear_combination)** of the *columns* of $A$, the coefficients taken from $x$.

## Tensors

A [**tensor**](https://en.wikipedia.org/wiki/Tensor) could be thought of as an organized multidimensional array of numerical values. A vector could be assumed to be a sub-class of a tensor. Rows of tensors extend alone the y-axis, columns along the x-axis. The **rank** of a scalar is 0, the rank of a **vector** is 1, the rank of a **matrix** is 2, the rank of a **tensor** is 3 or higher.