# Linear Algebra

**(C) 2018 by [Damir Cavar](http://damir.cavar.me/)**

**Version:** 1.0, January 2018

**License:** [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/) ([CA BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))

This is a tutorial related to the L665 course on Machine Learning for NLP, Fall 2018 at Indiana University.

The following material is based on *Linear Algebra Review and Reference* by Zico Kolter (updated by Chuong Do) from September 30, 2015. See also James E. Gentle (2017) [Matrix Algebra: Theory, Computations and Applications in Statistics](http://www.springer.com/us/book/9780387708720). Second edition. Springer. Another good resource is Philip N. Klein (2013) [Coding the Matrix: Linear Algebra through Applications to Computer Science](http://codingthematrix.com/), Newtonian Press.

# Basic Concepts and Notation

The following system of equations:

$\begin{equation}
\begin{split}
4 x_1 - 5 x_2 & = -13 \\
 -2x_1 + 3 x_2 & = 9
\end{split}
\end{equation}$

We are looking for a unique solution for the two variables $x_1$ and $x_2$.  The system can be described as:

$Ax = b$

as matrices:

$A = \begin{bmatrix}
       4  & -5 \\[0.3em]
       -2 &  3 
     \end{bmatrix},\ 
 b = \begin{bmatrix}
       -13 \\[0.3em]
       9 
     \end{bmatrix}$ .

A **scalar** is an element in a vector, containing a real number **value**. In a vector space model or a vector mapping of (symbolic, qualitative, or quantitative) properties the scalar holds the concrete value or property of a variable.

A **vector** is an array, tuple, or ordered list of scalars (or elements) of size $n$, with $n$ a positive integer. The **length** of the vector, that is the number of scalars in the vector, is also called the **order** of the vector.

**Vectorization** is the process of creating a vector from some data using some process.

Vectors of the length $n$ could be treated like points in $n$-dimensional space. One can calculate the distance between such points using measures like [Euclidean Distance](https://en.wikipedia.org/wiki/Euclidean_distance). The similarity of vectors could also be calculated using [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity).

## Notation

A **matrix** is a list of vectors that all are of the same length. $A$ is a matrix with $m$ rows and $n$ columns, antries of $A$ are real numbers:

$A \in \mathbb{R}^{m \times n}$

A vector $x$ with $n$ entries of real numbers, could also be thought of as a matrix with $n$ rows and $1$ column, or as known as a **column vector**. Representing a **row vector**, that is a matrix with $1$ row and $n$ columns, we write $x^T$ (this denotes the transpose of $x$, see below).

$x = \begin{bmatrix}
       x_1 \\[0.3em]
       x_2 \\[0.3em]
       \vdots \\[0.3em]
       x_n
     \end{bmatrix}$

We use the notation $a_{ij}$ (or $A_{ij}$, $A_{i,j}$, etc.) to denote the entry of $A$ in the $i$th row and
$j$th column:

$A = \begin{bmatrix}
       a_{11} & a_{12} & \cdots & a_{1n} \\[0.3em]
       a_{21} & a_{22} & \cdots & a_{2n} \\[0.3em]
       \vdots & \vdots & \ddots & \vdots \\[0.3em]
       a_{m1} & a_{m2} & \cdots & a_{mn} 
     \end{bmatrix}$

We denote the $j$th column of $A$ by $a_j$ or $A_{:,j}$:

$A = \begin{bmatrix}
       \big| & \big| &  & \big| \\[0.3em]
       a_{1} & a_{2} & \cdots & a_{n} \\[0.3em]
       \big| & \big| &  & \big|  
     \end{bmatrix}$

We denote the $i$th row of $A$ by $a_i^T$ or $A_{i,:}$:

$A = \begin{bmatrix}
      -- & a_1^T  & -- \\[0.3em]
       -- & a_2^T  & -- \\[0.3em]
          & \vdots &  \\[0.3em]
       -- & a_m^T  & -- 
     \end{bmatrix}$