*This notebook is part of  course materials for CS 345: Machine Learning Foundations and Practice at Colorado State University.
Original versions were created by Asa Ben-Hur.
The content is availabe [on GitHub](https://github.com/asabenhur/CS345).*

*The text is released under the [CC BY-SA license](https://creativecommons.org/licenses/by-sa/4.0/), and code is released under the [MIT license](https://opensource.org/licenses/MIT).*

<img style="padding: 10px; float:right;" alt="CC-BY-SA icon.svg in public domain" src="https://upload.wikimedia.org/wikipedia/commons/d/d0/CC-BY-SA_icon.svg" width="125">


<a href="https://colab.research.google.com/github//asabenhur/CS345/blob/master/notebooks/module1_02_labeled_data.ipynb">
  <img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [1]:
import numpy as np
%autosave 0

Autosave disabled


# Matrices and matrix operations

In this notebook we will introduce matrices as an efficient way to store and manipulate datasets.  We will introduce them from a mathematical perspective, and how they are implemented in Python using Numpy.


A **matrix** is a rectangular array of scalar values (i.e. numbers), which are the **elements** of the matrix.
For example

\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}

Like, vectors, matrices are ubiquitous in machine learning.
In fact, it's going to be very useful to think of a matrix as a list of vectors.
Those vectors, either make up the rows or columns of the matrix.

Matrices are convenient and very efficient to perform operations on many vectors at a time.


### Matrices in Python

As was the case for vectors, we can represent matrices using Python lists.  Except, that here, we will use lists-of-lists:

In [5]:
A = [
    [1, 2, 3],
    [4, 5, 6]
]

Numpy provides a much more efficient way to represent and manipulate matrix.  It also supports optimized implementations of many matrix operations.  For example:



In [3]:
A = np.array([
    [1, 2 ,3],
    [4, 5, 6]
])
A

array([[1, 2, 3],
       [4, 5, 6]])

By convention, we will use uppercase names to refer to matrices.


### The shape of a matrix

The shape of a matrix is its number of rows and columns.
By convention, an $n \times m$ matrix has $n$ rows and $m$ columns.
For example, the matrix $A$ defined above is a $2 \times 3$ matrix.

In Numpy, the shape attribute provides that information:

In [None]:
A.shape

**Note**: the size attribute of a Numpy array is its total number of elements.

### Indexing

The number in the $i^{th}$ row, and $j^{th}$ column of a matrix $X$ is sometimes denoted as $X_{i,j}$ or $X_{ij}$.
Using this notation we can express a matrix $X$ from its elements as:

$$
X = \begin{pmatrix}
  X_{1,1} & X_{1,2} & X_{1,3} &\ldots & X_{1,m}\\
  X_{2,1} & X_{2,2} & X_{2,3} & \ldots & X_{2,m}\\
  \vdots & \vdots & \vdots & \ddots & \vdots \\
  X_{n,1} & X_{n,2} & X_{n,3} & \cdots & X_{n,m}\\
\end{pmatrix}
$$

In math indices generally start at 1. But in programming they usually start at 0 (this is called 0-based indexing). So to access $A_{2,3}$ programmatically, we need to write this:

In [None]:
A[1,2]  # 2nd row, 3rd column


### Matrices for representing machine learning data

As mentioned above, a matrix can be formed from a collection of vectors.

Let $\mathbf{x}_i$ where $i=1,\ldots,N$ be vectors of size $m$.
Let $X$ be the matrix whose rows are the vectors $\mathbf{x}_i$, i.e.

$$
X = \begin{pmatrix}
  - & \mathbf{x}_1^\top & - \\
  - & \mathbf{x}_2^\top & - \\
   & \vdots &  \\
  - & \mathbf{x}_N^\top & - \\
\end{pmatrix}
$$


This matrix $X$ has dimensions $N \times d$.
In this way we can represent the feature vectors of a dataset as a matrix.
Recall that we defined a labeled dataset as a collection of vectors and their associated labels:
$
\mathcal{D} = \{ \;(\mathbf{x}_i, y_i) \; \}_{i=1}^N,
$
where $\mathbf{x}_i \in \mathbb{R}^d$ and $y_i$ is the label associated with $\mathbf{x}_i$.


### Operations on matrices

Like vectors, matrices support multiplication by a scalar, addition,  ...


#### Matrix-vector multiplication

Given a matrix $X$ of size $N \times d$ and a $d$ dimensional vector $\mathbf{w}$, we can define the matrix-vector product $X \mathbf{w}$ to be an $N$-dimensional vector 

$$
X \mathbf{w} = ( \mathbf{x}_1^\top \mathbf{w}, \ldots, \mathbf{x}_N^\top \mathbf{w}).
$$

In Numpy this operation can be performed by the same function used for dot-products between vectors:


In [6]:
X = np.array([
    [1, 2 ,3],
    [4, 5, 6]
])
w = np.array([1,1,1])
np.dot(X, w)

array([ 6, 15])

An equivalent way of expressing the matrix-vector product uses the `@` operator:

In [13]:
X @ w

array([ 6, 15])

Note that the same result can be obtained by stacking dot products row by row (but with a loss of efficiency):

In [15]:
np.array([np.dot(X[0], w), np.dot(X[1], w)])

array([ 6, 15])

This operation is very useful for expressing the workings of a linear model on an entire dataset.

The matrix-vector product requires that the number of columns of the matrix matches the length of the vector.  It is therefore sensitive to the order of its arguments:

In [17]:
try :
    print(np.dot(w, X))
except :
    print("the order or arguments matters in np.dot")
        
print(np.dot(w, X.T))
print(w @ X.T)

the order or arguments matters in np.dot
[ 6 15]
[ 6 15]



The matrix-vector product discussed above is a special case of the genral matrix multiplication operation.  We won't need the general version of this product in the first part of the course, so we won't introduce it (yet).

The `@` operator was introduced in version 3.5 of Python (the following document discusses the official [proposal](https://www.python.org/dev/peps/pep-0465/) for adding it to the language).




### Exercise

* As discussed above, the `@` operator performs matrix multiplication.  Explore the standard multiplication operator `*` between two-dimensional arrays and between two dimensional arrays and one-dimensional arrays and scalars.