# CSCI632 Homework 2: More Linear Algebra Review for Deep Learning

### Instructions
This assignment is designed to review key linear algebra concepts used throughout *Deep Learning* (Goodfellow et al., Ch. 2).
It expands on homework 1.  It emphasizes manipulation of matrices, vector spaces, dependence/independence, and norms.
Show all steps and justify each answer.   You do not need to use LaTeX for these answers.  


## Deliverables
- A jupyter notebook, scanned image, or PDF of your typed, LaTeX, or handwritten solutions.
- For conceptual questions, write concise but clear explanations.
- For computational problems, show all steps.


## Part A. Watch videos

I believe 3Blue 1Brown has some of the best vides for visualizing linear algebra, so much so that I want you to watch them.  If you know the material, do it a 2x speed, or just watch enough to answer the questions below.

For each of the following videos provide a few sentences describing the content of the video.

### Linear combinations, span, and basis vectors | Chapter 2, Essence of linear algebra

https://www.youtube.com/watch?v=k7RM-ot2NWY&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=2


**Answer**: This video provides visualizations of basis vectors and how we can represent vectors as linear
combinations of these basis vectors.  The set of all vectors created as a linear combination 
of the basis vectors is the *span* of the basis vectors.  The video then introduces the notion 
of linear independence between vectors.

**Gradine note**: Any answer that mentions that the video presents vectors, basis
vectors, span, and linear independence is sufficient.


### Linear transformations and matrices | Chapter 3

https://www.youtube.com/watch?v=kYB8IZa5AuE&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=3



**Possible Answer**

A transformation is a function that maps between an input and an output vector 
space. The video invites the viewer to think of a *transformation* as motion — 
taking in an input vector and moving it to an output vector. The video illustrates 
cases such as rotation, scaling, shearing, and reflection. These illustrate 
*linear transformations*.  

Consider the case where the input space is two-dimensional. Then the set of all 
input vectors can be represented as an infinite plane, which we can visualize as 
a uniform grid. A linear transformation maps the grid so that:  
 * the origin remains fixed,  
 * straight lines remain straight,  
 * parallel lines remain parallel, and  
 * points evenly spaced along a line remain evenly spaced (though the distances may be stretched or squashed).  

The video then proceeds to discuss how we can represent the columns of a matrix as being
coordinates where the unit vectors would be mapped.  The transformation maps $\hat{i}$ 
to column 1.  The transform maps $\hat{j}$ to column 2 and so on.  For example, we
represent $\hat{i}$ with the column vector

$$
\begin{bmatrix}
1 \\ 
0
\end{bmatrix}
$$

For example:

$$
\begin{bmatrix}
  1 & 3 \\
 -2 & 0 
\end{bmatrix} 
\begin{bmatrix}
  1 \\ 
  0
\end{bmatrix} =
\begin{bmatrix}
  1 \\ 
 -2
\end{bmatrix}
$$

which is exactly the first column of the matrix. Similarly, multiplying by 
$\hat{j}$ selects the second column, showing where $\hat{j}$ lands under the 
transformation.  

**Grading note:** Any answer that discusses the definition and properties of linear transformations is acceptable. 

### Matrix multiplication as composition | Chapter 4, Essence of linear algebra

https://www.youtube.com/watch?v=XkY2DOUCWMU&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=4



**Possible Answer** The video represents multiplying by two linear transformations as a composition.

To find the effect of $AB$ where $A$ and $B$ are $2\times 2$ matrices, we 
track how $A$ and $B$ would move the basis vectors
$\hat{i}$ and $\hat{j}$.  Let's start with how $\hat{i}$ is affected: $AB\hat{i}$.
I first perform $B\hat{i}$ where the first column tells me where $\hat{i}$ goes.  
Let $\hat{i}'$ denote the transformed unit vector $\hat{i}$. I can then apply $A$ to show 
where $\hat{i}'$ moves to get $\hat{i}''$.  I can then do 
the same operation for $AB\hat{j}$ to get $\hat{j}''$.   The resulting locations of 
$\hat{i}''$ gives me column 1 and $\hat{j}''$ gives me column 2 of the combined matrix
$C = AB$. 

After showing this example, the video replaces the integers in the $2 \times 2$ matices with letters.

$$
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
\begin{bmatrix}
e & f \\
g & h
\end{bmatrix} =
\begin{bmatrix}
a e + b g & a f + b h \\
c e + d g & c f + d h
\end{bmatrix}
$$ 

The video then demonstrates visually why matrix multplication is not commutative using a shear
and rotation versus the same rotation and then the same shear.  It easy to come up
with examples where commutivity fails.  Here is one not used in the video.
Use your right hand.  Turn it palm upward. Use your thumb as the $x$ axis and extend your
index finger in the $y$ direction.  It is now your $y$ axis.

**Grading note:** Any answer that discusses the performing matrix 
multiplication involving more than one matrix is sufficient. 

### The determinant | Chapter 6, Essence of linear algebra

https://www.youtube.com/watch?v=Ip3X9LOh2dk&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=6

**Possible Answer** 

To visualize the determinant let's start in two dimensions.  Consider the unit vectors
$\hat{i}$ and $\hat{j}$.  Visualize these as the edges of a square.  The area of this square
is equal to the base times the height.  Since the unit vectors both have length 1, the area
of the square bordered by $\hat{i}$ and $\hat{j}$ is also 1.

The determinant can be viewed as how much the transformation scales this area.
If my transformation scales $\hat{i}$ to a length of 2 but keeps $\hat{j}$ the
same, I now have column vectors

$$
2\hat{i} = 
  \begin{bmatrix}
    2 \\
    0 
  \end{bmatrix}, 
\hat{j} = 
  \begin{bmatrix}
    0 \\
    1
  \end{bmatrix}
$$

Now consider the rectangle with $2\hat{i}$ along the base and $\hat{j}$ along the left edge.
The resulting area is $2 \times 1 = 2.$.  If we represent the matrix by how it changes
the $\hat{i}$ and $\hat{j}$ basis vectors as was shown in previous videos, we 
put the result of the transformation of $\hat{i}$ in the left column and $\hat{j}$ in the right.

$$
A = \begin{bmatrix}
  2 & 0 \\
  0 & 1
\end{bmatrix}
$$

Because this transformation scales the area of the square with edges $\hat{i}$ and $\hat{j}$
by a factor of 2, we say that matrix $A$ has a determinant of 2.

The video shows the example of transformation that introduces a shear

$$
B = \begin{bmatrix}
  1 & 1 \\
  0 & 1 \\
\end{bmatrix}
$$

As mentioned already, from the prior videos we can think of the left column
as the location where $\hat{i}$ would end up and the right column of where
$\hat{j}$ would end up.  Thus $\hat{i}$ remains at $(1, 0)$ and is 
thus unaffected, but the second basis vector moves from $\hat{j}$ to $(1, 1)$.
It tilts my entire output space to the right by 45 degrees without affecting
the x direction.   This is an example of a *shear.* In general a shear
can tilt the y-axis by any amount as defined by the number in the upper-righthand
corner of the $2 \times 2$ matrix.  Or we could shear by tilting only
the x-axis by leaving the upper-righthand corner zero while changing the 
lower-lefthand corner of the matrix.

The resulting area with edges represented by vectors from the origin $(0,1)$
and $(1,1)$ is a parallelogram.  The area of a parallelogram is still the
base times the height so the area after the linear transformation $B$ still
remains 1.   We say thus that $B$ has a determinant of 1.

The video generalizes this concept to any arbitrary area.  If a linear
transformation would scale a unit square by a factor of 2 then it would scale
ANY region's area by a factor of 2, because linear transformations always
keep the grid lines on the input space parallel and evenly spaced in the
output space.

There is an important detail that I glossed over: determinants can be
negative.  A negative area does not make sense, but in the case of
determinants the sign signifies orientation.   A positive determinant
means the linear transformation preserves a counterclockwise orientation:
$\hat{j}$ maps to a location that is counterclockwise from
where $\hat{i}$ maps to. This could be precisely stated as "the image of $\hat{j}$
lies counterclockwise from the image of $\hat{i}$." A negative determinant
means that $\hat{j}$ ends up clockwise from $\hat{i}$. 

If a linear transformation maps $\hat{i}$ and $\hat{j}$ onto a line
then the area of the resulting parallelogram is 0.  For
example, if matrix $C$ maps $\hat{i}$ to $(2,1)$ and $\hat{j}$ to 
$(4,2)$ then we have the matrix

$$
C = \begin{bmatrix}
  2 & 4 \\
  1 & 2
\end{bmatrix}
$$

If we consider the left column to be one basis vector and the right column
to be the second basis vector, we see both point in the same direction.
The span of these vectors is a line.

All matrices with a determinant of zero correspond to mapping
of the unit square to a region with zero area, but not all
matrices with a determinant of zero span a space with the same 
dimensionality.   $C$ has a determinant of 0, but it maps to a line.
If both column vectors are the zero vector then the matrix has a 
zero determinant but it maps the entire input space onto the origin, 
i.e., a single point.

The determinant can be computed mechanically by doing the following:

$$
\textrm{det}\left( 
  \begin{bmatrix}
    a & b \\
    c & d
  \end{bmatrix}
\right) = ad - bc
$$

The video ends with a geometric illustration as to how the determinant
represents the area of the parallelogram with two edges defined by the basis 
vectors taken relative to the origin, by subtracting regions away 
from the rectangle with edges of length $a+b$ and $c+d$.

$$
(a+b)(c+d) - ac - bd - 2bc = ad-bc
$$



**Grading note:** Any answer that discusses how determinant 
is a measure of the area of the region with the basis vectors
as edges of this area, and what it means to have a negative 
determinant.

### Inverse matrices, column space and null space | Chapter 7, Essence of linear algebra


https://www.youtube.com/watch?v=uQhTuRlWMxw&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=7

**Possible Answer** 

This video starts by describing the representation of systems of linear equations
using matrices.

$$A \mathbf{x} = \mathbf{v}$$

where $A$ is a matrix, $\mathbf{x}$ is a column vector representing our unknowns
and $\mathbf{v}$ represents where $A$ maps $\mathbf{x}$.  Given $\mathbf{v}$ and 
$A$, we want to find $\mathbf{x}$.

The video then desribes the notion of the identity matrix which is the matrix
that is all zeroes except with ones along the diagonal.

$$\begin{bmatrix}
1 & 0 & \cdots & 0 \\
0 & 1 &        & 0 \\
\vdots &  & \ddots &  \\
0 & 0 & \cdots & 1
\end{bmatrix}
$$

The identity matrix leaves a square matrix unchanged:

$$AI = IA = A$$

It is one case where commutivity holds even though matrix
multiplication is not generally commutative.

For some square matrices, we can find a matrix $A^{-1}$ such 
that multiplying $A^{-1}A = AA^{-1} = I$.  We call $A^{-1}$
the inverse of $A$.

If know the inverse matrix of $A$, finding $\mathbf{x}$ is trivial.
We multiply both sides by $A^{-1}$.

$$A^{-1} A \mathbf{x} = A^{-1} \mathbf{v}$$

which can be rewritten as 

$$A^{-1} A \mathbf{x} = I \mathbf{x} = \mathbf{x} = A^{-1} \mathbf{v}$$

When the determinant of a matrix is zero, it maps the input space
onto lower dimensions.  For two dimension, a matrix with a determinant
of zero maps the input space onto a line or a point (the origin).  There is no
transformation that maps a line or a point back into a plane.  Thus,
a matrix with a determinant of zero has no inverse.

Just because there is no inverse matrix of $A$, it does not mean that 
there exists no solution to 

$$A \mathbf{x} = \mathbf{v}$$

In two dimensions it just means there is only a solution if 
$x$ lies on the line in which one or both of the column 
vectors of $A$ point(s).  In higher dimensions, the solution
must land on the point, line, plane or hyperplane spanned 
by the column vectors or there is no solution.

The video ends by exploring full rank vs. non-full rank matrices
and how this relates to the nullspace of a matrix.
The nullspace describes all solutions to A $\mathbf{x} = 0$, and its 
dimension tells us how far A is from being full rank.

**Possible Answer** 

A basis is just a set of vectors we choose to describe space.  When we say basis 
vectors we often think of $\hat{i}$, $\hat{j}$, and $\hat{k}$ if in three
dimensions, but space has no intrinsic coordinate system.  In linear algebra,
the origin is fixed; when we change the basis, we’re only re-scaling and
re-orienting the coordinate axes.

Although not mentioned in the video, it is worth pointing out that in physics
and engineering, we also often translate the origin, but this is referred to
as a change in the *frame of reference*. When discussing a *change of basis*, 
we keep the origin fixed while scaling and/or reorienting our basis vectors.


In the video, Grant Sanderson introduces an alternate basis used by a fictional Jennifer:

$$
\mathbf{b_1} = \begin{bmatrix} 2 \\ 1 \end{bmatrix},\qquad
\mathbf{b_2} = \begin{bmatrix} -1 \\ 1 \end{bmatrix}
$$

If a vector has Jennifer’s coordinates $\mathbf{u} = (-1, 2)$, it's 
arrow in our coordinate system is

$$
\mathbf{x} = -1 \cdot \mathbf{b_1} + 2 \mathbf{b_2} = 
  -1 \begin{bmatrix} 2 \\ 1 \end{bmatrix}
  + 2 \begin{bmatrix} -1 \\ 1 \end{bmatrix} 
  = \begin{bmatrix} -1 \cdot 2 + 2 \cdot (-1) \\ (-1) \cdot 1 + 2 \cdot 1 \end{bmatrix}
  = \begin{bmatrix} -4 \\ 1 \end{bmatrix} \tag{1}
$$

Writing the basis vectors as columns of a matrix,

$$
B = \begin{bmatrix}
      2 & -1 \\
      1 & 1
    \end{bmatrix}
$$

this is simply

$$
B \mathbf{u} = \begin{bmatrix}
  2 & -1 \\
  1 & 1 
\end{bmatrix} 
\begin{bmatrix}
  -1 \\
  2
\end{bmatrix} = 
\begin{bmatrix}
  -2 - 2 \\
  -1 + 2
\end{bmatrix} =
\begin{bmatrix} 
  -4 \\
   1 
\end{bmatrix} 
$$

### Change of basis | Chapter 13, Essence of linear algebra

https://www.youtube.com/watch?v=P2LTAUO1TdA&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=13

What if we want to go the other direction?  How do we transform a coordinate
in our coordinate system to Jennifer's coordinate system?  We use the inverse:

$$B^{-1} \mathbf{x} = B^{-1} B \mathbf{u} = I \mathbf{u} = \mathbf{u}$$

A valid basis matrix is 1) square, 2) spans the space with dimensionality equal
to the number basis vectors, 3) has non-zero determinant and 4) is invertible.

Since $B$ is invertible, we can always convert from our basis $\hat{i}$, $\hat{j}$ 
to the basis represented by $B$ and back.

**Grading note:** Any answer that discusses how the column vectors can be intepreted as basis vectors, and that multiplying by a vector by
a matrix moves the vector into the basis of the matrix's column vectors.

### Eigenvectors and eigenvalues | Chapter 14, Essence of linear algebra

https://www.youtube.com/watch?v=PFDu9oVAE-g&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=14

Eigenvectors $\mathbf{v}$ for matrix $A$ are solutions to the equation

$$
A\mathbf{v} = \lambda \mathbf{v}
$$

Here $\lambda$ is a scalar called an eigenvalue.

This video presents a particularly intuitive example of eigenvectors.
In 3d, consider a linear transformation that performs a rotation.
The orientation of the axis of rotation does not change as the 
space is rotated, nor does the scale of the space and thus $\lambda=1$.

In 2-space, consider the matrix which introduces a shear.

$$
A = \begin{bmatrix}
  3 & 1 \\
  0 & 2 
\end{bmatrix}
$$

Now consider any vector on the x-axis.

$$
\mathbf{v} = \begin{bmatrix}
 a \\
 0
\end{bmatrix}
$$

Multiplying $A$ by $\mathbf{v}$ yields

$$
\begin{bmatrix}
  3 & 1 \\
  0 & 2
\end{bmatrix}
\begin{bmatrix}
 a \\
 0
 \end{bmatrix} =
 a \begin{bmatrix}
    3 \\
    0
   \end{bmatrix}
+ 0 \begin{bmatrix}
    1 \\
    2
   \end{bmatrix}
   = 3 \begin{bmatrix}
       a \\
       0
       \end{bmatrix}
$$

Thus all vectors $(a,0)$ are eigenvectors with the eigenvalue 3.

Another family of eigenvectors are multiples of $(-1,1)$.

$$
\begin{bmatrix}
  3 & 1 \\
  0 & 2
\end{bmatrix}
\begin{bmatrix}
 -b \\
 b
 \end{bmatrix} =
- b \begin{bmatrix}
      3 \\
      0
    \end{bmatrix}
+ b \begin{bmatrix}
      1 \\
      2
   \end{bmatrix}
   = \begin{bmatrix}
       -2b \\
       +2b
     \end{bmatrix}
   = 2 \begin{bmatrix}
        -b \\
        b
      \end{bmatrix}
$$

which has eigenvalue 2.



We can obtain an intuition into how to find eigenvectors by 
rearranging the equation.

$$
  A\mathbf{v} = \lambda \mathbf{v} = \lambda I \mathbf{v}
$$

$$
  (A - \lambda I) \mathbf{v} = \mathbf{0}
$$

Let $M = (A - \lambda I)$ then we can rewrite the above as

$$
  M \mathbf{v} = \mathbf{0}
$$

Thus the eigenvectors for a given $\lambda$ span the nullspace of 
$M$ and $\det(M) = 0$.  Thus,

$$
  \det(A - \lambda I) = 0. \tag{1}
$$

The video uses an example matrix $A$

$$
  A = \begin{bmatrix}
    2 & 2 \\
    1 & 3 
    \end{bmatrix}
$$

Thus we rewrite (1) as   

$$
  \det \left(\begin{bmatrix}
    2-\lambda & 2 \\
    1         & 3-\lambda 
  \end{bmatrix} \right) = 0. \tag{3}
$$

Taking the determinant of (3) yields

$$
  (2-\lambda)(3-\lambda) - 2 \cdot 1 = 0
$$

$$
  6 - 5 \lambda + \lambda^2 - 2 = 0
$$

$$
  (\lambda-4)(\lambda-1) = 0
$$

This has two eigenvalues: $\lambda = 4$ or $\lambda = 1$.

Now let's find the eigenvectors each eigenvalue.  For the eigenvalue 1,

$$
  \begin{bmatrix}
    1 & 2 \\
    1 & 2 
  \end{bmatrix} \mathbf{v} = 0. \tag{2}
$$

Given that this matrix is underspecified, the solution is not a single vector
but a line: x+2y = 0.  Any multiple of $(1, -\tfrac{1}{2})$ satisfies this.

We can confirm this

$$
    \begin{bmatrix}
    2 & 2 \\
    1 & 3 
    \end{bmatrix} 
    \begin{bmatrix}
    1 \\
    -\tfrac{1}{2}
    \end{bmatrix} = 
    \begin{bmatrix}
    2 \\
    1
    \end{bmatrix}
    - \tfrac{1}{2}\begin{bmatrix}
    2 \\
    3
    \end{bmatrix} =
    1 \cdot
    \begin{bmatrix}
    1 \\
    -\tfrac{1}{2}
    \end{bmatrix}
$$

For the eigenvalue 4,

$$
  \begin{bmatrix}
    2-4 & 2 \\
    1   & 3-4
  \end{bmatrix} \mathbf{v} = 0. 
$$

$$
  \begin{bmatrix}
    -2 & 2 \\
    1  & -1
  \end{bmatrix} \mathbf{v} = 0. 
$$

$$-2 x + 2 y = 0$$

So any multiple of $(1,1)$ satisfies this equation.

So we have two eigenvectors $(1,1)$ and $(1, -\tfrac{1}{2})$

The video then goes on to discuss how some transformations
have no eigenvectors.  For example, in 2-d a rotation moves all
vectors and if we try to compute the eigenvalues we get only
imaginary roots when solving $\det(A-I\lambda) = 0$.

It also shows how if your basis vectors are eigenvectors then 
the eigenvalues are the numbers along the diagonal and all
off-diagonal numbers are zero.  When only the diagonal entries
are non-zero, we have a *diagonal matrix*.

If we mulitply a diagonal to itself we end up with matrix
that has the same diagonal values but squared.   If multiply
a matrix by itself $k$ times then the diagonals are just
the original values to the $k$th power.  For example,

$$
   A A A A = \begin{bmatrix}
                 2 & 0 \\
                 0 & 3
              \end{bmatrix}
              \begin{bmatrix}
                 2 & 0 \\
                 0 & 3
              \end{bmatrix}
              \begin{bmatrix}
                 2 & 0 \\
                 0 & 3
              \end{bmatrix}
              \begin{bmatrix}
                 2 & 0 \\
                 0 & 3
              \end{bmatrix}
              = 
              \begin{bmatrix}
                 2^4 & 0 \\
                 0 & 3^4
              \end{bmatrix}
$$

When the basis vectors are eigenvectors, we call this an eigenbasis.  If we can 
find an eigenbasis that spans the space of the original matrix, we can
perform a change of basis to the eigenbasis and then operations like
multiplying a matrix by itself become inexpensive. 

In fact any matrix operation that can be expressed as a power series
becomes cheap.


**Grading note:** Any answer that discusses the definition
of eigenvectors and eigen values and how to compute them is 
satisfactory.

## Part B. Basic Properties of Matrices

1. **Transpose Properties**  
Let  

$$
A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad 
B = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}.
$$  

Verify each of the following:

a) $(A+B)^T = A^T + B^T$

*A:* 

$$
\left(\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \, +
\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\right)^T = 
\begin{bmatrix} 1 & 1 \\ 4 & 4 \end{bmatrix}^T =
\begin{bmatrix} 1 & 4 \\ 1 & 4 \end{bmatrix}
$$

$$
\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}^T \, +
\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}^T =
\begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \, +
\begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix} =
\begin{bmatrix} 1 & 4 \\ 1 & 4 \end{bmatrix}

$$


b) $ (AB)^T = B^T A^T $

$$
\left(\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} 
\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\right)^T =
\begin{bmatrix} 2 & -1 \\ 4 & -3 \end{bmatrix}^T = 
\begin{bmatrix} 2 & 4 \\ -1 & -3 \end{bmatrix}
$$  

$$
\begin{bmatrix} 0 & -1 \\
                1 & 0 \end{bmatrix}^T
\begin{bmatrix} 1 & 2 \\
                3 & 4 \end{bmatrix}^T =
\begin{bmatrix} 0 & 1 \\
                -1 & 0 \end{bmatrix}
\begin{bmatrix} 1 & 3 \\
                2 & 4 \end{bmatrix} =
\begin{bmatrix} 2 & 4 \\ -1 & -3 \end{bmatrix}
$$

c) $ (A^T)^T = A $

$$
\left(\begin{bmatrix} 1 & 2 \\
                      3 & 4 \end{bmatrix}^T\right)^T =
      \begin{bmatrix} 1 & 3 \\
                      2 & 4 \end{bmatrix}^T =                      
      \begin{bmatrix} 1 & 2 \\
                      3 & 4 \end{bmatrix}
$$


2. **Trace Identities**  
For the same $A, B$, compute $\mathrm{tr}(A)$, $\mathrm{tr}(B)$, $\mathrm{tr}(AB)$, and $\mathrm{tr}(BA)$.  
Show that $\mathrm{tr}(AB) = \mathrm{tr}(BA)$.

$$
A = \begin{bmatrix}
  1 & 2 \\
  3 & 4 
\end{bmatrix}\quad 
B = \begin{bmatrix}
  0 & -1 \\
  1 & 0 
\end{bmatrix}
$$

$$
\operatorname{tr}(A) = \sum_{i=1}^n a_{ii}
  = \operatorname{tr}\left(
  \begin{bmatrix}
    1 & 2 \\
   3 & 4 
  \end{bmatrix}
  \right) = 1 + 4 = 5
$$

$$
\operatorname{tr}(B) = \sum_{i=1}^n b_{ii}
  = \operatorname{tr}\left(
  \begin{bmatrix}
    0 & -1 \\
    1 & 0 
  \end{bmatrix}
  \right) = 0 + 0 = 0
$$

$$
\operatorname{tr}(AB) = 
  = \operatorname{tr}\left(
    \begin{bmatrix}
    1 & 2 \\
    3 & 4 
  \end{bmatrix}
  \begin{bmatrix}
    0 & -1 \\
    1 & 0 
  \end{bmatrix}
  \right) 
  = \operatorname{tr}\left(
    \begin{bmatrix}
    2 & -1 \\
    4 & -3 
  \end{bmatrix}
  \right) 
= 2-3 = -1
$$

$$
\operatorname{tr}(BA) = 
  = \operatorname{tr}\left(
  \begin{bmatrix}
    0 & -1 \\
    1 & 0 
  \end{bmatrix}
  \begin{bmatrix}
    1 & 2 \\
    3 & 4 
  \end{bmatrix}
  \right) 
  = \operatorname{tr}\left(
    \begin{bmatrix}
    2 & -1 \\
    4 & -3 
  \end{bmatrix}
  \right) 
= 2-3 = -1
$$

The above two answers show that $\operatorname{tr}(AB) = 
\operatorname{tr}(BA)$.

3. **Inner Product**  
Let $\mathbf{x} = [1,2,3]^T$, $\mathbf{y} = [4,0,-1]^T$.  
a) Compute $\mathbf{x}^T \mathbf{y}$.  
    

**Answer:**

$$
\mathbf{x} =
\begin{bmatrix}
  1 \\
  2 \\
  3 \\
\end{bmatrix}
$$
 
Thus,

$$
\mathbf{x}^T \mathbf{y} =
\begin{bmatrix}
1 & 2 & 3 
\end{bmatrix}
\begin{bmatrix}
4 \\
0 \\
-1
\end{bmatrix} =
1 \cdot 4 + 2 \cdot 0 + 3 \cdot (-1) = 1
$$

b) Interpret the inner product in terms of vector similarity.


When two vectors point in the same direction, the inner product is the length of each vector multiplied together.   When the vectors are at right angles, the inner product is zero.  One can think of it as a measure of how much the vectors point in the same direction scaled by the size of each vector.   More formally

$$ \mathbf{x} \cdot \mathbf{y} = \|\mathbf{x}\| \|\mathbf{y}\| \cos \theta
$$

where $\theta$ is the angle between $\mathbf{x}$ and $\mathbf{y}.$

Sometimes when someone refers to similarity between vectors they 
are using a descriptive phrase that could mean anything like cosine
similarity which is the measure of the angle between 
$\mathbf{x}$ and $\mathbf{y}$.

$$\cos \theta = \frac{\mathbf{x} \cdot \mathbf{y}}{\|\mathbf{x}\| \|\mathbf{y}\|}$$

In this case, 

$$\theta = \cos^{-1} \left( \tfrac{1}{\| x \| \| y \|} \right) 
         = \cos^{-1} \left( \tfrac{1}{\sqrt{14} \sqrt{17}} \right)$$ 

$$\theta \approx 86.3$$

Grading note:  Any answer that relates the dot product to cosine
of the angle between the vectors is adequate.


4. **Outer Product**  
Using the same vectors $\mathbf{x}, \mathbf{y}$:  


a) Compute $\mathbf{x}\mathbf{y}^T$.  


**Answer:**

$$
\mathbf{x} \mathbf{y}^T =
\begin{bmatrix}
  1 \\ 2 \\ 3 
\end{bmatrix}
\begin{bmatrix}
4 & 0 & -1
\end{bmatrix} =
\begin{bmatrix}
4  & 0 & -1 \\
8  & 0 & -2 \\
12 & 0 & -3
\end{bmatrix} 
$$


b) What is the shape of the resulting matrix?  


**Answer:**

Unlike a dot product, which results in a scalar, an inner product produces a $n \times n$
matrix where $n$ is the dimensionality of the vectors.  In the case of $\mathbf{x} \mathbf{y}^T$,
the resulting matrix is $3 \times 3$.

5. **Hadamard Product**  
    Let  
    $$
    U = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, \quad 
    V = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}.
    $$



a) Compute the Hadamard product $U \odot V$.
    


**Answer:**

$$
U \odot V =
  \begin{bmatrix}
    1 & 2 \\
    3 & 4 
  \end{bmatrix}
  \odot
  \begin{bmatrix}
    5 & 6 \\
    7 & 8
    \end{bmatrix}
  =
  \begin{bmatrix}
    1 \cdot 5 & 2 \cdot 6 \\
    3 \cdot 7 & 4 \cdot 8
  \end{bmatrix}
  =
  \begin{bmatrix}
    5 & 12 \\
    21 & 32
  \end{bmatrix}
$$


b) Compare this elementwise product to the matrix product $UV$.  

**Answer:**

$$
U V =
  \begin{bmatrix}
    1 & 2 \\
    3 & 4 
  \end{bmatrix}
  \begin{bmatrix}
    5 & 6 \\
    7 & 8
  \end{bmatrix}
  =
  \begin{bmatrix}
    19 & 22 \\
    43 & 50
  \end{bmatrix}
$$

c) Where does the Hadamard product appear in machine learning (give one example)?


**Answer**:  *IGNORE*

This is an unfair question given that we haven't used it yet.

It does appear in backpropagation.

**Grading note**: ignore this problem.


6. **Determinant and Inverse**

a) Compute $\det(A)$.

**Answer**

$$
\det(A) = 
  \begin{vmatrix}
    1 & 2 \\ 
    3 & 4
  \end{vmatrix} =
  4 - 6 = -2
$$

b) Determine if $A$ is invertible, and if so, find $A^{-1}$.

**Answer:**

Any matrix with a non-zero determinant is invertible. Thus $A$ is invertible.

The inverse matrix $A^{-1}$ satisfies $A A^{-1} = I.$

For a $2 \times 2$ matrix there is a well known solution

$$
A^{-1} = \frac{1}{\det(A)} 
  \begin{bmatrix}
    d  & -b \\
    -c & a
  \end{bmatrix}
$$ 

Substituting the determinant from part (a) and the values for the matrix,

$$
A^{-1} = -\tfrac12
  \begin{bmatrix}
    4  & -2 \\
    -3 & 1
  \end{bmatrix}
  = \begin{bmatrix}
    -2  & 1 \\
    \tfrac{3}{2} & -\tfrac{1}{2}
  \end{bmatrix}
$$ 

We can confirm the inverse is correct by multiplying $A$ with $A^{-1}$ as follows

$$
A A^{-1} = 
  \begin{vmatrix}
    1 & 2 \\ 
    3 & 4
  \end{vmatrix} 
  \begin{bmatrix}
    -2  & 1 \\
    \tfrac{3}{2} & -\tfrac{1}{2}
  \end{bmatrix}
  =
  \begin{bmatrix}
    1 \cdot (-2) + 2 \cdot \tfrac{3}{2} & 1 \cdot 1 + 2 \cdot (-\tfrac{1}{2}) \\
    3 \cdot (-2) + 4 \cdot \tfrac{3}{2} & 3 \cdot 1 + 4 \cdot (-\tfrac{1}{2}) 
  \end{bmatrix}
  = \begin{bmatrix}
    1 & 0 \\
    0 & 1
  \end{bmatrix}
  $$



c) Verify that $\det(A^{-1}) = 1/\det(A)$.

**Answer:**

From (a) we know $\det(A) = -2$.

$$\det(A^{-1}) = -2 \cdot (-\tfrac12 ) - 1 \cdot \tfrac{3}{2} = - \tfrac{1}{2}$$

$- \tfrac{1}{2}$ is the reciprocal of $-2$.

## Part C. Linear Dependence and Span

7. Consider the vectors  
$\mathbf{v}_1 = [1,2,3]^T, \\ \mathbf{v}_2 = [2,4,6]^T, \\ \mathbf{v}_3 = [1,0,1]^T.$

a) Determine whether $\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\}$ are linearly independent.

**Answer:**

For the vectors do be linearly independent, none can be expressed as a linear
combination of the others.   In this problem,  $\mathbf{v}_2 = 2 \mathbf{v}_1$.

Not linearly independent.

b) Find a maximal independent subset of these vectors.

**Answer:**

$\mathbf{v}_1$ and $\mathbf{v}_3$ are not collinear, and
$\mathbf{v}_2$ and $\mathbf{v}_3$ are not collinear, so 
either $\{\mathbf{v}_1, \mathbf{v}_3\}$ or $\{\mathbf{v}_2, \mathbf{v}_3\}$ are
maximal independent subsets.


c) Describe the span of this set in $\mathbb{R}^3$.

**Answer:**

Even though the vectors are 3-dimension, they are not independent, so they do no
spen $\mathbb{R}^3$.  However, the two independent subsets describe the same plane.
The vectors span

$$\forall a, b \in \mathbb{R}: a\mathbf{v}_1 + b\mathbf{v}_3$$


8. Let  
$$
M = \begin{bmatrix} 
1 & 2 & 3 \\ 
2 & 4 & 6 \\ 
1 & 0 & 1 
\end{bmatrix}.
$$

a) Find $\mathrm{rank}(M)$.

**Answer:** 

The rank is equal to the maximum number of independent column vectors or row vectors.

When considering row vectors

$$
\begin{bmatrix}
  1 \\ 
  2 \\
  1
\end{bmatrix}
+ \begin{bmatrix}
  2 \\
  4 \\
  0
\end{bmatrix}
= \begin{bmatrix}
  3 \\
  6 \\
  1
\end{bmatrix}
$$

We can express any one column vector as a linear combination of the other two.
Thus, the rank is 2. 

We can also see that the second row is twice the first row, but the 
third row cannot be expressed as a linear combination of the first 2, 
so the rank as determined by row vectors is also 2.

As an aside, the rank is always the same whether determined from row
vectors or column vectors.


b) Explain how the rank relates to the results from Problem 4.

put answer here (or on scanned paper).

## Part D. Norms and Distances

9. Compute the following norms of $\mathbf{x} = [3, -4]^T$:

a) $|\mathbf{x}|_1 $

**Answer:** 

Often referred to as the $L^1$ (pronounced "ell-one") norm, it
refers to the sum of the absolute value of the vector's components.

$$\|\mathbf{x}\|_1 = \sum_{i=1}^n |x_i|.$$

So, for $\mathbf{x} = [3, -4]^T$,

$$\|\mathbf{x}\|_1 = |3| + |-4| = 7$$

b) $ |\mathbf{x}|_2 $

**Answer:** 

Often referred to as the $L^2$ (pronounced "ell-two") norm, it
refers to the square root of the sum of the squares of the 
vector's components.  This is the same as the *Euclidean norm*.

$$\|\mathbf{x}\|_2 = \sqrt{\sum_{i=1}^n x_i^2}.$$

So, for $\mathbf{x} = [3, -4]^T$,

$$\|\mathbf{x}\|_2 = \sqrt{3^2 + (-4)^2} = 5$$

c) $ |\mathbf{x}|_\infty $

**Answer:** The definition of norms generalizes to $\infty$ as follows

$$\|x\|_\infty = \left( \sum_{i=1}^n |x_i|^\infty \right)^{1/\infty}$$

What this means may not be obvious.  Consider as we increase the exponent
on the absolute value, the largest value races ahead of any smaller value.
Thus, to compute this norm, the norm is the component with the maximum
absolute value among all of the components in the vector.

$$\|x\|_\infty = \max_i|x_i|$$

For $\mathbf{x} = [3, -4]^T$,

$$\|x\|_\infty = 4$$

10. Prove or verify numerically that for any vector 
$\mathbf{x} \in \mathbb{R}^n$,

$$
   \|\mathbf{x}\|_\infty \le \|\mathbf{x}\|_2 \le \|\mathbf{x}\|_1.
$$  

Use $\mathbf{x} = [3,-4]^T$ as an example.


**Answer:**

Numerically is easy.

$$\|\mathbf{x}\|_\infty = 4$$

$$\|\mathbf{x}\|_2 = \sqrt{3^2 + (-4)^2} = 5$$

$$\|\mathbf{x}\|_1 = |3| + |-4| = 7$$

Proving this is a little more involved for arbitrary $n$.

$$(x_1​+x_2​+ \cdots +x_n​)^2= \sum_{i=1}^n x_i^2 + 2 \sum_{1 \le i < j \le n} x_i x_j$$

Similarly,

$$(|x_1|​+|x_2|​+ \cdots + |x_n| ​)^2= \sum_{i=1}^n |x_i|^2 + 2 \sum_{1 \le i < j \le n} |x_i| |x_j|$$

Taking the square root of both sides yields

$$\sqrt{(|x_1|​+|x_2|​+ \cdots + |x_n| ​)^2} = \sqrt{\sum_{i=1}^n |x_i|^2 + 2 \sum_{1 \le i < j \le n} |x_i| |x_j|}$$

which simplifies to

$$|x_1|​+|x_2|​+ \cdots + |x_n| = \sqrt{\sum_{i=1}^n |x_i|^2 + 2 \sum_{1 \le i < j \le n} |x_i| |x_j|}.  \tag{9.1}$$

Because $|x_i| |x_j|$ is guaranteed to be nonnegative,

$$2 \sum_{1 \le i < j \le n} |x_i| |x_j| \ge 0$$

So from (9.1),

$$\sqrt{\sum_{i=1}^n |x_i|^2 + 2 \sum_{1 \le i < j \le n} |x_i| |x_j|} \ge \sqrt{\sum_{i=1}^n |x_i|^2}$$

Thus,

$$|x_1|​+|x_2|​+ \cdots + |x_n| \ge \sqrt{\sum_{i=1}^n |x_i|^2}.$$

So,

$$\|\mathbf{x}\|_1 \ge \|\mathbf{x}\|_2. \tag{9.2}$$



To show $\|\mathbf{x}\|_\infty \le \|\mathbf{x}\|_2$, let $j = \argmax_i |x_i|$.  Then,

$$\sqrt{x_j^2} = |x_j| = \max |x_i| = \|\mathbf{x}\|_\infty. \tag{9.3}$$

Because $\forall i: x_i^2 \ge 0$,

$$\sum_i x_i^2 = x_1^2 + \cdots + x_j^2 + \cdots x_n^2 \ge x_j^2$$

Because square root is an increasing function, taking the square root of both sides does not change the direction of the inequality.

$$\sqrt{\sum_i x_i^2} \ge \sqrt{x_j^2}. \tag{9.4}$$

Substituting $(9.3)$ into $(9.4)$ yields

$$\sqrt{\sum_i x_i^2} \ge \|\mathbf{x}\|_\infty.$$

The lefthand side is the definition of $\|\mathbf{x}\|_2$ so

$$\|\mathbf{x}\|_2 \ge \|\mathbf{x}\|_\infty. \tag{9.5}$$

Combining $(9.2)$ and $(9.5)$ yields

$$ \|\mathbf{x}\|_\infty \le \|\mathbf{x}\|_2 \le \|\mathbf{x}\|_1.$$

QED

11. For vectors $\mathbf{u} = [1,1]^T$ and $\mathbf{v} = [2,0]^T$, compute:  
   

a) Euclidean distance $\|\mathbf{u} - \mathbf{v}\|_2$

**Answer:**

$$\|\mathbf{u} - \mathbf{v}\|_2 = \|[-1,1]^T\|_2 = \sqrt{(-1)^2 + 1^2} = \sqrt{2}$$

b) Cosine similarity $\frac{\mathbf{u}^T \mathbf{v}}{\|\mathbf{u}\|_2 \|\mathbf{v}\|_2}$.

**Answer:**

$$\cos \theta = \frac{1 \cdot 2 + 1 \cdot 0}{\sqrt{2} \cdot 2}$$

Simplifying yields

$$\frac{1}{\sqrt{2}}$$

## Part D. Trace Properties

12. **Linearity of Trace**
Let
$$
C = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}, \quad 
D = \begin{bmatrix} 2 & 0 \\ 3 & 1 \end{bmatrix}.
$$
a) $\mathrm{tr}(C+D) = \mathrm{tr}(C) + \mathrm{tr}(D)$.


**Answer:**

$$
\operatorname{tr}\left(
\begin{bmatrix}
  1 & 2 \\
  0 & 1
\end{bmatrix}
+ \begin{bmatrix}
  2 & 0 \\
  3 & 1
\end{bmatrix}
\right)
=
\operatorname{tr}\left(
\begin{bmatrix}
  3 & 2 \\
  3 & 2
\end{bmatrix}
\right) = 5
$$

$$
\operatorname{tr}\left(
\begin{bmatrix}
  1 & 2 \\
  0 & 1
\end{bmatrix}
\right) = 2
$$

$$
\operatorname{tr}\left(
\begin{bmatrix}
  2 & 0 \\
  3 & 1
\end{bmatrix}
\right) = 3
$$

$$\operatorname{tr}(C) + \operatorname{tr}(D) = 5$$



b) $\mathrm{tr}(\alpha C) = \alpha \,\mathrm{tr}(C)$ for scalar $\alpha = 3$.


**Answer:**

$$\operatorname{tr}\left(3\cdot
\begin{bmatrix}
  1 & 2 \\
  0 & 1
\end{bmatrix}
\right) = \operatorname{tr}\left(
\begin{bmatrix}
  3 & 6 \\
  0 & 3
\end{bmatrix}
\right) = 6
$$

$$3 \cdot \operatorname{tr}\left(
\begin{bmatrix}
  1 & 2 \\
  0 & 1
\end{bmatrix}
\right) = 3 \cdot 2 = 6
$$


13. **Cyclic Property of Trace**
Show that for compatible matrices $X, Y$:
$\mathrm{tr}(XY) = \mathrm{tr}(YX)$.
Verify with explicit computation using
$$
X = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}, \quad 
Y = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}.
$$


**Answer:**

Let $X$ and $Y$ be $n \times n$ matrices.

Let $C= XY$

$$C_{ij} = \sum_{k=1}^n X_{ik} Y_{kj} \tag{13.1}$$

$$\operatorname{tr}(XY) = \operatorname{tr}(C) = \sum_{i=1}^n C_{ii} \tag{13.2}$$

Substituting $(13.1)$ into $(13.2)$ yields

$$\operatorname{tr}(XY) = \sum_{i=1}^n \sum_{k=1}^n X_{ik} Y_{ki} \tag{13.3}$$

Similarly

$$\operatorname{tr}(YX) = \sum_{i=1}^n \sum_{k=1}^n Y_{ik} X_{ki}$$

Because additions is commutative,

$$\operatorname{tr}(YX) = \sum_{k=1}^n \sum_{i=1}^n Y_{ik} X_{ki}$$

Because $i$ and $k$ are bound variables, we can just swap labels to get

$$\operatorname{tr}(YX) = \sum_{i=1}^n \sum_{k=1}^n Y_{ki} X_{ik}$$

Because scalar multplication is commutative, the above becomes

$$\operatorname{tr}(YX) = \sum_{i=1}^n \sum_{k=1}^n X_{ik} Y_{ki} \tag{13.4}$$

Combining $(13.3)$ and $(13.4)$ yields

$$\operatorname{tr}(XY) = \operatorname{tr}(YX)$$

QED.

Now we verify with $X$ and $Y$ as defined in the Problem 13 description.

$$
X = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}, \quad 
Y = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}.
$$

$$\operatorname{tr}(XY) 
  = \operatorname{tr}
      \begin{bmatrix}
       2 & 1 \\
       1 & 0
      \end{bmatrix} = 2
$$

$$\operatorname{tr}(XY) 
  = \operatorname{tr}
      \begin{bmatrix}
       0 & 1 \\
       1 & 2
      \end{bmatrix} = 2
$$


14. **Trace of Outer Product**

Let $\mathbf{a} = [1,2]^T$.
Compute $\mathrm{tr}(\mathbf{a}\mathbf{a}^T)$ and explain why this equals $\|\mathbf{a}\|_2^2$.


**Answer:**

$$\operatorname{tr}\left(
  \begin{bmatrix}
    1 \\
    2
  \end{bmatrix}
  \begin{bmatrix}
    1 & 2
  \end{bmatrix}
\right) = 
\operatorname{tr}\left(
  \begin{bmatrix}
    1 & 2 \\
    2 & 4
  \end{bmatrix}
\right)
= 5
$$

We can easily explain why this is $\|\mathbf{a}\|_2^2$ 
by defining $\mathbf{a} = (x,y)$.

$$\operatorname{tr}\left(
  \begin{bmatrix}
    x \\
    y
  \end{bmatrix}
  \begin{bmatrix}
    x & y
  \end{bmatrix}
\right) = 
\operatorname{tr}\left(
  \begin{bmatrix}
    x^2 & xy \\
    yx  & y^2
  \end{bmatrix}
\right) = x^2 + y^2 = \|\mathbf{a}\|_2^2
$$

## Part E. Eigenvalues, PSD

15. **Eigenvalues and Eigenvectors**

Let
$$
E = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}.
$$

a) Find the eigenvalues and eigenvectors of $E$.


By definition an eigenvector $\mathbf{v}$ and eigenvalue $\lambda$ is a solution to

$$E\mathbf{v} = \lambda\mathbf{v}$$

Which can be rearranged as 

$$E\mathbf{v} = \lambda I \mathbf{v}$$

$$(E- \lambda I)\mathbf{v} = 0 \tag{15.1}$$

This can only occur if 

$$\det(E - \lambda I) = 0$$

$$
\det\left(\begin{bmatrix}
  2-\lambda & 1 \\
  1         & 2-\lambda
\end{bmatrix}\right) = (2-\lambda)^2 - 1 = (\lambda - 3)(\lambda -1) = 0
$$

This gives us 2 eigenvalues: 1 and 3.

We then solve $(15.1)$ for the first eigenvalue, $\lambda=1$.

$$(E - I)\mathbf{v} = 0 \tag{15.2}$$

$$
\begin{bmatrix}
  1 & 1 \\
  1 & 1
\end{bmatrix}
\mathbf{v} = 0
$$

which is satisfied when $x = -y$.  All multiples of $(1,-1)$ satisfy this.

For $\lambda=3$, 

$$(E - 3I)\mathbf{v} = 0$$

which expands to

$$
\begin{bmatrix}
  -1 & 1 \\
  1 & -1
\end{bmatrix}
\mathbf{v} = 0
$$

This is solved when $\mathbf{v}$ is a multiple of $(1,1)$, so this matrix
has eigenvectors $(1,-1)$ and $(1,1)$ with the eigenvalues 
$\lambda = 1$ and $\lambda=3$ respectively.

b) Verify that $E$ is symmetric and explain why its eigenvalues are guaranteed to be real.


Matrix $E$ is symmetric if and only if it is square and $E = E^T$, i.e., $E_{ij} = E_{ji}$ for all $i, j$.
Because $E$ is small, this is easy to show exhaustively.  It is trivially true along the 
diagonal where $i = j$.  There are only 2 off-diagonal entries:

$E_{12} = E_{21} = 1$.

$E$ is symmetric.

All real symmetric matrices have all real eigenvalues.  have a set of eigenvectors that form an orthonormal basis, meaning
the basis vectors are orthogonal and each has length 1.

This does not mean that the orthonormal basis vectors represent the 
entire set of eigenvectors for $E$, because the eigenvectors includes collinear
vectors.  For $E$, $(1,1)$ is an eigenvector, but so are all vectors $(a,a)$ 
for $a \in \mathbb{R}$.

16. **Positive Semidefinite Matrices**


a) Show that $E$ from Problem 15 is positive semidefinite by checking $\mathbf{x}^T E \mathbf{x} \geq 0$ for arbitrary $\mathbf{x}$.


**Answer:**

$E$ is a $2 \times 2$ matrix thus $\mathbf{x}$ must be 2-dimensional.

$$
\begin{bmatrix}
 x & y
\end{bmatrix}
\begin{bmatrix}
2 & 1 \\ 1 & 2
\end{bmatrix}
\begin{bmatrix}
 x \\ y
\end{bmatrix}
=
\begin{bmatrix}
2x + y & x + 2y
\end{bmatrix}
\begin{bmatrix}
 x \\ y
\end{bmatrix}
= 
(2x + y)x + (x + 2y)y 
$$

To demonstrate that this is greater than or equal to zero for all $x$ and $y$,

$$
(2x + y)x + (x + 2y)y = 2(x^2 + xy + y^2) = 2(x^2 + xy + \frac{y^2}{4} + \frac{3}{4}y^2)
  = 2(x + \frac{y}{4})^2 + 2\cdot\frac{3}{4}y^2
$$ 

Because both terms are squared, the result is guaranteed to be greater
than or equal to 0.

QED.


b) Why are PSD matrices important in optimization for machine learning?


**Possible Answer:**. There are multiple valid answers.  

For one, Hessian matrices (2nd derivative matrices) are PSD and are thus convex.
Convex functions guarantee that any local minimum is algo a global minimum.

When using gradient descent, stepping down the gradient for such a function 
steps toward the minimum.

**Grading note:** Ignore this problem.  We didn't cover where it appears in ML yet.

17. **Orthogonal Matrices**
a) Show that
$$
Q = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}
$$
is an orthogonal matrix.


**Possible Answer:**

A real square matrix $Q \in \mathbb{R}^{n \times n}$ is orthogonal if:
$Q^T Q = I$.

$$
\begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}
\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}
=
\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
= I
$$

QED.



b) Explain what it means geometrically when a matrix is orthogonal.


**Possible Answer:**

The basis vectors represented by the matrix are at right angles and
are all unit vectors. In other words, they form an orthonormal basis.
As a linear transform, an orthogonal matrix preserves length
and angles.  Orthogonal matrixes can perform rotations and 
reflections, but they cannot scale or shear. 