In [2]:
import sympy as sp

<hr style="border-width:4px; border-color:coral"></hr>

# Orthogonal vectors and unitary matrices 

<hr style="border-width:4px; border-color:coral"></hr>

Unitary matrix decompositions will play a crucial role in numerial linear algebra.  In this notebook, we will explore some of the basic properties relating to the orthogonality and unitary matrices. 

We start with a discussion of the matrix *transpose* (for real values matrices) and the *adjoint* (for complex values matrices). 

<hr style="border-width:2px; border-color:black"></hr>

###  Matrix transpose

For matrices with entries in $\mathbb R$ (e.g. real entries), the matrix transpose converts rows to columns.  The transpose of an  $m \times n$ matrix is a $n \times m$ matrix.  

The matrix transpose is indicated using a superscript $T$ and is indicaed as $A^T$.  

A square matrix for which $A = A^T$ is *symmetric*. 

In [2]:
A = sp.Matrix(2, 4, lambda i,j:sp.var(f'a_{i+1}{j+1}'))
print("A")
display(A)
print("")
 
print('A^T ("A transpose")')
display(A.T)

A


Matrix([
[a_11, a_12, a_13, a_14],
[a_21, a_22, a_23, a_24]])


A^T ("A transpose")


Matrix([
[a_11, a_21],
[a_12, a_22],
[a_13, a_23],
[a_14, a_24]])

<hr style="border-width:2px; border-color:black"></hr>

###  Matrix adjoint

If a matrix has entries in $\mathbb C$ (e.g. complex entries) a second related idea is the *Hermitian conjugate* or the *adjoint*. In this case,  not only are rows and columns switched, but complex entries are conjugated.

Recall that the *complex conjugate* of a complex number $z = x + iy$ is given by $\overline{z} = x - iy$.   For real $z \in \mathbb R$, $\overline{z} = z$.  

The symbol that is used is a superscript $*$ and is indicated $A^*$.   

A square matrix for which $A = A^*$ is *Hermitian*. 

In [3]:
A = sp.Matrix(2, 4, lambda i,j:sp.var(f'a_{i+1}{j+1}'))
print("A")
display(A)
print("")
 
print('A.adjoint()') 
display(A.adjoint()) 


A


Matrix([
[a_11, a_12, a_13, a_14],
[a_21, a_22, a_23, a_24]])


A.adjoint()


Matrix([
[conjugate(a_11), conjugate(a_21)],
[conjugate(a_12), conjugate(a_22)],
[conjugate(a_13), conjugate(a_23)],
[conjugate(a_14), conjugate(a_24)]])

#### Example : Adjoint

In [4]:
I = sp.I
z = sp.symbols("z")
z = 1 + 2*I

A = sp.Matrix(3,1,[1 + I, 2 + I, 3 + I])
display(A)
print("")

print('A.adjoint()')
display(A.adjoint())

Matrix([
[1 + I],
[2 + I],
[3 + I]])


A.adjoint()


Matrix([[1 - I, 2 - I, 3 - I]])

In this course, we will working mainly with matrices with real entries, but to stick with the textbook notation, we will use $A^*$.  Also, Golub and Van Loan also makes use the more general "adjoint" notation.  

We will use terminology "transpose" for matrices with real entries and "adjoint" for matrices with complex entries.

Fill in the properties of the transpose and adjoint below.  

#### Review : Properties of the transpose and adjoint

Below is a review of the rules for manipulating the matrix transpose and the adjoint.  

**Real matrices** 

In the following, assume that $A, B \in \mathbb R^{m \times m}$.

1. $(A^T)^T = A$

2.  $(AB)^T = B^T A^T$. 

3. $(A + B)^T = A^T + B^T$. 

4. $(A^{-1})^T = (A^{T})^{-1}$

5.  For $x \in \mathbb R$, $x^T = x$. 

6. For $c \in \mathbb R$, $(cA)^T = c A^T$

7.  For $\mathbf x \in \mathbb R^{m \times 1}$,   $(A \mathbf x)^T = x^T A^T$

8.  For $\mathbf x \in \mathbb R^{m \times 1}$, $(\mathbf x^T A \mathbf x)^T = \mathbf x^T A^T \mathbf x  =? \mathbf x^T A \mathbf x$

**Complex matrices**

In the following, assume that $A, B \in \mathbb C^{m \times m}$ 

1. $A^* = \bar{A}^T$

2. $(A^*)^* = A$

3. $(AB)^*= B^* A^*$. 

4. $(A + B)^* = \dots$. 

5. $(A^{-1})^* = \dots$ 

6.  For $x \in \mathbb C$, $x^* = \overline{x}$. 

7. For $c \in \mathbb C$, $(cA)^* = \bar{c} A^*$

8.  For $\mathbf x \in \mathbb C^{m \times 1}$,  $(A \mathbf x)^* = \dots$

9.  For $\mathbf x \in \mathbb C^{m \times 1}$, $(\mathbf x^* A \mathbf x)^* = \dots$

<hr style="border-width:4px; border-color:coral"></hr>

## Inner product and orthogonality

<hr style="border-width:4px; border-color:coral"></hr>

The inner product of two vectors $\mathbf x, \mathbf y \in \mathbb C^{m \times 1}$is expressed using the adjoint as 

\begin{equation}
\mathbf x^* \mathbf y = \sum_{i=1}^{m} x_i y_i
\end{equation}

As a generalization of the Pythagorean Theorem, we can express the length or "Euclidean norm" of a vector using the inner product. 

\begin{equation}
\Vert \mathbf x\Vert =  \sqrt{\mathbf x^* \mathbf x} = \sum_{i=1}^m |x_i|^2
\end{equation}

For vectors $\mathbf a, \mathbf b \in \mathbb R^2$, we are familar with the definition of a dot product as 

\begin{equation}
\mathbf a \cdot \mathbf b = ab \cos(\theta)
\end{equation}

where $a = \Vert \mathbf a \Vert$ and $b = \Vert \mathbf b \Vert$, and $\theta$ is the angle between $\mathbf a$ and $\mathbf b$.   If $\theta = \pi/2$, then the two vectors are at right angles to each other.  

#### Orthogonal vectors 

This idea can be generalized to vectors in $\mathbb R^m$.  While it is difficult to picture vectors in $\mathbb R^m$ at "right angles" to each other, we can nonetheless use the inner product to define 

\begin{equation}
\cos \theta = \frac{\mathbf x^* \mathbf y}{\Vert \mathbf x \Vert \Vert \mathbf y \Vert}
\end{equation}

where now, $\theta$ is the *angle* between vectors in $\mathbb R^m$.  Vectors for which $\mathbf x^* \mathbf y = 0$ are said to be *orthogonal*.  

* A set of vectors is said to *orthogonal* if all vectors are pairwise orthogonal.  

* A set of vectors is said to be *orthonormal* if the set is orthogonal, and in addition, each vector has length 1, e.g. $\Vert \mathbf x \Vert = 1$ for each vector $\mathbf x$ in the set.


### Theorem

Vectors in an orthogonal set $S$ are linearly independent. 

### Proof

Let $S = \{\mathbf u_1, \mathbf u_2, \dots \mathbf u_n\}$. We need to show that if

\begin{equation}
c_1 \mathbf u_1 + c_2 \mathbf u_2 + \dots + c_n\mathbf u_n = 0, 
\end{equation}

then $c_1 = c_2 = \dots = c_n = 0$.  Since $S$ is an orthogonal set, we have $\mathbf u_i^* \mathbf u_j = 0$, for $i \ne j$.  From this, we can conclude that 


\begin{equation}
\mathbf u_i^*\left(c_1 \mathbf u_1 + c_2 \mathbf u_2 + \dots + c_n\mathbf u_n\right) =  c_i \mathbf u_i^* \mathbf u_i =  c_i \Vert \mathbf u_i \Vert^2 = 0, \qquad i = 1,2,\dots n.
\end{equation}

By assumption, $\Vert \mathbf u_i\Vert \ne 0$, so we must have $c_i = 0$, for $i = 1,2,\dots, n$. $\blacksquare$

<hr style="border-width:2px; border-color:black"></hr>

### Decomposing vectors in an orthonormal basis

Suppose we have an orthonormal set of vectors $S = \{ \mathbf q_i\}_{i=1}^n$, where $\mathbf q_i \in \mathbf C^m$.   Under what condtions can we express an arbitrary vector $\mathbf v$ in terms of vectors $\mathbf q_i$? 

#### Case 1 : $n = m$

In this case, the vectors in $S$ form a basis for $\mathbb C^m$, so we can write 

\begin{equation}
\mathbf v = c_1 \mathbf q_1 + c_2 \mathbf q_2 + \dots + c_m \mathbf q_m
\end{equation}

for scalars $c_i \in \mathbb C$.  

#### Case 2 : $n < m$

In this case, we can only express $\mathbf v$ in terms of vectors in $S$ if $\mathbf v$ is in the span of $S$.  More generally, we have 

\begin{equation}
\mathbf v = c_1 \mathbf q_1 + c_2 \mathbf q_2 + \dots + c_n \mathbf q_n + \mathbf r
\end{equation}

where $\mathbf r$ is orthogonal to vectors in $S$.  

#### Components of $\mathbf v$ in $Q$

Because the vectors in $S$ form an orthonormal set,  it is especially easy to compute the coefficients $c_i$.  

We have

\begin{equation}
\mathbf q_i^* \mathbf v = \sum_{k=1}^{n} c_k \mathbf q_i^* \mathbf q_k = c_i
\end{equation}

#### Question 

Why the above is true?

If $n = m$, we have 

\begin{equation}
\mathbf v = \sum_{k=1}^{n} (\mathbf q_k^* \mathbf v) \mathbf q_k
\end{equation}

If $n < m$, then 

\begin{equation}
\mathbf v = \sum_{k=1}^{n} (\mathbf q_k^* \mathbf v) \mathbf q_k + \mathbf r = Q\mathbf c + \mathbf r
\end{equation}

where $Q \in \mathbb C^{m \times n}$ is a matrix whose columns are the vectors in $S$ and $\mathbf c$ is a vector whose components are the coeffiients $c_i$.   The vector $Q\mathbf c = \mathbf v - \mathbf r$ can be thought of as a *projection* of $\mathbf v$ onto the subspace spanned by vectors in $Q$. 

<br>

<center>
<img width=600px src="./images/ortho_01.png"></img>   
<center>

<hr style="border-width:4px; border-color:coral"></hr>

## Unitary matrices

<hr style="border-width:4px; border-color:coral"></hr>

Suppose $S$ is an orthonormal set of $m$ vectors in $\mathbb C^{m \times 1}$.  The square matrix $Q$ whose columns are the vectors in $S$ is called a *unitary* matrix.  If the entries of $Q$ are all real, we say that $Q$ is an *orthogonal* matrix.

#### Inverse of a unitary matrix

* Are all unitary matrices non-singular?  Yes

* What is the inverse of a unitary matrix? The inverse of a unitary matrix is just its adjoing, e.g. $Q^{-1} = Q^*$. 

#### Multiplication by a unitary matrix.  

*  Show that if $Q$ is unitary, then $\Vert Q \mathbf x \Vert = \Vert \mathbf x \Vert$. 

\begin{equation}
\Vert Q \mathbf x \Vert^2 = (Q \mathbf x)^*(Q \mathbf x) =  \mathbf x^* Q^*Q \mathbf x = \mathbf x^* \mathbf x = \Vert \mathbf x\Vert^2
\end{equation}

Since the norm is non-negative, we conclude $\Vert Q \mathbf x\Vert = \Vert \mathbf x \Vert$. $\blacksquare$

<hr style="border-width:2px; border-color:black"></hr>

#### Homework : Unitary matrices

1.  TB, Lecture 2, Exercises 2.1-2.6 (pages 15-16).  

