# Project 2

This assessment is marked out of 100 and comprises 15% of the final course mark.

Due by 23:59 on Wednesday December 2nd, to be submitted on Learn.

### Academic misconduct

The assessment is summative in nature. You are expected to be aware of and abide by University policies on academic misconduct.

- [School of Mathematics academic misconduct advice and policies](https://teaching.maths.ed.ac.uk/main/undergraduate/studies/assessment/academic-misconduct)
- [Academic Services academic misconduct information](https://www.ed.ac.uk/academic-services/students/conduct/academic-misconduct)

**This is an individual assignment - do not share your work with another student.**

If you use any resources (e.g. textbooks or websites), then include appropriate references in your solutions. Course materials do not need to be referenced, but you should clearly state which results you are using.


### Code commentary

Your code should be extensively commented, with the functionality of each line of code explained with a comment. This is to test your understanding of the code you have written. Up to half of the marks associated with the coding part of a question may be deducted for a missing, incomplete, or inaccurate code commentary.

Your comments should explain what the code does, as well as why it does it.

The following provides an example of the expected level of commenting.

In [1]:
def is_prime(n):
    """
    Return whether an input positive integer is prime.
    """
    
    if n == 1:        # If n is 1 ...
        return False  # ... then n is not prime
    
    for i in range(2, n):  # Test integers i from 2 to n - 1 inclusive
        if n % i == 0:     # If n is divisible by i ...
            return False   # ... then n is not prime
    # If n is not divisible by any integers from 2 to n - 1 inclusive then n is
    # prime
    return True

### Code efficiency

To obtain full marks, your code and pseudo-code should be *efficient* in the sense of avoiding unnecessary artihmetic operations. 

### Output 

Your code must generate and display all relevant output when run. Rerun your code cells after editing your code, to make sure that the output is updated.

### Re-using code

You may re-use your own code from previous workshops. You may also use the model solutions for workshop exercises posted on Learn, and Jupyter notebooks from lecture material posted on Learn. Any code you re-use should be commented to the level explained above, even if you are using the model solutions. The only exception to the latter is the function randsvd, introduced in the Week3.2 notebook in lectures, which may be used without further commenting.

### Markdown cells

You can enter your answers to theoretical questions in the Markdown cells provided in this notebook. To start editing the cell, press shift+enter or double click on it. You can use basic Latex. To render the cell, press shift+enter or run.

Alternatively, you can submit your hand-written and scanned in answers to the theoretical questions on Learn, alongside this notebook. 

# Question 1 [40 marks]

Consider the following version of Algorithm PI.

**Algorithm PI:**

*Input*: $\mathbf{A}\in\mathbb{R}^{n\times n}$ symmetric, $\mathbf z^{(0)} \in \mathbb R^n \setminus \{\mathbf 0 \}$, $\varepsilon > 0$

*Output*: $\mathbf z^{(k)} \in \mathbb R^n$, $\lambda^{(k)} \in \mathbb R$, with $\mathbf z^{(k)} \approx \mathbf  x_\mathrm{max}$ and $\lambda^{(k)} \approx \lambda_\mathrm{max}$ 

1. $k=0$ 

2. $\lambda^{(0)} = r_\mathbf{A}(\mathbf{z}^{(0)})$

3. While $\|\mathbf A \mathbf z^{(k)} - \lambda^{(k)} \mathbf z^{(k)}\|_2 > \varepsilon$

4. $\quad$ $k = k+1$

5. $\quad$ $\mathbf z^{(k)} = \mathbf A \mathbf z^{(k-1)}$

6. $\quad$ $\mathbf z^{(k)} = \frac{\mathbf z^{(k)}}{\|\mathbf z^{(k)}\|_2}$

7. $\quad$ $\lambda^{(k)} = r_\mathbf{A}(\mathbf{z}^{(k)})$

8. End While

### Question 1.1

In the code cell below, write a function *PI* that implements Algorithm PI following the pseudo-code above.

Test your code on the example given at the bottom of the code, for which the answer should converge to $\lambda_1 = 11$ and $\mathbf x_1 = \frac{1}{\sqrt{10}} \begin{bmatrix} -1 \\ 3 \end{bmatrix}$.

**[10 marks]**

In [2]:
# Add your code for Question 1.1 here
import numpy as np

def Rayleigh(x, A):
    """
    Calculates the rayleigh quotient of A for a given x. Calulated as (x^T)A(x)/(x^T)x.
    
    Input: x (n by 1 np.array), A (n by n np.array)
    Output: Integer (rayleigh quotient of A)
    """
    return (x.T@A@x)/(x.T@x) #Caluclates and returns the rayleigh coefficient of A for a given x.
   
def PI(A,z0,eps):
    """
    This calulates the algorithm PI for a given A, staring guess z0 and eps which is the tolerance. 
    And returns the estimates for the eigenvalue and eigenvector lk, zk
    """
    
    zk = z0             # Let the zk be the staring guess
    lk = Rayleigh(z0,A) # Calulate the first value of lambda with the inital guess with z0
    
    # Start the main 'while' loop, condition that if the norm of A@zk - lk*zk is small enough then the calualted value is close to the real value
    # Of the eigevalue and eignevector where the eignevector is zk and eigenvalue lk.
    while np.linalg.norm(A@zk - lk*zk) > eps:
        
        zk = A@zk                    # Multiply zk by A, as part of the algorithm to get zk closer to our desired eigenvector
        zk = zk / np.linalg.norm(zk) # Normalize zk so we aviod working with big numbers, which consequently can cause errors when divided by smaller ones.
        lk = Rayleigh(zk, A)         # Calculate the new lk with the updated zk using the Rayleigh method above
        
    return lk, zk                    # Returns the lk and zk which are found after the condiotin on the 'while' loop is False

# Small test example
A = np.array([[2, -3], [-3,10]], dtype=float)
z0 = np.array([1,1], dtype=float)

eps = 10**(-3)
lk, zk = PI(A,z0,eps)

print(f"Value of the eigenvalue calulated using algorithm PI: {lk}")
print(f"Value of the z^(k) calulated using algorithm PI: {zk}")

Value of the eigenvalue calulated using algorithm PI: 10.999999998457827
Value of the z^(k) calulated using algorithm PI: [-0.31621598  0.94868723]


### Question 1.2

For each of the examples given below, apply your Algorithm PI from Question 1.1. Do the outputs $\lambda^{(k)}$ and $\mathbf z^{(k)}$ converge, and if so, what do they converge to? Is this as you expected? Discuss your observations in relation to Theorem 6.5. 

If the convergence behaviour is not as expected, why do you think this is happening? *You do not need to provide full proofs. Your argument should contain a similar amount of detail to the discussion at the beginning of Section 6.1 and the accompanying lecture video.*

*You should use the function np.linal.eig to support your investigation and discussion. You do not need to comment on the speed of convergence.*

**[30 marks]**

In [3]:
# Example 1
A = np.array([[3/2, 1/2], [1/2, 3/2]], dtype=float)
z0 = np.array([1,1]/np.sqrt(2), dtype=float)

# Run your numerical tests here
eps = 10**(-3)
lk, zk = PI(A,z0,eps)
print(f"Value of the eigenvalue calulated using algorithm PI: {lk}")
print(f"Value of the z^(k) calulated using algorithm PI: {zk} \n")

evals, evecs = np.linalg.eig(A)
max_eval = evals[0]
max_evec = evecs[:,0]
print(f"Eigenvalues calculated with np.linalg.eig(A): {evals}")
print(f"Eigenvectors calculated with np.linalg.eig(A): {evecs} \n")

print(f"Value of the maximum eigenvalue calulated using np.linalg.eig(A): {max_eval}")
print(f"Value of the corresponding eigenvector calulated using np.linalg.eig(A): {max_evec} \n")

print(f"Value of (x_1)^T*z^(0): {max_evec.T@z0}")

Value of the eigenvalue calulated using algorithm PI: 2.0
Value of the z^(k) calulated using algorithm PI: [0.70710678 0.70710678] 

Eigenvalues calculated with np.linalg.eig(A): [2. 1.]
Eigenvectors calculated with np.linalg.eig(A): [[ 0.70710678 -0.70710678]
 [ 0.70710678  0.70710678]] 

Value of the maximum eigenvalue calulated using np.linalg.eig(A): 2.0
Value of the corresponding eigenvector calulated using np.linalg.eig(A): [0.70710678 0.70710678] 

Value of (x_1)^T*z^(0): 0.9999999999999998


*Discuss Example 1 here.*

In this case we can see that  $\lambda^{(k)}$ and $\mathbf z^{(k)}$ converge to the correct eigenvalues and eigenvectors, in particular they converge to the greatest eigenvector and corresponding eigenvalue, which in this case is $\lambda^{(k)}$ = 2 and $\mathbf z^{(k)}$ = $[0.70710678, 0.70710678]$.

This is what is expected since they converge to the greatest eigenvalue since we know that $|\lambda_1| > |\lambda_2|$ in our case because when using $np.linalg.eig(A)$ we find the eigenvalues 2 and 1, and so since 2 > 1, the first condtion is satsifed. 

We can also note that $\mathbf x_1^T\mathbf z^{(0)} \ne 0$ since it is equal to 1. Then all the condtions for theorem 6.5 are satsified which means that we know that it will converge to $\lambda_1$ = 2.

In [4]:
# Example 2
A = np.array([[3/2, -1/2], [-1/2, 3/2]], dtype=float)
z0 = np.array([1,1]/np.sqrt(2), dtype=float)

# Run your numerical tests here
eps = 10**(-3)
lk, zk = PI(A,z0,eps)
print(f"Value of the eigenvalue calulated using algorithm PI: {lk}")
print(f"Value of the z^(k) calulated using algorithm PI: {zk} \n")

evals, evecs = np.linalg.eig(A)
max_eval = evals[0]
max_evec = evecs[:,0]
print(f"Eigenvalues calculated with np.linalg.eig(A): {evals}")
print(f"Eigenvectors calculated with np.linalg.eig(A): {evecs} \n")

print(f"Value of the maximum eigenvalue calulated using np.linalg.eig(A): {max_eval}")
print(f"Value of the corresponding eigenvector calulated using np.linalg.eig(A): {max_evec} \n")

print(f"Value of (x_1)^T*z^(0): {max_evec.T@z0}")

Value of the eigenvalue calulated using algorithm PI: 1.0
Value of the z^(k) calulated using algorithm PI: [0.70710678 0.70710678] 

Eigenvalues calculated with np.linalg.eig(A): [2. 1.]
Eigenvectors calculated with np.linalg.eig(A): [[ 0.70710678  0.70710678]
 [-0.70710678  0.70710678]] 

Value of the maximum eigenvalue calulated using np.linalg.eig(A): 2.0
Value of the corresponding eigenvector calulated using np.linalg.eig(A): [ 0.70710678 -0.70710678] 

Value of (x_1)^T*z^(0): 1.6653345369377348e-16


*Discuss Example 2 here.*

In this case we can see that  $\lambda^{(k)}$ and $\mathbf z^{(k)}$ do not converge to the correct eigenvalues and eigenvectors, in particular they converge to $\lambda^{(k)}$ = 1 and $\mathbf z^{(k)}$ = $[0.70710678, 0.70710678] = [1/\sqrt{2}, 1/\sqrt{2}]$ which in fact correspond to the smallest eigenvalue and corresponding eigenvector of $\mathbf A$ as seen form the example above.

Even though this is not what is expected upon examination we can see why this is the case. We know that $|\lambda_1| > |\lambda_2|$ because 2 > 1 when using $np.linalg.eig(A)$ however we can see that $\mathbf x_1^T\mathbf z^{(0)} = 0$ by calculation since $\mathbf z^{(0)} = [1/\sqrt{2}, 1/\sqrt{2}]$ and $\mathbf x_1 = [1/\sqrt{2}, -1/\sqrt{2}]$ and since they are orthogonal their dot product will be zero. Then the condtions for theorem 6.5 are not satsified which means that we know that it will not necessarily converge to the greatest value of $\mathbf A$.

If $\mathbf x_1^T\mathbf z^{(0)} = 0$ we can then use this to evaluate the coefficents of $\mathbf z^{(0)} = \sum_{i=1}^{n}c_i\mathbf x_i$ when we write $\mathbf z^{(0)}$ as a decompostion of the set of orthonornmal eigenvectors of $\mathbf A$, since we know that the eigenvectors are orthnorormal we have the properties that $\mathbf x_i^T \mathbf x_i = 0$ if $\mathbf x_i \ne \mathbf x_i$ otherwise $\mathbf x_i^T \mathbf x_i = 1$  .We start by multiplying both sides by $\mathbf x_1^T$ to get:

$$ 
\mathbf x_1^T\mathbf z^{(0)} = 
\mathbf x_1^T\sum_{i=1}^{n}c_i\mathbf x_i = 
\sum_{i=1}^{n}c_i\mathbf x_1^T\mathbf x_i
$$

However since the we know by the dot product of orthonormal vectors that this will simply reduce the sum to $\mathbf x_1^T\mathbf z^{(0)} = c_1\mathbf x_1^T\mathbf x_1$ since $\mathbf x_1^T\mathbf x_i = 0$ for all i $\ne 1$. We can reduce it further since $\mathbf x_1^T\mathbf x_1 = 1$. Thefore we get the expression:

$$\mathbf x_1^T\mathbf z^{(0)} = c_1\mathbf x_1^T\mathbf x_1 = c_1 $$ 

So we have that $\mathbf x_1^T\mathbf z^{(0)} = c_1 $ but $\mathbf x_1^T\mathbf z^{(0)}$ = 0 therefore $c_1 = 0$ in this case.

We can now find to what eigenvalue and eigenvector that $\mathbf A^k \mathbf z^{(0)}$ will converge to.

$$\mathbf A^k \mathbf z^{(0)} = 
\sum_{i=1}^{n} \mathbf A^{k}c_i\mathbf x_i = 
\sum_{i=1}^{n} c_i\mathbf A^{k}\mathbf x_i = 
\sum_{i=1}^{n} c_i\lambda_{i}^{k}\mathbf x_i $$

But since $c_1 = 0$ we can start the sum form $i = 2$ so we get that:

$$\mathbf A^k \mathbf z^{(0)} = 
\sum_{i=2}^{n} c_i\lambda_{i}^{k}\mathbf x_i
$$

Therefore for large value of $k$ this sum will be dominated by the term corresponding to the eigenvalue of $\textbf A$ corresponding to the second largest eigenvalue of $\textbf A$, we can see that this is indeed the case since the second largest eigenvalue of $\textbf A$ = 1 and the eigenvalue calulated with Algorithm PI does indeed converge to this value, with the corresponding eigenvector of the second largest eigenvalue.

In [5]:
# Example 3
A = np.array([[2, 0], [0,2]], dtype=float)
z0 = np.array([1,1]/np.sqrt(2), dtype=float)

# Run your numerical tests here
eps = 10**(-3)
lk, zk = PI(A,z0,eps)
print(f"z^(0): {z0} \n")
print(f"Norm of the z^(0): {np.linalg.norm(z0)} \n")
print(f"Value of the eigenvalue calulated using algorithm PI: {lk}")
print(f"Value of the z^(k) calulated using algorithm PI: {zk} \n")

evals, evecs = np.linalg.eig(A)
max_eval = evals[0]
max_evec = evecs[:,0]
print(f"Eigenvalues calculated with np.linalg.eig(A): {evals}")
print(f"Eigenvectors calculated with np.linalg.eig(A): {evecs} \n")

print(f"Value of the maximum eigenvalue calulated using np.linalg.eig(A): {max_eval}")
print(f"Value of the corresponding eigenvector calulated using np.linalg.eig(A): {max_evec} \n")

print(f"Value of (x_1)^T*z^(0): {max_evec.T@z0}")
print(f"Value of (x_2)^T*z^(0): {evecs[:,1].T@z0}")

z^(0): [0.70710678 0.70710678] 

Norm of the z^(0): 0.9999999999999999 

Value of the eigenvalue calulated using algorithm PI: 2.0
Value of the z^(k) calulated using algorithm PI: [0.70710678 0.70710678] 

Eigenvalues calculated with np.linalg.eig(A): [2. 2.]
Eigenvectors calculated with np.linalg.eig(A): [[1. 0.]
 [0. 1.]] 

Value of the maximum eigenvalue calulated using np.linalg.eig(A): 2.0
Value of the corresponding eigenvector calulated using np.linalg.eig(A): [1. 0.] 

Value of (x_1)^T*z^(0): 0.7071067811865475
Value of (x_2)^T*z^(0): 0.7071067811865475


*Discuss Example 3 here.*

In this case we can see that  $\lambda^{(k)}$ does converge to the largest eigenvalue but simply because the eigenvalues of $\mathbf A$ are both 2 when using $np.linalg.eig(A)$. However $\mathbf z^{(k)}$ does not converge to either of the eigenvectors of $\mathbf A$. In particular they converge to $\lambda^{(k)}$ = 2 and $\mathbf z^{(k)}$ = $[0.70710678, 0.70710678] = [1/\sqrt{2}, 1/\sqrt{2}]$.

Even though this is not what is expected upon examination we can see why this is the case. We know that $|\lambda_1| = |\lambda_2|$ when using $np.linalg.eig(A)$. Then the one of the condtions for Theorem 6.5 are not satsified in particular that $|\lambda_1| > |\lambda_2|$ and so it will not necessarily converge to the greatest eigenvector of $\mathbf A$.

What we can note is our case $\mathbf x_1^T\mathbf z^{(0)} = 1/\sqrt{2}$ since we know that $\mathbf x_1 = [1, 0]$, likewise $\mathbf x_2^T\mathbf z^{(0)} = 1/\sqrt{2}$. This means that $c_1 = c_2 = 1/\sqrt{2}$ since we know that $
\mathbf x_i^T\mathbf z^{(0)} = 
\mathbf x_i^T\sum_{i=1}^{n}c_i\mathbf x_i = 
\sum_{i=1}^{n}c_i\mathbf x_i^T\mathbf x_i
$
and then we can divide for cases for different $\mathbf x_i$ and see that they evaluate to $c_i = 1/\sqrt{2}$.

In our case we can analyse $\mathbf A^k \mathbf z^{(0)} = \sum_{i=1}^{n} \mathbf A^{k}c_i\mathbf x_i$ for $n$ up to 2.


$$\mathbf A^k \mathbf z^{(0)} = 
\sum_{i=1}^{2} \mathbf A^{k}c_i\mathbf x_i = 
\sum_{i=1}^{2} c_i\mathbf A^{k}\mathbf x_i = 
\sum_{i=1}^{2} c_i\lambda_{i}^{k}\mathbf x_i =
c_1\lambda_{1}^{k}\mathbf x_1 + c_2\lambda_{2}^{k}\mathbf x_2
$$

But in this case we know that $\lambda_{1} = \lambda_{2}$, therfore:

$$\mathbf A^k \mathbf z^{(0)} = 
c_1\lambda_{1}^{k}\mathbf x_1 + c_2\lambda_{2}^{k}\mathbf x_2 = 
\lambda^{k}(c_1\mathbf x_1 + c_2\mathbf x_2)
$$

Now we can subsitute in $c_1$ and $c_2$ to get:

$$\mathbf A^k \mathbf z^{(0)} = 
\lambda^{k}(c_1\mathbf x_1 + c_2\mathbf x_2) = 
\lambda^{k}(1/\sqrt{2}\mathbf x_1 + 1/\sqrt{2}\mathbf x_2)
$$

Since we know that $\mathbf x_1$ and $\mathbf x_2$ are the unit vectors then we know that $1/\sqrt{2}\mathbf x_1 + 1/\sqrt{2}\mathbf x_2$ is exactly $z^{(0)}$ and so we get that:

$$\mathbf A^k \mathbf z^{(0)} = 
\lambda^{k}(1/\sqrt{2}\mathbf x_1 + 1/\sqrt{2}\mathbf x_2) = 
\lambda^{k}\mathbf z^{(0)}
$$

As a result we can see that for large k the sum will just converge to some multiple of $\mathbf z^{(0)}$ but since we are normalizing in the PI algorithm we will simply get that $\mathbf z^{(k)}$ will remain the same and that is why the eigenvector reutrned by the PI algorithm is indeed $\mathbf z^{(0)}$. Likewise the eigenvalue returned is 2 since it will converge to the eigenvalue which in this case is the same for both eigenvectors which in this case is 2.

Another way to see this is that for general n we have that 
$$\mathbf A^k \mathbf z^{(0)} = 
\sum_{i=1}^{n} \mathbf A^{k}c_i\mathbf x_i = 
\sum_{i=1}^{n} c_i\mathbf A^{k}\mathbf x_i = 
\sum_{i=1}^{n} c_i\lambda_{i}^{k}\mathbf x_i \approx	
c_1\lambda_{1}^{k}\mathbf x_1 + c_2\lambda_{2}^{k}\mathbf x_2
$$
Which means that as k tends to infity there is no way to distinguish between the $\lambda_{1}^{k}$ and $\lambda_{2}^{k}$ since they are both equally dominating since $\lambda_{1}$ = $\lambda_{2}$ and as a result the eigenvector does not converge to either of them.


In [6]:
# Example 4
A = np.array([[2, 0], [0,-2]], dtype=float)
z0 = np.array([1,1]/np.sqrt(2), dtype=float)

# Run your numerical tests here
evals, evecs = np.linalg.eig(A)
max_eval = evals[0]
max_evec = evecs[:,0]
print(f"Eigenvalues calculated with np.linalg.eig(A): {evals}")
print(f"Eigenvectors calculated with np.linalg.eig(A): {evecs} \n")

print(f"Value of the maximum eigenvalue calulated using np.linalg.eig(A): {max_eval}")
print(f"Value of the corresponding eigenvector calulated using np.linalg.eig(A): {max_evec} \n")

print(f"Value of the Rayleigh coefficent of z0: {Rayleigh(z0,A)} \n")
print(f"Norm of z0: {np.linalg.norm(z0)} \n")

print(f"Norm of Az0: {np.linalg.norm(A@z0)}")

Eigenvalues calculated with np.linalg.eig(A): [ 2. -2.]
Eigenvectors calculated with np.linalg.eig(A): [[1. 0.]
 [0. 1.]] 

Value of the maximum eigenvalue calulated using np.linalg.eig(A): 2.0
Value of the corresponding eigenvector calulated using np.linalg.eig(A): [1. 0.] 

Value of the Rayleigh coefficent of z0: 0.0 

Norm of z0: 0.9999999999999999 

Norm of Az0: 1.9999999999999998


*Discuss Example 4 here.*

What is important to see in this case is that when you try to run Algorithm PI it does not converge and runs infintely. (I have decied to omit the python cell to show it does not return any results and runs infintley.)

This means that the condition on line 3 "While $\|\mathbf A \mathbf z^{(k)} - \lambda^{(k)} \mathbf z^{(k)}\|_2 > \varepsilon$" is always true.

We know we can write $\mathbf z^{(k)}$ as a linear combinations of the eigenvectors of $\textbf A$, ie $\mathbf z^{(0)} = \sum_{i=1}^{n}c_i\mathbf x_i$. We know that in this case by using $np.linalg.eig(A)$ the eigenvalues are -2 and 2.

What we can note is our case $\mathbf x_1^T\mathbf z^{(0)} = 1/\sqrt{2}$ since we know that $\mathbf x_1 = [1, 0]$, likewise $\mathbf x_2^T\mathbf z^{(0)} = 1/\sqrt{2}$. This means that $c_1 = c_2 = 1/\sqrt{2}$ since we know that $
\mathbf x_i^T\mathbf z^{(0)} = 
\mathbf x_i^T\sum_{i=1}^{n}c_i\mathbf x_i = 
\sum_{i=1}^{n}c_i\mathbf x_1^T\mathbf x_i
$
and then we can divide for cases for different $\mathbf x_i$ and see that they evaluate to $c_i = 1/\sqrt{2}$



We start by inspecting $\mathbf A \mathbf z^{(k)}$ we know that by the algorithm PI this is equivalent to multiplying $\mathbf z^{(0)}$ by $\mathbf A$ k times therfore we can simply evaluate this expression:

$$\mathbf A^k \mathbf z^{(0)} = 
\sum_{i=1}^{n} \mathbf A^{k}c_i\mathbf x_i = 
\sum_{i=1}^{n} c_i\mathbf A^{k}\mathbf x_i = 
\sum_{i=1}^{n} c_i\lambda_{i}^{k}\mathbf x_i =
c_1\lambda_{1}^{k}\mathbf x_1 + c_2\lambda_{2}^{k}\mathbf x_2
$$

Now we can subsitute in $c_1$ and $c_2$ to get:

$$\mathbf A^k \mathbf z^{(0)} = 
c_1\lambda_{1}^{k}\mathbf x_1 + c_2\lambda_{2}^{k}\mathbf x_2 = 
1/\sqrt{2}\lambda_{1}^{k}\mathbf x_1 + 1/\sqrt{2}\lambda_{2}^{k}\mathbf x_2
$$

When k is even, since $\lambda_{1}$ = 2 and $\lambda_{2}$ = -2, then we know that $\lambda_{1}^{k} = \lambda_{2}^{k}$. So we can see that it reduces to:

$$\mathbf A^k \mathbf z^{(0)} =  
1/\sqrt{2}\lambda_{1}^{k}\mathbf x_1 + 1/\sqrt{2}\lambda_{2}^{k}\mathbf x_2 = 
\lambda^{k}(1/\sqrt{2}\mathbf x_1 + 1/\sqrt{2}\mathbf x_2) = 
\lambda^{k}(\mathbf z^{(0)})
$$

This is because we know that $\mathbf z^{(0)} = [1/\sqrt{2},1/\sqrt{2}]$, therefore we can see that when we normalize $\lambda^{k}(\mathbf z^{(0)})$ it will simply be equal to $\mathbf z^{(0)}$ since the norm of $\mathbf z^{(0)} = 1$.

Simliarly if k is odd, since $\lambda_{1}$ = 2 and $\lambda_{2}$ = -2, then we know that for odd powers of k $\lambda_{1}^{k} = -\lambda_{2}^{k}$. So we can see that it reduces to:

$$\mathbf A^k \mathbf z^{(0)} =  
1/\sqrt{2}\lambda_{1}^{k}\mathbf x_1 + 1/\sqrt{2}\lambda_{2}^{k}\mathbf x_2 = 
\lambda^{k}(1/\sqrt{2}\mathbf x_1 - 1/\sqrt{2}\mathbf x_2) = 
\lambda^{k}([1/\sqrt{2},-1/\sqrt{2}])
$$

And so when we noramlize this we get that $\lambda^{k}([1/\sqrt{2},-1/\sqrt{2}])$ it will simply be equal to $[1/\sqrt{2},-1/\sqrt{2}]$ since the norm of $[1/\sqrt{2},-1/\sqrt{2}] = 1$. 

This means that as k increase the eigenvector $\mathbf z^{(k)}$ will simply alternate between $[1/\sqrt{2},1/\sqrt{2}]$ and $[1/\sqrt{2},-1/\sqrt{2}]$.

Now lets focus on $\lambda^{(k)} \mathbf z^{(k)}$, which is the other part in the norm. We know that we calulate $\lambda^{(k)}$ using the Rayleigh coefficeint in the PI algorith. What is also important to note is that we know that $\mathbf z^{(k)}$ is equivalent to $\mathbf A^{(k)}\mathbf z^{(0)}$, since in the power algorithm we are conitnously multiplying it by A and normalzing.

$$ 
\lambda^{(k)} =
r_{\mathbf A}( \mathbf z^{(k)}) = 
\frac{ \mathbf z^{(k)T} \mathbf A  \mathbf z^{(k)}}{\mathbf z^{(k)T} \mathbf z^{(k)}} = 
\frac{ (\mathbf A^{(k)}\mathbf z^{(0)})^T \mathbf A  \mathbf A^{(k)}\mathbf z^{(0)}}{1} =
(\mathbf z^{(0)})^T (\mathbf A^{(k)})^T \mathbf A^{(k+1)}\mathbf z^{(0)}
$$

Since $\mathbf A$ is symmetric we know that $\mathbf A^T = \mathbf A$, therefore $\mathbf (A^{(k)})^T = \mathbf A^{(k)}$. So we get that:

$$ 
\lambda^{(k)} =
(\mathbf z^{(0)})^T (\mathbf A^{(k)})^T \mathbf A^{(k+1)}\mathbf z^{(0)} = 
(\mathbf z^{(0)})^T \mathbf A^{(k)} \mathbf A^{(k+1)}\mathbf z^{(0)} = 
(\mathbf z^{(0)})^T\mathbf A^{(2k+1)}\mathbf z^{(0)}
$$

Again we can write $\mathbf z^{(0)}$ as a sum of the eigenvectors of $\mathbf A$ to get:

$$ 
\lambda^{(k)} =
(\mathbf z^{(0)})^T\mathbf A^{(2k+1)}\mathbf z^{(0)} = 
(\sum_{i=1}^{n}c_i\mathbf x_i^T )\sum_{i=1}^{n}\mathbf A^{(2k+1)} c_i\mathbf x_i  = 
(\sum_{i=1}^{n}c_i\mathbf x_i^T )\sum_{i=1}^{n}c_i\lambda_i^{(2k+1)} \mathbf x_i 
$$

Since we know that $\lambda_1 = 2$ and $\lambda_2 = -2$, since $2k+1$ will always be odd for all $k$ we know that $\lambda_1^{(2k+1)} = 2^{2k+1}$ and $\lambda_2^{(2k+1)} = (-2)^{2k+1} = -(2)^{2k+1}$. So the above equation can be written as for $n=2$ as: 

$$ 
\lambda^{(k)} =
(\sum_{i=1}^{n}c_i\mathbf x_i^T )\sum_{i=1}^{n}c_i\lambda_i^{(2k+1)} \mathbf x_i =
(c_1\mathbf x_1^T + c_2\mathbf x_2^T)(c_1\lambda_1^{(2k+1)} \mathbf x_1 + c_2\lambda_2^{(2k+1)} \mathbf x_2)
$$

$\mathbf x_i^T \mathbf x_j = 0$ for $i \ne j$ otherwise it is equal to 1. So we have:

$$ 
\lambda^{(k)} =
(c_1\mathbf x_1^T + c_2\mathbf x_2^T)(c_1\lambda_1^{(2k+1)} \mathbf x_1 + c_2\lambda_2^{(2k+1)} \mathbf x_2) = 
c_1^2\lambda_1^{(2k+1)} + c_2^2\lambda_2^{(2k+1)}
$$

We know that $c_1 = c_2 = 1/\sqrt{2}$ and so the lambda terms will determine $\lambda^{(k)}$ but since $\lambda_1^{(2k+1)} = 2^{2k+1}$ and $\lambda_2^{(2k+1)} = (-2)^{2k+1} = -(2)^{2k+1}$, then $\lambda_1^{(2k+1)} + \lambda_1^{(2k+1)} = 0$, so:

$$ 
\lambda^{(k)} =
c_1^2\lambda_1^{(2k+1)} + c_2^2\lambda_2^{(2k+1)} = 
(1/\sqrt{2})^2(\lambda_1^{(2k+1)} + \lambda_2^{(2k+1)}) = 
(1/\sqrt{2})^2(0) = 
0
$$

Thereore we have seen that $\lambda^{(k)} = 0$ for all k, and so $\lambda^{(k)} \mathbf z^{(k)} = 0$, so we can reduce the norm in the while loop to $\|\mathbf A \mathbf z^{(k)}\|_2$ but since $\mathbf z^{(k)}$ just alternates between $[1/\sqrt{2},1/\sqrt{2}]$ and $[1/\sqrt{2},-1/\sqrt{2}]$. We know that $\|\mathbf A \mathbf z^{(k)}\|_2$ always equal to 2 which is always greater than $\varepsilon$ and so the PI algorithm never converges.

And so when we do the PI algorithm on this particular set of $\mathbf A $ and $\mathbf z^{(k)}$ we will never get a solution since the while loop will always be true.


# Question 2 [25 marks]

Let $\mathbf A \in \mathbb R^{n \times n}$ be symmetric, with eigenvalues $|\lambda_1| > |\lambda_2| > |\lambda_3| \geq \dots \geq |\lambda_n|$. Denote by $\mathbf x_1, \dots, \mathbf x_n$ a corresponding set of eigenvectors, chosen as an orthonormal set.

Suppose we have already computed the dominating eigenpair $\lambda_1$ and $\mathbf x_1$, and we now want to compute $\lambda_2$ and $\mathbf x_2$. This can be done with one of the following approaches.

**Approach 1** Apply Algorithm PI to $\mathbf A$, with starting guess $\mathbf z^{(0)} = (\mathbf A - \lambda_1 \mathbf I) \mathbf u$, for some $\mathbf u \in \mathbb R^{n} \setminus \{\mathbf 0\}$ and tolerance $\varepsilon$.

**Approach 2** Apply Algorithm PI to $\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}$, with starting guess $\mathbf z^{(0)} \in \mathbb R^{n} \setminus \{\mathbf 0\}$ and tolerance $\varepsilon$.

### Question 2.1

Explain why Approach 1 converges to $\lambda_2$ and $\mathbf x_2$. What condition do you need on $\mathbf u$ to guarantee convergence as $k \rightarrow \infty$?

*You do not need to provide a full proof. Your argument should contain a similar amount of detail to the discussion at the beginning of Section 6.1 and the accompanying lecture video.*

**[5 marks]**

**Answer to Q2.1**

We know that our starting guess is $\mathbf z^{(0)} = (\mathbf A - \lambda_1 \mathbf I) \mathbf u$, for some $\mathbf u \in \mathbb R^{n} \setminus \{\mathbf 0\}$ and tolerance $\varepsilon$.

We can write $\mathbf u$ as a linear combination of all the eignvectors of $\mathbf A$. Let $\mathbf u = \sum_{i=1}^{n}c_i\mathbf x_i$. We know that multiplying $\mathbf z^{(0)}$ repeatdely by $\mathbf A$ gives:

$$\mathbf A^k \mathbf z^{(0)} = \mathbf A^k(\mathbf A - \lambda_1 \mathbf I) \mathbf u = (\mathbf A^k\mathbf A -  \mathbf A^k\lambda_1)\mathbf u  = (\mathbf A^{k+1} -  \mathbf A^k\lambda_1)\mathbf u $$

Now we substitue $\mathbf u$ for the linear combination of eigenvectors of $\mathbf A$, $\sum_{i=1}^{n}c_i\mathbf x_i$ and we get:

$$\mathbf A^k \mathbf z^{(0)} = 
\sum_{i=1}^{n} (\mathbf A^{k+1} -  \mathbf A^k\lambda_1)c_i\mathbf x_i = 
\sum_{i=1}^{n} \mathbf A^{k+1}c_i\mathbf x_i - \mathbf A^k\lambda_1c_i\mathbf x_i  = 
\sum_{i=1}^{n} \mathbf \lambda_{i}^{k+1}c_i\mathbf x_i - \mathbf \lambda_i^k\lambda_1c_i\mathbf x_i = 
\sum_{i=1}^{n} \lambda_{i}^{k}\lambda_{i}c_i\mathbf x_i - \mathbf \lambda_i^k\lambda_1c_i\mathbf x_i = 
\sum_{i=1}^{n} (\lambda_{i} - \lambda_{1})\lambda_{i}^{k}c_i\mathbf x_i $$


What is now important to see is that $\lambda_{i} - \lambda_{1} = 0$ when $\lambda_{i} = \lambda_{1}$ and so we can reduce the summantion to ignore the first eigenvalue. So we have that $\mathbf A^k \mathbf z^{(0)} = \sum_{i=2}^{n} (\lambda_{i} - \lambda_{1})\lambda_{i}^{k}c_i\mathbf x_i$. Since the eigenvalues are unique and we are considering the eigenvalues starting from $i = 2$, and we know that $|\lambda_1| > |\lambda_2| > |\lambda_3| \geq \dots \geq |\lambda_n|$ so means that $\lambda_2$ will be the greatest eignevlaue in this summation so as $k \rightarrow \infty$ the sum will be dominated by the second eigenvalue, and as a result approach 1 will converge to $\lambda_2$ and $\mathbf x_2$.

The condition on $\mathbf u$ for this to guarantee convergence is that $\mathbf u$ cannot be orthogonal to $\mathbf x_2$ since the inner product will be 0 and so this is to ensure that $c_2$ is non zero which means $\mathbf x_2$ will be in the expression $\mathbf A^k \mathbf z^{(0)}$.

To see that $\mathbf x_2^T\mathbf u \ne 0$ ensures that $c_2$ is present we can see that $\mathbf x_2^T\mathbf u = \mathbf x_2^T\sum_{i=1}^{n}c_i\mathbf x_i = \sum_{i=1}^{n}c_i x_2^T\mathbf x_i = c_2$ by the properties of orthonormal vecotors and since $\mathbf x_2^T\mathbf u = c_2$ and $\mathbf x_2^T\mathbf u \ne 0$ then $c_2 \ne 0$. Which means $c_2$ will be in $\mathbf A^k \mathbf z^{(0)}$ which in turn means that $(\lambda_2)^k\mathbf x_2$ is in $\mathbf A^k \mathbf z^{(0)}$ and  $(\lambda_2)^k$ will be present in the equation and will be the dominating eigenvalue, so there will be convergence.

### Question 2.2

What are the eigenvalues and corresponding eigenvectors of the matrix $\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}$?

**[5 marks]**

**Answer to Q2.2**

We have $\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}$ so what we can do to find the eigenvalue of any vector solves $\mathbf A\mathbf x = \lambda \mathbf x$.

We can start with $\mathbf x_1$ and see what we get using that since we know that $\mathbf A\mathbf x_1 = \lambda_1 \mathbf x_1$ and that $\mathbf x_1^\mathrm{T}\mathbf x_1 = 1$ since the eigenvectors are orthanormal.

$$(\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T})\mathbf x_1 = 
\mathbf A \mathbf x_1 - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}\mathbf x_1 = 
 \lambda_1 \mathbf x_1 -  \lambda_1 \mathbf x_1 = 0.
$$

We can see that $(\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T})\mathbf x_1 = 0$ which means that since $\mathbf x_1$ is not 0 that the eigenvalue of $\mathbf x_1$ must be 0.

Lets now look at the other eigenvectors, $\mathbf x_i$ where $ i > 1$. The we have that:

$$(\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T})\mathbf x_i = 
\mathbf A \mathbf x_i - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}\mathbf x_i = 
 \lambda_i \mathbf x_i -  \lambda_1 \mathbf x_1(0) = \lambda_i \mathbf x_i.
$$

We know that  $\mathbf x_1^\mathrm{T}\mathbf x_i = 0$ for all i since they are orthogonal vectors. So now that we know that $(\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T})\mathbf x_i = \lambda_i \mathbf x_i.$ This means it satisfies the eigenvector eigenvalue equation which means that that the eigenvectors $\mathbf x_i$ with the associated eigenvalue $\lambda_i$ are the eignvalues and eigenvectors of $\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}$ which are the same as the eigenvalues and eigenvectors of $\mathbf A$. With the only change that the eigenvalue of $\mathbf x_1$ is 0.


So the eigenvectors and eigenvalues of $\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}$ are: $\{\mathbf x_1, \mathbf x_2,..., \mathbf x_n\}$ with corresponding eigenvalues: $\{0,\lambda_2,...,\lambda_n\}$.

We can show this is the case if we use the rayleigh coefficient on this matrix for any eigenvector $ \mathbf x_i$:

$$ 
\lambda =
r_{(\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T})}( \mathbf x_i) = 
\frac{ \mathbf x_i^T(\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T})\mathbf x_i}{\mathbf x_i^T \mathbf x_i} = 
\frac{ \mathbf x_i^T\mathbf A\mathbf x_i - \lambda_1 \mathbf x_i^T\mathbf x_1 \mathbf x_1^\mathrm{T}\mathbf x_i}{\mathbf x_i^T \mathbf x_i} =
\frac{ \lambda_i\mathbf x_i^T\mathbf x_i - \lambda_1 \mathbf x_i^T\mathbf x_1 \mathbf x_1^\mathrm{T}\mathbf x_i}{\mathbf x_i^T \mathbf x_i} = 
\lambda_i- \lambda_1 \mathbf x_i^T\mathbf x_1 \mathbf x_1^\mathrm{T}\mathbf x_i
$$

CASE 1: We can see that if $ \mathbf x_i = \mathbf x_1$ then the expression evaluates to: $\lambda_1- \lambda_1 \mathbf x_1^T\mathbf x_1 \mathbf x_1^\mathrm{T}\mathbf x_1 =\lambda_1- \lambda_1 (1)(1) = \lambda_1- \lambda_1 = 0$ using  the fact that $\mathbf x_1^\mathrm{T}\mathbf x_1$  = 1. This is exactly the eigenvalue of $\mathbf x_1$ stated before.

CASE 2: The second case would be that $ \mathbf x_i \ne \mathbf x_1$ using the fact that $ \mathbf x_1^\mathrm{T}\mathbf x_i$  = 0 when $i \ne 1$. The expression evaluates to: $\lambda_i- \lambda_1 \mathbf x_i^T\mathbf x_1 \mathbf x_1^\mathrm{T}\mathbf x_i = \lambda_i- \lambda_i (0)(0) = \lambda_1- 0 = \lambda_i$, which are the eigenvalues for all the other eigenvectors $ \mathbf x_i \ne \mathbf x_1$.

### Question 2.3

Using Question 2.2 or otherwise, explain why Approach 2 converges to $\lambda_2$ and $\mathbf x_2$. What condition do you need on $\mathbf z^{(0)}$ to guarantee convergence as $k \rightarrow \infty$?

**[3 marks]**

**Answer to Q2.3**

Let $\mathbf B = \mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}$, and let $\mathbf z^{(0)} = \sum_{i=1}^{n}c_i\mathbf x_i$, that is $\mathbf z^{(0)}$ is a linear combination of the orthanormal eigenvectors of $\textbf A$. 

Then repeatedly multiplying $\mathbf z^{(0)}$ by $\textbf B$ gives us:

$$\mathbf B^k \mathbf z^{(0)} = 
\sum_{i=1}^{n}c_iB^k\mathbf x_i = 
\sum_{i=1}^{n}c_i(\lambda_i)^k\mathbf x_i
$$

We can see that for $i = 1$ we have that $\lambda_1 = 0$, and so the dominating will not be the first one but will be the second one since $|\lambda_2| > |\lambda_3| \geq \dots \geq |\lambda_n|$ as a result the dominating eigenvalue will be the second which corresponds to the eigenvector associated with $\lambda_2$ which from Quesion 2.2 is $\mathbf x_2$.

As a result using the PI algorithm on this new matrix does indeed converge to $\lambda_2$ and $\mathbf x_2$.

The condition we need on $\mathbf z^{(0)}$ to guarantee convergence as $k \rightarrow \infty$ is that $\mathbf z^{(0)}$ cannot be orthogonal to $\mathbf x_2$ since then the inner product will be 0, this cannot happen this is to ensure that $c_2$ is non zero which means $\mathbf x_2$ will be in the expression $\mathbf B^k \mathbf z^{(0)}$.

To see that $\mathbf x_2^T\mathbf z^{(0)} \ne 0$ ensures that $c_2$ is present we can see that $\mathbf x_2^T\mathbf z^{(0)} = \mathbf x_2^T\sum_{i=1}^{n}c_i\mathbf x_i = \sum_{i=1}^{n}c_i x_2^T\mathbf x_i = c_2$ by the properties of orthonormal vecotors and since $\mathbf x_2^T\mathbf z^{(0)} = c_2$ and $\mathbf x_2^T\mathbf z^{(0)} \ne 0$ then $c_2 \ne 0$. Which means $c_2$ will be in $\mathbf B^k \mathbf z^{(0)}$ which in turn means that $(\lambda_2)^k\mathbf x_2$ is in $\mathbf B^k \mathbf z^{(0)}$ and  $(\lambda_2)^k$ will be present in the equation. Since it is the dominating eigenvalue of $\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}$ by Question 2.2, this guarantees convergence.


### Question 2.4

Suppose $|\lambda_1| > |\lambda_2| > |\lambda_3| > \dots > |\lambda_n|$, and suppose we want to compute all eigenvalues, starting with $\lambda_1$.

Describe how you can do this with Algorithm PI, by generalising the idea in 

(i) Approach 1,

(ii) Approach 2.

**[8 marks]**

**Answer to Q2.4**

*(i) Approach 1: Apply Algorithm PI to $\mathbf A$, with starting guess $\mathbf z^{(0)} = (\mathbf A - \lambda_1 \mathbf I) \mathbf u$, for some $\mathbf u \in \mathbb R^{n} \setminus \{\mathbf 0\}$ and tolerance $\varepsilon$.*

What we can do to generalize this is multiply the original guess $\mathbf z^{(0)}$ by $(\mathbf A - \lambda_i \mathbf I)$ for i up to, but not including, the eigenvalue you want. For example if you want $\lambda_3$ then youre initial guess for $\mathbf z^{(0)}$ would be $(\mathbf A - \lambda_1 \mathbf I)(\mathbf A - \lambda_2 \mathbf I) \mathbf u$. I will show this is true for when $\mathbf z^{(0)} = (\mathbf A - \lambda_1 \mathbf I)(\mathbf A - \lambda_2 \mathbf I) \mathbf u$.

We can write $\mathbf u$ as a linear combination of all the eignvectors of $\mathbf A$. Let $\mathbf u = \sum_{i=1}^{n}c_i\mathbf x_i$. We know that multiplying $\mathbf z^{(0)}$ repeatdely by $\mathbf A$ gives:

$$\mathbf A^k \mathbf z^{(0)} = 
\mathbf A^k(\mathbf A - \lambda_1 \mathbf I)(\mathbf A - \lambda_2 \mathbf I) \mathbf u = 
\mathbf A^k(\mathbf A^2 - \lambda_1\mathbf A - \lambda_2\mathbf A  + \lambda_1\lambda_2\mathbf I)\mathbf u=
(\mathbf A^{k+2} - \lambda_1\mathbf A^{k+1} - \lambda_2\mathbf A^{k+1} + \lambda_1\lambda_2\mathbf A^{k})\mathbf u $$

Now we substitue $\mathbf u$ for the linear combination of eigenvectors of $\mathbf A$, $\sum_{i=1}^{n}c_i\mathbf x_i$ and we get:
$$\sum_{i=1}^{n} (\mathbf A^{k+2} - \lambda_1\mathbf A^{k+1} - \lambda_2\mathbf A^{k+1} + \lambda_1\lambda_2\mathbf A^{k})c_i\mathbf x_i$$

Here we multiply in $c_i\mathbf x_i$ to get:
$$\sum_{i=1}^{n} \mathbf A^{k+2}c_i\mathbf x_i - \lambda_1\mathbf A^{k+1}c_i\mathbf x_i - \lambda_2\mathbf A^{k+1}c_i\mathbf x_i + \lambda_1\lambda_2\mathbf A^{k}c_i\mathbf x_i$$ 

Here we can use the property of eigenvalues since we know that $\mathbf A^{k}\mathbf x_i = \lambda_i^{k}\mathbf x_i$ to get:
$$\sum_{i=1}^{n} c_i\lambda_i^{k+2}\mathbf x_i - c_i\lambda_1\lambda_i^{k+1}\mathbf x_i - c_i\lambda_2\lambda_i^{k+1}\mathbf x_i + c_i\lambda_1\lambda_2\lambda_i^{k}\mathbf x_i$$ 
$$\sum_{i=1}^{n} c_i\lambda_i^{k}\lambda_i^{2}\mathbf x_i - c_i\lambda_1\lambda_i^{k}\lambda_i^{1}\mathbf x_i - c_i\lambda_2\lambda_i^{k}\lambda_i^{1}\mathbf x_i + c_i\lambda_1\lambda_2\lambda_i^{k}\mathbf x_i$$ 

Factorize out  $ c_i\mathbf x_i\lambda_i^{k}$ to get:
$$\sum_{i=1}^{n} c_i\mathbf x_i\lambda_i^{k}(\lambda_i^{2} -\lambda_1\lambda_i - \lambda_2\lambda_i+ \lambda_1\lambda_2)$$ 
$$\sum_{i=1}^{n} c_i\mathbf x_i\lambda_i^{k}(\lambda_i -\lambda_1)(\lambda_i - \lambda_2)$$ 

Therefore we can see like before that for i = 1 and i = 2 the entries will be equal to zero since the terms $(\lambda_i -\lambda_1)(\lambda_i - \lambda_2)$ will be equal to zero and so they will not be in the final sum and so the greatest dominating eigenvalue will be $\lambda_3$ in this case since it will be the first non zero entry of the sum which is greater than all the other eigenvalues remaining.

Therefore we can genralize this, if you want the $n$th eigenvalue then we would let $\mathbf z^{(0)} = \prod_i^{n-1}(\mathbf A - \lambda_i \mathbf I)\mathbf u$ since when we evaluate the expression of $\mathbf A^k\mathbf z^{(0)}$ it will in be in the form of $\mathbf A^k \mathbf z^{(0)} =\sum_{i=1}^{n} (c_i\mathbf x_i\lambda_i^{k}\prod_j^{n-1}(\lambda_i -\lambda_j))$ and so this would mean that the first n-1 eigenvalues would cancel and as a result we get that the first non zero term will be the nth eigenvalue which will be the dominating one in this case.

*(ii) Approach 2: Apply Algorithm PI to $\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}$, with starting guess $\mathbf z^{(0)} \in \mathbb R^{n} \setminus \{\mathbf 0\}$ and tolerance $\varepsilon$.*

For approach 2 we can subtract $\sum_{i=1}^n\lambda_i \mathbf x_i \mathbf x_i^\mathrm{T}$ from $\mathbf A$, if we want the $n+1$th eigenvalue. This is because if we do the same analysis as before where we have the matrix $\mathbf A - \sum_{i=1}^n\lambda_i \mathbf x_i \mathbf x_i^\mathrm{T}$, we can see that for all $x_j$ for $ 1 \leq j \leq n$ we have that:

$$(\mathbf A - \sum_{i=1}^n\lambda_i \mathbf x_i \mathbf x_i^\mathrm{T})\mathbf x_j = 
\mathbf A\mathbf x_j - \sum_{i=1}^n\lambda_i \mathbf x_i \mathbf x_i^\mathrm{T}\mathbf x_j = 
\lambda_j \mathbf x_j - \lambda_j \mathbf x_j = 0
$$

We are using the fact that $\mathbf x_i^T\mathbf x_j = 1$ if  $i=j$ otherwise 0 this is because the eigenvalues of $\textbf A$ are orthanormal.

We can see that the eigenvalues for the eigenvectors up to $\mathbf x_j$ will be equal to 0 since $(\mathbf A - \sum_{i=1}^n\lambda_i \mathbf x_i \mathbf x_i^\mathrm{T})\mathbf x_j= 0\mathbf x_j$ and so the dominiating eigenvalue in the expression $\mathbf B^k \mathbf z^{(0)}$ where $\mathbf B = \mathbf A - \sum_{i=1}^n\lambda_i \mathbf x_i \mathbf x_i^\mathrm{T}$ will be the first eigenvalue which is not zero which is $\mathbf x_{n+1}$.

So for getting the first eigenvalue $\lambda_1$ we start with by using PI algorithm on A. For the second eigenvalue $\lambda_2$ use PI algorithm on $\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}$, for the thrid eigenvalue use $\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T} - \lambda_2 \mathbf x_2 \mathbf x_2^\mathrm{T} $, this can be generalzied for any eigenvalue:


If we want the $j+1$th eigenvalue then we can use algorithm PI on the matrix $\mathbf A - \sum_{i=1}^j\lambda_i \mathbf x_i \mathbf x_i^\mathrm{T}$ this will always evalute the $j+1$th eigenvalue since by construction of the new matrix the first j eigenvalues will be zero so the dominating one will be the $j+1$th which corresponds to the eignevalue we want.

### Question 2.5

In practice, we will only be able to compute approximate values $\lambda^{(k)} \approx \lambda_1$ and $\mathbf z^{(k)} \approx \sigma_k \mathbf x_1$, where $\sigma_k \in \{-1,1\}$.

What influence could this have on the convergence of Approaches 1 and 2 for computing $\lambda_2$ and $\mathbf x_2$?

**[4 marks]**

**Answer to Q2.5**

For *approach 1*, the issue would be that since we know $\mathbf A^k \mathbf z^{(0)} = \sum_{i=1}^{n} (\lambda_{i} - \lambda_{1})\lambda_{i}^{k}c_i\mathbf x_i$ then if $\lambda^{(k)} \approx \lambda_1$ then the issue would rise for when i = 1 since $(\lambda_{i} - \lambda_{1})$ is not necessarly equal to 0 since the value for $\lambda_1$ is just an approximation. This would mean that in $\sum_{i=1}^{n} (\lambda_{i} - \lambda_{1})\lambda_{i}^{k}c_i\mathbf x_i$ the dominating term will still be the first eigenvalue expression since $(\lambda_{i} - \lambda_{1}) \ne 0$ so the dominating term $\lambda_{i}^{k}$ will be when i = 1. This is becasue we know that $|\lambda_1| > |\lambda_2| > |\lambda_3| \geq \dots \geq |\lambda_n|$ and as a result this initial guess would not allow us to calculate the value for $\lambda_2$ which is the one we desire but instead would just calulate first eigenvalue and corresponding eigenvector. And so this would mean that the algorithm PI will converge to $\lambda_1$ and $\mathbf x_1$ instead of $\lambda_2$ and $\mathbf x_2$.

Likewise for *approach 2* when we want to calculate the eigenvalue of $(\mathbf A - \lambda_1' \mathbf x_1 \mathbf x_1^\mathrm{T})\mathbf x_1$ using the approximation of $\lambda_1 = \lambda_1' $ we get that:

$$(\mathbf A - \lambda_1' \mathbf x_1 \mathbf x_1^\mathrm{T})\mathbf x_1 = 
\mathbf A \mathbf x_1 - \lambda_1' \mathbf x_1 \mathbf x_1^\mathrm{T}\mathbf x_1 = 
 \lambda_1 \mathbf x_1 -  \lambda_1' \mathbf x_1 \ne 0.
$$

This is because $\lambda_1 \mathbf x_1 -  \lambda_1' \mathbf x_1 = (\lambda_1 -\lambda_1')\mathbf x_1$, but again the difference is not always equal to 0, $\lambda_1 -\lambda_1' \ne 0$ therefore the eigenvalues of $\mathbf A - \lambda_1 \mathbf x_1 \mathbf x_1^\mathrm{T}$ are not 0 for the first eigenvector and so when we calulate the value of  $\mathbf A^k \mathbf z^{(0)}$ then the dominating term will still be the first eigenvector. And so the PI algorithm will still converge to $\lambda_1$ and $\mathbf x_1$ instead of $\lambda_2$ and $\mathbf x_2$.



# Question 3 [35 marks]

Let $\mathbf A \in \mathbb R^{n \times n}$ be symmetric, and suppose $\mathbf A$ diagonalises as $\mathbf A = \mathbf Q \mathbf D \mathbf Q^{-1}$, where the diagonal matrix $\mathbf{D} \in \mathbb R^{n \times n}$ has the eigenvalues of $\mathbf A$ as diagonal entries, and the columns of the orthogonal matrix $\mathbf{Q} \in \mathbb R^{n \times n}$ are the corresponding eigenvectors of $\mathbf{A}$, in the same order. 

Consider the matrix $\mathbf{A} + \mathbf{\Delta A} \in \mathbb R^{n \times n}$, and assume for simplicity that the matrix $\mathbf{\Delta A}$ is such that all eigenvalues of $\mathbf{A} + \mathbf{\Delta A}$ are real. We have the following theorem.

**Theorem 1** If $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$, then $\mathbf A$ has an eigenvalue $\lambda + \Delta \lambda$ with $|\Delta \lambda| \leq \|\mathbf{\Delta A}\|_2$.

The theorem is useful for example when computing eigenvalues in floating point arithmetic, ensuring that a small rounding error in storing the matrix $\mathbf A$ leads to small errors in the corresponding eigenvalues.

### Question 3.1

*You may use, without proof, that the eigenvalues of a diagonal matrix are its diagonal entries, and that the inverse of a diagonal matrix is the diagonal matrix with reciprocal diagonal elements.*

Prove Theorem 1 by performing the following steps:	

(i) Show that $\|\mathbf B\|_2 = \max_{i=1, \dots, n} |b_{ii}|$, for any diagonal matrix $\mathbf B \in \mathbb R^{n \times n}$. **[3 marks]**

(ii) Using part (i), show that if $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$, then $\mathbf A$ has an eigenvalue $\lambda + \Delta \lambda$, with $\Delta \lambda$ either such that $\Delta \lambda = 0$ or such that $|\Delta \lambda| = \|(\lambda \mathbf I - \mathbf D)^{-1}\|_2^{-1}$. **[6 marks]**

(iii) Show that if $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$, then $\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2 \geq \|\mathbf{\Delta A}\|_2^{-1}$, with the convention that $\|\mathbf B^{-1}\|_2 = \infty$ if $\mathbf B$ is singular. **[6 marks]**

(iv) Show that $\|\mathbf B\|_2 = 1$, for any orthogonal matrix $\mathbf B \in \mathbb R^{n \times n}$. **[2 marks]**

(v) Using parts (ii) - (iv), show that if $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$, then $\mathbf A$ has an eigenvalue $\lambda + \Delta \lambda$ with $|\Delta \lambda| \leq \|\mathbf \Delta A\|_2$. **[8 marks]**

**Answer to Q3.1**

*(i) Show that $\|\mathbf B\|_2 = \max_{i=1, \dots, n} |b_{ii}|$, for any diagonal matrix $\mathbf B \in \mathbb R^{n \times n}$.*

We know that by definition that  $\|\mathbf B\|_2 = \sqrt{\rho(\mathbf{B^TB})}$ where $\rho(\mathbf{B^TB})$ = max$\{|\lambda|: \lambda \text{ is an eigenvalue of } \mathbf{B^TB}\}$. Since we know that $\mathbf{B}$ is diagonal then we know that $\mathbf{B} = \mathbf{B^T}$, which means that $\mathbf{B^TB} = \mathbf{BB} = \mathbf{B^2}$.

Therefore the matrix $\mathbf{B^2}$ will also be a diagonal matrix with diagonal entries $b_{ii}^2$ for $i$ up to $n$. Since $\mathbf{B^2}$ is diagonal we know that the eigenvalues of this matrix will be the diagonal entries which in this case are: $b_{ii}^2$.

Therfore we know that $\|\mathbf B\|_2 = \sqrt{\rho(\mathbf{B^TB})} = \sqrt{\rho(\mathbf{B^2})} =  \sqrt{max\{|b_{ii}^2|: b_{ii}^2 \text{ is an eigenvalue of } \mathbf{B^2}\}} = \sqrt{max\{|b_{ii}|^2: b_{ii}^2 \text{ is an eigenvalue of } \mathbf{B^2}\}}$.

Where we know that =  $\sqrt{max\{|b_{ii}|^2: b_{ii}^2 \text{ is an eigenvalue of } \mathbf{B^2}\}} = \max_{i=1, \dots, n} \sqrt{|b_{ii}|^2} =  \max_{i=1, \dots, n} |b_{ii}|$.

Therfore we have shown that $\|\mathbf B\|_2 = \max_{i=1, \dots, n} |b_{ii}|$ and so we are done.

*(ii) Using part (i), show that if $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$, then $\mathbf A$ has an eigenvalue$\lambda + \Delta \lambda$, with $\Delta \lambda$ either such that $\Delta \lambda = 0$ or such that $|\Delta \lambda| = \|(\lambda \mathbf I - \mathbf D)^{-1}\|_2^{-1}$.*

First lets analyse the case when $\Delta \lambda$ = 0 this means that  $\mathbf{A} + \mathbf{\Delta A}$ and $\mathbf{A}$ have the same eigenvalue. This means that $(\mathbf{A} + \mathbf{\Delta A})\mathbf x = \lambda \mathbf x$ and $\mathbf{A}\mathbf x = \lambda \mathbf x$, thereofore: $(\mathbf{A} + \mathbf{\Delta A})\mathbf x = \mathbf{A}\mathbf x $ which means that $\mathbf{A}\mathbf x + \mathbf{\Delta A}\mathbf x = \mathbf{A}\mathbf x $ implies $\mathbf{\Delta A}\mathbf x = 0$ since $\mathbf x$ is non zero then $\mathbf{\Delta A}$ must be 0 and so we get that when $|\Delta \lambda|$ = 0 then $\mathbf{A} + \mathbf{\Delta A}$ = $\mathbf{A}$.

Now we can look at $|\Delta \lambda| = \|(\lambda \mathbf I - \mathbf D)^{-1}\|_2^{-1}$. We can rewrite this expression as $|\Delta \lambda| = \frac{1}{\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2}$.

Looking at the expression $\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2$ we can see that $\lambda \mathbf I - \mathbf D$ will be a diagonal matrix with entries in the form of $\lambda - \mu_{ii}$ where $\mu_{ii}$ is the eigenvalue of $\mathbf{A}$. Thefore since we are taking the inverse of this we know that the eigenvalues of $(\lambda \mathbf I - \mathbf D)^{-1}$ will be $\frac{1}{\lambda - \mu_{ii}}$, since we know that the eigenvalues of an inverse diagonal matrix are the reciprocal of the diagonal entries.

Now we can evaluate $\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2$ since we know that this will be equal to the max of the diagonal by part (i). Thefore in our case we get that $\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2 = \max_{i=1, \dots, n} |\frac{1}{\lambda - \mu_{ii}}|$. We can look closesly at this expression and see that this value will be maxium when $\mu_{ii}$ is closest to $\lambda$. Thefore let $\mu_{ii}'$ be the $\mu_{ii}$ closest to $\lambda$.

Now we have that $\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2 = |\frac{1}{\lambda - \mu_{ii}'}|$, now we can evaluate the expression since:

$$\frac{1}{\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2} =
\frac{1}{|\frac{1}{\lambda - \mu_{ii}'}|} =
|{\lambda - \mu_{ii}'}|
$$

So we have that $|\Delta \lambda| = |{\lambda - \mu_{ii}'}| = |{\mu_{ii}' - \lambda}|$ . By definition of absoulte values we know that it is equivalent to $\sqrt{(\Delta \lambda)^2} = \sqrt{({\mu_{ii}' - \lambda })^2}$ which is equal to $\Delta \lambda = \mu_{ii}'- \lambda $, therfore we have that $\mu_{ii}' =\lambda +\Delta \lambda$. And so since $\mu_{ii}'$ is the eigenvalue of $\textbf A$ closest to $\lambda$ this expression is true for all the eigenvalues. And so we have shown that $\textbf A$ indeed does have eigenvalue $\lambda +\Delta \lambda$.

*(iii) Show that if $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$, then $\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2 \geq \|\mathbf{\Delta A}\|_2^{-1}$, with the convention that $\|\mathbf B^{-1}\|_2 = \infty$ if $\mathbf B$ is singular.*

I will show this directly. Since we know that $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$ then we know that $(\mathbf{A} + \mathbf{\Delta A})\textbf x =\lambda  \textbf x$ for some eigenvector $\textbf x$.

$$(\mathbf{A} + \mathbf{\Delta A})\textbf x =\lambda  \textbf x$$
$$ \mathbf{A}\textbf x + \mathbf{\Delta A}\textbf x =\lambda  \textbf x$$
$$ \mathbf{\Delta A}\textbf x =\lambda  \textbf x - \mathbf{A}\textbf x$$
$$ \mathbf{\Delta A}\textbf x =( \lambda \textbf I   - \mathbf{A})\textbf x$$
$$ ( \lambda \textbf I   - \mathbf{A})^{-1}\mathbf{\Delta A}\textbf x =\textbf I\textbf x$$

Here we assume that $(\lambda \textbf I   - \mathbf{A})$ is non singular since if it was singular then $ \|(\lambda \textbf I   - \mathbf{A})^{-1}\|_2 = \infty$ and we couldn't evaluate the expression. In the last set we know that it must be the case that $ ( \lambda \textbf I   - \mathbf{A})^{-1}\mathbf{\Delta A}=\textbf I$ since $\textbf x$ is just an eigenvector. Thefore we can take the 2-norm to both sides and get:

$$ ( \lambda \textbf I   - \mathbf{A})^{-1}\mathbf{\Delta A}\textbf x =\textbf I\textbf x$$
$$  \|(\lambda \textbf I   - \mathbf{A})^{-1}\mathbf{\Delta A} \|_2=\|\textbf I\|_2$$

We can see that $\|\textbf I\|_2$ will simply be equal to 1 this is because from (i) we know this will be the maximum of the columns of $\textbf I$ which in this case they are all 1, and the matrix is diagonal. Therefore we get:

$$  \|(\lambda \textbf I   - \mathbf{A})^{-1}\mathbf{\Delta A} \|_2=\|\textbf I\|_2$$
$$  \|(\lambda \textbf I   - \mathbf{A})^{-1}\mathbf{\Delta A} \|_2=1 $$

Now we need to split the norm, we can do this by using Exercise 2 (ii) from Workshop 2, which states that $\|\mathbf{AB}\|_2 \leq \|\mathbf{A}\|_2 \|B\|_2$ thefore using this we get that:

$$  \|(\lambda \textbf I   - \mathbf{A})^{-1}\mathbf{\Delta A} \|_2 \leq \|(\lambda \textbf I   - \mathbf{A})^{-1} \|_2 \|\mathbf{\Delta A} \|_2$$ 

And since $  \|(\lambda \textbf I   - \mathbf{A})^{-1}\mathbf{\Delta A} \|_2=1 $ and $ \|(\lambda \textbf I   - \mathbf{A})^{-1}\mathbf{\Delta A} \|_2 \leq \|(\lambda \textbf I   - \mathbf{A})^{-1} \|_2 \|\mathbf{\Delta A} \|_2$  then we get that:

$$  \|(\lambda \textbf I   - \mathbf{A})^{-1} \|_2 \|\mathbf{\Delta A} \|_2 \geq  1$$ 

Simplyfying this further we get that since $ \|\mathbf{\Delta A} \|_2$ is a number :
$$  \|(\lambda \textbf I   - \mathbf{A})^{-1} \|_2 \geq  \frac{1}{ \|\mathbf{\Delta A} \|_2}$$ 
$$  \|(\lambda \textbf I   - \mathbf{A})^{-1} \|_2 \geq  \|\mathbf{\Delta A} \|_2^{-1}$$ 

And so we have shown that if  $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$, then $\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2 \geq \|\mathbf{\Delta A}\|_2^{-1}$.

*(iv) Show that $\|\mathbf B\|_2 = 1$, for any orthogonal matrix $\mathbf B \in \mathbb R^{n \times n}$.*

To see this we can use properties of orthogonal matricies, in particular: if $\textbf B$ is orthogonal then $\textbf B^T = \textbf B^{-1}$ which means that $\textbf B^T\textbf B = \textbf B^{-1}\textbf B = \textbf I$.

So when we use the definition of the 2-norm for matricies we get that $\|\mathbf B\|_2 = \sqrt{\rho(\mathbf{B^TB})} = \sqrt{\rho(\mathbf{B^{-1}B})} = \sqrt{\rho(\mathbf{I})} = \sqrt{max\{|\lambda|: \lambda \text{ is an eigenvalue of } \mathbf{I}\}}$ 

However the eigenvalues of the identiy matrix are all 1 since they are just the entries in the diagonals, so in this case $\lambda = 1$. Thefore we get that:

$$\|\mathbf B\|_2 = \sqrt{max\{|\lambda|: 1 \text{ is an eigenvalue of } \mathbf{I}\}} = \sqrt{1} = 1$$

As a result we get that:
$$\|\mathbf B\|_2 =  1$$

*(v) Using parts (ii) - (iv), show that if $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$, then $\mathbf A$ has an eigenvalue $\lambda + \Delta \lambda$ with $|\Delta \lambda| \leq \|\mathbf \Delta A\|_2$.*

From part (ii) we have that $|\Delta \lambda| = \|(\lambda \mathbf I - \mathbf D)^{-1}\|_2^{-1}$ and from part (iii) we have that $ \|(\lambda \mathbf I - \mathbf A)^{-1}\|_2 \geq \|\mathbf{\Delta A}\|_2^{-1}$, we can rearrange the equation in part (iii) to get the following expression:

$$
\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2 \geq \|\mathbf{\Delta A}\|_2^{-1} \iff
\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2 \geq \frac{1}{\|\mathbf{\Delta A}\|_2} \iff
\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2\|\mathbf{\Delta A}\|_2 \geq 1 \iff 
\|\mathbf{\Delta A}\|_2 \geq \frac{1}{\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2}$$ 

$$\iff$$ 

$$\|\mathbf{\Delta A}\|_2 \geq \|(\lambda \mathbf I - \mathbf A)^{-1}\|_2^{-1}$$

And so we get that $\|\mathbf{\Delta A}\|_2 \geq \|(\lambda \mathbf I - \mathbf A)^{-1}\|_2^{-1} $, which means that $|\Delta \lambda| = \|(\lambda \mathbf I - \mathbf D)^{-1}\|_2^{-1} \leq \|(\lambda \mathbf I - \mathbf A)^{-1}\|_2^{-1} \leq \|\mathbf{\Delta A}\|_2$, therefore if we show that $\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2^{-1} \leq \|(\lambda \mathbf I - \mathbf A)^{-1}\|_2^{-1}$ then we know that $|\Delta \lambda| \leq \|\mathbf{\Delta A}\|_2$ by transitivity. 

What is important to note is that showing $\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2^{-1} \leq \|(\lambda \mathbf I - \mathbf A)^{-1}\|_2^{-1}$  is equivalent to showing $\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2 \leq \|(\lambda \mathbf I - \mathbf D)^{-1}\|_2 $ 

To do this we can focus on $\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2$. Since $\mathbf A$ diagonalises as $\mathbf A = \mathbf Q \mathbf D \mathbf Q^{-1}$, where the diagonal matrix $\mathbf{D} \in \mathbb R^{n \times n}$ has the eigenvalues of $\mathbf A$ as diagonal entries, and the columns of the orthogonal matrix $\mathbf{Q} \in \mathbb R^{n \times n}$ are the corresponding eigenvectors of $\mathbf{A}$, we can rewrite $\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2$ as $\|(\lambda \mathbf I - \mathbf Q \mathbf D \mathbf Q^{-1})^{-1}\|_2$.

We can also let $\textbf I = \mathbf{Q}\mathbf{Q^{-1}}$ since $\mathbf{Q}$ is orthogonal so we can easily get the inverse as $\mathbf{Q^{-1}} = \mathbf{Q^{T}} $. Therefore we can subsitute this in to get:
$$\|(\lambda \mathbf{Q}\mathbf{Q^{-1}}- \mathbf Q \mathbf D \mathbf Q^{-1})^{-1}\|_2$$

Now we can factorize $\mathbf{Q}$ on the left hand side and $\mathbf{Q}^{-1}$ on the right hand side to get:
$$\|(\mathbf{Q}(\lambda \mathbf{Q^{-1}}- \mathbf D \mathbf Q^{-1}))^{-1}\|_2$$
$$\|(\mathbf{Q}(\lambda \mathbf I - \mathbf D)\mathbf{Q^{-1}})^{-1}\|_2$$

Now we can take the inverse of each element inside of the the bracets to get:
$$\|(\mathbf{Q^{-1}}(\lambda \mathbf I - \mathbf D)^{-1}\mathbf{Q}\|_2$$

Here we can use Exercise 2 (ii) from Workshop 2, to split the norm which states that $\|\mathbf{AB}\|_2 \leq \|\mathbf{A}\|_2 \|B\|_2$ therefore this is what we get if we split it twice:

$$\|(\mathbf{Q^{-1}}(\lambda \mathbf I - \mathbf D)^{-1}\mathbf{Q}\|_2\leq 
\|\mathbf{Q^{-1}}\|_2 \|(\lambda \mathbf I - \mathbf D)^{-1}\mathbf{Q}\|_2 \leq 
\|\mathbf{Q^{-1}}\|_2 \|(\lambda \mathbf I - \mathbf D)^{-1}\|_2\|\mathbf{Q}\|_2$$

Which means that $\|(\mathbf{Q^{-1}}(\lambda \mathbf I - \mathbf D)^{-1}\mathbf{Q}\|_2\leq 
\|\mathbf{Q^{-1}}\|_2 \|(\lambda \mathbf I - \mathbf D)^{-1}\|_2\|\mathbf{Q}\|_2$ , however using (iv) we can see that both $\|\mathbf{Q^{-1}}\|_2$ and $\|\mathbf{Q}\|_2$ are equal to 1, since they are both orthgonal matricies where $\|\mathbf{Q^{-1}}\|_2 = \|\mathbf{Q^{T}}\|_2$ which means that it is still orthogonal but transposed. Thefore we have that: 

$$\|(\mathbf{Q^{-1}}(\lambda \mathbf I - \mathbf D)^{-1}\mathbf{Q}\|_2\leq 
\|\mathbf{Q^{-1}}\|_2 \|(\lambda \mathbf I - \mathbf D)^{-1}\|_2\|\mathbf{Q}\|_2  $$
$$\iff$$
$$\|(\mathbf{Q^{-1}}(\lambda \mathbf I - \mathbf D)^{-1}\mathbf{Q}\|_2\leq 
(1)\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2(1)  $$
$$\iff$$
$$\|(\mathbf{Q^{-1}}(\lambda \mathbf I - \mathbf D)^{-1}\mathbf{Q}\|_2\leq 
\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2 $$

However we know that $\|(\mathbf{Q^{-1}}(\lambda \mathbf I - \mathbf D)^{-1}\mathbf{Q}\|_2 = \|(\lambda \mathbf I - \mathbf A)^{-1}\|_2$ by construction therefore we have that:

$$\|(\lambda \mathbf I - \mathbf A)^{-1}\|_2\leq 
\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2 $$

Since we have shown the above then this is equivalent to $\|(\lambda \mathbf I - \mathbf D)^{-1}\|_2^{-1} \leq \|(\lambda \mathbf I - \mathbf A)^{-1}\|_2^{-1}$ is true which makes the follwoing inequality true $|\Delta \lambda| = \|(\lambda \mathbf I - \mathbf D)^{-1}\|_2^{-1} \leq \|(\lambda \mathbf I - \mathbf A)^{-1}\|_2^{-1} \leq \|\mathbf{\Delta A}\|_2$ therefore if $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$ then $|\Delta \lambda|\leq \|\mathbf{\Delta A}\|_2$.

### Question 3.2

Show that the estimate in Theorem 1 is not always sharp, by finding a particular example of matrices $\mathbf A$ and $\mathbf{\Delta A}$ that satisfy the assumptions given in the beginning of this question, but for which $|\Delta \lambda| < \|\mathbf {\Delta A}\|_2$. Justify your example.

**[5 marks]**

**Answer to Q3.2**

We know that: If $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$, then $\mathbf A$ has an eigenvalue $\lambda + \Delta \lambda$ with $|\Delta \lambda| \leq \|\mathbf{\Delta A}\|_2$. Therefore we can interpet  $\mathbf{A} + \mathbf{\Delta A}$  as the "computed" version of  $\mathbf{A}$ resulting from storing  $\mathbf{A}$ in floating point arithmetic.

In this case we will let $\mathbf{A}$ be a 2 by 2 matrix which is symetric and diagonalizes to $\mathbf{IAI^{-1}}$:

$$\mathbf{A} =
			\begin{bmatrix}
				1*10^{32}+1*10^{16}	&	0	\\
				0	&	1 + 1*10^{-16}		\\
			\end{bmatrix}.
		$$
        
When we store $\mathbf{A}$ in floating point arithmetic it will be:

$$\mathbf{A} + \mathbf{\Delta A} =
			\begin{bmatrix}
				1*10^{32}	&	0	\\
				0	&	1 	\\
			\end{bmatrix}.
$$
And so in this case $\mathbf{\Delta A}$ will be:
$$\mathbf{\Delta A} =
			\begin{bmatrix}
				-1*10^{16}	&	0	\\
				0	&	-1*10^{-16}	 	\\
			\end{bmatrix}.
$$

Therefore $\lambda$ the eigenvalues of $\mathbf{A} + \mathbf{\Delta A}$, will be $1*10^{32}$ and 1, and the eigenvalue of $\mathbf A$  ($\lambda + \Delta \lambda$) would have values $1*10^{32}+1*10^{16}$ and $1 + 1*10^{-16}$. This is because they are both diagaonal matricies and the eigenvalues are just the entries on the diagonals.

Firstly what we can see is that $\|\mathbf{\Delta A}\|_2$ is simply equal to $\|-1*10^{16}\| = |-1*10^{16}| = 1*10^{16}$, since by Question 3.1 (i) we know that this just evaluates to the max of the diagonal entries since $\mathbf{\Delta A}$ is a diagonal matrix.


We can see that $|(\lambda + \Delta \lambda) - \lambda| = |\Delta \lambda| $ in our case we can split this into two:

(1) $|\Delta \lambda| = |1*10^{32}+1*10^{16} - 1*10^{32}| = |1*10^{16}| = 1*10^{16}$.

(2) $|\Delta \lambda| = |1+1*10^{-16} - 1| = |1*10^{-16}| = 1*10^{-16}$.


Now we can evlauate $|\Delta \lambda|  \leq \|\mathbf{\Delta A}\|_2$ for the first case we have that $|\Delta \lambda|  = \|\mathbf{\Delta A}\|_2$ since $1*10^{16} = 1*10^{16}$. But in the second case we have that $|\Delta \lambda|  \leq \|\mathbf{\Delta A}\|_2$ = $1*10^{-16}  \leq 1*10^{16} \implies 1*10^{-16} < 1*10^{16}$. And so in this case $\|\mathbf{\Delta A}\|_2$ is not a sharp upper bound on the error of $|\Delta \lambda|$


In [7]:
A = np.array([[1*10**(32)+1*10**(16), 0], 
              [0,1+1*10**(-16)]], dtype=float)

print(f"This is how A is stored: \n{A}")

This is how A is stored: 
[[1.e+32 0.e+00]
 [0.e+00 1.e+00]]


### Question 3.3

Show that the estimate in Theorem 1 can be sharp, by finding a particular example of matrices $\mathbf A$ and $\mathbf{\Delta A}$ that satisfy the assumptions given in the beginning of this question and for which $|\Delta \lambda| = \|\mathbf {\Delta A}\|_2$. Justify your example.

**[5 marks]**

**Answer to Q3.3**

We know that: If $\lambda$ is an eigenvalue of $\mathbf{A} + \mathbf{\Delta A}$, then $\mathbf A$ has an eigenvalue $\lambda + \Delta \lambda$ with $|\Delta \lambda| \leq \|\mathbf{\Delta A}\|_2$. Therefore we can interpet  $\mathbf{A} + \mathbf{\Delta A}$  as the "computed" version of  $\mathbf{A}$ resulting from storing  $\mathbf{A}$ in floating point arithmetic.

In this case we will let $\mathbf{A}$ be a 2 by 2 matrix which is symetric and which diagonalizes to $\mathbf{IAI^{-1}}$:

$$\mathbf{A} =
			\begin{bmatrix}
				1+1*10^{16}	&	0	\\
				0	&	1 + 1*10^{16}		\\
			\end{bmatrix}.
		$$
        
When we store $\mathbf{A}$ in floating point arithmetic it we will have:

$$\mathbf{A} + \mathbf{\Delta A} =
			\begin{bmatrix}
				1*10^{16}	&	0	\\
				0	&	1*10^{16} 	\\
			\end{bmatrix}.
$$
And so in this case $\mathbf{\Delta A}$ will be:
$$\mathbf{\Delta A} =
			\begin{bmatrix}
				-1	&	0	\\
				0	&	-1	 	\\
			\end{bmatrix}.
$$

Therefore $\lambda$ the eigenvalues of $\mathbf{A} + \mathbf{\Delta A}$, will be $1*10^{16}$ for both, and the eigenvalues of $\mathbf A$  ($\lambda + \Delta \lambda$) will be $1+1*10^{16}$.

We can see that $|(\lambda + \Delta \lambda) - \lambda| = |\Delta \lambda| $ in our case this we would be $|\Delta \lambda| = |1+1*10^{16} - 1*10^{16}| = |1| = 1$. Therefore we have for all eigenvalues in this question: $|\Delta \lambda| = 1$.

We can also see that $\|\mathbf{\Delta A}\|_2$ is simply equal to $\|-1\| = |1| = 1$, since by Question 3.1 (i) we know that this just evaluates to the max of the diagonal entries since $\mathbf{\Delta A}$ is a diagonal matrix. Therefore we have that $\|\mathbf{\Delta A}\|_2 = 1$.

Therefore we can see that since $|\Delta \lambda| = 1$ and $\|\mathbf{\Delta A}\|_2 = 1$ for all the eigenvalues for this set of matricies then this means that $|\Delta \lambda|  = \|\mathbf{\Delta A}\|_2$ and so in this case Theorem 1 is sharp.

In [8]:
A = np.array([[1+1*10**(16), 0], 
              [0,1+1*10**(16)]], dtype=float)

print(f"This is how A is stored: \n{A}")

This is how A is stored: 
[[1.e+16 0.e+00]
 [0.e+00 1.e+16]]
