Quite often in basic machine learning applications -- say with linear regression -- we gather $n$ samples of data and look to fit a model to it.  Note: we often have *a lot* of data, and in fact n can be any natural number.  For illustrative purposes, and without a loss of generality, this posting will use n = 5. 

Note that we typically also have multiple different features in our data, but **the goal of this posting is to strip down ideas to their very core*, so we consider the one feature case.  Also note that in machine learning we may use notation like $\mathbf {Xw} = \mathbf y$, where we solve for the weights in $\mathbf w$.  However, this posting uses the typical Linear Algebra setup of $\mathbf{Ax} = \mathbf b$, where we are interested in solving for $\mathbf x$.  

So initially we may just have the equation

$\mathbf{Ax} = \begin{bmatrix}
a_1\\ 
a_2\\ 
a_3\\ 
a_4\\ 
a_5
\end{bmatrix} \begin{bmatrix}
x_1\\ 
\end{bmatrix} = \mathbf b$

**this original 'data' matrix will also be written as **

$\mathbf a = \begin{bmatrix}
a_1\\ 
a_2\\ 
a_3\\ 
a_4\\ 
a_5
\end{bmatrix}$

Note that when we gather real world data there is noise in the data, so we would be *extremely* surprised if any of the entries in $\mathbf a$ are duplicates.  So, unless otherwise noted assume that each entry in $a_i$ is unique. Since there is only one column, the column rank of $\mathbf A$ is one, and the column rank = row rank, thus we know that the row rank = 1. 

Then we decide to insert a bias /affine translation piece (in index position zero -- to use notation from Caltech's "Learning From Data").  

Thus we end up with the following equation

$\mathbf{Ax} = \begin{bmatrix}
1 & a_1\\ 
1 & a_2\\ 
1 & a_3\\ 
1 & a_4\\ 
1 & a_5
\end{bmatrix} \begin{bmatrix}
x_0\\
x_1\\ 
\end{bmatrix} = x_0 \mathbf 1 + x_1 \mathbf a = \mathbf b$

Column 0 of $\mathbf A$ is the ones vector, also denoted as $\mathbf 1$.  

At this point we know that $\mathbf A$ still has full column rank (i.e. rank = 2) -- if this wasn't the case, this would imply that we could scale column 0 to get column 1 (i.e. everything in column 1 would have to be identical).   

From here we may simply decide to do least squares and solve (which we always can do when we have full column rank, and $\mathbf A $ has m rows and n columns, where $m \geq n$).  

Or we may decide to map this to a higher dimensional space that has a quadratic term.  

$\mathbf{Ax} = \begin{bmatrix}
1 & a_1 & a_1^2\\ 
1 & a_2 & a_2^2\\ 
1 & a_3 & a_3^2\\ 
1 & a_4 & a_4^2\\ 
1 & a_5 & a_5^2
\end{bmatrix} \begin{bmatrix}
x_0\\
x_1\\ 
x_2\\
\end{bmatrix} = \mathbf b$


At this point we may just do least squares and solve.  But that requires $\mathbf A$ to have full column rank.  How do we know that $\mathbf A$ has full column rank?  An intuitive way to think about it is that squaring each $a_i$ to get column 2 is not a linear transformation, so we would not expect it to be linear combination of prior columns.  

$\mathbf a \circ \mathbf a \neq \gamma_0 \mathbf 1 + \gamma_1 \mathbf a$

where $\circ$ denotes the Hadamard product.  And by earlier argument, we know $\mathbf a \neq \gamma_0 \mathbf 1$, hence each column is linearly independent.  There is another (more mathemetically exact) way to verify linear independence of these columns -- which comes from the Vandermonde Matrix, and we will address this shortly.  

We may however decide we want an even higher dimensional space for our data, so we add a cubic term:

$\mathbf{Ax} = \begin{bmatrix}
1 & a_1 & a_1^2 & a_1^3\\ 
1 & a_2 & a_2^2 & a_2^3\\ 
1 & a_3 & a_3^2 & a_3^3\\ 
1 & a_4 & a_4^2 & a_4^3\\ 
1 & a_5 & a_5^2 & a_5^3
\end{bmatrix} \begin{bmatrix}
x_0\\
x_1\\ 
x_2\\
x_3\\
\end{bmatrix} = \mathbf b$

Again we may be confident that the columns are linearly independent because our new column -- cubing $\mathbf a$ is not a linear transformation (or alternatively, using the hadamard product is not a linear transformation), so we write: 

$\mathbf a \circ \mathbf a \circ \mathbf a \neq \gamma_0 \mathbf 1 + \gamma_1 \mathbf a + \gamma_2 \big(\mathbf a \circ \mathbf a\big)$

And if the above is *still* not enough, we may add a term to the fourth power:

$\mathbf{Ax} = \begin{bmatrix}
1 & a_1 & a_1^2 & a_1^3 & a_1^4\\ 
1 & a_2 & a_2^2 & a_2^3 & a_2^4\\ 
1 & a_3 & a_3^2 & a_3^3 & a_3^4\\ 
1 & a_4 & a_4^2 & a_4^3 & a_4^4\\ 
1 & a_5 & a_5^2 & a_5^3 & a_5^4
\end{bmatrix} \begin{bmatrix}
x_0\\
x_1\\ 
x_2\\
x_3\\
x_4\\
\end{bmatrix} = \mathbf b$

Again quite confident that the above has full column rank because 

$\mathbf a \circ \mathbf a \circ \mathbf a \circ \mathbf a \neq \gamma_0 \mathbf 1 + \gamma_1 \mathbf a + \gamma_2 \big(\mathbf a \circ \mathbf a\big) + \gamma_3 \big(\mathbf a \circ \mathbf a \circ \mathbf a \big)$

We may be tempted to go to an even higher dimensional space at this point, but this requires considerable justification.  Notice that $\mathbf A$ is a square matrix now, and as we've argued, it has full column rank -- which means it also has full row rank.  Thus we can be sure to solve the above equation for a single, exact solution, where $\mathbf x = \mathbf A^{-1}\mathbf b$.  If we were to go to a higher dimensional space we would be entering the world of an underdetermined system of equations -- see postings titled "Underdetermined_System_of_Equations.ipynb" for the L2 norm oriented solution, and "underdetermined_regression_minimize_L1_norm.ipynb" for the L1 norm oriented solution.  Since we can already be certain of solving for a single exact solution in this problem, we will stop mapping to higher dimensions here.  

In the above equation of $\mathbf{Ax} = \mathbf b$, the square $\mathbf A$ is a Vandermonde matrix.  Technical note: some texts say that $\mathbf A$ is the Vandermonde matrix, while others say $\mathbf A^T$ is the Vandermonde matrix.  The calculation of the determinant is identical, and for other properties, a mere small book-keeping adjustment is required.
  
Note that the Vandermonde matrix is well studied, has special fast matrix vector multiplication (i.e. $\lt O(n^2)$) algorithms associated with it -- and a very special type of Vandermonde matrix is the Discrete Fourier Transform matrix.  The Vandermonde matrix  also has some very interesting properties for thinking about eigenvalues. 
- - - -
As a slight digression, consider the case where $a_1 = a_2$.  If this were true, the maximal row rank of $\mathbf A$ would be 4, and hence the maximal column rank would also be 4, and thus $\mathbf A$ would not be full rank aka $det\big(\mathbf A\big) = 0$.  
- - - -
There is another, more exacting way to verify that $\mathbf A$ is full rank.  Let's look at the determinant of $\mathbf A^T$.  There are a few different ways to prove this.  Sergei Winitzki had an interesting proof using wedge products -- that I may revisit at some point in the future.  For the now,  I'll just notice that there is a rather obvious 'pattern' to these Vandermonde matrices, so we'll do the proof using mathematical induction, which takes advantage of this pattern / progression in polynomial terms.  

**claim**: 

for natural number $n \geq 2$ where $\mathbf A \in \mathbb R^{n x n}$, and $\mathbf A$ is a Vandermonde matrix, 

$det \big(\mathbf A \big) = det \big(\mathbf A^T \big) = \prod_{1 \leq i \lt j \leq n} (a_j - a_i)$

*Base Case:* 

$n = 2$

$\mathbf A^T = \begin{bmatrix}
1 & 1\\ 
a_1 & a_2
\end{bmatrix}$

$det \big(\mathbf A^T \big) = (a_2 - a_1) = \prod_{1 \leq i \lt j \leq n} (a_j - a_i)$

*sneak peak:*  
if we follow the row operation procedure used during the inductive case, what we'd have is:

$det \big(\mathbf A^T \big) = det\Big(\begin{bmatrix}
1 & 1\\ 
0 & (a_2 - a_1)
\end{bmatrix}\Big) = 1*(a_2 - a_1)$


*Inductive Case:*

For $n \gt 2$, assume formula is true where $\mathbf C \in \mathbb R^{(n-1) x (n -1)}$

i.e. assume true where 

$\mathbf C = \begin{bmatrix}
1 & 1 & 1 & \dots & 1\\ 
a_1 & a_2 & a_3 & \dots & a_{n-1}\\ 
a_1^2 & a_2^2 & a_3^2 & \dots & a_{n-1}^2\\ 
\vdots & \vdots & \vdots & \ddots & \vdots\\ 
a_{1}^{n-2} & a_{2}^{n-2} & a_{3}^{n-2} & \dots & a_{n-1}^{n-2}
\end{bmatrix}$

Note that we call this submatrix $\mathbf C$ -- it will make a reappearance shortly!


We need to show that the formula holds true where dimension of $\mathbf A$ is $n$ x $n$. Thus consider the case where:

$\mathbf A^T  = \begin{bmatrix}
1 & 1 & 1 & \dots & 1 & 1\\ 
a_1 & a_2 & a_3 & \dots & a_{n-1} & a_n \\ 
a_1^2 & a_2^2 & a_3^2 & \dots & a_{n-1}^2 & a_{n}^2\\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 
a_{1}^{n-2} & a_{2}^{n-2} & a_{3}^{n-2} & \dots & a_{n-1}^{n-2} & a_{n}^{n-2}\\
a_{1}^{n-1} & a_{2}^{n-1} & a_{3}^{n-1} & \dots & a_{n-1}^{n-1} & a_{n}^{n-1}
\end{bmatrix} $

**Procedure:**
subtract $a_1$ times the $i - 1$ row from the ith row, for  $0 \lt i \leq n$ **starting from the bottom of the matrix and working our way up** (i.e. the operations / subproblems do not overlap in this regard).  

- - - - - 
**Justification:**

First, the reason we'd like to do this is because we see an obvious pattern in the polynomial progression in each column of $\mathbf A^T$.  Thus by following this procedure, we can zero out all entries in the zeroth column of $\mathbf A^T$ except, the 1 located in the top left (i.e. in $a_{0,0}$).  This will allow us to, in effect, reduce our problem to the n - 1 x n - 1 dimensional case.  

Also recall that the determinant of $\mathbf A^T$ is equivalent to the determinant of $\mathbf A$. Thus the above procedure is equivalent to subtracting a scaled version of column 0 of the original $\mathbf A$ from column 1, and a scaled version of column 1 in the original $\mathbf A$ from column 2, and so on.  These are standard operations that are well understood to not change the calculated determinant over any legal field. 

Since, your author particularly likes Gramâ€“Schmidt and orthgonality, there is an additional more visual interpretation that can be used over inner product spaces (i.e. real or complex fields).  Consider that $\mathbf A = \mathbf{QR}$, thus $det \big(\mathbf A \big) = det \big(\mathbf{QR} \big) = det \big(\mathbf{Q} \big)det \big(\mathbf{R} \big)$.  Notice that these column operations will have no impact on $\mathbf Q$, and will only change the value of entries above the diagonal in $\mathbf R$, thus there is no change in $det \big(\mathbf{Q} \big)$ or $det \big(\mathbf{R} \big)$ (which is given by the product of its diagonal entries).  This means there is no change in $det \big(\mathbf{A} \big)$.  


- - - - - 

$ = \begin{bmatrix}
1 & 1 & 1 & \dots & 1 & 1\\ 
0 & a_2 - a_1 & a_3 - a_1 & \dots & a_{n-1} - a_1 & a_n - a_1 \\ 
0 & a_2^2 - a_1 a_2 & a_3^2 - a_1 a_3 & \dots & a_{n-1}^2 - a_1 a_{n-1} & a_{n}^2 - a_1 a_{n}\\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 
0 & a_{2}^{n-2} - a_1 a_{2}^{n-3} & a_{3}^{n-2} - a_1 a_{3}^{n-3} & \dots & a_{n-1}^{n-2} - a_1 a_{n-1}^{n-3} & a_{n}^{n-2} - a_1 a_{n}^{n-3}\\
0 & a_{2}^{n-1} - a_1 a_2^{n-2} & a_{3}^{n-1} - a_1 a_3^{n-2}& \dots & a_{n-1}^{n-1} -  a_1 a_{n-1}^{n-2}& a_{n}^{n-1} - a_1 a_{n}^{n-1}
\end{bmatrix} $

$ = \begin{bmatrix}
1 & 1 & 1 & \dots & 1 & 1\\ 
0 & (a_2 - a_1) 1 & (a_3 - a_1)1 & \dots & (a_{n-1} - a_1) 1 & (a_n - a_1) 1 \\ 
0 & (a_2 - a_1) a_2 & (a_3 - a_1) a_3 & \dots & (a_{n-1} - a_1) a_{n-1} & (a_n - a_1) a_{n}\\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 
0 & (a_2 - a_1)a_{2}^{n-3} & (a_3 - a_1)a_{3}^{n-3} & \dots & (a_{n-1} - a_1)a_{n-1}^{n-3} & (a_n - a_1)a_{n}^{n-3}\\
0 & (a_2 - a_1)a_{2}^{n-2} & (a_3 - a_1)a_{3}^{n-2} & \dots & (a_{n-1} - a_1)a_{n-1}^{n-2} & (a_n - a_1)a_{n}^{n-2} 
\end{bmatrix}  $

we can rewrite this as 

$= \begin{bmatrix}
1 & \mathbf 1^T\\ 
\mathbf 0 & \mathbf{CD}
\end{bmatrix}$


where 

$\mathbf D = Diag\Big(\begin{bmatrix}
a_2 & a_3 & a_4 & \dots & a_n
\end{bmatrix}^T \Big) - a_1 \mathbf I =    \begin{bmatrix}
(a_2-a_1) & 0 &  0& \dots & 0\\ 
0 & (a_3 - a_1) &0  &\dots  &0 \\ 
0 & 0 & (a_4 - a_1) & \dots & 0\\ 
\vdots & \vdots & \vdots & \ddots & \vdots \\ 
0 & 0 & 0 & \dots & (a_n - a_1)
\end{bmatrix}$

Note that $\begin{bmatrix}
1 & \mathbf 1^T\\ \mathbf 0 & \mathbf{CD} \end{bmatrix} - \lambda \begin{bmatrix}
1 & \mathbf 0^T\\ \mathbf 0 & \mathbf{I} \end{bmatrix} = \begin{bmatrix}
1 - \lambda & \mathbf 1^T\\ \mathbf 0 & \mathbf{CD } - \mathbf \lambda I \end{bmatrix}$, which is not invertible when $\lambda := 1$ (because the left most column is all zeros).  

Hence we know that there is an eigenvalue of 1, given by the top left diagonal entry, associated with $\begin{bmatrix}
1 & \mathbf 1^T\\ 
\mathbf 0 & \mathbf{CD}
\end{bmatrix}$. We'll call this $\lambda_1$ -- for the first eigenvalue of the "MatrixAfterRowOperations".  

Thus the determinant can be written as 

$det\big(\mathbf A^T \big) = det\big(MatrixAfterRowOperations\big) = (\lambda_1) * (\lambda_2  * \lambda_3 * ... * \lambda_n\big) = (1) * \det\big(\mathbf{CD}\big) = \det\big(\mathbf{C}\big) \det\big(\mathbf{D}\big)$



- - - - -

**begin interlude** 

The fact that 
$det\big(\begin{bmatrix}
1 & \mathbf *\\ 
\mathbf 0 & \mathbf{Z}
\end{bmatrix}\big) = 1 * det\big(\mathbf{Z}\big)$

is well understood via properities of block matrices over many fields.  However, as is often the case, there is an additional interpretation over inner product spaces, that makes use of orthogonality.  **This interlude is a bit overkill and may safely be skipped**.

Another way to think about this, is we can borrow from the Schur Decomposition $\mathbf X = \mathbf V \mathbf R \mathbf V^{H}$ where $\mathbf V$ is unitary and $\mathbf R$ is upper triangular.  Equivalently, $ \mathbf V^H \mathbf X  \mathbf V = \mathbf R$.  Also, we know the  eigenvector associated with $\lambda_1$ (which is $\begin{bmatrix}1 \\ \mathbf 0 \\ \end{bmatrix}$) can be chosen to be the left most column of $\mathbf V$.  Since all columns in $\mathbf V$ are mutually orthonormal, and hence all other columns must have a zero in the upper-most position.  Writing this out, and working through the blocked multiplication we get the following:

(note that $^H$ denotes conjugate transpose -- and of course if the values are real, then this acts like a regular transpose operation)

$\mathbf V^{H} \begin{bmatrix}
1 & \mathbf 1^H\\ 
\mathbf 0 & \mathbf{CD}
\end{bmatrix} \mathbf V = \mathbf R$

$\mathbf V^{H} \begin{bmatrix}
1 & \mathbf 1^H\\ 
\mathbf 0 & \mathbf{CD}
\end{bmatrix} \mathbf V = \begin{bmatrix}1 & \mathbf 0^H \\ 
\mathbf 0 & \mathbf{Q}
\end{bmatrix}^H \begin{bmatrix}
1 & \mathbf 1^H\\ 
\mathbf 0 & \mathbf{CD} \end{bmatrix} \begin{bmatrix}
1 & \mathbf 0^H \\ 
\mathbf 0 & \mathbf{Q}
\end{bmatrix}= \begin{bmatrix}1 & \mathbf 0^H \\ 
\mathbf 0 & \mathbf Q^H
\end{bmatrix} \Big(\begin{bmatrix}
1 & \mathbf 1^H\\ 
\mathbf 0 & \mathbf{CD} \end{bmatrix} \begin{bmatrix}
1 & \mathbf 0^H \\ 
\mathbf 0 & \mathbf{Q}
\end{bmatrix}\Big) $


$\mathbf V^{H} \begin{bmatrix}
1 & \mathbf 1^H\\ 
\mathbf 0 & \mathbf{CD}
\end{bmatrix} \mathbf V =  \begin{bmatrix} 1 & \mathbf 0^H \\ 
\mathbf 0 & \mathbf Q^H
\end{bmatrix} \Big(\begin{bmatrix}1 & \mathbf 1^H \mathbf Q \\ 
\mathbf 0 & \mathbf{CDQ}
\end{bmatrix}\Big) = \begin{bmatrix}1 & \mathbf 1^H \mathbf Q \\ 
\mathbf 0 & \mathbf{Q}^H\mathbf{ CDQ}
\end{bmatrix} = \begin{bmatrix}
1 & \mathbf 1^H \mathbf Q \\ 
\mathbf 0 & \mathbf{T}
\end{bmatrix}= \mathbf R$

Thus we know that the determinant we want comes from a similar matrix $\mathbf R$, who's determinant is the product of its eigenvalues (which are along its diagonal).  We further know that this is equal to $1 * det\big(\mathbf T\big) = 1 *det\big(\mathbf Q^H \mathbf{CDQ}\big) = det\big(\mathbf{Q}^H\big) det\big(\mathbf C\big)\det(\mathbf D\big) det\big(\mathbf Q\big) = det\big(\mathbf C\big)det\big(\mathbf D\big)$, via the fact that upper triangular matrix $\mathbf T = \mathbf Q^H \mathbf{CDQ}$, then applying multiplicative properties of determinants (and perhaps noticing that $\mathbf{CD}$ is similar to $\mathbf T$).  

Thus $det\Big(\begin{bmatrix}
1 & \mathbf 1^H\\ 
\mathbf 0 & \mathbf{CD}
\end{bmatrix}\Big) = det\Big(\begin{bmatrix}
1 & \mathbf *\\ 
\mathbf 0 & \mathbf{CD}
\end{bmatrix}\Big) = det\big(\mathbf{CD}\big) = det\big(\mathbf{C}\big) det\big(\mathbf{D}\big)$

**end interlude**
- - - - -
We know that 

$\det\big(\mathbf{D}\big) = (a_2-a_1) * (a_3 - a_1) * ... * (a_n - a_1)$

because the determininant of a diagonal matrix is the product of its diagonal entries (i.e. its eigenvalues)  

and 

$det \big(\mathbf C \big) = \prod_{1 \leq i \lt j \leq n-1} (a_j - a_i)$ 

by inductive hypothesis.  
Thus we can say 

$ det\big(\mathbf A^T \big) = \big(\prod_{1 \leq i \lt j \leq n-1} (a_j - a_i)\big) \big((a_2-a_1) * (a_3 - a_1) * ... * (a_n - a_1)\big) = \prod_{1 \leq i \lt j \leq n} (a_j - a_i)$

And the induction is proved.  

Finally, we note that $det \big(\mathbf A \big) = det \big(\mathbf A^T \big)$ because $\mathbf A$ and $\mathbf A^T$ have the same characteristic polynomials (or equivalently, they have the same eigenvalues). We have thus proved the determinant formula for $\mathbf A$.  

(Technical note: if $\mathbf A \in \mathbb C^{n x n}$ then the above results still hold with respect to the magnitude of the determinant of $\mathbf A$.  This includes the very important special case of whether or not $\big\vert det\big(\mathbf A\big)\big\vert = 0$ --i.e. whether or not $\mathbf A^{-1}$ exists.  However, with respect to the exact determinant, it would be more proper to state that $det\big(\mathbf A\big) = conjugate\Big(\det\big(\mathbf A^H\big)\Big)$. 
- - - -

This gives us another way to confirm that our Vandermonde Matrix is full rank.  We know that a square, finite dimensional matrix is singular iff it has a determinant of 0.  We then see that 

$\det \big(\mathbf A\big) = \big(\prod_{1 \leq i \lt j \leq n} (a_j - a_i)\big) = 0$ iff there is some $a_j = a_i$ where $i \neq j$.  

This of course is another way of saying that our Vandermonde Matrix is not full rank if some entry in our 'original' matrix of 

$\mathbf a = \begin{bmatrix}
a_1\\ 
a_2\\ 
a_3\\ 
a_4\\ 
a_5
\end{bmatrix}$

was not unique.  


- - - -
It is worth highlighting that if for some reason we did not like to explicitly use determinants, we could instead just repeatedly, and recursively apply the above procedure as a type of Gaussian Elimination, and in the end we would get have transformed $\mathbf{A}^T$ into the below Row Echelon form: 

$\begin{bmatrix}
1 & 1 & 1 &  \dots & 1 & 1\\
0 &(a_2-a_1) & 1 &   \dots & 1 & 1\\ 
0& 0 & (a_3 - a_1)(a_3 - a_2)  &\dots &1 &1 \\ 
\vdots &\vdots & \vdots & \ddots & \vdots & \vdots \\ 
0&0 & 0 & \dots & \big(\prod_{1 \leq i \lt n-1} (a_{n-1} - a_i)\big) & 1\\ 
0& 0 & 0 & \dots & 0 & \big(\prod_{1 \leq i \lt n} (a_{n} - a_i)\big)
\end{bmatrix}\mathbf x = \mathbf b$

(Of course, we can immediately notice that the determinant formula can be recovered by multiplying the diagonal elements of the above matrix.)

It is instructive to realize that we can solve for an exact $\mathbf x$ so long as we don't have any zeros on the diagonal of our above upper triangular /row echelon matrix.  We notice that this is the case only if and only if all $a_i$ are unique.

- - - -



Furthermore, notice that this determinant formula gives us a proof that we have full column rank in any thinner (i.e. more rows than columns) version of our Vandermonde matrix.  E.g. consider the case of 


$\mathbf{A} = \begin{bmatrix}
1 & a_1 & a_1^2 & a_1^3\\ 
1 & a_2 & a_2^2 & a_2^3\\ 
1 & a_3 & a_3^2 & a_3^3\\ 
1 & a_4 & a_4^2 & a_4^3\\ 
1 & a_5 & a_5^2 & a_5^3
\end{bmatrix}$


These columns must be linearly independent, so long as each $a_i \neq a_j$ where $i \neq j$.  If that was not the case, then appending additional columns until square (i.e. append $\mathbf a \circ \mathbf a \circ \mathbf a \circ \mathbf a$) would mean that 

$\mathbf A = \begin{bmatrix}
1 & a_1 & a_1^2 & a_1^3 & a_1^4\\ 
1 & a_2 & a_2^2 & a_2^3 & a_2^4\\ 
1 & a_3 & a_3^2 & a_3^3 & a_3^4\\ 
1 & a_4 & a_4^2 & a_4^3 & a_4^4\\ 
1 & a_5 & a_5^2 & a_5^3 & a_5^4
\end{bmatrix} $

could not have full column rank either.  Yet we know this matrix is full rank via our determinant formula (again so long as each $a_i$ is unique) thus we know that the columns of any smaller  "long and skinny" version of this matrix must also be linearly independent.

Also, when each $a_i$ is unique, since we know that our Vandermonde matrix is full rank, we know that each of its rows is linearly independent.  If for some reason we had a 'short and fat' version of the above matrix, like:

$\mathbf A = \begin{bmatrix}
1 & a_1 & a_1^2 & a_1^3 & a_1^4\\ 
1 & a_2 & a_2^2 & a_2^3 & a_2^4\\ 
1 & a_3 & a_3^2 & a_3^3 & a_3^4\\ 
\end{bmatrix} $

we would know that it is full row rank -- i.e. each of its rows are linearly independent.


It is also worth noting, that if each $a_i \neq 0$, then we can say that the square Vandermonde Matrix can also be viewed, with no loss of generality, as 

$\mathbf{Diag}\Big(\begin{bmatrix}
a_1 & a_2 & a_3 & a_4 & a_5
\end{bmatrix}^T \Big) \mathbf A = \begin{bmatrix}
a_1 & a_1^2 & a_1^3 & a_1^4 & a_1^5\\ 
a_2 & a_2^2 & a_2^3 & a_2^4 & a_2^5\\ 
a_3 & a_3^2 & a_3^3 & a_3^4 & a_3^5\\ 
a_4 & a_4^2 & a_4^3 & a_4^4 & a_4^5\\ 
a_5 & a_5^2 & a_5^3 & a_5^4 & a_5^5
\end{bmatrix} $


**Implication: A degree n-1 polynomial is completely given by n uniqe data points**

Assuming there is no noise in the data -- or numeric precision issues-- the Vandermonde matrix, $\mathbf A$, allows you to solve for the unique values in some polynomial with coefficients of 


$x_0 * 1 + x_1 a + x_2 a^2 + x_3 a^4 = b$


- - - - -
$\mathbf{Ax} = \begin{bmatrix}
1 & a_1 & a_1^2 & a_1^3 & a_1^4\\ 
1 & a_2 & a_2^2 & a_2^3 & a_2^4\\ 
1 & a_3 & a_3^2 & a_3^3 & a_3^4\\ 
1 & a_4 & a_4^2 & a_4^3 & a_4^4\\ 
1 & a_5 & a_5^2 & a_5^3 & a_5^4
\end{bmatrix} \begin{bmatrix}
x_0\\
x_1\\ 
x_2\\
x_3\\
x_4\\
\end{bmatrix} = \mathbf b$
- - - - -

The next extension is perhaps a bit more interesting.

**Extension: two ways to think about polynomials** 

Knowing that we can exactly specify a degree $n-1$ polynomial with $n$ distinct data points leads us to wonder:

is it 'better' to think about polynomials with respect to the coefficients or the data points?  In the above vector form -- the question becomes is it better to think about the polynomial in terms of $\mathbf x$ or $\mathbf b$? 

The answer is-- it depends.  To directly evaluate a function is much quicker when we know $\mathbf x$.  But as it turns out, when we want to multiply or convolve polynomials, it is considerably faster to know their point values contained in $\mathbf b$.  

And since the Vandermonde matrix is so helpful for encapsulating all of our knowledge about a polynomial, an natural question is -- what if we wanted to make multiplying $\mathbf A^{-1} \mathbf b$ to get $\mathbf x$ at least as easy as just multiplying $\mathbf{Ax}$ to get $\mathbf b$?  The clear answer would mean finding a way so that you don't have to explicitly invert $\mathbf A$.  This can be done most easily if $\mathbf A$ is unitary (i.e. orthogonal albeit in a complex inner product space), hence $\mathbf A^H = \mathbf A^{-1}$.  If $\mathbf A$ is unitary, this directly leads us to the Discrete Fourier Transform.  (And from there to the Fast Fourier Transform which is widely regarded as one of the top 10 algorithms of the last 100 years.)

But first, let's work through a couple of important related ideas where we can apply Vandermonde matrices: (a) square matrices that have unique eigenvalues must be diagonalizable and (b) some interesting cyclic and structural properties underlying Permutation matrices. 



**Application of Vandermonde Matrices: Proof of Linear Independence of Eigenvectors associated with Unique Eigenvalues**

This proves that if a square (finite dimensional) matrix --aka an operator --has all eigenvalues that are unique, then the eigenvectors must be linearly independent.  Put differently, this proves that such an operator is diagonalizable.  

The typical argument for linear indepdence is in fact a touch shorter than this and does not need Vandermonde matrices -- however it relies on a contradiction that is not particularly satisfying.  The following proof -- adapted from Winitzki's *Linear Algebra via Exterior Products* is direct -- and to your author-- very intuitive.   

Consider $\mathbf B \in \mathbb C^{n x n}$ matrix, which has n unique eigenvalues -- i.e. $\lambda_1 \neq \lambda_2 \neq ... \neq \lambda_n$.  

When looking for linear indepenence, 

$\gamma_1 \mathbf v_1 + \gamma_2 \mathbf v_2 + ... + \gamma_n \mathbf v_n = \mathbf 0$  

we can say that **the eigenvectors are linearly independent iff** $\gamma_1 = \gamma_2 = ... = \gamma_n = 0$

Further, for $k = \{1, 2, ..., n\}$, we know that  
$\mathbf v_k  = \mathbf v_k$  
$\mathbf B \mathbf v_k = \lambda_k \mathbf v_k$  
$\mathbf B \mathbf B \mathbf v_k = \mathbf B^2 \mathbf v_k = \lambda_k^2 \mathbf v_k$  
$\vdots $  

$\mathbf B^{n-1} \mathbf v_k = \lambda_k^{n-1} \mathbf v_k$  


Thus we can take our original linear independence test,

$\gamma_1 \mathbf v_1 + \gamma_2 \mathbf v_2 + ... + \gamma_n \mathbf v_n = \mathbf 0$  

and further generalize it to also include:

$ \lambda_1^r \gamma_1 \mathbf v_1 + \lambda_2^r  \gamma_2 \mathbf v_2 + ... + \lambda_n^r \gamma_n \mathbf v_n = \mathbf 0$  

for $r = \{1, 2, ..., n-1\}$
- - - -

Now let's collect these $n$ relationships in a system of equations:


$\bigg[\begin{array}{c|c|c|c}
\gamma_1 \mathbf v_1 & \gamma_2 \mathbf v_2 &\cdots & \gamma_n \mathbf v_n
\end{array}\bigg] \mathbf W = \bigg[\begin{array}{c|c|c|c}
  \mathbf 0 & \mathbf 0 &\cdots & \mathbf 0
\end{array}\bigg]$


where 

$\mathbf W = \begin{bmatrix}
1 & \lambda_1 & \lambda_1^2 & \dots  & \lambda_1^{n-1}\\ 
1 & \lambda_2 & \lambda_2^2 & \dots &  \lambda_2^{n-1} \\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \\ 
1 & \lambda_{n} & \lambda_{n}^{2} & \dots  & \lambda_{n}^{n-1}
\end{bmatrix}$


Notice that $\mathbf W$ is a Vandermonde matrix. Since $\lambda_i \neq \lambda_k$ if $i \neq k$, we know that $det \big(\mathbf W\big) \neq 0$, and thus $\mathbf W^{-1}$ exists as a unique operator.  We multiply each term on the right by $\mathbf W^{-1}$.  

$\bigg[\begin{array}{c|c|c|c}
\mathbf \gamma_1 \mathbf v_1 & \gamma_2 \mathbf v_2 &\cdots & \gamma_n \mathbf v_n
\end{array}\bigg]
 \mathbf W \mathbf W^{-1}= \bigg[\begin{array}{c|c|c|c}
\gamma_1 \mathbf v_1 & \gamma_2 \mathbf v_2 &\cdots & \gamma_n \mathbf v_n
\end{array}\bigg]\mathbf I = \bigg[\begin{array}{c|c|c|c}
  \mathbf 0 & \mathbf 0 &\cdots & \mathbf 0
\end{array}\bigg] \mathbf W^{-1} = \bigg[\begin{array}{c|c|c|c}
  \mathbf 0 & \mathbf 0 &\cdots & \mathbf 0
\end{array}\bigg]$  


Thus we know that 

$\bigg[\begin{array}{c|c|c|c}
\mathbf \gamma_1 \mathbf v_1 & \gamma_2 \mathbf v_2 &\cdots & \gamma_n \mathbf v_n
\end{array}\bigg] = \bigg[\begin{array}{c|c|c|c}
  \mathbf 0 & \mathbf 0 &\cdots & \mathbf 0
\end{array}\bigg]$

By definition each eigenvector $\mathbf v_k \neq \mathbf 0$.  This means that each scalar $\gamma_k = 0$.  Each eigenvector has thus been proven to be linearly independent.  


# Permutation Matrices and Periodic Behavior

Consider an $n$ x $n$ permutation matrix $\mathbf P$.  Note that this matrix is real valued where each column has all zeros, and a single 1.  

We'll use the $^H$ to denote conjugate transpose (even though the matrix is entirely real valued), as some imaginary numbers will creep in later.


It is easy to verify that the permutation matrix is unitary (a special case infact which is orthogonal), i.e. that 

$\mathbf P^H \mathbf P = \mathbf I$

because, by construction, each column in a permutation matrix has all zeros, except a single 1, and the Permutation matrix is full rank -- hence each column must be orthogonal.  

Further, as mentioned in "Schurs_Inequality.ipynb", such a matrix ccan be diagonalized where   

$\mathbf P = \mathbf{QDQ}^H$

where each eigenvalue $\lambda_i$ is contained along the diagonal of $\mathbf D$ and is on the unit circle.  

notice that for any permtuation matrix: 

$\mathbf {P1} = \mathbf 1$

Hence such a permutation matrix has $\lambda_1 = 1$ (i.e. a permutation matrix is stochastic -- in fact doubly so).  

Note that $\mathbf P$ has all zeros, except a single 1 in each column (or equivalently, in each row), hence it can be interpretted as a special kind of Adjacency Matrix for a directed graph. 
- - - - 
Of particular interest is **the permutation matrix that relates to a connected graph** (i.e. where each node is reachable from each other node) with $n$ nodes.  One example, where $n = 6$ is the below:

$\mathbf P = \begin{bmatrix}
0 & 0 & 0 & 0 & 0 & 1\\ 
1 & 0 & 0 & 0 & 0 & 0\\ 
0 & 1 & 0 & 0 & 0 & 0\\ 
0 & 0 & 1 & 0 & 0 & 0\\ 
0 & 0 & 0 &  1&  0& 0\\ 
0 & 0 &  0& 0 & 1 &0 
\end{bmatrix}$

*Claim:*  
For a permutation matrix associated with a connected graph (i.e. where each node may be visited from each other node), the time it take to repeat a visit to a node is $n$ iterations. 

*Proof:*  
since each node in the directed graph has an outbound connection to only one other node, and there are $n$ nodes total, if a cycle can occur in $\leq n - 1$ iterations, then the number of nodes you can reach from the starting node (including itself) is $\leq n - 1$ nodes, and hence the graph is not connected -- a contradiction. 


Thus we can say that $\mathbf P^ 0 = \mathbf P^n = \mathbf I$. So $trace\big(\mathbf P^0\big) = trace\big(\mathbf P^n\big) = n$.  

However, the diagonal entries of $\mathbf P^k$ are all zero for $k = \{1, 2, ..., n-2, n-1\}$. Thus we have:

$trace\big(\mathbf P^k\big) = 0$


Now consider the standard basis vectors $\mathbf e_j$ $j \in \{1, 2, ..., n-1, n\}$ -- i.e. column slices of the identity matrix, shown below:

$\bigg[\begin{array}{c|c|c|c}
\mathbf e_1 & \mathbf  e_2 &\cdots &\mathbf  e_n
\end{array}\bigg] = \mathbf I$

Each one of these vectors is a valid starting position for having a 'presence' on exactly one node of the graph. With no loss of generality, we could choose just one the standard basis vectors to be our starting point -- e.g. set $\mathbf e_j := \mathbf e_1$.  However, we'll keep the notation $\mathbf e_j$, though the reader may decide to select a specific standard basis vector if helpful.

Since we have a connected graph and can only be on one position at a time as we iterate though, we know that $\mathbf P^k \mathbf e_j \perp \mathbf P^r \mathbf e_j$,

for natural number $r$, where $ 0 \leq r \lt k$, $k \in \{1, 2, ..., n-2, n-1\}$

Thus we can collect each location in the graph in an $n$ x $n$ matrix as below:

$\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \mathbf P^2\mathbf e_j &\cdots & \mathbf P^{n-1}\mathbf e_j
\end{array}\bigg] = \bigg[\begin{array}{c|c|c|c|c}
\mathbf V \mathbf D^0 \mathbf V^H\mathbf e_j & \mathbf V \mathbf D^1 \mathbf V^H\mathbf e_j & \mathbf V \mathbf D^2 \mathbf V^H\mathbf e_j &\cdots & \mathbf V \mathbf D^{n-1} \mathbf V^H\mathbf e_j
\end{array}\bigg]$



$\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \mathbf P^2\mathbf e_j &\cdots & \mathbf P^{n-1}\mathbf e_j
\end{array}\bigg] = \mathbf V \bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf V^H\mathbf e_j & \mathbf D^1 \mathbf V^H\mathbf e_j & \mathbf D^2 \mathbf V^H\mathbf e_j &\cdots & \mathbf D^{n-1} \mathbf V^H\mathbf e_j
\end{array}\bigg]$

Now left multiply each side by full rank, unitary matrix $\mathbf V^H$, and for notational simplity, let $\mathbf y := \mathbf V^H\mathbf e_j$


$\mathbf V^H \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \mathbf P^2\mathbf e_j &\cdots & \mathbf P^{n-1}\mathbf e_j
\end{array}\bigg] = \bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf y & \mathbf D^1 \mathbf y & \mathbf D^2 \mathbf y &\cdots & \mathbf D^{n-1} \mathbf y
\end{array}\bigg]$

For each column vector on the right hand side, we have $\mathbf D^m \mathbf y$.  In various forms this can be written as 

$\mathbf D^m \mathbf y =  \mathbf D^m \big(\mathbf{Diag}\big(\mathbf y\big)\mathbf 1\big) = \mathbf{Diag}\big(\mathbf y\big) \mathbf D^m \mathbf 1 =
\mathbf{Diag}\big(\mathbf y\big)\big(\mathbf D^m \mathbf 1\big)$

Thus we can say: 

$\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf y & \mathbf D^1 \mathbf y & \mathbf D^2 \mathbf y &\cdots & \mathbf D^{n-1} \mathbf y
\end{array}\bigg] = \bigg[\begin{array}{c|c|c|c|c}
\mathbf{Diag}\big(\mathbf y\big)\mathbf D^0 \mathbf 1 & \mathbf{Diag}\big(\mathbf y\big)\mathbf D^1 \mathbf 1 & \mathbf{Diag}\big(\mathbf y\big)\mathbf D^2 \mathbf 1 &\cdots & \mathbf{Diag}\big(\mathbf y\big)\mathbf D^{n-1} \mathbf 1
\end{array}\bigg] $



$\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf y & \mathbf D^1 \mathbf y & \mathbf D^2 \mathbf y &\cdots & \mathbf D^{n-1} \mathbf y
\end{array}\bigg] = \mathbf{Diag}\big(\mathbf y\big)\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg] $


We make this substitution and see: 

$\mathbf V^H \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \mathbf P^2\mathbf e_j &\cdots & \mathbf P^{n-1}\mathbf e_j
\end{array}\bigg] = \mathbf{Diag}\big(\mathbf y\big)\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg] $

From here we may notice that since the left hand side is full rank, the right hand side must be full rank as well. 

Actually, we know even more than this -- i.e. we know that the left hand side is unitary. 

where we have $\mathbf X = \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \mathbf P^2\mathbf e_j &\cdots & \mathbf P^{n-1}\mathbf e_j
\end{array}\bigg]$

earlier we noted that:

$\mathbf P^k \mathbf e_j \perp \mathbf P^r \mathbf e_j$


and of course 

$\big \Vert \mathbf P^m \mathbf e_j\big \Vert_2^2 = \big(\mathbf P^m \mathbf e_j\big)^H\big(\mathbf P^m \mathbf e_j\big) = \mathbf e_j^H \big(\mathbf P^m\big)^H  \mathbf P^m \mathbf e_j = \mathbf e_j^H \mathbf I \mathbf e_j = \mathbf e_j^H \mathbf e_j = 1$

Thus each column in $\mathbf X$ is mutually orthonormal -- and $\mathbf U$ is n x n so it is a (real valued) unitary matrix. 

--

From here we see

$\big(\mathbf V^H \mathbf X\big)^H \mathbf V^H \mathbf X = \mathbf X^H \big(\mathbf V \mathbf V^H\big) \mathbf X = \mathbf X^H \mathbf X = \mathbf I$

So we know that the left hand side in unitary.  This means that the right handside must be unitary as well. 

Since the right hand side is unitary, that means it must be non-singular.  

Note that with respect to determinants, we could say:

$ \Big \vert Det\Big(\mathbf{Diag} \big(\mathbf y\big)\Big)\Big \vert*\Big \vert Det\Big( \bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]\Big) \Big \vert = 1$

Thus 

$Det\Big( \bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]\Big) \neq 0$

Finally, we 'unpack' this matrix and see that

$\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]=\begin{bmatrix}
1 & \lambda_1 & \lambda_1^2 & \dots  & \lambda_1^{n-1}\\ 
1 & \lambda_2 & \lambda_2^2 & \dots &  \lambda_2^{n-1} \\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \\ 
1 & \lambda_{n} & \lambda_{n}^{2} & \dots  & \lambda_{n}^{n-1}
\end{bmatrix}$

This is the Vandermonde matrix, which is non-singular **iff**  each $\lambda_i$ is unique.  Thus we conclude that each $\lambda_i$ for our Permutation matrix of a connected graph must be unique.



**claim:**

$\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]^H \bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg] = n \mathbf I$

That is, each column in the above matrix is mutually orthogonal (aka has an inner product of zero), and subject to some normalizing scalar constant, we know that the matrix is unitary. 

**proof:**  
First notice that each column has a squared L2 norm of $n$

for $m = \{0, 1, 2, ..., n-1\}$

$\big(\mathbf D^m \mathbf 1\big)^H \mathbf D^m \mathbf 1 = \mathbf 1^H \big(\mathbf D^m\big)^H \mathbf D^m \mathbf 1 = \mathbf 1^H \big(\mathbf I\big) \mathbf 1 = trace \big(\mathbf I\big)  = n$ 

note that when we say $\mathbf 1^H \big(\mathbf I\big) \mathbf 1 = trace \big(\mathbf I\big)$, we notice first that $\mathbf 1^H \big(\mathbf Z\big) \mathbf 1$, means to sum up all entries in some operator $\mathbf Z$, and if $\mathbf Z$ is a diagonal matrix, then this is equivalent to just summing the entries along the diagonal of $\mathbf Z$ which is equal to the trace of $\mathbf Z$.

Next we want to prove that the inner product of any column $j$ with some other column $\neq j$, is zero.

Thus we are interested in the cases of

$\big(\mathbf D^r \mathbf 1\big)^H \mathbf D^k \mathbf 1$ 

for all natural numbers $k$, $r$, *first* where $0\leq r \lt k \leq n-1$ and *second* where $0\leq  k \lt r \leq n-1$.

First we observe the $r \lt k$ case:

$\big(\mathbf D^r \mathbf 1\big)^H \mathbf D^k \mathbf 1 = \mathbf 1^H \Big(\big(\mathbf D^r\big)^H \big(\mathbf D^r \mathbf D^{k-r}\big)\Big) \mathbf 1 = \mathbf 1^H \Big(\big(\big(\mathbf D^r\big)^H \mathbf D^r\big) \mathbf D^{k-r}\Big) \mathbf 1 = \mathbf 1^H \Big(\big(\mathbf I\big) \mathbf D^{k - r}\Big) \mathbf 1 = \mathbf 1^H \big(\mathbf D^{k - r}\big) \mathbf 1 = trace\big(\mathbf D^{k - r}\big) $ 

since $k \gt r$, we know $0 \lt k - r \leq n-1$.  Put differently $(k - r) \%n \neq 0$

Thus $\big(\mathbf D^r \mathbf 1\big)^H \mathbf D^k \mathbf 1 = trace\big(\mathbf D^{k - r}\big) = trace\big(\mathbf Q^H \mathbf P^{k - r} \mathbf Q\big) = trace\big(\mathbf Q \mathbf Q^H \mathbf P^{k - r}\big) = trace\big(\mathbf P^{k - r}\big) = 0$

note that inner products are symmetric -- except for complex conjugation-- so in the case of an inner product equal to zero, we have 

$\Big(\big(\mathbf D^r \mathbf 1\big)^H \mathbf D^k \mathbf 1\Big)^H = trace\big(\mathbf D^{k - r}\big)^H  = 0^H = 0$

which covers the second case.


Thus all columns in

$\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]$

have a squared length of $n$ and are mutually orthgonal.

Hence we can say:

$\frac{1}{\sqrt n} \bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]= \frac{1}{\sqrt n} \begin{bmatrix}
1 & \lambda_1 & \lambda_1^2 & \dots  & \lambda_1^{n-1}\\ 
1 & \lambda_2 & \lambda_2^2 & \dots &  \lambda_2^{n-1} \\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \\ 
1 & \lambda_{n} & \lambda_{n}^{2} & \dots  & \lambda_{n}^{n-1}
\end{bmatrix}$

is a unitary matrix.


**Side note:** An interesting (though unneeded) consequence of this is that since 

$ \mathbf{Diag}\big(\mathbf y\big)\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]\Big(\mathbf{Diag}\big(\mathbf y\big)\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]\Big)^H = \mathbf I$

we can simplify this to

$ \mathbf{Diag}\big(\mathbf y\big)\Big(\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]^H \Big)\mathbf{Diag}\big(\mathbf y\big)^H = \mathbf{Diag}\big(\mathbf y\big) \big(n \mathbf I \big)\mathbf{Diag}\big(\mathbf y\big)^H = n \mathbf{Diag}\big(\mathbf y\big) \mathbf{Diag}\big(\mathbf y\big)^H = n \mathbf{Diag}\big(\mathbf y\big)^H \mathbf{Diag}\big(\mathbf y\big) = \mathbf I$

hence

$\mathbf{Diag}\big(\mathbf y\big)^H \mathbf{Diag}\big(\mathbf y\big) = \mathbf{Diag}\Big(\big(\mathbf V^H \mathbf e_j\big)^H \circ \big(\mathbf V^H \mathbf e_j\big)\Big) = \frac{1}{n}\mathbf I$ 

where $\circ$ denotes the Hadamard product


** Permutation matrix conclusion:**


Finally, when we consider that (a) each $\lambda_i$ is distinct, and also that (b) each $\lambda_i^n = 1$

as a reminder: this is because (a) the associated Vandermonde matrix is non-singular, and (b) $\mathbf D^n = \mathbf Q^H \mathbf P^n \mathbf Q = \mathbf Q^H \mathbf I \mathbf Q = \mathbf I $

We know that $\lambda_1 = 1$, because $\mathbf {P1} = \mathbf 1$.  From here we can say: that $\lambda_1$ is the 1st root of unity - 1.  

Put differently $\lambda_1$ has polor coordinate (1, $2\pi \frac{(1 - 1) }{n}$) which is to say it has magnitude 1, and an angle of $0 \pi$ i.e. it is all real valued = 1.  

$\lambda_2$ has polar coordinate of (1 , $2\pi\frac{(2-1)}{n} $)  
$\lambda_3$ has polar coordinate of (1,  $2\pi\frac{(3-1)}{n} $)  
$\vdots$  
$\lambda_{n-1}$ has polar coordinate of (1, $2\pi\frac{(n-1 -1)}{n} $)  
$\lambda_n$ has polar coordinate of (1, $2\pi\frac{(n-1)}{n}$).  

There is a variant of the Pidgeon Hole principle here: we have have $n$ $\lambda_j$'s, each of which must be unique, and there are only $n$ unique nth roots of unity -- hence each nth root has one and only one $\lambda_j$ "in" it.  

Thus **the Vandermonde matrix in the following form is unitary**:

$\frac{1}{\sqrt n} \bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]= \frac{1}{\sqrt n} \begin{bmatrix}
1 & \lambda_1 & \lambda_1^2 & \dots  & \lambda_1^{n-1}\\ 
1 & \lambda_2 & \lambda_2^2 & \dots &  \lambda_2^{n-1} \\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \\ 
1 & \lambda_{n} & \lambda_{n}^{2} & \dots  & \lambda_{n}^{n-1}
\end{bmatrix}= \frac{1}{\sqrt n} \begin{bmatrix}
1 & 1 & 1 & \dots  & 1\\ 
1 & \lambda_2 & \lambda_2^2 & \dots &  \lambda_2^{n-1} \\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \\ 
1 & \lambda_{n} & \lambda_{n}^{2} & \dots  & \lambda_{n}^{n-1}
\end{bmatrix} = \mathbf F$

when each $\lambda_j$ has polar coordinate of (1, $2\pi\frac{(j-1)}{n} $)

**This is the Discrete Fourier Transform matrix**



**Aditional Note:**

When we look at 

$\mathbf V^H \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \mathbf P^2\mathbf e_j &\cdots & \mathbf P^{n-1}\mathbf e_j
\end{array}\bigg] = \mathbf{Diag}\big(\mathbf y\big)\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg] $

we can recongize that this is a form of the singular value decompostion (so long as we relax the constraint that the diagonal matrix is real-valued, non-negative) on our matrix $\mathbf X$, that is we have 

$\mathbf X = \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \mathbf P^2\mathbf e_j &\cdots & \mathbf P^{n-1}\mathbf e_j
\end{array}\bigg] = \mathbf V \Big(\mathbf{Diag} \big(\mathbf y\big)\Big) \bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg] $

Where we have $\mathbf X$ being decomposed into an unitary matrix (with apologies that we are calling it $\mathbf V$ and not $\mathbf U$) times a diagonal matrix times another unitary matrix.  It is also interesting that in the case of 

$\mathbf X = \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \mathbf P^2\mathbf e_j &\cdots & \mathbf P^{n-1}\mathbf e_j
\end{array}\bigg] = \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\mathbf s & \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]$

at least if $\mathbf s = \mathbf e_j$, our $\mathbf X$ is unitary and per Schur Inequality writeup, that means that $\mathbf X$ is normal.  Since $\mathbf X$ is normal, that means that (setting aside minor scaling adjustments)

$\mathbf V = \frac{1}{\sqrt(n)}\bigg[\begin{array}{c|c|c|c|c}
\mathbf D^0 \mathbf 1 & \mathbf D^1 \mathbf 1 & \mathbf D^2 \mathbf 1 &\cdots & \mathbf D^{n-1} \mathbf 1
\end{array}\bigg]^H = \mathbf F$ 

Which is another way of saying that our unitary Vandermonde matrix $\mathbf F$ is the collection of eigenvectors for $\mathbf X$.  This immediately motivates that question-- what if $\mathbf X$ was a function of permuting some different $\mathbf s$ -- would $\mathbf X$ still be normal?  The answer is yes, though it takes a little bit more work to show it.

From here we can notice things like 

$f(\mathbf e_1) + f(\mathbf e_2) = f(\mathbf e_1 + \mathbf e_2)$

by linearity.  Writtent terms of a matrix, this is:

$\bigg[\begin{array}{c|c|c|c}
\mathbf P^0\mathbf e_1 & \mathbf P^1\mathbf e_1 \cdots & \mathbf P^{n-1}\mathbf e_1
\end{array}\bigg] + \bigg[\begin{array}{c|c|c|c}
\mathbf P^0\mathbf e_2 & \mathbf P^1\mathbf e_2 & \cdots & \mathbf P^{n-1}\mathbf e_2
\end{array}\bigg] = \bigg[\begin{array}{c|c|c|c}
\mathbf P^0\big(\mathbf e_1 +  \mathbf e_2\big) & \mathbf P^1 \big(\mathbf e_1 + \mathbf e_2\big) & \cdots & \mathbf P^{n-1}\big(\mathbf e_1 + \mathbf e_2 \big)
\end{array}\bigg]$


again, if we introduce $\mathbf F$, what we get is


$\mathbf F \bigg[\begin{array}{c|c|c|c}
\mathbf P^0\mathbf e_1 & \mathbf P^1\mathbf e_1 & \cdots & \mathbf P^{n-1}\mathbf e_1
\end{array}\bigg] + \mathbf F \bigg[\begin{array}{c|c|c|c}
\mathbf P^0\mathbf e_2 & \mathbf P^1\mathbf e_2 &\cdots & \mathbf P^{n-1}\mathbf e_2
\end{array}\bigg]= \mathbf\Lambda_1\mathbf F + \mathbf\Lambda_2\mathbf F = \big(\mathbf \Lambda_1 + \mathbf \Lambda_2\big) \mathbf F = \bigg[\begin{array}{c|c|c|c}
\mathbf P^0\big(\mathbf e_1 +  \mathbf e_2\big) & \mathbf P^1 \big(\mathbf e_1 + \mathbf e_2\big) &\cdots & \mathbf P^{n-1}\big(\mathbf e_1 + \mathbf e_2 \big)
\end{array}\bigg]$

and perhaps if we added all of the standard basis vectors, we'd get the ones vector (which we know is actually an eigenvector of our permutation matrix, and turns the whole matrix into the ones matrix).  Where $\mathbf \Lambda_j$ is a diagonal matrix associated with the permutations in $f(\mathbf e_j)$.  (We will not introduce the scalar components, as I'm afraid this posting may have overloaded $\lambda$ one too many times.)  

We can write this as:  

$\mathbf {11}^H = \Sigma_{j=1}^{n}\Big(\bigg[\begin{array}{c|c|c|c}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \cdots & \mathbf P^{n-1}\mathbf e_j
\end{array}\bigg]\Big) = \bigg[\begin{array}{c|c|c|c}
\mathbf P^0\big(\Sigma_{j=1}^{n} \mathbf e_j \big) & \mathbf P^1 \big(\Sigma_{j=1}^{n} \mathbf e_j \big) & \cdots & \mathbf P^{n-1}\big(\Sigma_{j=1}^{n} \mathbf e_j \big)
\end{array}\bigg]$  

and if we left multiply by $\mathbf F$, we get

$\mathbf F \big(\mathbf {11}^H \big) = \Sigma_{j=1}^{n}\Big(\mathbf F\bigg[\begin{array}{c|c|c|c}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \cdots & \mathbf P^{n-1}\mathbf e_j
\end{array}\bigg]\Big) = \Sigma_{j=1}^{n} \mathbf \Lambda_j \mathbf F = \big(\Sigma_{j=1}^{n}\mathbf \Lambda_j\big) \mathbf F$  

we could also scale each standard basis vector by some arbitrary amount, $\alpha_j$ getting us  

$\Sigma_{j=1}^{n}\Big(\alpha_j \bigg[\begin{array}{c|c|cc}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \cdots & \mathbf P^{n-1} \mathbf e_j
\end{array}\bigg]\Big) = \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big) & \mathbf P^1 \big(\Sigma_{j=1}^{n}\alpha_j \mathbf e_j \big) & \cdots & \mathbf P^{n-1}\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big)
\end{array}\bigg]$

again, left multiply this expression by $\mathbf F$ and we see

$\mathbf F\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big) & \mathbf P^1 \big(\Sigma_{j=1}^{n}\alpha_j \mathbf e_j \big) & \cdots & \mathbf P^{n-1}\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big)
\end{array}\bigg] = \Sigma_{j=1}^{n}\Big(\alpha_j \mathbf F\bigg[\begin{array}{c|c|cc}
\mathbf P^0\mathbf e_j & \mathbf P^1\mathbf e_j & \cdots & \mathbf P^{n-1} \mathbf e_j
\end{array}\bigg]\Big)$

$\mathbf F\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big) & \mathbf P^1 \big(\Sigma_{j=1}^{n}\alpha_j \mathbf e_j \big) & \cdots & \mathbf P^{n-1}\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big)
\end{array}\bigg] = \Sigma_{j=1}^{n} \mathbf \alpha_j \mathbf \Lambda_j \mathbf F = \big(\Sigma_{j=1}^{n}\mathbf \alpha_j \mathbf\Lambda_j\big) \mathbf F$

We right multiply each side by $\mathbf F^H$ and see 

$\mathbf F\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big) & \mathbf P^1 \big(\Sigma_{j=1}^{n}\alpha_j \mathbf e_j \big) & \cdots & \mathbf P^{n-1}\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big)
\end{array}\bigg] \mathbf F^H = \big(\Sigma_{j=1}^{n}\mathbf \alpha_j \mathbf \Lambda_j\big) $

which is to say that our matrix is unitarily similar to a diagonal matrix, and the mutually orthonormal eigenvectors are contained in $\mathbf F$.  

Now consider the more general case where $\mathbf X = f(\mathbf s)$.  In coordinates this looks quite formidable, and the circulant matrix is given by: 

$\mathbf X = \bigg[\begin{array}{c|c|c|c}
\mathbf P^0\mathbf s & \mathbf P^1\mathbf s & \cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]  = \begin{bmatrix}
s_0 & s_{n-1} & s_{n-2} & \dots & s_2 & s_1 \\ 
s_1 & s_0 & s_{n-1} & \dots & s_3 & s_2 \\ 
s_2 & s_1 & s_0 & \dots & s_4 & s_3 \\
\vdots & \vdots  & \vdots &\ddots & \dots & \vdots\\ 
s_{n-2} & s_{n-3} & s_{n-4} & \dots & s_0  & s_{n-1} \\ 
s_{n-1} & s_{n-2}  & s_{n-3} & \dots & s_1 &  s_0
\end{bmatrix}$

But, we simply need to recall that the standard basis vectors are in fact a basis, so we can write $\mathbf s$ in terms of them.  That is,

$\mathbf s = \alpha_1 \mathbf e_1 + \alpha_2 \mathbf e_2 + ... + \alpha_n \mathbf e_n  = \Sigma_{j=1}^{n} \alpha_j \mathbf e_j$

Thus we have 


$\mathbf X = \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big) & \mathbf P^1 \big(\Sigma_{j=1}^{n}\alpha_j \mathbf e_j \big) & \cdots & \mathbf P^{n-1}\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big)
\end{array}\bigg]$

left multiply each side by $\mathbf F$ and right multiply each side by $\mathbf F^H$, and we get 

$\mathbf F \big(\mathbf X\big) \mathbf F^H = \mathbf F \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big) & \mathbf P^1 \big(\Sigma_{j=1}^{n}\alpha_j \mathbf e_j \big) & \cdots & \mathbf P^{n-1}\big(\Sigma_{j=1}^{n} \alpha_j \mathbf e_j \big)
\end{array}\bigg] \mathbf F^H = \big(\Sigma_{j=1}^{n}\mathbf \alpha_j \mathbf \Lambda_j\big)$

which proves that $\mathbf X = f(\mathbf s)$ is unitarily similar to a diagonal matrix, with $\mathbf F$ forming the basis of mutually orthonormal eigenvectors, which completes the proof the $\mathbf X$ is unitarily diagonalizable. 



# below is some misc. junk

Probably delete the stuff in this cell

now consider the two matrices given by $\big( \mathbf X^H \mathbf X\big)$ and $\big( \mathbf{XX}^H\big)$.  The former gives the L2 norm, squared, of the vector $\mathbf s$ (one column at a time) in each diagonal element, and the the latter gives the L2 norm, squared, of the vector $\mathbf s$ (one row at a time) in each diagonal element.  

Thus, when i = j, we have 

$\big( \mathbf X^H \mathbf X\big)_{i,j}$ and $\big( \mathbf{XX}^H\big)_{i,j}$

Now, for the off diagonal case. 

$\mathbf X =  \begin{bmatrix}
s_0 & s_{n-1} & s_{n-2} & \dots & s_2 & s_1 \\ 
s_1 & s_0 & s_{n-1} & \dots & s_3 & s_2 \\ 
s_2 & s_1 & s_0 & \dots & s_4 & s_3 \\
\vdots & \vdots  & \vdots &\ddots & \dots & \vdots\\ 
s_{n-2} & s_{n-3} & s_{n-4} & \dots & s_0  & s_{n-1} \\ 
s_{n-1} & s_{n-2}  & s_{n-3} & \dots & s_1 &  s_0
\end{bmatrix}$

and 

$\mathbf X^H = \begin{bmatrix}
\bar{s_0} & \bar{s_{1}} & \bar{s_{2}} & \dots & \bar{s_{n-2}} & \bar{s_{n-1}} \\ 
\bar{s_{n-1}} & \bar{s_0} & \bar{s_{1}} & \dots & \bar{s_{n-3}} & \bar{s_{n-2}} \\ 
\bar{s_{n-2}} & \bar{s_{n-1}} & \bar{s_0} & \dots & \bar{s_{n-4}} & \bar{s_{n-3}} \\
\vdots & \vdots  & \vdots &\ddots & \dots & \vdots\\ 
\bar{s_{2}} & \bar{s_{3}} & \bar{s_{4}} & \dots & \bar{s_0}  & \bar{s_{1}} \\ 
\bar{s_{1}} & \bar{s_{2}}  & \bar{s_{3}} & \dots & \bar{s_{n-1}} & \bar{s_0}
\end{bmatrix}$  


$\big(\mathbf X^H \mathbf X_{i,j}\big) = ?$



In [None]:
# I'll need to do stuff from 


# Almost everything on the below is irrelevant
# it gets us to an SVD decomposition, but I did that much earlier on already,
# so I need to think about this some more.... what's the deal here? 
# if there is an easy-ish way to show that the circulant matrix is normal... that is really what I want here... not sure how to do that....?

# Alternatively just do some calculations to prove that each column in F is an eigenvector, not sure really of the best way... I badly want something more elegant but I am thinking that I may not find it, which is really too bad.


** more from Permutation matrices**

*technical note:* this section is a bit loose with normalizing constants -- as it is expedient to do so.  The little nits required so that eigenvectors have a length 2 norm = 1 (instead of square root of N) will be cleaned up at the end.



Because our permutation matrix is related to a connected graph, we know that given any start position $\mathbf e_j$, we will have vistited every location once after $n$ iterations.

Thus we say

$\big(\mathbf P^0  + \mathbf P^1 + \mathbf P^2 ... + \mathbf P^{n-1}\big)\mathbf e_j = \mathbf 1$

for $j = \{1, 2, ..., n-1, n\}$

which is to say that over all $\mathbf e_j$

Now consider the standard basis vectors $\mathbf e_j$ $j \in \{1, 2, ..., n-1, n\}$ -- i.e. column slices of the identity matrix, shown below:

$\big(\mathbf P^0  + \mathbf P^1 + \mathbf P^2 ... + \mathbf P^{n-1}\big)\bigg[\begin{array}{c|c|c|c}
\mathbf e_1 & \mathbf  e_2 &\cdots &\mathbf  e_n
\end{array}\bigg] = \big(\mathbf P^0  + \mathbf P^1 + \mathbf P^2 ... + \mathbf P^{n-1}\big)\mathbf I = \bigg[\begin{array}{c|c|c|c}
\mathbf 1 & \mathbf 1 &\cdots &\mathbf  1
\end{array}\bigg]= \mathbf{11}^H$

hence we say that:

$\big(\mathbf P^0  + \mathbf P^1 + \mathbf P^2 ... + \mathbf P^{n-1}\big) = \mathbf{11}^H = \mathbf V \big(\mathbf D^0  + \mathbf D^1 + \mathbf D^2 ... + \mathbf D^{n-1}\big) \mathbf V^H$

The first remarkable thing is that $\mathbf{11}^H $ is a rank one matrix, and we know that $\mathbf 1$ is an eigenvector (or we may say that the first eigenvector is $\propto \mathbf 1$ if we are worried about scaling its L2 norm).

addtionally, 

$trace\Big(\mathbf P^0  + \mathbf P^1 + \mathbf P^2 ... + \mathbf P^{n-1}\big) = trace\big(\mathbf{11}^H\Big) = trace\Big(\big(\mathbf D^0  + \mathbf D^1 + \mathbf D^2 ... + \mathbf D^{n-1}\big)\Big) = n$

Thus the sum of these permutation matrices is equal to a rank one matrix with one nonzero eigenvalue = $n$.  

From here we may easily see that this means all $\lambda_k$ for $k \geq 2$ on $\mathbf D$ are $=0$ because $\Sigma_{i=0}^{n-1}\lambda_1 = \Sigma_{i=0}^{n-1} 1  = n $, and all other summations must be equal to zero, or else this would not be a rank one matrix.

Hence we re-examine our earlier equation and see:

$\mathbf{11}^H = \mathbf V \big(\mathbf D^0  + \mathbf D^1 + \mathbf D^2 ... + \mathbf D^{n-1}\big) \mathbf V^H = n \mathbf V \big(\mathbf e_1 \mathbf e_1^H\big) \mathbf V^H = n*\lambda_1*\mathbf{v_1 v_1}^H = n*\mathbf{v_1 v_1}^H$
 
where $\mathbf v_1$ is the first eigenvector for $\mathbf P$, and $\mathbf v_1 \propto \mathbf 1$.
 

another way of representing $\mathbf D$, using polar coordinates is the following:


$\mathbf D= \mathbf{Diag}\big(\lambda_1, \lambda_2, \lambda_3, ... , \lambda_{n-1}, \lambda_{n}\big) = \mathbf{Diag}\big((1 , 2\pi\frac{(0)}{n}), (1 , 2\pi\frac{(1)}{n}), (1,  2\pi\frac{(2)}{n}), ..., (1, 2\pi\frac{(n-2)}{n}), (1, 2\pi\frac{(n-1)}{n})\big)$  


If we multiplied 

**TBC: using Eulers formula... am I doing the rotations correct? or are they counter clockwise / the other way?**




What I want to say here is: 

$\lambda_1 \mathbf D = 1*\mathbf D_1 = \mathbf{Diag}\big(\lambda_1, \lambda_2, \lambda_3, ... , \lambda_{n-1}, \lambda_{n}\big) = \mathbf{Diag}\big((1 , 2\pi\frac{(0)}{n}), (1 , 2\pi\frac{(1)}{n}), (1,  2\pi\frac{(2)}{n}), ..., (1, 2\pi\frac{(n-2)}{n}), (1, 2\pi\frac{(n-1)}{n})\big)$  

and further that:

# it may "rotate" the other way though... to be confirmed....

-----------------------------------------
$\lambda_2 \mathbf D = \mathbf D_2 = \mathbf{Diag}\big(\lambda_2, \lambda_3, ... , \lambda_{n-1}, \lambda_{n}, \lambda_1\big) = \mathbf{Diag}\big((1 , 2\pi\frac{(1)}{n}), (1,  2\pi\frac{(2)}{n}), ..., (1, 2\pi\frac{(n-2)}{n}),(1, 2\pi\frac{(n-1)}{n}),(1 , 2\pi\frac{(0)}{n}), \big)$  

$\lambda_2^2 \mathbf D = \mathbf D_3 = \mathbf{Diag}\big(\lambda_3, ... , \lambda_{n-1}, \lambda_{n}, \lambda_1, \lambda_2, \big) = \mathbf{Diag}\big((1,  2\pi\frac{(2)}{n}), ..., (1, 2\pi\frac{(n-2)}{n})\big), (1, 2\pi\frac{(n-1)}{n}),(1 , 2\pi\frac{(0)}{n}), (1 , 2\pi\frac{(1)}{n}), \big) = \lambda_3 \mathbf D$    

-----------------------------------------

and so on


notice that $\mathbf D_2$ is just $\mathbf D_1$ with each of the eigenvalues permuted once, $\mathbf D_3$ is just $\mathbf D_1$ with each of the eigenvlaues permuted twice, and so on.

my claim is that: 

$\mathbf V \big(\mathbf \lambda_2^0 \mathbf D^0  + \lambda_2^1 \mathbf D^1 + \lambda_2^2 \mathbf D^2 ... + \lambda_2^{n-1} \mathbf D^{n-1}\big) \mathbf V^H = n \mathbf V \big(\mathbf e_2 \mathbf e_2^H\big) \mathbf V^H = n*\mathbf{v_2 v_2}^H$

$\mathbf V \big(\mathbf \lambda_3^0 \mathbf  D^0  + \lambda_3^1 \mathbf D^1 + \lambda_3^2 \mathbf D^2 ... + \lambda_3^{n-1} \mathbf D^{n-1}\big) \mathbf V^H = n \mathbf V \big(\mathbf e_3 \mathbf e_3^H\big) \mathbf V^H = n*\mathbf{v_3 v_3}^H$

and in general:

$\mathbf V \big(\mathbf \lambda_k^0 \mathbf D^0  + \lambda_k^1 \mathbf D^1 + \lambda_k^2 \mathbf D^2 ... + \lambda_k^{n-1} \mathbf D^{n-1}\big) \mathbf V^H = n \mathbf V \big(\mathbf e_k \mathbf e_k^H\big) \mathbf V^H = n*\mathbf{v_k v_k}^H$


for our permutation matrix $\mathbf P$ for each permutation over some arbitrary vector $\mathbf s$, what we have is:  

$\mathbf V \big(\mathbf \lambda_k^0 \mathbf D^0  + \lambda_k^1 \mathbf D^1 + \lambda_k^2 \mathbf D^2 ... + \lambda_k^{n-1} \mathbf D^{n-1}\big) \mathbf V^H = n \mathbf V \big(\mathbf e_k \mathbf e_k^H\big) \mathbf V^H = n*\mathbf{v_k v_k}^H$



$\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 & \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]\mathbf 1 = n * \mathbf v_1 \big(\mathbf v_1^H \mathbf s \big)$

Since $\mathbf v_1 \propto \mathbf 1$, this tells use that $\mathbf bf 1$ is an eigenvector for the above matrix.  

now consider the case where we do:

$\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s & \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]\begin{bmatrix}
\lambda_2^0 \\ 
\lambda_2^1 \\ 
\lambda_2^2 \\ 
\vdots \\ 
\lambda_2^{n-1}
\end{bmatrix} = \mathbf V \big( \lambda_2^0 \mathbf D^0 + \lambda_2^1 \mathbf D^1  + \lambda_2^2 \mathbf D^2  + ... + \lambda_2^{n-1}\mathbf D^{n-1}\big) \mathbf V^H \mathbf s = n * \mathbf v_2 \big(\mathbf v_2^H \mathbf s \big)$

and in general

$\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s& \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]\begin{bmatrix}
\lambda_k^0 \\ 
\lambda_k^1 \\ 
\lambda_k^2 \\ 
\vdots \\ 
\lambda_k^{n-1}
\end{bmatrix} = \mathbf V \big( \lambda_k^0 \mathbf D^0 + \lambda_k^1 \mathbf D^1  + \lambda_k^2 \mathbf D^2  + ... + \lambda_k^{n-1}\mathbf D^{n-1}\big) \mathbf V^H \mathbf s = n * \mathbf v_k \big(\mathbf v_k^H \mathbf s \big)$



In vectorized form, this becomes


$\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s& \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]
\begin{bmatrix}
1 & 1 & 1 & \dots  & 1\\ 
1 & \lambda_2 & \lambda_3 & \dots &  \lambda_{n} \\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \\ 
1 & \lambda_{2}^{n-1} & \lambda_{3}^{n-1} & \dots  & \lambda_{n}^{n-1}
\end{bmatrix}  = \bigg[\begin{array}{c|c|c|c}
\mathbf v_1 & \mathbf v_2 & \cdots & \mathbf v_n
\end{array}\bigg] \begin{bmatrix}
n \mathbf v_1^H \mathbf s & 0 & \dots  &0 \\ 
0 & n \mathbf v_2^H \mathbf s & \dots &0 \\ 
0 & \vdots &\ddots & 0\\ 
0 & 0 & \dots & n \mathbf v_n^H \mathbf s
\end{bmatrix}$

At this point, we, at a minimum, have the components for the SVD of this matrix on the left -- the circulant matrix (**clean this up and clearly define a circulant matrix earlier on...**).

is there an easy way to prove that the circulant matrix is in fact normal?

note that 

$\Big \Vert \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s& \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg] \Big \Vert_F^2 = n \big \Vert \mathbf s \big \Vert_2^2 $

I need to either get the eigs for 
$\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s& \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]$

or get the eigs for 
$\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s& \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s& \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]$


or....

perhaps I could prove that 

$\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s& \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]^H\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s& \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg] = \bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s& \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]\bigg[\begin{array}{c|c|c|c|c}
\mathbf P^0 \mathbf s& \mathbf P^1\mathbf s & \mathbf P^2\mathbf s &\cdots & \mathbf P^{n-1}\mathbf s
\end{array}\bigg]^H$

which would make it normal... not sure how to prove that though.... I also haven't derived this relationship via Schur's Inequality as meaning normal matrix just yet.... 

this final bit of my analysis seems to be getting quite tortured....  may I can find a cool hidden inequality / bound to use... otherwise I may need to just do direct verification?

- - - - -

an alternative approach at this juncture would be to note that $\mathbf s$ can be written in the coordinates of the standard basis vector, and do a decomposition....
- - - - -


$\begin{bmatrix}
n \mathbf v_1^H \mathbf s & 0 & \dots  &0 \\ 
0 & n \mathbf v_2^H \mathbf s & \dots &0 \\ 
0 & \vdots &\ddots & 0\\ 
0 & 0 & \dots & n \mathbf v_n^H \mathbf s
\end{bmatrix}$

Ignoring conjugation signs for the moment... see that I could do something like this instead:


$\frac{1}{\sqrt n} \begin{bmatrix}
1 & 1 & 1 & \dots  & 1\\ 
1 & \lambda_2 & \lambda_3 & \dots &  \lambda_{n} \\ 
\vdots & \vdots & \vdots & \ddots & \vdots & \\ 
1 & \lambda_{2}^{n-1} & \lambda_{3}^{n-1} & \dots  & \lambda_{n}^{n-1}
\end{bmatrix} = \mathbf F^H or \mathbf F^T or ?$

In [33]:
import numpy as np

In [34]:
P = np.zeros((6,6))
P[1,0] = 1
P[2,1] = 1
P[3,2] = 1
P[4,3] = 1
P[5,4] = 1


P[0,-1] = 1
P

array([[ 0.,  0.,  0.,  0.,  0.,  1.],
       [ 1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.]])

In [35]:
c = np.array([4,3,2,1])
c

array([4, 3, 2, 1])

In [37]:
eigval, eigvect = np.linalg.eig(P)

In [39]:
mylam = eigval[1]
mylam

(-0.50000000000000011+0.8660254037844386j)

In [40]:
eigvect.conj().T @ eigvect

array([[  1.00000000e+00 +0.00000000e+00j,
          4.30211422e-16 -1.94289029e-16j,
          4.30211422e-16 +1.94289029e-16j,
         -1.80411242e-16 +3.05311332e-16j,
         -1.80411242e-16 -3.05311332e-16j,
          6.66133815e-16 +0.00000000e+00j],
       [  4.30211422e-16 +1.94289029e-16j,
          1.00000000e+00 +0.00000000e+00j,
          3.33066907e-16 -2.77555756e-17j,
          2.08166817e-16 -1.94289029e-16j,
          4.16333634e-17 -1.66533454e-16j,
          1.38777878e-17 +1.38777878e-16j],
       [  4.30211422e-16 -1.94289029e-16j,
          3.33066907e-16 +2.77555756e-17j,
          1.00000000e+00 +0.00000000e+00j,
          4.16333634e-17 +1.66533454e-16j,
          2.08166817e-16 +1.94289029e-16j,
          1.38777878e-17 -1.38777878e-16j],
       [ -1.80411242e-16 -3.05311332e-16j,
          2.08166817e-16 +1.94289029e-16j,
          4.16333634e-17 -1.66533454e-16j,
          1.00000000e+00 +0.00000000e+00j,
         -5.55111512e-17 +3.05311332e-16j,
        

In [69]:
n = P.shape[0]

for idx in range(eigval.shape[0]):
    print("\n - - - - \n ")
    mylam = eigval[idx]
    my_collector = np.zeros((n,n), dtype = np.complex128)
    for m_iteration in range(n):      
#         newP = eigvect @ (mylam*np.diag(eigval))**m_iteration @ eigvect.conj().T
        newP = np.linalg.matrix_power(P*mylam, m_iteration)
        my_collector += newP
    for row in my_collector:
        print(np.round((row),2))
    print(np.linalg.svd(my_collector, False, False))


 - - - - 
 
[ 1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j]
[-1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j]
[ 1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j]
[-1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j]
[ 1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j]
[-1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j]
[  6.00000000e+00   2.28119570e-15   1.89842863e-15   1.13761465e-15
   1.04622581e-15   9.73925254e-16]

 - - - - 
 
[ 1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j]
[-0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j]
[-0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j  ]
[ 1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j]
[-0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j]
[-0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j  ]
[  6.00000000e+00   1.26309442e-15   8.52805019e-16   7.48204388e-16
   5.10438514e-16   4.10477550e-16]

 - - - - 
 
[ 1.0+0.j   -0.5+0.87j -0.5-0.87j  1.0-0.j 

In [65]:
n = P.shape[0]

for idx in range(eigval.shape[0]):
    print("\n - - - - \n ")
    mylam = eigval[idx]
#     mylam = 1
    my_collector = np.zeros((n,n), dtype = np.complex128)
    for m_iteration in range(n):
        
#         newP = eigvect @ (mylam*np.diag(eigval))**m_iteration @ eigvect.conj().T
        newP = np.linalg.matrix_power(P*mylam, m_iteration)
        my_collector += newP
    for row in my_collector:
        print(np.round((row),2))



 - - - - 
 
[ 1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j]
[-1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j]
[ 1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j]
[-1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j]
[ 1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j]
[-1.+0.j  1.+0.j -1.+0.j  1.+0.j -1.+0.j  1.+0.j]

 - - - - 
 
[ 1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j]
[-0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j]
[-0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j  ]
[ 1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j]
[-0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j]
[-0.5-0.87j -0.5+0.87j  1.0+0.j   -0.5-0.87j -0.5+0.87j  1.0+0.j  ]

 - - - - 
 
[ 1.0+0.j   -0.5+0.87j -0.5-0.87j  1.0-0.j   -0.5+0.87j -0.5-0.87j]
[-0.5-0.87j  1.0+0.j   -0.5+0.87j -0.5-0.87j  1.0-0.j   -0.5+0.87j]
[-0.5+0.87j -0.5-0.87j  1.0+0.j   -0.5+0.87j -0.5-0.87j  1.0-0.j  ]
[ 1.0-0.j   -0.5+0.87j -0.5-0.87j  1.0+0.j   -0.5