# Applications (10)
## Graphs and Networks (10.1)

[Lecture here](https://youtu.be/2IdtqGM6KWU)

A **graph** consists of **nodes** connected by **edges**.

The **incidence matrix** of a graph tells us how $n$ nodes are connected by $m$ edges.

By focusing on incidence matrices, the laws of linear algebra become [Kirchoff's laws](https://en.wikipedia.org/wiki/Kirchhoff%27s_circuit_laws).

Each entry of an incidence matrix is 0 or 1 or -1.  This continues to apply after elimination.  All four subspaces in fact will use these simple components.

### The Incidence Matrix

If node one has an arrow to node two, the corresponding row for that edge will have a -1 in column 1 and a 1 in column 2.  The negative number means the arrow is going out from node 1, and the positive number means it's going into node 2.

![image.png](images/incidence-matrix-graph.png)

This graph is **complete**-- every pair of nodes is connected by an edge.

A graph with no closed loops is called a **tree**.

The maximum number of edges is $\frac{1}{2}n(n-1)$ and the minimum to connect all nodes in some way is $n-1$.

Elimination reduces every graph to a tree.  Loops produce dependent rows in $A$ and zero rows in echelon forms $U$ and $R$.

When $x$ is a vector of voltages at the nodes, $Ax$ gives the voltage differences.

**Kirchoff's Voltage Law**: The components of $Ax=b$ add to zero around every loop.

**Kirchoff's Current Law**: $A^Ty = 0$.  Flow in equals flow out at each node.

The incidence matrix $A$ comes from a connected graph with $n$ nodes and $m$ edges. The row space and column space have dimensions $r = n - 1$.  The nullspaces of $A$ and $A^T$ have dimensions 1 and $m-n + 1$.

* $N(A)$ - The constant vectors $(c, c, \dots, c)$ make up the nullspace of A.  dim = 1.
* $C(A^T)$ - The edges of any tree give $r$ dependent rows of $A$: $r = n - 1$.
* $C(A)$ - Voltage Law: The components of $Ax$ add to zero around all loops: dim = $n - 1$.
* $N(A^T)$ - Current Law: $A^Ty = \text{flow in - flow out} = 0$ is solved by loop currents.  There are $m - r = m - n + 1$ independent small loops in the graph.

For every graph in a plane, linear algebra yields **Euler's formula**:

$$\text{(number of nodes) - (number of edges) + (number of small loops)} = 1$$

This is saying $(n) - (m) + (m - n + 1) = 1$

When we have a current source, Kirchoff's Current Law changes from $A^Ty = 0$ to $A^Ty = f$, to balanace the source $f$ from outside.  Flow into each node still equals flow out.  

### Voltages and Currents and $A^TAx = f$

[this is rather complex, will see if he covers it in lecture]

### Networks and $A^TCA$

**Conductance** is the inverse of **resistance**, and measures how easily flow gets through.

A **network** is graph that has conductance at each edge.  These numbers go into the conductance matrix $C$, which is diagonal.

**Ohm's Law**: $\text{Current along edge = conductance times voltage difference.}$

Ohm's Law for all $m$ currents is $y = -CAx$.  The vector $Ax$ gives the potential differences, and $C$ multiplies by the conductances.

Combining Ohm's Law with Kirchoff's Current Law we get $A^TCAx = 0$.

## Markov Matrices, Population, and Economics (10.3)

Covered in [this lecture](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-24-markov-matrices-fourier-series-1/).

This section is about **positive matrices**: every $a_{ij} > 0$.  The key fact is quick to state: The largest eigenvalue is real and positive and so is its eigenvector.  In economics and ecology and population dynamics, this fact leads a long way.  The max lambda value $\lambda_{\text{max}}$ controls the powers of $A$.  We will see this first for $\lambda_{\text{max}} = 1$.

### Markov Matrices (10.3)

Multiply a positive vector $u_0$ again and again by this matrix $A$:

$$
\begin{aligned}
& \text { Markov } \\
& \text { matrix }
\end{aligned} \quad A=\left[\begin{array}{ll}
.8 & .3 \\
.2 & .7
\end{array}\right] \quad u_1=A u_0 \quad u_2=A u_1=A^2 u_0
$$

After $k$ steps we have $A^ku_0$.  The vectors $u_1, u_2, u_3, \dots$ will approach a "steady state" $u_\infty = (.6, .4)$.  This final outcome does not depend on the starting vector $u_0$.  For every $u_0 = (a, 1-a)$ we converge to the same $u_\infty$.  The question is why.

The steady state equation $Au_\infty = u_\infty$ makes $u_\infty$ an eigenvector with eigenvalue 1:

$$
\text { Steady state } \quad\left[\begin{array}{ll}
.8 & .3 \\
.2 & .7
\end{array}\right]\left[\begin{array}{l}
.6 \\
.4
\end{array}\right]=\left[\begin{array}{l}
.6 \\
.4
\end{array}\right]=u_{\infty}
$$

Multiplying by $A$ does not change $u_\infty$.  But this does not explain why so many vectors $u_0$ lead to $u_\infty$.  Other examples might have a steady state, but it is not necessarily attractive:

$$
\text { Not Markov } B=\left[\begin{array}{ll}
1 & 0 \\
0 & 2
\end{array}\right] \text { has the unattractive steady state } B\left[\begin{array}{l}
1 \\
0
\end{array}\right]=\left[\begin{array}{l}
1 \\
0
\end{array}\right] \text {. }
$$

In this case, the starting vector $u_0 = (0,1)$ will giv $u_1 = (0,2)$ and $u_2 = (0,4)$.  The second components are doubled.  In the language of eigenvalues, $B$ has $\lambda = 1$ but also $\lambda = 2$-- this produces instability.  The component of $u$ along that unstable eigenvector is multiplied by $\lambda$, and $|\lambda| > 1$ means blowup.

This section is about two special properties of $A$ that guarantee a stable steady state.  These properties define a positive **Markov matrix**, and $A$ above is one particular example.

For a Markov matrix:
1. Every entry of $A$ is positive: $A_{ij} > 0$
2. Every column of $A$ adds to $1$.

Column 2 of $B$ adds to 2, not 1.  When $A$ is a Markov matrix, two facts are immediate: 
Because of #1: Multiplying $u_0 \ge 0$ by $A$ produces a nonnegative $u_1 = Au_0 \ge 0$.
Because of #2: If the components of $u_0$ add to $1$, so do the components of $u_1 = Au_0$.

Reason: The components of $u_0$ add to 1 when $\begin{bmatrix}1 & \dots & 1\end{bmatrix}u_0 = 1$.  This is true for each column of $A$ by Property 2.  Then by matrix multiplication $\begin{bmatrix}1 & \dots & 1\end{bmatrix}A = \begin{bmatrix}1 & \dots & 1\end{bmatrix}$:

$$
\text { Components of } A u_0 \text { add to } 1 \quad\left[\begin{array}{lll}
1 & \ldots & 1
\end{array}\right] A u_0=\left[\begin{array}{lll}
1 & \ldots & 1
\end{array}\right] u_0=1 .
$$

The same facts apply to $u_2 = Au_1$ and $u_3 = Au_2$.  Every vector $A^ku_0$ is nonnegative with components adding to 1.  These are **probability vectors**.  The limit $u_\infty$ is also a probability vector-- but we have to prove that there is a limit.  We will show that $\lambda_{\text{max}}=1$ for a positive Markov matrix.

Let's look at an example.  The fraction of rental cars in Denver starts at $\frac{1}{50} = .02$.  The fraction outside denver is .98.  Every month, 80% of the Denver cars stay in Denver (and 20% leave).  Also 5% of the outside cars come in (95% stay outside).  This means that the fractions $u_0 = (.02, .98)$ are multiplied by $A$:

$$
\text { First month } \quad A=\left\lfloor\begin{array}{ll}
.80 & .05 \\
.20 & .95
\end{array}\right\rfloor \quad \text { leads to } \quad u_1=A u_0=A\left\lfloor\left[\begin{array}{l}
.02 \\
.98
\end{array}\right\rfloor=\left\lfloor\begin{array}{l}
.065 \\
.935
\end{array}\right\rfloor\right.
$$

Notice that .065 + .935 = 1.  All cars are accounted for.  Each step multiplies by $A$:

$$
\text { Next month } \quad \boldsymbol{u}_2=A \boldsymbol{u}_1=(.09875, .90125) \text {. This is } A^2 \boldsymbol{u}_0 \text {. }
$$

All these vectors are positive because $A$ is positive.  Each vector $u_k$ will have its components adding to 1.  The first component has grown from .02 and cars are moving towards Denver.  What happens in the long run?

This section involves powers of matrices.  The understanding of $A^k$ was our first and best application of diagonalization.  Where $A^k$ can be complicated, the diagonal matrix $\Lambda^k$ is simple.  The eigenvector matrix $X$ connects them: $A^k$ equals $X\Lambda^kX^{-1}$.  The new application to Markov matrices uses the eigenvalues (in $\Lambda$) and the eigenvectors (in X).  We will show that $u_\infty$ is an eigenvector of $A$ correspnding to $\lambda = 1$.

Since every column of $A$ adds to 1, nothing is lost or gained.  We are mvoing rental cars or populations, and no cars or people suddenly appear (or disappear).  The fractions add to 1 and the matrix $A$ keeps them that way.  The question is how they are distributed afer $k$ time periods-- which leads us to $A^k$.

Solution: $A^ku_0$ gives the fractions in and out of Denver after $k$ steps.  We diagonalize $A$ to understand $A^k$.  The eigenvalues are $\lambda = 1$ and .75.

$$
A x=\lambda x \quad A\left[\begin{array}{l}
.2 \\
.8
\end{array}\right]=1\left[\begin{array}{l}
.2 \\
.8
\end{array}\right] \quad \text { and } \quad A\left[\begin{array}{r}
-1 \\
1
\end{array}\right]=.75\left[\begin{array}{r}
-1 \\
1
\end{array}\right]
$$

The starting vector $u_0$ combines $x_1$ and $x_2$, in this case with coefficents 1 and .18:

$$
\text { Combination of eigenvectors } \quad \boldsymbol{u}_0=\left[\begin{array}{l}
.02 \\
.98
\end{array}\right]=\left[\begin{array}{l}
.2 \\
.8
\end{array}\right]+.18\left[\begin{array}{r}
-1 \\
1
\end{array}\right] \text {. }
$$

Now multiply by $A$ to find $u_1$.  The eigenvectors are multiplied by $\lambda_1 = 1$ and $\lambda_2 = .75$:

$$
\text { Each } x \text { is multiplied by } \lambda \quad \boldsymbol{u}_1=1\left\lfloor\begin{array}{l}
.2 \\
.8
\end{array}\right\rfloor+(.75)(.18)\left\lfloor\begin{array}{r}
-1 \\
1
\end{array}\right\rfloor \text {. }
$$

Every month, another $\lambda = .75$ multiplies the vector $x_2$. The eigenvector $x_1$ is unchanged:

$$
\text { After } k \text { steps } \quad \boldsymbol{u}_k=A^k \boldsymbol{u}_0=1^k\left[\begin{array}{l}
.2 \\
.8
\end{array}\right]+(.75)^k(.18)\left[\begin{array}{r}
-1 \\
1
\end{array}\right] \text {. }
$$

This equation reveals what happens.  The eigenvector $x_1$ with $\lambda = 1$ is the steady state.  the other eigenvector $x_2$ disappears because $|\lambda| < 1$.  The more steps we take, the closer we come to $u_\infty = (.2, .8)$.  In the limit, $\frac{2}{10}$ of the cars are in Denver, and $\frac{8}{10}$ are outside.  This is the pattern for Markov chains, even starting from $u_0 = (0,1)$:

If $A$ is a positive Markov matrix (entries $a_{ij} > 0$, each column adds to 1), then $\lambda_1 = 1$ is larger than any other eigenvalue.  The eigenvectors $x_1$ is the steady state-- $u_\infty = x_1$.

The first point is to see that $\lambda = 1$ is an eigenvalue of $A$.  Reason: Every column of $A - I$ adds to $1 - 1 = 0$.  The rows of $A - I$ add up to the zero row.  Those rows are linearly dependent, so $A - I$ is singular.  Its determinant is zero and $\lambda = 1$ is an eigenvalue.

The second point is that no eigenvalue can have $|\lambda| > 1$.  With such an eigenvalue, the powers $A^k^ would grow.  But $A^k$ is also a Markov matrix!  $A^k$ has positive entries still adding to 1-- and that leaves no room to get large.

A lot of attention is paid to the possibility that another eigenvalues $|\lambda| = 1$.

$A = \begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix} has no steady state because $\lambda_2 = -1$.  The second eigenvector $x_2 = (-1, 1)$ will be multiplied by $\lambda_2 = -1$ at every step-- and does not become smaller: No steady state.

Suppose the entries of $A$ or any power of $A$ are all positive--zero not allowed.  In this "regular" or "primitive" case, $\lambda = 1$ is strictly larger than any other eigenvalue.  Then it will reach steady state.

### Perron-Frobenius Theorem

One matrix theorem dominates this subject.  The Perron-Frobenius Theorem applies when all $a_{ij} \ge 0$.  There is no requirement that columns add to 1.

The theorem: For $A > 0$, all numbers in $Ax = \lambda_{\text max}x$ are strictly positive.

The proof is covered in the book, to be honest it doesn't make a huge amount of sense to me at this time.

### Population Growth

Divide the population into three age groups: age < 20, age 20 to 39, and age 40 to 59.  At year $T$ the sizes of those grups are $n_1,n_2,n_3$.  Twenty years later, the sizes have changed for three reasons: births, deaths, and getting older.

1. Reproduction: $n_1^{\text new} = F_1n_1 + F_2n_2 + F_3n_3$ gives a new generation
2. Survival: $n_2^{\text new} = P_1n_1$ and $n_3^{\text new} = P_2n_2$ gives the older generations

The fertility rates are $F_1, F_2, F_3$ ($F_2$ largest).  The _Leslie Matrix_ $A$ might look like this:

$$
\left[\begin{array}{c}
n_1 \\
n_2 \\
n_3
\end{array}\right]^{\text {new }}=\left[\begin{array}{ccc}
F_1 & F_2 & F_3 \\
P_1 & 0 & 0 \\
0 & P_2 & 0
\end{array}\right]\left[\begin{array}{l}
n_1 \\
n_2 \\
n_3
\end{array}\right]=\left[\begin{array}{ccc}
.04 & \mathbb{1} . \mathbb{1} & .01 \\
.98 & 0 & 0 \\
0 & .92 & 0
\end{array}\right]\left[\begin{array}{c}
n_1 \\
n_2 \\
n_3
\end{array}\right] .
$$

This is population projection in its simplest form, the same matrix $A$ at every step.  In a realistic model, $A$ will change with time (from the environment or internal factors).

The matrix has $A \ge 0$ but not $A \gt 0$.  The Perron-Frobenius theorem still aplies because $A^3 > 0$.  The largest eigenvalue is $\lambda_{\text{max}} \approx 1.06$.  You can watch the generatons move, starting from $n_2 = 1$ in the middle generaton.

A fast start would come from $u_0 = (0,1,0)$.  That middle group will reproduce 1.1 and also survive .92.

###  Linear Algebra in Economics: The Consumption Matrix

The **consumption matrix** tells how much of each input goes into a unit of output.  This describes the manufacturing side of the economy.

Say we have $n$ industries like chemicals, food, and oil.  To produce a unit of chemicals may require .2 units of chemicals, .3 units of food, and .4 units of oil.  Those numbers go into row 1 of the consumption matrix $A$:

$$
\left\lceil\begin{array}{c}
\text { chemical output } \\
\text { food output } \\
\text { oil output }
\end{array}\right\rceil=\left[\begin{array}{ccc}
.2 & .3 & .4 \\
.4 & .4 & .1 \\
.5 & .1 & .3
\end{array}\right\rceil\left[\begin{array}{c}
\text { chemical input } \\
\text { food input } \\
\text { oil input }
\end{array}\right\rceil
$$

Row 2 shows the inputs to produce food-- a heavy use of chemicals and food, not so much of oil.  Row 3 of $A$ shows the inputs consumed to refine a unit of oil.  The real consumption matrix for the United STates in 1958 contained 83 industries.  The models in the 1990's are much larger and more precise.  We chose a consumption matrix that has a convenient eigenvector.

Now comes the question: Can this economy meet demands $y_1, y_2, y_3$ for chemicals, food, and oil?  To do that, the inputs $p_1,p_2,p_3$ will have to be higher-- because part of $p$ is consumed in producing $y$.  The input is $p$ and the consumption is $Ap$, which leaves the output $p - Ap$.  This net production is what meets the demand $y$:

$$
\text { \textbf{Problem:} Find a vector } \boldsymbol{p} \text { such that } \quad \boldsymbol{p}-A \boldsymbol{p}=\boldsymbol{y} \quad \text { or } \quad \boldsymbol{p}=(I-A)^{-1} \boldsymbol{y} \text {. }
$$

Apparently the linear algebra question is whether $I - A$ is invertible.  But there is more to the problem.  The vector $y$ of required outputs is nonnegative, and so is $A$.  The production levels in $p = (I-A)^{-1}y$ must also be nonnegative.  The real question is: When is $(I - A)^{-1}$ a nonnegative matrix?

This is the test on $(I-A)^{-1}$ for a productive economy, which can meet any demand.  If $A$ is small compared to $I$, then $Ap$ is small compared to $p$.  There is plenty of output.  If $A$ is too large, then production consumes too much and the demand $y$ cannot be met.  

"Small" or "large" is decided by the largest eigenvalue $\lambda_1$ of $A$ (which is positive):

- If $\lambda_1 > 1\quad\text{then}\quad(I-A)^{-1}$ has negative entries
- If $\lambda_1 = 1\quad\text{then}\quad(I-A)^{-1}$ fails to exist
- If $\lambda_1 < 1\quad\text{then}\quad(I-A)^{-1}$ is nonnegative as desired

The main point is that last one.  The reasoning uses a nice formula for $(I-A)^{-1}$, which we give now.  The most important infinite series in mathematics is the **geometric series** $1 + x + x^2 + \cdots$.  This series adds up to $1/(1-x)$ provided $x$ lies between -1 and 1.  When $x = 1$ the series is $1 + 1 + 1 + \cdots = \infty$.  When $|x| \ge 1$ the terms $x^n$ don't go to zero and the series has no chance to converge.

The nice formula for $(I -A)^{-1}$ is the **geometric series of matrices**:

$$
(I-A)^{-1}=I+A+A^2+A^3+\cdots .
$$

If you multiply the series $S = I + A + A^2 + \cdots$ by $A$, you get the same series except for $I$.  Therefore $S - AS = I$, which is $(I - A)S = I$.  This series adds to $S = (I - A)^{-1}$ if it converges.  And it converges if all eigenvalues of $A$ have $|\lambda| < 1$.

In our case $A \ge 0$.  All terms of the series are nonnegative.  Its sum is is $(I - A)^{-1} \ge 0$.

$$
A=\left[\begin{array}{ccc}
.2 & .3 & .4 \\
.4 & .4 & .1 \\
.5 & .1 & .3
\end{array}\right] \text { has } \lambda_{\max }=.9 \text { and }(I-A)^{-1}=\frac{1}{93}\left[\begin{array}{ccc}
41 & 25 & 27 \\
33 & 36 & 24 \\
34 & 23 & 36
\end{array}\right]
$$

This economy is productive.  $A$ is small compared to $I$, because $\lambda_{\text{max}}$ is .9.  To meet the demand $y$, start from $p = (I - A)^{-1}y$.  Then $Ap$ is consumed in production, leaving $p - Ap$.  This is $(I-A)p = y$, and the demand is met.

## Fourier Series: Linear Algebra for Functions (10.5)

Covered in [this lecture](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-24-markov-matrices-fourier-series-1/).

This section goes from finite dimensions to infinite dimensions.  We want to explain linear algebra in infinite-dimensional space, and show that it still works.  First step: look back.  We baegan with vectors and dot products and linear combinations.  We begin by converting those basic ideas to the infinite case-- then the rest will follow.

What does it mean for a vector to have infinitely many components?  There are two different answers, both good:

1. The vector is infinitely long: $v = (v_1, v_2, v_3, \dots)$.  It could be $(1,\frac{1}{2},\frac{1}{4},\dots)$.
2. The vector is a function $f(x)$.  It could be $v = \sin{x}$.

We will go both ways.  The idea of a Fourier series will connect them.

After vectors come dot products.  The natural dot product of two infinite vectors $(v_1,v_2,\dots)$ and $(w_1,w_2,\dots)$ is an infinite series:

$$v \cdot w = v_1w_1 + v_2w_2 + \cdots$$

This brings up a new question, which never occured to us for vectors in $R^n$.  Does this infinite sum add up to a finite number?  Does the series converge?  Here is the first and biggest difference between finite and infinite.

When $v = w = (1,1,1,\dots)$, the sum certainly does not converge.  In that case $v \cdot w = 1 +  + 1 + \cdots$ is infinite.  Since $v$ equals $w$, we are really computing $v \cdot v = ||v||^2$, the length squared.  The vector $(1,1,1,\dots)$ has infinite length.  We don't want that vector.  Since we are making the rules, we don't have to include it.  The only vectors to be allowed are those with finite length:

Definition -- The vector $v = (v_1, v_2, \dots)$ and the function $f(x)$ are in our infinite-dimensional **Hilbert spaces** if and only if their lengths $||v||$ and $||f||$ are finite:

$$
\begin{aligned}
& \|\boldsymbol{v}\|^2=\boldsymbol{v} \cdot \boldsymbol{v}=v_1^2+v_2^2+v_3^2+\cdots \quad \text { must add to a finite number. } \\
& \|f\|^2=(f, f)=\int_0^{2 \pi}|f(x)|^2 d x \quad \text { must be a finite integral. } \\
&
\end{aligned}
$$

If $v$ and $w$ have finite length, how large can their dot product be?  The sum $v \cdot w = v_1w_1 + v_2w_2 + \cdots$ also adds to a finite number.  We can safely take dot products.  The Schwarz inequality is still true: 

$$|v \cdot w| \le ||v|| ||w||$$

The ratio of $v \cdot w$ to $||v|| ||w||$ is still the cosine of $\theta$ (the angle between $v$ and $w$).  Even in infinite-dimensional space, $|cos \theta|$ is not greater than 1.

Now change over to functions.  Those are the "vectors."  The space of functions $f(x), g(x), h(x), \dots$ defined for $0 \le x \le 2\pi$ must be somehow bigger than $R^n$.  What is the dot product of $f(x)$ and $g(x)$?  What is the length of $f(x)$?

Key point in the continuous case: Sums are replaced by integrals.  Instead of a sum of $v_j$ times $w_j$, the dot product is an integral of $f(x)$ times $g(x)$.  Change the "dot" to parentheses with a comma, and change the words "dot product" to inner product.  The **inner product** of $f(x)$ and $g(x)$, and the **length squared** of $f(x)$ are:

$$
(f, g)=\int_0^{2 \pi} f(x) g(x) d x \quad \text { and } \quad\|f\|^2=\int_0^{2 \pi}(f(x))^2 d x
$$

The interval $[0, 2\pi]$ where the functions are defined could change to a differental interval like $[0, 1]$ or $(-\infty, \infty)$.  We chose $2\pi$ because our first examples are $\sin{x}$ and $\cos{x}$.

The length of $f(x) = \sin{x}$ comes from its inner product with itself:

$$
(f, f)=\int_0^{2 \pi}(\sin x)^2 d x=\pi . \quad \text { The length of } \sin x \text { is } \sqrt{\pi} \text {. }
$$

More important: $\sin{x}$ and $\cos{x}$ are orthogonal in function space: $(f, g) = 0$.

That zero is no accident.  It is highly important to science.  The orthogonality goes beyond the two functions $\sin{x}$ and $\cos{x}$, to an infinite list of sines and cosines.  The list contains $\cos{0x}$ (which is 1), $\sin{x}, \cos{x}, \sin{2x}, \cos{2x}, \sin{3x}, \cos{3x}, \dots$

Every function in that list is orthogonal to every other function in the list.

### Fourtier Series

The **Fourier series** of a function $f(x)$ is its expansion into sines and cosines:

$$
f(x)=a_0+a_1 \cos x+b_1 \sin x+a_2 \cos 2 x+b_2 \sin 2 x+\cdots
$$

We have an orthogonal basis!  The vectors in "function space" are combinations of the sines and cosines.  On the interval from $x = 2\pi$ to $x = 4\pi$, all our functions repeat what they did from $0$ to $2\pi$.  They are "periodic".  The distance between repititions is the period $2\pi$.

Remember: The list is infinite.  The Fourier series is an infinite series.  We avoided the vector $v - (1,1,1,\dots)$ because its length is infinite, now we avoid a function like $\frac{1}{2} + \cos{x} + \cos{2x} + \cos{3x} + \cdots$.  (_Note_: This is $\pi$ times the famous delta function $\delta(x)$.  It is an infinite "spike" above a single point.  At $x=0$ its height $\frac{1}{2}$ + 1 + 1 + \cdots$ is infinite.  At all points inside $0 < x < 2\pi$ the series adds in some average way to zero.)  The integral of $\delta(x)$ is 1.  But $\int\delta^2(x) = \infty$, so delta functions are not allowed into Hilbert space.

Compute the length of a typical sum $f(x)$:

$$
\begin{aligned}
(f, f) & =\int_0^{2\pi}\left(a_0+a_1 \cos x+b_1 \sin x+a_2 \cos 2 x+\cdots\right)^2 d x \\
& =\int_0^{2\pi}\left(a_0^2+a_1^2 \cos ^2 x+b_1^2 \sin ^2 x+a_2^2 \cos ^2 2 x+\cdots\right) d x \\
\|\boldsymbol{f}\|^2 & =2 \pi \boldsymbol{a}_0^2+\boldsymbol{\pi}\left(\boldsymbol{a}_{\mathbf{1}}^2+\boldsymbol{b}_{\mathbf{1}}^2+\boldsymbol{a}_{\mathbf{2}}^2+\cdots\right) .
\end{aligned}
$$

The step from line 1 to line 2 used orthogonality. All products like $\cos{x}\cos{2x}$ integrate to give zero. Line 2 contains what is left—the integrals of each sine and cosine squared.  Line 3 evaluates those integrals.  (The integral of $1^2$ is $2\pi$, when all other integrals give $\pi$.)  If we divide by their lengths, our functions become orthonormal:

$$
\frac{1}{\sqrt{2 \pi}}, \frac{\cos x}{\sqrt{\pi}}, \frac{\sin x}{\sqrt{\pi}}, \frac{\cos 2 x}{\sqrt{\pi}}, \ldots \text { is an orthonormal basis for our function space. }
$$

These are unit vectors.  We could combine them with coefficients $A_0, A_1, B_1, A_2, \dots$ to yield a function $F(x)$.  Then the $2\pi$ and the $\pi$'s drop out of the formula for length.

$$
\text { Function length }=\text { vector length } \quad\|F\|^2=(F, F)=A_0^2+A_1^2+B_1^2+A_2^2+\cdots
$$

Here is the important point, for $f(x)$ as well as $F(x)$.  The function has finite length exactly when the vector of coefficients has finite length.  Fourier series give us a perfect match between the Hilbert spaces for functions and for vectors.  The function is in $L^2$, its Fourier cofficients are in $l^2$.

The function space contains $f(x)$ exactly when the Hilbert space contains the vector $v = (a_0, a_1, b_1, \dots)$ of Fourier coefficients of $f(x)$.  Both must have finite length.

### The Fourier Coefficients

How do we find the $a$'s and $b$'s which multiply the cosines and sines?  For a given function $f(x), we are asking or its Fourier coefficients $a_k$ and $b_k$:

$$
\text { Fourier series } \quad f(x)=a_0+a_1 \cos x+b_1 \sin x+a_2 \cos 2 x+\cdots \text {. }
$$

Here is the way to find $a_1$.  Multiply both sides by $\cos{x}$.  Then integrate from 0 to $2\pi$.  The key is orthogonality! All integrals on the right side are zero except for $\cos^2{x}$:

$$
\text { For coefficient } a_1 \quad \int_0^{2 \pi} f(x) \cos x d x=\int_0^{2 \pi} a_1 \cos ^2 x d x=\pi a_1 \text {. }
$$

Divide by $\pi$ and you have $a_1$.  To find any other $a_k$, multiply the Fourier series by $\cos{kx}$.  Integrate from 0 to $2\pi$.  Use orthogonality, so only the integral of $a_k\cos^2{kx}$ is left.  That integral is $\pi a_k$, and divide by $\pi$:
$$
\boldsymbol{a}_k=\frac{1}{\pi} \int_0^{2 \pi} f(x) \cos k x d x \quad \text { and similarly } \quad b_k=\frac{1}{\pi} \int_0^{2 \pi} f(x) \sin k x\, d x \text {. }
$$

The exception is $a_0$.  This time we multiply by $\cos{0x} = 1$.  The integral of 1 is $2\pi$:

$$
\text { Constant term } \quad a_0=\frac{1}{2 \pi} \int_0^{2 \pi} f(x) \cdot 1 d x=\text { average value of } f(x) \text {. }
$$

### Compare Linear Algebra in $R^n$

Infinite-dimensional Hilbert space is very much like $n$-dimensional space $R^n$.  Suppose the nonzero vectors $v_1,\dots,v_n$ are orthogonal in $R^n$.  We want to write the vector $b$ (instead of the function $f(x)$) as a combination of those $v$'s:

$$
\text { Finite orthogonal series } \quad \boldsymbol{b}=c_1 \boldsymbol{v}_1+c_2 \boldsymbol{v}_2+\cdots+c_n \boldsymbol{v}_n \text {. }
$$

Multiply both sides by $v_1^T$.  Use orthogonality, so $v_1^Tv_2 = 0$.  Only the $c_1$ term is left:

$$
\text { Coefficient } c_1 \quad \boldsymbol{v}_1^{\mathrm{T}} \boldsymbol{b}=c_1 \boldsymbol{v}_1^{\mathrm{T}} \boldsymbol{v}_1+0+\cdots+0 \text {. Therefore } c_1=\boldsymbol{v}_1^{\mathrm{T}} \boldsymbol{b} / \boldsymbol{v}_1^{\mathrm{T}} \boldsymbol{v}_1 \text {. }
$$

The denominator $v_1^Tv_1$ is the length squared, like $\pi$ we saw earlier.  The numerator $v_1^T$ is the inner product like we saw $\int f(x) \cos{kx} dx$.  Coefficients are easy to find when the basis vectors are orthogonal.  We are just doing one-dimensional projections, to find the components along each basis vector.

The formulas are even better when the vectors are orthonormal.  Then we have unit vectors in $Q$.