# Diagonalization of matrices

Matrix diagonalization is a fundamental concept in linear algebra. It allows you to simplify complex problems and uncover hidden structure within linear transformations. Through diagonalization, you gain deeper insight into **eigenvectors**, **eigenvalues**, **characteristic polynomials**, and the transformative nature of matrix operations.

Diagonalization also plays a crucial role across many disciplines, including mathematics, physics, data science, and engineering. It enables efficient computations, clearer geometric interpretations, and powerful theoretical results.

Prepare to unlock the true potential of matrices.

## Diagonal matrices

After the identity matrix, diagonal matrices are the easiest to manipulate. With zero entries off the main diagonal, calculations involving diagonal matrices become significantly streamlined. As most entries cancel out, basic operations such as addition, subtraction, and multiplication become straightforward. This property greatly reduces the complexity and computational burden of diagonal matrix operations.

Take an $n \times n$ diagonal matrix $A$ with diagonal entries
$$
a_1, a_2, \dots, a_n.
$$
Transforming a vector
$$
x = (x_1, x_2, \dots, x_n)^T
$$
is especially simple:

$$
Ax =
\begin{pmatrix}
a_1 & 0   & \cdots & 0 \\
0   & a_2 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0   & 0   & \cdots & a_n
\end{pmatrix}
\begin{pmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n
\end{pmatrix}
=
\begin{pmatrix}
a_1 x_1 \\
a_2 x_2 \\
\vdots \\
a_n x_n
\end{pmatrix}.
$$

Computing many matrix products can be tricky, especially when dimensions are large. However, for diagonal matrices, taking powers is exceptionally straightforward. For example,

$$
A^2 = A A =
\begin{pmatrix}
a_1 & 0   & \cdots & 0 \\
0   & a_2 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0   & 0   & \cdots & a_n
\end{pmatrix}
\begin{pmatrix}
a_1 & 0   & \cdots & 0 \\
0   & a_2 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0   & 0   & \cdots & a_n
\end{pmatrix}
=
\begin{pmatrix}
a_1^2 & 0     & \cdots & 0 \\
0     & a_2^2 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0     & 0     & \cdots & a_n^2
\end{pmatrix}.
$$

In general, for any positive integer $k$,

$$
A^k =
\begin{pmatrix}
a_1^k & 0     & \cdots & 0 \\
0     & a_2^k & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0     & 0     & \cdots & a_n^k
\end{pmatrix}.
$$

Manipulating diagonal matrices is easy, but in practice, most matrices are not diagonal. So what is the point of knowing these properties? This is where the concept of **diagonalization** comes into play.

## Diagonalizable matrices

Diagonalizing a matrix consists of decomposing it as the product of other, simpler matrices. Formally, a square matrix $A$ is **diagonalizable** if there exist a diagonal matrix $D$ and an invertible matrix $P$ (both of size $n \times n$) such that

$$
A = P D P^{-1}.
$$

At first glance, this definition may not seem very intuitive. However, the key idea is that a diagonalizable matrix behaves very much like a diagonal one. This becomes clear when we compute powers of $A$.

Consider the square of $A$:

$$
\begin{aligned}
A^2
&= (P D P^{-1})(P D P^{-1}) \\
&= P D (P^{-1} P) D P^{-1} \\
&= P D^2 P^{-1}.
\end{aligned}
$$

More generally, for any positive integer $k$,

$$
A^k = P D^k P^{-1}.
$$

Without diagonalization, computing $A^k$ would require performing $k-1$ matrix multiplications. With diagonalization, the task becomes much simpler: you only need to raise the diagonal entries of $D$ to the $k$-th power and then perform just **two** matrix multiplications.

This dramatic simplification is one of the main reasons diagonalization is so powerful and widely used.

Naturally, this leads to two important questions:
- How can we tell whether a matrix is diagonalizable?
- If it is diagonalizable, how do we find the matrices $D$ and $P$?

These questions will be answered in the next topic. For now, the key takeaway is the usefulness of diagonalization and how it turns complicated matrix computations into manageable ones.

## The Importance of Diagonalization

One of the key advantages of diagonalization is its ability to simplify complex matrices. While most matrices are not diagonal in their original form, diagonalization allows us to transform them into a diagonal matrix through a suitable change of basis. This transformation not only makes computations significantly easier but also provides deeper insight into the matrixâ€™s structure, revealing its intrinsic properties and relationships.

### Diagonalization in Data Science

In data science, diagonalization plays a crucial role in **dimensionality reduction** techniques. By diagonalizing a covariance matrix, one can identify the **principal components** that capture the most significant variation in a dataset. These principal components provide a lower-dimensional representation of the data, enabling more efficient analysis, visualization, and modeling.

A typical example is reducing high-dimensional data (such as 3D data) into a lower-dimensional space (such as 2D) while preserving as much information as possible.

### Diagonalization in Programming and Network Analysis

In programming, diagonalization appears in several applications, particularly in **graph theory** and **network analysis**. Diagonalizing the adjacency matrix of a graph can reveal important information about:
- connectivity patterns,
- centrality measures,
- community structures.

Such insights are essential for analyzing social networks, building recommendation systems, and detecting anomalies or fraud in large-scale networks.

### Diagonalization and Eigenvalue Problems

Diagonalization is also closely tied to **eigenvalue problems**, which arise across many scientific and engineering disciplines. Solving eigenvalue problems via diagonalization allows us to:
- analyze vibrations and stability in mechanical systems,
- simulate physical processes,
- optimize numerical algorithms,
- predict the behavior of dynamic systems.

In summary, diagonalization is a fundamental tool that bridges theory and application, enabling both conceptual understanding and practical computation across mathematics, science, and engineering.

## Diagonalization in Action: A Weather Forecast Markov Chain

Imagine you live in a small city with very predictable weather. This city has three types of weather: sunny, cloudy, and snowy. The weather transitions from one type to another based only on the current weather condition. This is a random system where the future depends solely on the present state, known as a **Markov chain**.

Let the states be:

- $1 = \text{sunny}$
- $2 = \text{cloudy}$
- $3 = \text{snow}$

We represent transition probabilities using a **transition matrix** $A$, where each entry $a_{ij}$ is the probability of moving from state $i$ to state $j$. Our matrix is:

$$
A=
\begin{bmatrix}
0.7 & 0.2 & 0.1\\
0.4 & 0.5 & 0.1\\
0.2 & 0.3 & 0.5
\end{bmatrix}
$$

So, for example, the probability of going from a sunny day to a cloudy day is $0.2$. Similarly, if it is snowing today, the probability that it will be snowing tomorrow is $0.5$.

Note that each row of $A$ sums to $1$, because from any current weather state, the next day must be in *some* state.

### Multi-step Forecasting via Matrix Powers

If today is cloudy, you know the probability that tomorrow will be sunny. But how can you determine:

- the probability that it will snow **the day after tomorrow**, or
- the probability that in **a week** it will be cloudy again?

In general, you need the probability of transition from state $i$ to state $j$ in $k$ steps. It turns out that this probability is precisely the $(i,j)$ entry of the matrix power $A^k$.

### Using Diagonalization to Compute $A^k$

Computing powers of $A$ directly can be tedious. This is where diagonalization helps.

We will not go through the full derivation here, but we are told that the matrices

$$
D=
\begin{bmatrix}
1 & 0 & 0\\
0 & 0.3 & 0\\
0 & 0 & 0.4
\end{bmatrix},
\qquad
P=
\begin{bmatrix}
1 & 0.125 & -0.2\\
1 & -0.75 & -0.2\\
1 & 1 & 1
\end{bmatrix}
$$

satisfy

$$
A = P D P^{-1}.
$$

Then

$$
A^7 = P D^7 P^{-1}.
$$

Since $D$ is diagonal, taking its 7th power is easy:

$$
D^7=
\begin{bmatrix}
1 & 0 & 0\\
0 & 0.3^7 & 0\\
0 & 0 & 0.4^7
\end{bmatrix}.
$$

Thus,

$$
A^7 = P
\begin{bmatrix}
1 & 0 & 0\\
0 & 0.3^7 & 0\\
0 & 0 & 0.4^7
\end{bmatrix}
P^{-1}
=
\begin{bmatrix}
0.5243 & 0.3092 & 0.1663\\
0.5241 & 0.3094 & 0.1663\\
0.5213 & 0.3106 & 0.1680
\end{bmatrix}.
$$

### Example Interpretation

If it is cloudy today, then the probability that it will be cloudy again in a week is the entry $(2,2)$ of $A^7$, namely:

$$
(A^7)_{2,2} = 0.3094.
$$

### Bonus: Equilibrium Behavior

Notice that the rows of $A^7$ are already very similar. This suggests the Markov chain is approaching an **equilibrium (stationary distribution)**, where the long-run weather probabilities become nearly independent of the starting condition.

## Conclusion

- **Diagonal matrices** are the easiest to manipulate. They have simple algebraic properties, but they are uncommon in practice.

- A square matrix $A$ is **diagonalizable** if there exist a diagonal matrix $D$ and an invertible matrix $P$ such that
  $$
  A = P D P^{-1}.
  $$

- **Diagonalizable matrices** behave similarly to diagonal matrices, which makes many computations much simpler.

- If $A$ is diagonalizable, then its powers can be computed efficiently as
  $$
  A^k = P D^k P^{-1}.
  $$

- **Diagonalization** has important applications across many fields, including data science, programming, physics, and optimization.