To connect the dots between solving the objective function and concluding that the vector we were looking for is an **eigenvector**, we need to follow a series of logical steps. Let's walk through the process carefully:

### Step 1: **PCA Objective**

The goal of Principal Component Analysis (PCA) is to find the **direction (or vector)** in the feature space that maximizes the variance of the projected data points. In mathematical terms, we want to find the vector $( \mathbf{w} )$ (also called the principal component direction) that maximizes the variance in the data.

#### Objective:
Maximize the variance along a direction $( \mathbf{w} )$ subject to the constraint that $( \mathbf{w} )$ is a unit vector (to avoid scaling issues). This can be framed as the following optimization problem:

$[
\max_{\mathbf{w}} \quad \mathbf{w}^\top S \mathbf{w} \quad \text{subject to} \quad \mathbf{w}^\top \mathbf{w} = 1
]$

Where:
- $( S )$ is the covariance matrix of the dataset.
- $( \mathbf{w} )$ is the direction we are looking for (the principal component).
- $( \mathbf{w}^\top S \mathbf{w} )$ represents the variance along the direction $( \mathbf{w} )$.

### Step 2: **Using Lagrange Multipliers**

To solve this constrained optimization problem, we use **Lagrange multipliers**. The Lagrangian function for this optimization is:

$[
\mathcal{L}(\mathbf{w}, \lambda) = \mathbf{w}^\top S \mathbf{w} - \lambda (\mathbf{w}^\top \mathbf{w} - 1)
]$

Where $( \lambda )$ is the Lagrange multiplier associated with the constraint $( \mathbf{w}^\top \mathbf{w} = 1 )$.

### Step 3: **Taking the Derivative**

Next, we take the derivative of the Lagrangian function with respect to $( \mathbf{w} )$ and set it equal to zero to find the extremum:

$[
\frac{\partial \mathcal{L}}{\partial \mathbf{w}} = 2S\mathbf{w} - 2\lambda \mathbf{w} = 0
]$

Simplifying:

$[
S\mathbf{w} = \lambda \mathbf{w}
]$

This is an **eigenvalue equation**.

### Step 4: **Eigenvalue Equation**

The equation $( S\mathbf{w} = \lambda \mathbf{w} )$ is the **standard eigenvalue equation**:

$[
A \mathbf{v} = \lambda \mathbf{v}
]$

Where:
- $( A = S )$ (the covariance matrix).
- $( \mathbf{v} = \mathbf{w} )$ (the vector we are solving for).
- $( \lambda )$ is the eigenvalue (which corresponds to the variance along that direction).

### Step 5: **Connecting to Eigenvectors and Eigenvalues**

From the eigenvalue equation $( S\mathbf{w} = \lambda \mathbf{w} )$, we can immediately recognize the following:

- The vector $( \mathbf{w} )$ is an **eigenvector** of the covariance matrix $( S )$, because it satisfies the equation $( S\mathbf{w} = \lambda \mathbf{w} )$.
- The scalar $( \lambda )$ is the corresponding **eigenvalue**, which in the context of PCA represents the variance captured by that eigenvector (or principal component).

### Step 6: **Conclusion**

Thus, we have derived that:
- The vector $( \mathbf{w} )$ that maximizes the variance in the data (the principal component direction) is an **eigenvector** of the covariance matrix $( S )$.
- The associated eigenvalue $( \lambda )$ tells us the **variance** (or amount of information) captured along that direction.

### Final Thoughts

In summary, the process works like this:
- The optimization problem for PCA seeks the direction in the feature space that maximizes the variance of the data.
- By solving the optimization problem with Lagrange multipliers, we arrive at the **eigenvalue equation**.
- The solutions to this equation (i.e., the vectors $( \mathbf{w} )$) are the **eigenvectors** of the covariance matrix, and the corresponding values $( \lambda )$ are the **eigenvalues**.
- These eigenvectors correspond to the **principal components** of the data, and the eigenvalues tell us how much variance each principal component captures.

