# Principal Component Analysis (PCA) in Machine Learning

Principal Component Analysis (PCA) is a dimensionality reduction technique used in machine learning to simplify complex datasets. It is primarily used to reduce the number of features while retaining the most important information in the data. PCA achieves this by transforming the data into a new set of variables called **Principal Components**, which are uncorrelated and capture the maximum variance in the dataset.

Principal Component Analysis (PCA) is used to reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables, retaining most of the sample’s information, and useful for the regression and classification of data.


## Key Points:
- **Purpose**: Reduce dimensionality and remove noise while retaining important patterns.
- **Usage**: Helps in visualization, speed up machine learning algorithms, and prevent overfitting.

## How PCA Works:
1. **Standardization**: The data is standardized to have a mean of 0 and a standard deviation of 1, making it unit-free.
2. **Covariance Matrix Computation**: A covariance matrix is created to identify correlations between features.
3. **Eigen Decomposition**: Eigenvalues and eigenvectors are calculated from the covariance matrix to identify the principal components.
   - **Eigenvalues** determine the amount of variance each principal component explains.
   - **Eigenvectors** point in the direction of the principal components.
4. **Feature Vector Formation**: Select the top `k` eigenvectors that correspond to the largest eigenvalues.
5. **Data Projection**: The original data is projected onto the new feature space formed by the selected principal components.



In [2]:
## PCA Function in Python (using `scikit-learn`):

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np

# Sample dataset
X = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0], [2.3, 2.7], [2, 1.6], [1, 1.1], [1.5, 1.6], [1.1, 0.9]])

# Standardizing the dataset
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Applying PCA
pca = PCA(n_components=2)  # Number of components to keep
X_pca = pca.fit_transform(X_scaled)

# Print the explained variance ratio
print("Explained Variance Ratio:", pca.explained_variance_ratio_)
print("Principal Components:\n", X_pca)

Explained Variance Ratio: [0.96296464 0.03703536]
Principal Components:
 [[-1.08643242 -0.22352364]
 [ 2.3089372   0.17808082]
 [-1.24191895  0.501509  ]
 [-0.34078247  0.16991864]
 [-2.18429003 -0.26475825]
 [-1.16073946  0.23048082]
 [ 0.09260467 -0.45331721]
 [ 1.48210777  0.05566672]
 [ 0.56722643  0.02130455]
 [ 1.56328726 -0.21536146]]
