Principal Component Analysis (PCA) Implementation
π Overview
This project implements Principal Component Analysis (PCA) from scratch using NumPy and Matplotlib. PCA is a dimensionality reduction technique used in machine learning and data visualization to transform high-dimensional data into lower dimensions while preserving variance.
β¨ Features
β Computes mean row vectors for dataset normalization.
β Calculates covariance matrix to understand data relationships.
β Extracts eigenvalues and eigenvectors to determine principal components.
β Projects high-dimensional data onto a lower-dimensional subspace.
β Visualizes the transformed data using Matplotlib.
π οΈ Requirements
Python 3.8+
Required Python packages:
pip install numpy matplotlib
π Usage
1οΈβ£ Run the script:
python pca_script.py
2οΈβ£ Outputs:
Prints step-by-step calculations:
Mean row vectors
Covariance matrix
Eigenvalues and eigenvectors
Projection matrix (W)
Transformed dataset
Plots and saves a scatter plot of the transformed data as PCA_tutorial.png.
π How It Works
Compute Mean Row Vectors: Calculate the mean of each feature.
Subtract Mean: Center the data by subtracting the mean row vector.
Compute Covariance Matrix: Measures relationships between different features.
Eigen Decomposition: Find eigenvalues and eigenvectors of the covariance matrix.
Select Top k Components: Choose the top k eigenvectors.
Project Data: Transform the original dataset to a lower-dimensional space.
Visualize Data: Plot the projected data if k=2.
π Example Output
compute covariance matrix [[ 2.1875 4.1875 -0.5 ] [ 4.1875 9.6875 -1.25 ] [-0.5 -1.25 0.875 ]]
β Notes
k=2 is used for 2D visualization; you can modify it for different dimensionality reductions.
PCA is unsupervised, meaning it doesnβt require labels.
Eigenvalues indicate the importance of each principal component.
π‘ Future Enhancements
β Extend PCA to work with larger datasets.
β Implement PCA using Scikit-learn for comparison.
β Add 3D visualization for k=3.
π License
This project is open-source and available under the MIT License.
π Contributions
Feel free to contribute! Submit issues, suggest features, or fork and improve the project.