Skip to content

Latest commit

 

History

History

principal-component-analysis

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a dimensionality reduction and data analysis technique commonly used in machine learning and statistics. PCA transforms high-dimensional data into a lower-dimensional representation, by identifying and projecting the data onto a new set of orthogonal axes called principal components. PCA is useful for dimensionality reduction, feature transformation, data visualization, noise reduction, and lossy compression.

Steps…

Data Preparation: Use a dataset with 'n' data points, each represented by 'm' features (dimensions). Data is typically standardized or normalized to ensure each feature has a similar scale.

Compute Mean: Calculate the mean vector for the entire dataset. This mean vector will serve as the center of the data.

Calculate Covariance Matrix: Compute the covariance matrix of the data. This captures relationships between different features and provides information about the data's variability.

Calculate Eigenvectors and Eigenvalues: Calculate the eigenvectors and eigenvalues of the covariance matrix. Eigenvectors represent the directions along which the data varies the most, and eigenvalues quantify the amount of variance along these directions.

Select Principal Components: Choose the top 'k' eigenvectors based on the highest eigenvalues. These eigenvectors represent the principal components of the data.

Project Data: Create a new matrix by projecting the original data onto the selected 'k' principal components. This reduces dimensions while preserving as much variance as possible.