Write a Python function that performs Principal Component Analysis (PCA) from scratch. The function should take a 2D NumPy array as input, where each row represents a data sample and each column represents a feature. The function should standardize the dataset, compute the covariance matrix, find the eigenvalues and eigenvectors, and return the principal components (the eigenvectors corresponding to the largest eigenvalues). The function should also take an integer k as input, representing the number of principal components to return.
:


In [21]:
import numpy as np
def principal_component(data,k):

  #standarize the data
  data_mean=np.mean(data,axis=0)
  standarized_data=(data-data_mean) / np.std(data, axis=0) #broadcasts

  #calculate the covariance
  # cov_matrix=np.dot(standarized_data.T,standarized_data)
  cov_matrix=np.cov(standarized_data,rowvar=False)

  #calculate the eigen vectors and eigen values of the covariance matrix or correlation matrix
  eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

  # Sort the eigenvectors by decreasing eigenvalues
  idx = np.argsort(eigenvalues)[::-1] #NumPy function that returns the indices that would sort an array in ascending order. #[::-1]:This is a slicing technique in Python to reverse the array.
  eigenvalues_sorted = eigenvalues[idx]#you can index a NumPy array with another array (in this case, idx), and this feature is called fancy indexing or integer array indexing. This allows you to reorder or select elements from a NumPy array based on the indices provided in another array.
  eigenvectors_sorted = eigenvectors[:,idx]

  #select the top k eigen vectors (dimensionality reduction)
  principal_components=eigenvectors[:,:k]

  return np.round(principal_components,4)

In [22]:
data = np.array([[1, 2], [3, 4], [5, 6]])
k = 1
print(principal_component(data,k))

[[0.7071]
 [0.7071]]


After standardizing the data and computing the covariance matrix, the eigenvalues and eigenvectors are calculated. The largest eigenvalue's corresponding eigenvector is returned as the principal component, rounded to four decimal places.

Understanding Eigenvalues in PCA
Principal Component Analysis (PCA) utilizes the concept of eigenvalues and eigenvectors to identify the principal components of a dataset. Here's how eigenvalues fit into the PCA process:
Eigenvalues and Eigenvectors: The Foundation of PCA
For a given square matrix ( A ), representing the covariance matrix in PCA, eigenvalues ( \lambda ) and their corresponding eigenvectors ( v ) satisfy:
Av=λvAv=λv
Calculating Eigenvalues
The eigenvalues of matrix ( A ) are found by solving the characteristic equation:
det⁡(A−λI)=0det(A−λI)=0
where ( I ) is the identity matrix of the same dimension as ( A ). This equation highlights the relationship between a matrix, its eigenvalues, and eigenvectors.
Role in PCA
In PCA, the covariance matrix's eigenvalues represent the variance explained by its eigenvectors. Thus, selecting the eigenvectors associated with the largest eigenvalues is akin to choosing the principal components that retain the most data variance.
Eigenvalues and Dimensionality Reduction
The magnitude of an eigenvalue correlates with the importance of its corresponding eigenvector (principal component) in representing the dataset's variability. By selecting a subset of eigenvectors corresponding to the largest eigenvalues, PCA achieves dimensionality reduction while preserving as much of the dataset's variability as possible.
Practical Application
1.	Standardize the Dataset: Ensure that each feature has a mean of 0 and a standard deviation of 1.
2.	Compute the Covariance Matrix: Reflects how features vary together.
3.	Find Eigenvalues and Eigenvectors: Solve the characteristic equation for the covariance matrix.
4.	Select Principal Components: Choose eigenvectors (components) with the highest eigenvalues for dimensionality reduction.
Through this process, PCA transforms the original features into a new set of uncorrelated features (principal components), ordered by the amount of original variance they explain.
