# PCA — Key Formulas

**1. Mean (feature-wise):**

$$
\mu = \frac{1}{n}\sum_{i=1}^{n} x_i
$$

**2. Mean-centering the data:**

$$
X_{\text{meaned}} = X - \mu
$$

**3. Covariance matrix:**

$$
\Sigma = \frac{1}{n-1}\, X_{\text{meaned}}^{T} X_{\text{meaned}}
$$

**4. Eigen decomposition (principal axes & variances):**

$$
\Sigma v_i = \lambda_i v_i \quad \text{for } i = 1, \dots, p
$$

Matrix form:

$$
\Sigma V = V \Lambda
$$

**5. Sort eigenvalues / eigenvectors:**

$$
\lambda_1 \ge \lambda_2 \ge \dots \ge \lambda_p
$$

**6. Project onto top k principal components:**

$$
Y = X_{\text{meaned}} V_k
$$

**7. Explained variance ratio:**

$$
\text{EVR}_i = \frac{\lambda_i}{\sum_{j=1}^{p}\lambda_j}, \quad
\text{CumVar}_k = \sum_{i=1}^{k}\text{EVR}_i
$$


In [1]:
import numpy as np
from sklearn.decomposition import PCA

np.random.seed(42)
X = np.random.randn(10, 3) 

X_meaned = X - np.mean(X, axis=0)

cov_mat = np.cov(X_meaned, rowvar=False)

eigen_values, eigen_vectors = np.linalg.eigh(cov_mat)

sorted_idx = np.argsort(eigen_values)[::-1]

eigen_values = eigen_values[sorted_idx]

eigen_vectors = eigen_vectors[:, sorted_idx]

k = 2
eigen_vectors_subset = eigen_vectors[:, :k]

X_reduced_manual = np.dot(X_meaned, eigen_vectors_subset)

pca = PCA(n_components=2)
X_reduced_sklearn = pca.fit_transform(X)
print("pca manual")
print(np.round(X_reduced_manual, 4))

print("pca sklearn")
print(np.round(X_reduced_sklearn, 4))

print("\npca difference(manual - sklearn):")
print(np.round(X_reduced_manual - X_reduced_sklearn, 4))


pca manual
[[ 0.3399  1.0104]
 [-0.7863  0.7987]
 [-1.4578  1.1021]
 [-0.264   0.0493]
 [-0.2215 -1.6968]
 [ 1.1519 -0.1451]
 [ 2.3036  0.3724]
 [-0.711  -0.8003]
 [-0.368  -0.7284]
 [ 0.0131  0.0376]]
pca sklearn
[[ 0.3399  1.0104]
 [-0.7863  0.7987]
 [-1.4578  1.1021]
 [-0.264   0.0493]
 [-0.2215 -1.6968]
 [ 1.1519 -0.1451]
 [ 2.3036  0.3724]
 [-0.711  -0.8003]
 [-0.368  -0.7284]
 [ 0.0131  0.0376]]

pca difference(manual - sklearn):
[[ 0.  0.]
 [ 0. -0.]
 [ 0.  0.]
 [ 0. -0.]
 [ 0. -0.]
 [-0.  0.]
 [-0. -0.]
 [-0.  0.]
 [-0.  0.]
 [ 0. -0.]]


array([[-0.57561026,  0.48111412],
       [-0.48303245,  0.45240071],
       [ 0.65981246,  0.75090798]])