## Q1. What is a projection and how is it used in PCA?
## Answer 

#### In geometry and linear algebra, a projection is essentially the process of translating points from a high-dimensional space onto a lower-dimensional subspace. Imagine shining a flashlight on a 3D object to cast a 2D shadow — the shadow is the projection.
## How is it used in PCA :
#### PCA uses projection to reduce dimensionality while preserving as much variance (information) as possible:
#### 1. Find Principal Components (PCs)
- These are the directions (vectors) in which the data varies the most.
- First PC captures maximum variance, second PC captures the next highest (orthogonal to the first), and so on.
#### 2. Project the Data
- Each data point is projected onto these principal components.
- Instead of expressing the point in terms of its original features, it’s now represented using a reduced number of PCs.
#### 3. Dimensionality Reduction
- If you keep only the top k principal components, you reduce the dataset from n dimensions to k, while retaining most of the "shape" or structure.

## 

## Q2. How does the optimization problem in PCA work, and what is it trying to achieve?
## Answer 

#### Goal of PCA’s Optimization :
- PCA wants to answer this: “What are the best directions to project the data so that we preserve the maximum variance?”
- Why variance? Because variance captures information, structure, and spread in the dataset. High variance means the data points are well spread out in that direction — so PCA prioritizes keeping that.
#### HOW IT WORKS ? :
#### Input:
##### A dataset 𝑋∈𝑅 𝑛×𝑑, where:
- 𝑛 = number of samples
- 𝑑 = number of original features
##### Assume the data is centered, i.e. mean = 0
####
#### Objective:
##### Find a vector 𝑤∈𝑅𝑑 (a direction) such that the variance of the data projected onto 𝑤 is maximized.
##### Here 𝑆 is the covariance matrix of the data.
##### The constraint ∥𝑤∥ = 1 ensures we’re only choosing direction — not magnitude.
####
#### Solution:
##### This is solved via eigen decomposition.
##### The optimal 𝑤 is the eigenvector of 𝑆 corresponding to the largest eigenvalue.


## 

## Q3. What is the relationship between covariance matrices and PCA?
## Answer 

#### Covariance Matrix: The DNA of PCA
- At the heart of PCA lies the covariance matrix, a square matrix that reveals how features in your dataset vary with respect to each other.
## PCA’s Job: Decode That Matrix
#### PCA transforms your dataset into new axes by solving the covariance matrix:
#### 1. Compute the Covariance Matrix 𝑆
#### 2. Diagonalize 𝑆
- Find the eigenvectors and eigenvalues of 𝑆.
- Eigenvectors = principal components (directions of maximum variance)
- Eigenvalues = amount of variance each component captures
#### 3. Construct the PCA Basis
- Arrange the eigenvectors in descending order of eigenvalues.
- These vectors become the new axes to project your data.


## 

## Q4. How does the choice of number of principal components impact the performance of PCA?
## Answer 

## 1. Information Retention vs Compression
#### More PCs: 
- Capture more variance → richer representation of data
- Fewer PCs: Leaner data → better generalization, but risk of losing meaningful signal
- Trade-off: Retaining ~95% variance is often a sweet spot, but context matters
####
## 2. Model Accuracy
- In models like KNN, a well-chosen number of PCs improves accuracy by focusing on the most informative directions.
- Too few PCs → underfitting (important patterns lost)
- Too many PCs → overfitting (model picks up on noise)
####
## 3. Computational Efficiency
- Fewer PCs mean faster training and inference.
- Useful in real-time systems or when working with large datasets.
#### 
## 4. Noise Reduction
- PCA inherently filters out dimensions with low variance (likely noise).
- Proper selection helps clarify signal from background clutter.

## 

## Q5. How can PCA be used in feature selection, and what are the benefits of using it for this purpose?
## Answer 

#### PCA transforms your original features into principal components (PCs) — linear combinations of the original features ordered by variance captured.

#### Dimensionality Reduction :
- PCA lets you represent your data with fewer synthetic features (PCs) that capture most of the variance.
- This helps reduce noise and redundancy from highly correlated or low-impact original features.
#### Variance Prioritization :
- Features contributing to the top PCs are considered more informative.
- You can analyze the loading scores - pca.explained_variance_ratio_ - to decide which original features to keep.
#### Unsupervised Preprocessing
- PCA doesn’t depend on target labels, making it perfect for unsupervised feature selection.
- It gives a “clean slate” view of which directions in the data are most active

## 

## Q6. What are some common applications of PCA in data science and machine learning?
## Answer 

#### 1. Dimensionality Reduction for Modeling
#### 2. Data Visualization (high dimension to lower dimension)
#### 3. Noise Filtering - PCA filters out features with low variance, often reducing noise in sensor data, text embeddings, or biological data
#### 4. Anomaly Detection - By compressing the feature space, PCA makes outliers more visible.
#### 5. Image Compression & Recognition - PCA reduces image size while preserving essential structural patterns (e.g., Eigenfaces concept).

## 

## Q7.What is the relationship between spread and variance in PCA?
## Answer 

#### "Spread" refers to how data points are distributed along a direction or axis. A greater spread means that points are more scattered, and that axis captures more variation in the dataset.
#### PCA’s Objective: Maximize Spread = Maximize Variance
#### In PCA:
- The goal is to find axes (principal components) where the data shows the maximum spread.
- These axes are chosen by calculating the variance along every possible direction, and keeping the ones with the highest variance.

## 

## Q8. How does PCA use the spread and variance of the data to identify principal components?
## Answer

#### 1. Compute Covariance Matrix
#### 2. Eigen-Decomposition of 𝑆 :
- Extract eigenvectors (directions of variance)
- Extract eigenvalues (amount of variance along those directions)

## 

## Q9. How does PCA handle data with high variance in some dimensions but low variance in others?
## Answer 

#### The covariance matrix 𝑆 captures how features vary together:
- High variance → large entries
- Low variance → smaller influence
##### So eigenvectors aligned with high-variance directions will get higher eigenvalues — and be prioritized as principal components.

- High-variance dimensions: preserved in early components
- Low-variance dimensions: relegated to later components or discarded