# Quiz: Dimensionality Reduction Assessment
---

## Q1. What is the key difference between PCA and LDA? 
1. PCA maximizes class separability , while LDA maximizes variance 
2. PCA is unspervised, while LDA is supervised 
3. PCA works only for binary classification, while LDA works for multi-class classification 
4. PCA is a non-linear method, while LDA is linear

The correct answer is:

**2. PCA is unsupervised, while LDA is supervised**

**Explanation:**

* **PCA (Principal Component Analysis)** is an **unsupervised** dimensionality reduction technique that finds directions (principal components) that maximize the variance in the data, without using class labels.

* **LDA (Linear Discriminant Analysis)** is a **supervised** technique that finds the linear combinations of features that best separate the classes by maximizing the ratio of between-class variance to within-class variance.

Other options:

1. Wrong — PCA maximizes variance, LDA maximizes class separability.
2. Wrong — PCA is not classification-based; LDA works for multi-class classification as well as binary.
3. Wrong — Both PCA and LDA are linear methods.



## Q2. What happens to the separability of classes in LDA if the within-class scatter matrix is singular or nearly singular? 
1. The separability is maximized 
2. The LDA solution is not unique, leading to poor class separability 
3. The separability is unaffected 
4. The classes become linearly inseparable

The correct answer is:

**2. The LDA solution is not unique, leading to poor class separability**

**Explanation:**

* LDA involves calculating the **within-class scatter matrix (Sw)** and the **between-class scatter matrix (Sb)**.
* To find the optimal projection, LDA solves the generalized eigenvalue problem involving the inverse of the within-class scatter matrix: ( S_w^{-1} S_b ).
* If **Sw is singular or nearly singular** (i.e., not invertible or ill-conditioned), the solution becomes unstable or **not unique**.
* This instability means the projection directions are not well-defined, which can degrade the ability to separate classes effectively.

Other options:

1. Incorrect — Singular Sw does not maximize separability.
2. Incorrect — Separability is affected by singular Sw.
3. Incorrect — Classes might still be linearly separable but LDA can’t find the solution properly.


## Q3. Consider a dataset with 10,000 features. After applying PCA, how would you determine the optimal number of principal components to retain? 
1. Retain components that account for at least 90% of the total variance 
2. Retain the first 100 Principal components 
3. Retain components based on their eigenvalues being greater than one 
4. Retain components until the explained variance begins to decrease

The best answer is:

**1. Retain components that account for at least 90% of the total variance**

**Explanation:**

* When applying PCA, a common strategy to choose the number of principal components is to select enough components to explain a large percentage (commonly 90% or 95%) of the total variance in the data.
* This balances dimensionality reduction with retaining most of the important information.

Other options:

2. Arbitrary fixed number (100) may not suit every dataset and may retain too many or too few components.
3. The "eigenvalue greater than one" rule is commonly used in **Factor Analysis** or **Kaiser criterion**, but less standard for PCA with very high dimensions.
4. Explained variance does not typically “begin to decrease” as you add more components — it always increases or stays constant because components are ordered by decreasing variance.

## Q4. Which of the following statements best describes the relationship between the Curse of Dimensionality and the intrinsic dimensionality of data? 
1. The Curse of Dimensionality only affects data with low intrinsic dimensionality 
2. The Curse of Dimensionality arises when the number of features exceeds the intrinsic dimensionality of the data 
3. The intrinsic dimensionality of data increases as the number of features increases 
4. The Curse of Dimensionality does not depend on intrinsic dimensionality

The correct answer is:

**2. The Curse of Dimensionality arises when the number of features exceeds the intrinsic dimensionality of the data**

**Explanation:**

* **Intrinsic dimensionality** refers to the minimum number of variables needed to represent the data without significant information loss.
* The **Curse of Dimensionality** occurs when the **ambient dimensionality** (actual number of features) is much higher than the intrinsic dimensionality, causing problems like data sparsity, overfitting, and difficulty in distance-based measures.
* When the number of features exceeds the true intrinsic dimensionality, the data becomes sparse in the high-dimensional space, which negatively impacts many algorithms.

Other options:

1. Incorrect — Curse affects data when high dimensionality exceeds intrinsic dimensionality, not only low intrinsic dimensionality data.
2. Incorrect — Intrinsic dimensionality is a property of the data structure, not directly increased by adding irrelevant features.
3. Incorrect — Curse of Dimensionality is closely related to intrinsic dimensionality.


## Q5. Which of th efollowing statements is true about PCA and LDA? 
1. Both PCA and LDA require labelled data for training 
2. PCA maximizes variance, while LDA maximizes class separbility 
3. LDA is always superior to PCA for all datasets 
4. PCA is a supervised technique, while LDA is unsupervised

The correct answer is:

**2. PCA maximizes variance, while LDA maximizes class separability**

**Explanation:**

* **PCA** is an **unsupervised** technique that finds directions (principal components) maximizing the overall variance in the data, without using class labels.
* **LDA** is a **supervised** technique that aims to find a projection maximizing the separability between different classes.

Other options:

1. Incorrect — PCA does **not** require labeled data, but LDA does.
2. Incorrect — LDA is not always superior; it depends on the dataset and task.
3. Incorrect — PCA is unsupervised, LDA is supervised.


## Q6. What is the primary purpose of applying dimensionality reduction before using a machine learning model? 
1. To improve the performance of distance-based algorithms 
2. To increase the number of features in th edataset 
3. To remobve noise from the dataset 
4. To make the datset easier to visalize

The best answer is:

**1. To improve the performance of distance-based algorithms**

**Explanation:**

* Dimensionality reduction helps **reduce the number of features**, which often improves the performance of machine learning models, especially **distance-based algorithms** like k-NN or clustering, because these algorithms suffer from the Curse of Dimensionality.
* By reducing dimensions, data becomes less sparse, distances become more meaningful, and models often perform better.

Other options:

2. Incorrect — Dimensionality reduction **decreases** the number of features, not increases.
3. Partially true — Dimensionality reduction can help reduce noise indirectly, but it's not the **primary** purpose.
4. Also true — Dimensionality reduction aids visualization (e.g., reducing to 2D or 3D), but this is more of a side benefit than the main purpose.



## Q7. Which technique would you use to visualize the separation between different classes in a dataset? 
1. PCA 
2. LDA 
3. t-SNE 
4. Autoencoder

The best answer is:

**3. t-SNE**

**Explanation:**

* **t-SNE (t-Distributed Stochastic Neighbor Embedding)** is specifically designed for visualizing high-dimensional data by preserving local structures and revealing clusters, making it excellent for visualizing separation between different classes.
* While **PCA** and **LDA** can also be used for visualization:

  * **PCA** focuses on variance, not class separation.
  * **LDA** is supervised and tries to maximize class separability but may not capture complex structures.
* **Autoencoders** are primarily for nonlinear dimensionality reduction or feature learning, not primarily visualization.


## Q8. When applying PCA, why is it important to standardize the data? 
1. To ensure that all variables have equal  variance 
2. To reduce th enumber of principal components 
3. To maximize the explained variance 
4. To increase the number of features

The correct answer is:

**1. To ensure that all variables have equal variance**

**Explanation:**

* PCA is sensitive to the scale of the features because it relies on the covariance matrix.
* If features have different units or scales, those with larger variance will dominate the principal components.
* **Standardizing** (usually mean=0, variance=1) puts all features on the same scale, so PCA treats each feature equally.

Other options:

2. Incorrect — Standardization doesn’t reduce the number of components directly.
3. Incorrect — Standardization doesn’t maximize explained variance; it balances variance across features.
4. Incorrect — Standardization does not increase the number of features.


## Q9. Which of the following is a disadvantage of LDA? 
1. It requires more computation compared to PCA 
2. It works only for binary classification problems 
3. It assumes that the data is normally distributed within each class 
4. It cannot be used with high-dimensional data

The correct answer is:

**3. It assumes that the data is normally distributed within each class**

**Explanation:**

* LDA assumes that the features for each class follow a **multivariate normal (Gaussian) distribution** with the same covariance matrix across classes.
* This assumption can be violated in real-world data, reducing LDA’s effectiveness.

Other options:

1. Incorrect — LDA is generally less computationally expensive than some other methods; PCA often requires eigen decomposition but both are comparable.
2. Incorrect — LDA can handle multi-class classification, not just binary.
3. Incorrect — LDA can be used with high-dimensional data, though it might face issues like singular covariance matrices if the dimensionality is too high relative to the sample size.


## Q10. What is a common use of PCA in image processing? 
1. Image segmentation 
2. Image compression 
3. Edge detection 
4. Color correction

The correct answer is:

**2. Image compression**

**Explanation:**

* PCA reduces the dimensionality of image data by projecting it onto principal components that capture the most variance.
* This allows images to be stored or transmitted using fewer components while preserving important information, effectively compressing the image.

Other options:

1. Image segmentation — usually done using clustering or other methods, not PCA.
2. Edge detection — relies on filters like Sobel or Canny, not PCA.
3. Color correction — involves color space transformations, not PCA


## Q11. In PCA, what do eigenvectors represent? 
1. The amount of variance explained by each principal component 
2. The directions of maximum variance in the data 
3. The correlation between variables 
4. The covariance between principal components

The correct answer is:

**2. The directions of maximum variance in the data**

**Explanation:**

* In PCA, **eigenvectors** represent the directions (principal components) along which the data varies the most.
* Each eigenvector defines a new axis in the transformed feature space.
* The corresponding **eigenvalues** indicate how much variance is explained by each eigenvector (principal component).

Other options:

1. Incorrect — That describes eigenvalues, not eigenvectors.
2. Incorrect — Correlation is between variables, not what eigenvectors represent.
3. Incorrect — Covariance between principal components is zero (they are orthogonal).


## Q12. What is the primary limitation of PCA? 
1. It can  only be applied to categorical data 
2. It assumes linear relationships between variables 
3. It always increases model complexity 
4. It cannot be sued for data visalization

The correct answer is:

**2. It assumes linear relationships between variables**

**Explanation:**

* PCA is a **linear** dimensionality reduction technique that projects data onto linear combinations of original features.
* It cannot capture **non-linear** relationships between variables effectively.

Other options:

1. Incorrect — PCA works with continuous numeric data, not categorical data directly.
2. Incorrect — PCA generally reduces dimensionality and often decreases model complexity.
3. Incorrect — PCA is commonly used for data visualization (e.g., 2D or 3D plots).


## Q13. Which of the following scenarios is LDA particularly useful for? 
1. Reducing dimensionality in a regression problem 
2. Reducing dimensionality in a multi-class classification problem 
3. Visualizing data in 3D space 
4. Handling missing data

The correct answer is:

**2. Reducing dimensionality in a multi-class classification problem**

**Explanation:**

* LDA is a **supervised** dimensionality reduction technique designed to maximize class separability.
* It works well in multi-class classification problems by projecting data onto a lower-dimensional space while preserving class discriminability.

Other options:

1. Incorrect — LDA is not used for regression problems.
2. While LDA can reduce dimensions to 3D, its main purpose is class separation, not just visualization.
3. Incorrect — LDA does not inherently handle missing data.


## Q14. What is the "Curse of Dimensionality"? 
1. The phenomenon where the amount of data needed to generalize accurately increases exponentially with the number of features 
2. The process of reducing the number of dimensions in a dataset 
3. The problem of having too few data points in a dataset 
4. The situation where data becomes easier to visualize with more dimesions

The correct answer is:

**1. The phenomenon where the amount of data needed to generalize accurately increases exponentially with the number of features**

**Explanation:**

* The **Curse of Dimensionality** refers to various problems that arise when dealing with high-dimensional data.
* One key aspect is that as the number of features (dimensions) increases, the volume of the feature space grows exponentially, making data points sparse.
* This sparsity means that you need exponentially more data to reliably learn patterns and generalize well.

Other options:

2. Incorrect — That describes dimensionality reduction, not the curse.
3. Incorrect — Having too few data points is related but not the definition of the curse.
4. Incorrect — More dimensions generally make visualization harder, not easier.


## Q15. Which of the following best describes LDA (Linear Discriminant Analysis)? 
1. A method for clustering data into groups 
2. A dimensionality reduction technique that also performs classification 
3. A technique used solely for feature selection
4. A non-linear dimensionality reduction technique

The correct answer is:

**2. A dimensionality reduction technique that also performs classification**

**Explanation:**

* LDA is primarily a **supervised dimensionality reduction** method that projects data onto a lower-dimensional space to maximize class separability.
* It can also be used as a **classifier** by assigning new samples to classes based on the learned discriminant functions.

Other options:

1. Incorrect — Clustering is unsupervised, whereas LDA is supervised.
2. Incorrect — LDA reduces dimensionality but is not just for feature selection.
3. Incorrect — LDA is a **linear** technique, not non-linear.


## Q16. What is the main advantage of using PCA for dimensionality reduction? 
1. It always increases model accuracy 
2. It reduces the number of dimensions while retaining most of the data's variance 
3. It simplifies data without any loss of information 
4. It increases the interpretability of the dataset

The correct answer is:

**2. It reduces the number of dimensions while retaining most of the data's variance**

**Explanation:**

* PCA transforms the data into a smaller set of uncorrelated variables (principal components) that capture most of the original variance.
* This helps simplify the dataset while preserving the most important information.

Other options:

1. Incorrect — PCA doesn’t always increase accuracy; it depends on the model and data.
2. Incorrect — PCA involves some loss of information because it reduces dimensionality.
3. Incorrect — PCA components are linear combinations and often less interpretable than original features.


## Q17. Which of the following is NOT a type of dimensionality reduction technique? 
1. Feature Selection 
2. Feature Extraction 
3. Feature Engineering 
4. Feature Elimination

The correct answer is:

**3. Feature Engineering**

**Explanation:**

* **Dimensionality reduction** involves reducing the number of features either by selecting a subset (**Feature Selection**, **Feature Elimination**) or by transforming features (**Feature Extraction**).
* **Feature Engineering** is a broader process of creating new features or modifying existing ones but is not specifically a dimensionality reduction technique.

Other options are types of dimensionality reduction methods:

* **Feature Selection:** Choosing a subset of original features.
* **Feature Extraction:** Creating new features from original features (e.g., PCA).
* **Feature Elimination:** Removing irrelevant or redundant features.

## Q18. In PCA, what does the first principal component represent? 
1. The component with the least variance 
2. The direction with the most variance in the dataset 
3. The direction with the least correlation with the dataset 
4. The direction of the smallest eigenvalue

The correct answer is:

**2. The direction with the most variance in the dataset**

**Explanation:**

* The **first principal component** is the linear combination of original features that captures the **maximum variance** in the data.
* It represents the direction along which the data varies the most.

Other options:

1. Incorrect — It represents the *most* variance, not the least.
2. Incorrect — It does not necessarily relate to correlation in that way.
3. Incorrect — It corresponds to the largest eigenvalue, not the smallest.


## Q19. Which technique is a linear method for dimensionality reduction? 
1. PCA (Principal Component Analysis) 
2. t-SNE ( t-Distributed Stochastic Neighbor Embedding) 
3. UMAP (Uniform Manifold Approximation and Projection) 
4. Autoencoder

The correct answer is:

**1. PCA (Principal Component Analysis)**

**Explanation:**

* **PCA** is a **linear** dimensionality reduction technique that projects data onto orthogonal directions of maximum variance.
* The others (**t-SNE, UMAP, Autoencoder**) are **non-linear** methods designed to capture complex structures in data.


## Q20. What is the primary goal of dimensionality reduction? 
1. To increase the number of features in a dataset 
2. To retain the most important information in data while reducing the number of features 
3. To remove outliers from the dataset 
4. To eliminate noise from the dataset

The correct answer is:

**2. To retain the most important information in data while reducing the number of features**

**Explanation:**

* The main goal of dimensionality reduction is to simplify data by reducing its features, while preserving the key information or structure.
* This helps improve model performance, reduce computation, and sometimes aids visualization.

Other options:

1. Incorrect — Dimensionality reduction reduces features, not increase them.
   3 & 4. Incorrect — While noise reduction or outlier handling might be side effects, they are not the primary goal.



## Q21. Which of the following is NOT an effect of high dimensionality? 
1. Data sparsity 
2. Increased computation 
3. Reduced model complexity 
4. Performance degradation of distance-based algorithms

The correct answer is:

**3. Reduced model complexity**

**Explanation:**

* High dimensionality usually **increases** model complexity because there are more features to consider.
* Effects of high dimensionality include:

  * **Data sparsity** — points become spread out in space.
  * **Increased computation** — more features mean more calculations.
  * **Performance degradation of distance-based algorithms** — distances become less meaningful.


## Q22. In the context of LDA, what does the term "discriminant" refer to? 
1. A feature that is discarded during dimensionality reduction 
2. A linear combination of features that separates classes 
3. A type of noise in the dataset 
4. A technique used for clustering data points

The correct answer is:

**2. A linear combination of features that separates classes**

**Explanation:**

* In LDA, a **discriminant** is a linear combination of the original features that best separates different classes.
* These discriminants form the new axes in the reduced-dimensional space to maximize class separability.

Other options:

1. Incorrect — Discriminants are not discarded features.
2. Incorrect — It's not noise.
3. Incorrect — LDA is supervised, not a clustering technique.


## Q23. When performing PCA on a dataset, why might you choose to perform a whitening transformation on the principal components? 
1. To increase the dimensionality of the data 
2. To make the components uncorrelated with unit variance 
3. To reduce the noise in the data 
4. To enforce a particular order in the components

The correct answer is:

**2. To make the components uncorrelated with unit variance**

**Explanation:**

* **Whitening** transforms the principal components so that they are **uncorrelated** and each has **unit variance**.
* This standardizes the scale of components, which can be useful for certain machine learning algorithms that assume features have similar scales.

Other options:

1. Incorrect — Whitening does not increase dimensionality.
2. Incorrect — Whitening is not primarily for noise reduction.
3. Incorrect — The order of components is based on explained variance, not whitening.


## Q24. In PCA, if the data is highly correlated, what can be expected regarding the number of principal components requires to capture most of the variance? 
1. A large number of principal components will be required 
2. Only one principal component will be sufficient 
3. A small number of principal components will capture most of the variance 
4. The correlation among features has no impact on th enumber of principal components

The correct answer is:

**3. A small number of principal components will capture most of the variance**

**Explanation:**

* When features are highly correlated, much of the variance lies along fewer directions.
* Therefore, a **small number** of principal components can capture most of the variance.
* This is why PCA is effective for reducing dimensionality in correlated datasets.

Other options:

1. Incorrect — Highly correlated data usually requires fewer components, not more.
2. Incorrect — Usually more than one component is needed unless data is perfectly correlated.
3. Incorrect — Correlation directly affects how many components explain variance.


## Q25. In LDA, how does the dimensionality of the projected space relate to the number of classes ccc in the dataset? 
1. The dimensionality of the projected space is equal to the number of classes ccc 
2. The dimensionality of the projected space is c-1c - 1c -1 
3. The dimensionality of the projected space is c+1c + 1c+1 
4. The dimensionality of the projected space is always less than c-1c - 1c-1

The correct answer is:

**2. The dimensionality of the projected space is ( c - 1 )**
* In LDA, the dimensionality of the space onto which the data is projected is c−1c - 1c−1, where ccc is the number of classes.

**Explanation:**

* In LDA, if you have **( c )** classes, the maximum number of discriminant vectors (dimensions) you can get is **( c - 1 )**.
* This is because LDA finds directions that best separate the classes, and the number of such directions is limited by the number of classes minus one.

Other options:

1. Incorrect — It’s not equal to the number of classes, but one less.
2. Incorrect — It’s not ( c + 1 ).
3. Incorrect — The dimensionality is exactly ( c - 1 ), not always less.
