

---

### Q1. **What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.**

**Min-Max scaling** is a normalization technique that transforms data to a specified range, typically [0,1], by scaling each feature based on the minimum and maximum values of the feature. This helps ensure that all features contribute equally to the model, particularly when features have different units or magnitudes.

**Formula:**
\[
X_{\text{scaled}} = \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}}
\]
Where \(X_{\text{min}}\) and \(X_{\text{max}}\) are the minimum and maximum values of the feature.

**Example:**
Suppose you have the following data: `[10, 20, 30, 40, 50]`. To apply Min-Max scaling to a range of [0,1]:

- \(X_{\text{min}} = 10\), \(X_{\text{max}} = 50\)
- For \(X = 10\): \(X_{\text{scaled}} = \frac{10 - 10}{50 - 10} = 0\)
- For \(X = 50\): \(X_{\text{scaled}} = \frac{50 - 10}{50 - 10} = 1\)

Resulting scaled data: `[0, 0.25, 0.5, 0.75, 1]`.

---

### Q2. **What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.**

The **Unit Vector** technique scales each feature vector such that the Euclidean (L2) norm of the vector is 1. This means the data is transformed to lie on a unit sphere, making it useful for models sensitive to the direction rather than the magnitude of the data.

**Formula:**
\[
X_{\text{unit}} = \frac{X}{\|X\|}
\]
Where \(\|X\|\) is the L2 norm of the vector.

**Difference from Min-Max Scaling:**
- **Min-Max scaling** normalizes data to a specific range (e.g., [0, 1]), whereas **Unit Vector scaling** ensures that the magnitude (length) of the vector is 1.
- Unit Vector scaling is focused on the direction of the data, whereas Min-Max focuses on range.

**Example:**
For a vector \([3, 4]\):
- L2 norm: \(\|X\| = \sqrt{3^2 + 4^2} = 5\)
- Unit Vector scaling: \([3/5, 4/5] = [0.6, 0.8]\)

---

### Q3. **What is PCA (Principal Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.**

**Principal Component Analysis (PCA)** is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space by projecting the data onto a set of orthogonal (uncorrelated) axes called principal components. These components capture the maximum variance in the data, with the first principal component capturing the most variance, and each subsequent component capturing the next highest variance.

**How it works:**
1. Standardize the data.
2. Compute the covariance matrix.
3. Find the eigenvectors (principal components) and eigenvalues (variance captured by each component).
4. Project the data onto the principal components.

**Example:**
Consider a dataset with two correlated features, \(X_1\) and \(X_2\). After applying PCA, you might find that the first principal component explains most of the variance in the data. By projecting onto this component, you reduce the dimensionality to one, capturing most of the information.

---

### Q4. **What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.**

**PCA as Feature Extraction**: PCA transforms the original feature space into a set of new features (principal components), which are linear combinations of the original features. These new features are uncorrelated and ranked by the amount of variance they explain in the data.

**Relationship**: Feature extraction using PCA is about finding the most informative combinations of features (principal components) that explain the variability in the data while reducing redundancy and noise.

**Example:**
In a dataset with 10 correlated features, applying PCA may show that 3 principal components explain 95% of the variance. These 3 components replace the original 10 features, making the data simpler and more interpretable.

---

### Q5. **You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.**

To preprocess the data for the recommendation system using **Min-Max scaling**:
1. Identify the features: price, rating, and delivery time. These features may have different units and ranges.
2. Apply Min-Max scaling to each feature independently to bring them into the same range (e.g., [0,1]):
   - **Price**: Normalize the range of prices from minimum to maximum.
   - **Rating**: Normalize the range of ratings (e.g., from 1 to 5).
   - **Delivery time**: Normalize delivery time (e.g., from 10 minutes to 60 minutes).
3. This ensures that no single feature dominates the model, improving the performance of algorithms sensitive to feature scales (e.g., k-NN, SVM).

---

### Q6. **You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.**

To use PCA for dimensionality reduction in predicting stock prices:
1. **Standardize the features**: Since PCA is affected by the scale of the data, standardize the financial data and market trends to have mean 0 and variance 1.
2. **Compute the covariance matrix**: This shows the relationships between features.
3. **Find principal components**: Determine the eigenvectors (principal components) and eigenvalues (variance explained).
4. **Select the top components**: Choose a number of principal components that explain a significant portion of the variance (e.g., 95% of the variance).
5. **Transform the data**: Project the original dataset onto the selected principal components to create a lower-dimensional dataset that retains most of the information.

---

### Q7. **For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.**

**Min-Max Scaling Formula** to transform to [-1, 1]:
\[
X_{\text{scaled}} = 2 \times \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}} - 1
\]
Where \(X_{\text{min}} = 1\) and \(X_{\text{max}} = 20\).

For each value:
- \(X = 1\): \(X_{\text{scaled}} = 2 \times \frac{1 - 1}{20 - 1} - 1 = -1\)
- \(X = 5\): \(X_{\text{scaled}} = 2 \times \frac{5 - 1}{20 - 1} - 1 = -0.789\)
- \(X = 10\): \(X_{\text{scaled}} = 2 \times \frac{10 - 1}{20 - 1} - 1 = -0.474\)
- \(X = 15\): \(X_{\text{scaled}} = 2 \times \frac{15 - 1}{20 - 1} - 1 = -0.158\)
- \(X = 20\): \(X_{\text{scaled}} = 2 \times \frac{20 - 1}{20 - 1} - 1 = 1\)

Result: `[-1, -0.789, -0.474, -0.158, 1]`.

---

### Q8. **For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?**

**Steps to perform Feature Extraction using PCA:**
1. **Standardize the features**: Standardize the dataset (excluding gender, as it is categorical) to have a mean of 0 and standard deviation of 1.
2. **Compute principal components**: Apply PCA to compute the principal components based on the covariance matrix.
3. **Choose the number of components**: Use the explained variance ratio to decide how many principal components to retain. For example, if the first 2 or 3 components explain 95% of the variance, retain those components.

**Why retain 2-3 components?**: In most cases, a few principal components capture most of the variability, simplifying the model while maintaining predictive power.