## Q1: What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

*Min-Max Scaling*:
Min-Max scaling is a normalization technique used to scale the features of a dataset to a fixed range, typically [0, 1] or [-1, 1]. This is done by subtracting the minimum value of each feature and then dividing by the range (max - min) of the feature.

*Formula*:
\[ X' = \frac{X - X_{\min}}{X_{\max} - X_{\min}} \]

For scaling to a range [a, b]:
\[ X' = a + \frac{(X - X_{\min})(b - a)}{X_{\max} - X_{\min}} \]

*Example*:
Suppose we have a feature with values [2, 4, 6, 8, 10] and we want to scale it to the range [0, 1].

1. Minimum value \(X_{\min} = 2\)
2. Maximum value \(X_{\max} = 10\)

Scaled values:
\[ X' = \frac{X - 2}{10 - 2} = \frac{X - 2}{8} \]

So,
- 2 -> 0
- 4 -> 0.25
- 6 -> 0.5
- 8 -> 0.75
- 10 -> 1

## Q2: What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

*Unit Vector Technique*:
The Unit Vector technique (or normalization to unit norm) scales the feature vector such that its Euclidean length (L2 norm) is 1. Each data point is scaled individually to have unit norm.

*Formula*:
\[ X' = \frac{X}{\|X\|_2} \]
where \(\|X\|_2\) is the L2 norm of the vector \(X\).

*Difference from Min-Max Scaling*:
- Min-Max scaling normalizes data to a specific range.
- Unit Vector scaling normalizes data to have a unit length.

*Example*:
Consider a feature vector \(X = [1, 2, 2]\).

1. Calculate the L2 norm:
\[ \|X\|_2 = \sqrt{1^2 + 2^2 + 2^2} = \sqrt{9} = 3 \]

2. Normalize each component:
\[ X' = \left[\frac{1}{3}, \frac{2}{3}, \frac{2}{3}\right] \]

## Q3: What is PCA (Principal Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

*Principal Component Analysis (PCA)*:
PCA is a statistical technique used to emphasize variation and bring out strong patterns in a dataset. It transforms the data into a set of linearly uncorrelated variables called principal components. These components are ordered so that the first few retain most of the variation present in the original dataset.

*Steps*:
1. Standardize the data.
2. Compute the covariance matrix.
3. Calculate eigenvalues and eigenvectors.
4. Sort eigenvalues and select the top k eigenvectors.
5. Transform the original data.

*Example*:
Suppose we have a 2D dataset with points \((x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\). PCA might find that most of the variance is along a line through the origin, and we can project our data onto this line (the first principal component) to reduce dimensionality while retaining most of the variance.

## Q4: What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

*Relationship*:
PCA can be used for feature extraction by transforming the original features into a new set of features (principal components) that are linear combinations of the original features. These new features capture the most variance in the data with the fewest components, thus simplifying the dataset.

*Using PCA for Feature Extraction*:
1. Compute the principal components.
2. Select the top k principal components based on eigenvalues.
3. Use these components as new features for your model.

*Example*:
Suppose we have a dataset with 100 features. PCA is applied and it turns out that 95% of the variance can be captured with 10 principal components. These 10 components can now be used as new features, reducing dimensionality from 100 to 10 while preserving most of the information.

## Q5: You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

To preprocess the data using Min-Max scaling:
1. Identify the range (minimum and maximum values) for each feature (price, rating, delivery time).
2. Apply Min-Max scaling to each feature to transform them into a common scale, typically [0, 1] or [-1, 1].

*Example*:
- Suppose the price range is \$5 to \$50.
- The rating range is 1 to 5.
- The delivery time range is 10 to 60 minutes.

For price:
\[ \text{Scaled Price} = \frac{\text{Price} - 5}{50 - 5} \]

For rating:
\[ \text{Scaled Rating} = \frac{\text{Rating} - 1}{5 - 1} \]

For delivery time:
\[ \text{Scaled Delivery Time} = \frac{\text{Delivery Time} - 10}{60 - 10} \]

## Q6: You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

To reduce the dimensionality of the dataset using PCA:
1. Standardize the dataset to have zero mean and unit variance.
2. Compute the covariance matrix of the features.
3. Calculate the eigenvalues and eigenvectors of the covariance matrix.
4. Sort the eigenvalues in descending order and select the top k eigenvectors corresponding to the largest eigenvalues.
5. Project the original data onto the selected eigenvectors to obtain the reduced feature set.

*Example*:
If the original dataset has 100 features, and after applying PCA, it is determined that 10 principal components capture 95% of the variance, you can reduce the dataset to these 10 components.

## Q7: For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.

To scale the values [1, 5, 10, 15, 20] to a range of -1 to 1:
1. Minimum value \(X_{\min} = 1\)
2. Maximum value \(X_{\max} = 20\)

\[ X' = -1 + \frac{(X - 1)(1 - (-1))}{20 - 1} \]

\[ X' = -1 + \frac{(X - 1) \cdot 2}{19} \]

So,
- 1 -> -1
- 5 -> \(-1 + \frac{(5 - 1) \cdot 2}{19} = -1 + \frac{8}{19} \approx -0.58\)
- 10 -> \(-1 + \frac{(10 - 1) \cdot 2}{19} = -1 + \frac{18}{19} \approx -0.05\)
- 15 -> \(-1 + \frac{(15 - 1) \cdot 2}{19} = -1 + \frac{28}{19} \approx 0.47\)
- 20 -> 1

## Q8: For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?

To determine how many principal components to retain:
1. Standardize the dataset.
2. Apply PCA to compute the principal components.
3. Analyze the explained variance ratio of each principal component.

*Selection Criteria*:
- Typically, choose enough components to explain a high percentage of variance, often 95% or more.

*Example*:
Suppose the explained variance ratios are:
- PC1: 40%
- PC2: 30%
- PC3: 15%
- PC4: 10%
- PC5: 5%

Cumulative explained variance for the first three components is 40% + 30% + 15% = 85%. If 85% is considered sufficient, you might choose to retain the first three components. However, if you require 95% variance explained, you might retain the first four components.