Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

ans. **Min-Max Scaling** is a data preprocessing technique used to normalize the range of features or variables in a dataset. It rescales the data to a fixed range—usually **\[0, 1]**—by transforming the original values linearly.



### What is Min-Max Scaling?

The formula for Min-Max scaling is:

$$
X_{\text{scaled}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}
$$

* $X$ = original value
* $X_{\min}$ = minimum value in the dataset (feature-wise)
* $X_{\max}$ = maximum value in the dataset (feature-wise)
* $X_{\text{scaled}}$ = scaled value, which will be between 0 and 1



### Why Use Min-Max Scaling?

* **Normalizes data:** Brings all features to the same scale, preventing features with large ranges from dominating the learning process.
* **Improves algorithm performance:** Many ML algorithms (like KNN, neural networks) perform better with normalized data.
* **Keeps data distribution:** It preserves the relationships and distribution of the original data (unlike some other scaling methods like standardization which centers the data).



### Example:

Suppose you have a dataset with one feature (height in cm) for 5 people:

| Person | Height (cm) |
| ------ | ----------- |
| A      | 150         |
| B      | 160         |
| C      | 170         |
| D      | 180         |
| E      | 190         |

* Minimum height, $X_{\min} = 150$
* Maximum height, $X_{\max} = 190$

Apply Min-Max scaling for person C (170 cm):

$$
X_{\text{scaled}} = \frac{170 - 150}{190 - 150} = \frac{20}{40} = 0.5
$$

Similarly, scaled values for all:

| Person | Height (cm) | Scaled Height       |
| ------ | ----------- | ------------------- |
| A      | 150         | (150-150)/40 = 0.0  |
| B      | 160         | (160-150)/40 = 0.25 |
| C      | 170         | 0.5                 |
| D      | 180         | 0.75                |
| E      | 190         | 1.0                 |



Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.


ans.

## What is Unit Vector Scaling?

**Unit Vector scaling** rescales a feature vector so that its length (or magnitude) is 1. This is often done by dividing each component of the vector by its **Euclidean norm** (L2 norm).

Mathematically, for a feature vector $\mathbf{x} = [x_1, x_2, ..., x_n]$, the unit vector scaled form $\mathbf{x}_{\text{scaled}}$ is:

$$
\mathbf{x}_{\text{scaled}} = \frac{\mathbf{x}}{\|\mathbf{x}\|_2} = \frac{\mathbf{x}}{\sqrt{x_1^2 + x_2^2 + ... + x_n^2}}
$$



## How is it used in Feature Scaling?

* It scales the whole vector (all features together), not feature-wise independently.
* Common in text mining, image processing, and other cases where the direction of the data vector matters more than its magnitude.
* Used when we want each sample to have unit length, focusing on the *direction* rather than scale.



## How is Unit Vector different from Min-Max Scaling?

| Aspect    | Min-Max Scaling                                                    | Unit Vector Scaling                                        |
| --------- | ------------------------------------------------------------------ | ---------------------------------------------------------- |
| Purpose   | Rescales each feature independently to a range (e.g., 0 to 1)      | Scales the whole vector to have length 1                   |
| Operation | Feature-wise: $\frac{x - \min}{\max - \min}$                       | Vector-wise: divide by vector magnitude (L2 norm)          |
| Result    | Each feature lies in a fixed range \[0,1]                          | Vector magnitude = 1; features keep relative ratios        |
| Use cases | When features have different ranges and you want to normalize each | When direction of vector matters (e.g., cosine similarity) |



## Example:

Suppose a sample vector with three features:

$$
\mathbf{x} = [3, 4, 0]
$$

* Compute Euclidean norm:

$$
\|\mathbf{x}\|_2 = \sqrt{3^2 + 4^2 + 0^2} = \sqrt{9 + 16 + 0} = \sqrt{25} = 5
$$

* Unit Vector scaled:

$$
\mathbf{x}_{\text{scaled}} = \frac{1}{5}[3, 4, 0] = [0.6, 0.8, 0]
$$

Check length of scaled vector:

$$
\sqrt{0.6^2 + 0.8^2 + 0^2} = \sqrt{0.36 + 0.64} = \sqrt{1} = 1
$$



### In contrast, if we apply Min-Max scaling feature-wise (assuming min=0, max=5 for each feature for example):

* For feature 1: $\frac{3 - 0}{5 - 0} = 0.6$
* For feature 2: $\frac{4 - 0}{5 - 0} = 0.8$
* For feature 3: $\frac{0 - 0}{5 - 0} = 0$

This example yields the same values here, but if the min and max differ per feature, Min-Max scales features independently, while unit vector scales the whole vector together to length 1.


Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

ans.
##  PCA (Principal Component Analysis)?

**Principal Component Analysis (PCA)** is a statistical technique used to **reduce the dimensionality** of a dataset while preserving as much of the original variance (information) as possible.

* PCA transforms the original features into a new set of variables called **principal components**.
* These components are linear combinations of the original variables.
* The components are ordered by the amount of variance they capture from the data:

  * The 1st principal component captures the most variance.
  * The 2nd principal component captures the next most, and so on.
* By keeping only the top $k$ components, you reduce the number of features while retaining most of the data’s important information.



## How does PCA work?

1. **Standardize the data** (mean=0, variance=1 for each feature).
2. **Calculate the covariance matrix** of the features.
3. **Compute eigenvalues and eigenvectors** of the covariance matrix.
4. **Sort eigenvectors** by eigenvalues (variance explained).
5. **Project the original data** onto the top $k$ eigenvectors to get a reduced representation.



## Why use PCA for dimensionality reduction?

* To reduce computational cost.
* To remove noise and redundancy in data.
* To visualize high-dimensional data in 2D or 3D.
* To improve model performance by eliminating irrelevant features.



## Example:

Suppose you have a dataset with 2 features:

| Sample | Feature 1 (X) | Feature 2 (Y) |
| ------ | ------------- | ------------- |
| 1      | 2.5           | 2.4           |
| 2      | 0.5           | 0.7           |
| 3      | 2.2           | 2.9           |
| 4      | 1.9           | 2.2           |
| 5      | 3.1           | 3.0           |

* After standardizing the data and calculating covariance, PCA finds the directions (principal components) along which the data varies most.
* The first principal component might point roughly along the line $Y = X$, capturing the main variance.
* Instead of representing data in terms of (X, Y), we can use just the first principal component (a single dimension) which explains most variance.
* This reduces the 2D data into 1D, simplifying the dataset while keeping most information.



### Visual intuition:

If points lie roughly along a line in 2D, PCA finds that line and projects all points onto it, reducing 2D data to 1D.



Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

ans.

## What is Feature Extraction?

* **Feature Extraction** is the process of transforming raw data into a set of new features (or variables) that are more informative, non-redundant, and useful for modeling.
* Instead of using the original features directly, you create new features that capture the essential information.
* It helps in reducing dimensionality, improving model performance, and handling noisy or correlated data.



## Relationship Between PCA and Feature Extraction

* **PCA is a popular feature extraction technique.**
* It **extracts new features** called **principal components** by combining the original features.
* These components capture the **most important patterns/variance** in the data.
* PCA reduces dimensionality by keeping only a few principal components, which act as new, **informative features** for further use in machine learning or analysis.



## How PCA is used for Feature Extraction?

* PCA computes linear combinations of original features to form principal components.
* Each principal component is a new feature.
* These new features are **uncorrelated** and ordered by their importance (variance explained).
* By selecting the top $k$ components, PCA extracts the most relevant features from the original set.



## Example:

Suppose you have a dataset with 3 correlated features: $X_1, X_2, X_3$.

| Sample | $X_1$ | $X_2$ | $X_3$ |
| ------ | ----- | ----- | ----- |
| 1      | 2.0   | 2.1   | 1.9   |
| 2      | 3.5   | 3.6   | 3.7   |
| 3      | 1.2   | 1.1   | 1.3   |

* These features might be correlated (similar information).
* PCA analyzes the covariance among $X_1, X_2, X_3$ and creates 3 principal components $PC_1, PC_2, PC_3$.
* Suppose the first two components explain 95% of the variance.
* You can **extract** and use only $PC_1$ and $PC_2$ as new features, reducing dimensionality from 3 to 2.
* These extracted features capture most of the meaningful variation in the data, while reducing redundancy.



Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.


ans.


### Why Use Min-Max Scaling Here?

* These features are on **different scales**:

  * Price could range from ₹50 to ₹1000
  * Rating might be from 1 to 5
  * Delivery time could be from 10 to 90 minutes
* If you feed these raw features into many machine learning algorithms, features with larger scales (like price) can **dominate** the learning process.
* Min-Max scaling rescales all features to the **same range \[0, 1]**, making them comparable and ensuring fair influence on the model.



### How to Use Min-Max Scaling:

For each feature $X$, apply:

$$
X_{\text{scaled}} = \frac{X - X_{\min}}{X_{\max} - X_{\min}}
$$



### Step-by-Step Example:

Suppose for the **price** feature:

* $X_{\min} = 50$
* $X_{\max} = 1000$

A price of ₹200 would be scaled as:

$$
\frac{200 - 50}{1000 - 50} = \frac{150}{950} \approx 0.158
$$

Similarly, for **rating** (min=1, max=5):

* Rating 4 scaled as:

$$
\frac{4 - 1}{5 - 1} = \frac{3}{4} = 0.75
$$

For **delivery time** (min=10 min, max=90 min):

* Delivery time 30 minutes scaled as:

$$
\frac{30 - 10}{90 - 10} = \frac{20}{80} = 0.25
$$



### Benefits:

* Helps algorithms converge faster and perform better.
* Prevents bias toward features with larger numeric ranges.
* Keeps the data consistent and interpretable.



Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

ans.


### Problem Context:

* You have many features like company financials (e.g., revenue, profit, debt ratios) and market trends (e.g., indices, volumes, sentiment scores).
* These features might be **high-dimensional**, and many could be **correlated or redundant**.
* Too many features can cause:

  * Increased computational cost.
  * Overfitting.
  * Difficulty in interpreting the model.



### How to Use PCA for Dimensionality Reduction:

1. **Standardize the Data:**
   Scale features so each has zero mean and unit variance. PCA is sensitive to the scale of data.

2. **Calculate Covariance Matrix:**
   Understand how features vary with each other.

3. **Compute Eigenvalues & Eigenvectors:**
   Identify directions (principal components) where variance is maximized.

4. **Select Principal Components:**

   * Sort components by explained variance.
   * Choose top $k$ components that capture most variance (e.g., 90-95%).

5. **Transform Data:**
   Project the original dataset onto the selected principal components.
   This creates a **new feature set** with reduced dimensions but most of the important information retained.



### Benefits of PCA in Stock Price Prediction:

* **Reduces Noise:** Eliminates redundant or less important features.
* **Improves Speed:** Fewer features mean faster training and prediction.
* **Mitigates Multicollinearity:** PCA components are uncorrelated.
* **Simplifies Model:** Easier to interpret and less prone to overfitting.



### Example:

* Suppose you start with 50 financial and market features.
* PCA reveals the first 10 components capture 92% of the total variance.
* You keep these 10 principal components as your new features instead of all 50.
* These 10 features summarize the original data efficiently, making your stock price model faster and possibly more accurate.



Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.


ans.


### Min-Max Scaling formula for custom range $[a, b]$:

$$
X_{\text{scaled}} = a + \frac{(X - X_{\min})(b - a)}{X_{\max} - X_{\min}}
$$

where:

* $X$ = original value
* $X_{\min} = 1$ (minimum value in the dataset)
* $X_{\max} = 20$ (maximum value in the dataset)
* $a = -1$ (new min)
* $b = 1$ (new max)


### Step-by-step calculation for each value:

$$
X_{\text{scaled}} = -1 + \frac{(X - 1)(1 - (-1))}{20 - 1} = -1 + \frac{(X - 1) \times 2}{19}
$$

* For $X = 1$:

$$
-1 + \frac{(1 - 1) \times 2}{19} = -1 + 0 = -1
$$

* For $X = 5$:

$$
-1 + \frac{(5 - 1) \times 2}{19} = -1 + \frac{4 \times 2}{19} = -1 + \frac{8}{19} \approx -1 + 0.421 = -0.579
$$

* For $X = 10$:

$$
-1 + \frac{(10 - 1) \times 2}{19} = -1 + \frac{9 \times 2}{19} = -1 + \frac{18}{19} \approx -1 + 0.947 = -0.053
$$

* For $X = 15$:

$$
-1 + \frac{(15 - 1) \times 2}{19} = -1 + \frac{14 \times 2}{19} = -1 + \frac{28}{19} \approx -1 + 1.474 = 0.474
$$

* For $X = 20$:

$$
-1 + \frac{(20 - 1) \times 2}{19} = -1 + \frac{19 \times 2}{19} = -1 + 2 = 1
$$


### Final scaled values in $[-1,1]$ range:

$$
[-1.0, -0.579, -0.053, 0.474, 1.0]
$$




Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?

ans.

**\[height, weight, age, gender, blood pressure]**



### Step 1: Consider the features

* **height, weight, age, blood pressure** are **numerical** features.
* **gender** is typically a **categorical** feature (e.g., male/female), which should be encoded (e.g., 0/1) before PCA.



### Step 2: Preprocess the data

* Encode **gender** numerically.
* **Standardize** all features (mean=0, variance=1), because PCA is sensitive to feature scales.



### Step 3: Apply PCA

* PCA will analyze the covariance among these 5 features and generate 5 principal components.
* Each principal component is a linear combination of the original features.



### Step 4: How many principal components to retain?

* Usually, choose the number of components that **explain a high percentage of the variance** in the data, commonly **90-95%**.
* For example:

  * If the first 2 principal components explain 92% variance, keep 2 components.
  * If you need more detailed representation, keep 3 or more.
* This choice balances **dimensionality reduction** with **information preservation**.



### Why reduce dimensions?

* To **remove redundancy** or correlated information.
* To **simplify** the model and reduce computational cost.
* To potentially **improve model performance** by reducing noise.



### Practical note:

* After fitting PCA, look at the **explained variance ratio** output (usually from libraries like `sklearn`).
* Plot a **scree plot** or cumulative variance plot to decide the ideal number of components.



### So, for this dataset:

* Start by computing explained variance.
* Likely keep **2 or 3 principal components** to retain most info while reducing dimension from 5 to 2 or 3.

