Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its 
 application
 
 
 Answer: Min-Max scaling, also known as normalization, is a data preprocessing technique used to rescale numeric features within a specific range. The purpose of this scaling method is to bring all the features to a common scale without distorting the original distribution of the data.

The Min-Max scaling formula is given as:

\[ X_{\text{new}} = \frac{{X - X_{\text{min}}}}{{X_{\text{max}} - X_{\text{min}}}} \]

Where:
- \(X\) is the original value of a feature.
- \(X_{\text{min}}\) is the minimum value of that feature in the dataset.
- \(X_{\text{max}}\) is the maximum value of that feature in the dataset.
- \(X_{\text{new}}\) is the rescaled value of the feature within the range of 0 to 1.

By applying this formula, the minimum value in the dataset becomes 0, the maximum value becomes 1, and all other values are proportionally scaled in between. This technique is particularly useful when the range of features varies significantly and when algorithms are sensitive to the scale of the input data, such as gradient descent-based optimization algorithms.

Here's an example to illustrate its application:

Let's say we have a dataset of house prices with two features: "Area" and "Number of Bedrooms." The "Area" feature has values ranging from 500 sq. ft. to 2000 sq. ft., while the "Number of Bedrooms" feature has values ranging from 1 to 5.

Original data:
```
|   Area   | Bedrooms |
|----------|----------|
|   500    |    1     |
|  1000    |    2     |
|  1500    |    3     |
|  2000    |    4     |
```

To apply Min-Max scaling, we calculate the minimum and maximum values of each feature:

- For the "Area" feature: \(X_{\text{min}} = 500\) and \(X_{\text{max}} = 2000\).
- For the "Number of Bedrooms" feature: \(X_{\text{min}} = 1\) and \(X_{\text{max}} = 4\).

Applying the Min-Max scaling formula, we can rescale the features to the range of 0 to 1:

```
|   Area   | Bedrooms |
|----------|----------|
|   0.0    |   0.0    |
|   0.333  |   0.333  |
|   0.667  |   0.667  |
|   1.0    |   1.0    |
```

After Min-Max scaling, both features are now within the range of 0 to 1, making them more comparable and suitable for feeding into machine learning algorithms that rely on scaled inputs.

Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? 
Provide an example to illustrate its application

Answer: The Unit Vector technique, also known as normalization or feature scaling, rescales features to have a unit norm (length of 1). It differs from Min-Max scaling as it focuses on the direction of the data points rather than their range. It is useful when the magnitude of features varies widely. For example, normalizing a dataset of vectors to unit norm.

Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an 
example to illustrate its application.

Answer: PCA (Principal Component Analysis) is a statistical technique used for dimensionality reduction. It transforms high-dimensional data into a lower-dimensional representation while preserving the most important information. It identifies the principal components (orthogonal directions of maximum variance) and projects the data onto these components. For example, reducing the dimensions of an image dataset while retaining the most relevant visual features.

Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature 
Extraction? Provide an example to illustrate this concept.

Answer: PCA is a dimensionality reduction technique that can be used for feature extraction. It identifies the most informative features (principal components) in the data and discards the less important ones. These principal components can serve as new features, capturing the most significant information. For example, extracting facial features (like eyes, nose, and mouth) from images using PCA.

Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset 
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to 
preprocess the data.

Answer: To preprocess the data for the food delivery recommendation system, I would use Min-Max scaling. It would transform the numerical features like price, rating, and delivery time into a common range (e.g., 0 to 1) while preserving their relative relationships. This ensures that no single feature dominates the recommendation process.

Q6. You are working on a project to build a model to predict stock prices. The dataset contains many 
features, such as company financial data and market trends. Explain how you would use PCA to reduce the 
dimensionality of the dataset.

Answer: To reduce the dimensionality of the stock price prediction dataset, I would apply PCA. It would identify the most important features that explain the maximum variance in the data. By selecting a subset of these principal components, we can effectively reduce the number of features while retaining crucial information for modeling stock price movements.

In [30]:
#  Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the 
# values to a range of -1 to 1.


data = [1, 5, 10, 15, 20]
min_val = min(data)
max_val = max(data)
scaled_data = []

for value in data:
    scaled_value = (value - min_val) / (max_val - min_val) * 2 - 1
    scaled_data.append(scaled_value)

print(scaled_data)


[-1.0, -0.5789473684210527, -0.052631578947368474, 0.4736842105263157, 1.0]


In [31]:
# Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform 
# Feature Extraction using PCA. How many principal components would you choose to retain, and why

import numpy as np
from sklearn.decomposition import PCA

# Sample dataset
data = np.array([[170, 65, 30, 1, 120],
                 [165, 60, 35, 0, 130],
                 [180, 75, 40, 1, 140],
                 [160, 55, 28, 0, 110]])

# Apply PCA
pca = PCA()
pca.fit(data)

# Determine the explained variance ratio of each principal component
explained_variance_ratio = pca.explained_variance_ratio_

# Calculate the cumulative explained variance ratio
cumulative_variance_ratio = np.cumsum(explained_variance_ratio)

# Choose the number of principal components to retain based on explained variance threshold
n_components = np.sum(cumulative_variance_ratio < 0.95) + 1

print("Number of principal components to retain:", n_components)


Number of principal components to retain: 2
