**Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.**

**Ans :**
Min-Max scaling is a data preprocessing technique used to scale numeric features to a fixed range, typically between 0 and 1. It works by subtracting the minimum value of the feature and then dividing by the difference between the maximum and minimum values. Min-Max scaling is helpful when dealing with features that have different scales and ensures that all features contribute equally to the model.

In [4]:
from sklearn.preprocessing import MinMaxScaler

data = [[1], [5], [10], [15], [20]]
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)

print("Scaled data using Min-Max scaling:")
print(scaled_data)

Scaled data using Min-Max scaling:
[[0.        ]
 [0.21052632]
 [0.47368421]
 [0.73684211]
 [1.        ]]


---
**Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.**

**Ans :** he Unit Vector technique scales features to have a unit norm, meaning their magnitude becomes 1. It differs from Min-Max scaling in that it doesn't necessarily bound the features to a specific range. Unit Vector scaling is useful when the direction of the data matters more than its magnitude.


In [5]:
from sklearn.preprocessing import Normalizer

data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
scaler = Normalizer()
scaled_data = scaler.fit_transform(data)

print("Scaled data using Unit Vector scaling:")
print(scaled_data)

Scaled data using Unit Vector scaling:
[[0.26726124 0.53452248 0.80178373]
 [0.45584231 0.56980288 0.68376346]
 [0.50257071 0.57436653 0.64616234]]


---
**Q3. What is PCA (Principal Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.**

**Ans :**
PCA is a dimensionality reduction technique used to reduce the number of features in a dataset while preserving most of the information. It achieves this by identifying the directions (principal components) that capture the maximum variance in the data and projecting the data onto these components. PCA is useful for visualizing high-dimensional data and removing redundant features.

In [6]:
from sklearn.decomposition import PCA

data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
pca = PCA(n_components=2)

transformed_data = pca.fit_transform(data)

print("Transformed data after PCA:")
print(transformed_data)

Transformed data after PCA:
[[-5.19615242e+00  2.56395025e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 5.19615242e+00  2.56395025e-16]]


---
**Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.**

**Ans :**
PCA can be used for feature extraction by transforming the original features into a new set of orthogonal features (principal components) that capture the most significant variations in the data. These principal components can then be used as the new features for modeling. Feature extraction using PCA can help reduce the dimensionality of the dataset while retaining most of the information.

In [7]:
from sklearn.decomposition import PCA

data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

pca = PCA(n_components=2)

transformed_data = pca.fit_transform(data)

print("Transformed data after PCA (used for feature extraction):")
print(transformed_data)

Transformed data after PCA (used for feature extraction):
[[-5.19615242e+00  2.56395025e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 5.19615242e+00  2.56395025e-16]]


---
**Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.**

**Ans :**
To use Min-Max scaling for preprocessing:
- Calculate the minimum and maximum values for each feature (e.g., price, rating, delivery time).
- Subtract the minimum value from each feature and divide by the difference between the maximum and minimum values.
- This transforms the values of each feature to a range between 0 and 1, making them comparable and suitable for modeling.

In [None]:
from sklearn.preprocessing import MinMaxScaler

price = [10, 20, 30, 40]
rating = [3.5, 4.2, 4.8, 3.9]
delivery_time = [25, 30, 20, 35]


scaler = MinMaxScaler()

price_scaled = scaler.fit_transform([[p] for p in price])
rating_scaled = scaler.fit_transform([[r] for r in rating])
delivery_time_scaled = scaler.fit_transform([[dt] for dt in delivery_time])

print("Scaled price:", price_scaled)
print("Scaled rating:", rating_scaled)
print("Scaled delivery time:", delivery_time_scaled)

---
**Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.**

**Ans :**
To use PCA for dimensionality reduction:
- Standardize the features to have zero mean and unit variance.
- Apply PCA to the standardized dataset to find the principal components that capture the most significant variations in the data.
- Select a reduced number of principal components that explain a high percentage of the variance in the data, effectively reducing the dimensionality while retaining most of the information.

In [None]:
from sklearn.preprocessing import MinMaxScaler

price = [10, 20, 30, 40]
rating = [3.5, 4.2, 4.8, 3.9]
delivery_time = [25, 30, 20, 35]

scaler = MinMaxScaler()

price_scaled = scaler.fit_transform([[p] for p in price])
rating_scaled = scaler.fit_transform([[r] for r in rating])
delivery_time_scaled = scaler.fit_transform([[dt] for dt in delivery_time])

print("Scaled price:", price_scaled)
print("Scaled rating:", rating_scaled)
print("Scaled delivery time:", delivery_time_scaled)

---
**Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.**

**Ans :**
Min-Max scaling is a method used to scale the values of a dataset to a specified range. To perform Min-Max scaling, we subtract the minimum value of the dataset and then divide by the difference between the maximum and minimum values. Finally, we multiply by the desired range and add the minimum of the range.

In [9]:
import numpy as np

data = np.array([1, 5, 10, 15, 20])

min_val = np.min(data)
max_val = np.max(data)
scaled_data = -1 + (data - min_val) * (2 / (max_val - min_val))

print("Scaled data using Min-Max scaling to range of -1 to 1:")
print(scaled_data)

Scaled data using Min-Max scaling to range of -1 to 1:
[-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


---
**Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?**

**Ans :**
PCA can be used for feature extraction by finding the principal components that capture the most significant variations in the data. To determine the number of principal components to retain, we typically look at the cumulative explained variance ratio. Retaining enough principal components to explain a high percentage (e.g., 95%) of the total variance ensures that most of the information in the data is preserved while reducing dimensionality.

In [15]:
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

features = np.random.rand(100, 5)  # Assuming 100 samples and 5 features

scaler = StandardScaler()
standardized_features = scaler.fit_transform(features)

pca = PCA()
pca.fit(standardized_features)

cumulative_variance_ratio = np.cumsum(pca.explained_variance_ratio_)
num_components = np.argmax(cumulative_variance_ratio >= 0.95) + 1

print("Number of principal components to retain:", num_components)


Number of principal components to retain: 5
