
## Q1: What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

**Answer:**
**Min-Max Scaling** is a feature scaling technique that transforms the features to a specific range, typically between 0 and 1, or any other defined range. It’s used in data preprocessing to normalize the features, ensuring that they have the same scale without distorting differences in the range of values.

The formula for Min-Max scaling is:
\[
X_{scaled} = \frac{X - X_{min}}{X_{max} - X_{min}}
\]
Where:
- \(X\) is the original value,
- \(X_{min}\) and \(X_{max}\) are the minimum and maximum values of the feature.

**Example:**
```python
from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[1], [5], [10], [15], [20]])
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(data)
print(scaled_data)
```

---

## Q2: What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

**Answer:**
The **Unit Vector technique** scales features by normalizing each data point to have a unit norm (length of 1). It is useful when you want to ensure that the magnitude of each feature is consistent, making them comparable in size.

The formula is:
\[
X_{scaled} = \frac{X}{\|X\|}
\]
Where:
- \( \|X\| \) is the Euclidean norm (length) of the vector.

**Difference from Min-Max Scaling:**
- Min-Max scaling normalizes values between a specific range (e.g., 0 and 1).
- Unit vector scaling transforms each data point to have a unit length (1).

**Example:**
```python
from sklearn.preprocessing import normalize
data = np.array([[4, 1], [2, 2], [1, 3]])
normalized_data = normalize(data, norm='l2')
print(normalized_data)
```

---

## Q3: What is PCA (Principal Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

**Answer:**
**Principal Component Analysis (PCA)** is a statistical technique used to reduce the dimensionality of datasets by transforming the original features into a set of new, uncorrelated features called **principal components**. These components capture the most significant variance in the data.

PCA works by:
1. Standardizing the data.
2. Calculating the covariance matrix.
3. Finding the eigenvectors (principal components) and eigenvalues (variance explained by each component).
4. Selecting the top components that capture most of the variance.

**Example:**
```python
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np

data = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0]])
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

pca = PCA(n_components=1)
pca_data = pca.fit_transform(scaled_data)
print(pca_data)
```

---

## Q4: What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

**Answer:**
**Relationship between PCA and Feature Extraction:**
PCA can be viewed as a feature extraction technique because it transforms the original features into a new set of features (principal components) that capture the most important information (variance) in the data. These new features are linear combinations of the original ones but have reduced dimensionality and are uncorrelated.

**How PCA is used for Feature Extraction:**
- PCA projects the data onto a lower-dimensional subspace by selecting a subset of the principal components.
- These components are new features that capture the maximum variance in the data.

**Example:**
```python
# Assume data has 5 original features, but we extract 2 principal components
pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)
print(pca_data)  # Reduced dataset with 2 extracted features
```

---

## Q5: You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

**Answer:**
In the recommendation system project, Min-Max scaling can be used to normalize features like price, rating, and delivery time so that they are on the same scale. This helps the model treat these features equally during training, preventing features with larger ranges (e.g., price) from dominating others (e.g., rating).

**Steps:**
1. **Apply Min-Max scaling** to normalize all features to a range of 0 to 1 (or any desired range).
2. **Fit the model** using the scaled data to ensure uniformity across all features.

**Example:**
```python
# Preprocessing the dataset with price, rating, and delivery time features
data = np.array([[100, 4.5, 30], [200, 4.0, 25], [150, 4.2, 35]])
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
print(scaled_data)
```

---

## Q6: You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

**Answer:**
For predicting stock prices, PCA can be used to reduce the dimensionality of the dataset by selecting the most important principal components that capture the majority of the variance in the data, simplifying the model and improving its performance.

**Steps:**
1. **Standardize the data** to ensure each feature has zero mean and unit variance.
2. **Apply PCA** to reduce the number of features by selecting the top principal components that capture the most variance.
3. **Train the model** using the reduced set of features, improving training time and avoiding overfitting.

**Example:**
```python
pca = PCA(n_components=5)  # Retain 5 principal components
pca_data = pca.fit_transform(scaled_data)
print(pca_data)
```

---

## Q7: For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.

**Answer:**
We can apply Min-Max scaling to transform the given values into the range [-1, 1]. The formula is:
\[
X_{scaled} = \frac{X - X_{min}}{X_{max} - X_{min}} \times (new\_max - new\_min) + new\_min
\]
Where:
- \(new\_min = -1\)
- \(new\_max = 1\)

**Example:**
```python
data = np.array([[1], [5], [10], [15], [20]])
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_data = scaler.fit_transform(data)
print(scaled_data)
```

---

## Q8: For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?

**Answer:**
When performing PCA on features like height, weight, age, gender, and blood pressure, the number of principal components to retain depends on the explained variance ratio. Typically, we choose the number of components that capture at least 95% of the variance to ensure that the most important information in the data is retained.

**Steps:**
1. **Standardize the features** (except for gender, which is categorical and should be encoded).
2. **Apply PCA** and analyze the explained variance ratio.
3. **Retain components** that capture 95% or more of the variance.

**Example:**
```python
pca = PCA(n_components=0.95)  # Retain components that explain 95% variance
pca_data = pca.fit_transform(scaled_data)
print(pca_data)
```

The number of components depends on the cumulative explained variance. Typically, you would retain 2-3 components if they capture sufficient variance.

---
