# Feature Engineering-3

## Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

Min-Max scaling, also known as normalization, is a data preprocessing technique used to scale numerical features within a specific range, typically between 0 and 1. This transformation is performed to ensure that all features have the same scale, making them more comparable and preventing certain features from dominating others when training machine learning models. Min-Max scaling is defined by the following formula for each feature:


$$X_{scaled} = \frac{X - X_{min}} {X_{max} - X_{min}} $$

Where:

- $X_{scaled}$ is the scaled value of the feature 
- $X $ the original value of the feature.
- $X_{min}$ is the minimum value of the feature in the dataset.
- $X_{max}$ is the maximum value of the feature in the dataset.

After applying Min-Max scaling, the values of the feature will fall within the range [0, 1]. If the minimum and maximum values of the original feature are known, this scaling ensures that the minimum value maps to 0, and the maximum value maps to 1.

Here's an example to illustrate Min-Max scaling:

Suppose you have a dataset containing the ages of people and their corresponding incomes, and you want to scale both features using Min-Max scaling.

Original Data:

- Age (in years): [25, 40, 30, 35, 50]
- Income (in thousands): [30, 60, 40, 50, 80]

For Age:
- $X_{min}$ = 25 (minimum age)
- $X_{max}$ = 50 (maximum age)

For Income:
- $X_{min}$ = 30 (minimum income)
- $X_{max}$ = 80 (maximum income)

Now, let's scale the data:

Scaled Age:
- $X_{scaled} = \frac{X - 25}{50 - 25}$ for each value in the Age feature.

Scaled Income:
- $X_{scaled} = \frac{X - 30}{80 - 25}$ for each value in the Income feature.

The scaled values will be between 0 and 1 for both Age and Income. This scaling allows you to compare and use these features in machine learning algorithms without one feature dominating the other due to differences in their original scales.

## Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

The Unit Vector technique, also known as Unit Vector scaling or Vector normalization, is a feature scaling method that scales numerical features to have a unit norm. It differs from Min-Max scaling in that it doesn't scale the data to a specific range like [0, 1] but instead scales the data such that the resulting vector has a Euclidean norm (L2 norm) of 1. The L2 norm of a vector is the square root of the sum of the squares of its components.

The formula for scaling a feature using Unit Vector scaling is as follows:

$$X_{scaled} = \frac{X}{||X||_{2}}$$



​
 

Where:

- $X_{scaled}$ is the scaled value of the feature $X$.
- $X$ is the original value of the feature.
- $||X||_{2}$ represents the L2 norm of the feature vector $X$

Unit Vector scaling is often used in machine learning when the direction of the data vectors is more important than their actual magnitude. It is useful in cases where the magnitude of the features is not crucial, such as in some clustering or dimensionality reduction algorithms.

Here's an example to illustrate Unit Vector scaling:

Suppose you have a dataset with two features, representing the length and width of different objects. You want to scale these features using Unit Vector scaling.

Original Data:

Length (in centimeters): [5, 8, 3, 10, 6]
Width (in centimeters): [2, 4, 1, 5, 3]

To scale the data using Unit Vector scaling:

1. Calculate the L2 norm for each data point:
 
2. Scale each feature by dividing by its L2 norm:

Scaled Length: $ \frac{Length}{∥Length∥_{2}} $

Scaled Width: $ \frac{width}{∥width∥_{2}} $

For example, for the first data point:

- Length: 5
- Width: 2

Calculate the L2 norm:  $ ∥X∥_{2} = \sqrt{5^{2} + 2^{2}}$

Scaled Length: $ \frac{5}{\sqrt{29}} $

Scaled Length: $ \frac{2}{\sqrt{29}} $

The scaled values will have a Euclidean norm of 1, meaning they fall on the unit circle in a two-dimensional space. This scaling technique emphasizes the direction of the data vectors while maintaining their relative relationships. It does not enforce a specific range for the features, unlike Min-Max scaling.


## Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

Principal Component Analysis (PCA) is a dimensionality reduction technique commonly used in machine learning and statistics. Its goal is to transform high-dimensional data into a new coordinate system, capturing the most important information while minimizing information loss. PCA achieves this by identifying the principal components, which are the directions in the data that have the maximum variance.

The steps involved in PCA are as follows:

1. **Standardize the Data**: Ensure that the data is centered (subtract the mean) and standardized (divide by the standard deviation) to give all features equal importance.

2. **Compute the Covariance Matrix**: Calculate the covariance matrix for the standardized data.

3. **Compute Eigenvectors and Eigenvalues**: Find the eigenvectors and eigenvalues of the covariance matrix. The eigenvectors represent the principal components, and the eigenvalues indicate the amount of variance captured by each principal component.

4. **Sort and Select Principal Components**: Sort the eigenvectors by their corresponding eigenvalues in descending order. Choose the top $ k $ eigenvectors to form the new feature subspace (where $k$ is the desired dimensionality of the reduced data).

5. **Project the Data**: Multiply the original standardized data by the selected eigenvectors to obtain the new reduced-dimensional data.

In [1]:
import numpy as np
from sklearn.decomposition import PCA

# Generate sample data
np.random.seed(42)
data = np.random.rand(5, 3)  # 5 samples, 3 features

# Step 1: Standardize the data
mean = np.mean(data, axis=0)
std_dev = np.std(data, axis=0)
standardized_data = (data - mean) / std_dev

# Step 2: Compute the Covariance Matrix
cov_matrix = np.cov(standardized_data, rowvar=False)

# Step 3: Compute Eigenvectors and Eigenvalues
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Step 4: Sort and Select Principal Components
sorted_indices = np.argsort(eigenvalues)[::-1]
top_k_indices = sorted_indices[:2]  # Select the top 2 principal components

# Step 5: Project the Data
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(standardized_data)

# Print the original and reduced data
print("Original Data:\n", standardized_data)
print("\nReduced Data:\n", reduced_data)


Original Data:
 [[-0.51154143  1.31491696  0.64425338]
 [ 0.30841528 -0.73584035 -1.17636352]
 [-1.66932533  1.09676143  0.23057173]
 [ 0.70871634 -1.08533586  1.39625714]
 [ 1.16373513 -0.59050218 -1.09471874]]

Reduced Data:
 [[ 1.42951997  0.13831095]
 [-1.09204359 -0.83910405]
 [ 1.92069255 -0.42795863]
 [-0.71671805  1.755521  ]
 [-1.54145089 -0.62676928]]


## Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

Principal Component Analysis (PCA) and Feature Extraction are closely related concepts used in data analysis and machine learning.

PCA is a statistical procedure that uses an orthogonal transformation to convert a set of correlated variables into a set of uncorrelated variables1. It’s a technique for dimensionality reduction that identifies a set of orthogonal axes, called principal components, that capture the maximum variance in the data1. The main goal of PCA is to reduce the dimensionality of a dataset while preserving the most important patterns or relationships between the variables without any prior knowledge of the target variables1.

On the other hand, Feature Extraction is a process of dimensionality reduction by which an initial set of raw data is reduced to more manageable groups for processing2. It’s the name for methods that select and/or combine variables into features, effectively reducing the amount of data that must be processed, while still accurately and completely describing the original data set2.

So, PCA is actually a type of feature extraction technique. It’s used to reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables, retaining most of the sample’s information, and useful for the regression and classification of data1.

For example, consider a dataset with a large number of features. If we apply PCA, it will identify a new set of variables, smaller than the original set of variables, that retains most of the sample’s information1. These new features, called Principal Components, are ordered in decreasing order of importance1. Here, Principal Component-1 (PC1) captures the maximum information of the original dataset, followed by PC2, then PC3, and so on1. This way, PCA helps in reducing the dimensionality of the dataset while preserving the most important patterns or relationships between the variables1.

## Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

## Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

## Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.

## Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?