# Feature Engineering 3

**Q1: What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.**

Min-Max scaling is a data preprocessing technique that scales numerical features to a specific range, typically between 0 and 1. It is used to standardize features and make them comparable.   

Example:

Consider a dataset with the following feature values: [2, 5, 8, 11, 14]. The minimum value is 2, and the maximum value is 14. To scale these values to the range of 0 to 1, we can use the following formula:

scaled_value = (value - min_value) / (max_value - min_value)

Applying this formula to the dataset, we get the following scaled values: [0, 0.23, 0.46, 0.69, 1].


In [1]:
import numpy as np

data = [2, 5, 8, 11, 14]

# Min-Max scaling to the range 0-1
min_val = min(data)
max_val = max(data)
scaled_data = [(x - min_val) / (max_val - min_val) for x in data]

print("scaled data",scaled_data)


scaled data [0.0, 0.25, 0.5, 0.75, 1.0]


**Q2: What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.**

The Unit Vector technique is a data preprocessing technique that scales numerical features to have a magnitude of 1. It is used to normalize features and remove the effect of magnitude differences.

Example:

Consider a dataset with the following feature values: [3, 4, 5]. To scale these values to unit vectors, we can use the following formula:

scaled_value = value / sqrt(sum(value^2))

Applying this formula to the dataset, we get the following scaled values: [0.447, 0.596, 0.697].



In [2]:

import math

data = [3, 4, 5]

scaled_value = [x / math.sqrt(sum(x**2 for x in data)) for x in data]

print("scaled data",scaled_value)


scaled data [0.4242640687119285, 0.565685424949238, 0.7071067811865475]


In [3]:
import math

data = [3, 4, 5]

def unit_vector(data):
  sum_squares = sum(x**2 for x in data)
  norm = math.sqrt(sum_squares)
  return [x / norm for x in data]

scaled_value = unit_vector(data)
print("scaled value",scaled_value)


scaled value [0.4242640687119285, 0.565685424949238, 0.7071067811865475]


**Q3: What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.**

PCA (Principal Component Analysis) is a dimensionality reduction technique that transforms a dataset with a large number of features into a dataset with a smaller number of uncorrelated features called principal components.   

Example:

Consider a dataset with 10 features. PCA can be used to reduce the dimensionality of this dataset to 2 principal components. These 2 principal components will capture most of the variance in the original dataset.

**Q4: What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.**

PCA is a feature extraction technique. It extracts new features from the original dataset that are uncorrelated and capture most of the variance in the data.   

Example:

Consider a dataset with 10 features. PCA can be used to extract 2 new features that are uncorrelated and capture most of the variance in the data. These 2 new features can be used as input to a machine learning model instead of the original 10 features.

**Q5: You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.**

To preprocess the data for a recommendation system for a food delivery service, you can use Min-Max scaling to scale the features price, rating, and delivery time to a range of 0 to 1. This will make the features comparable and improve the performance of the recommendation system.   

**Q6: You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.**

To reduce the dimensionality of the dataset for a stock price prediction model, you can use PCA to extract a smaller number of principal components that capture most of the variance in the data. This can help to improve the performance of the model and reduce overfitting.   

**Q7: For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.**

To scale the values to a range of -1 to 1, we can use the following formula:

scaled_value = 2 * ((value - min_value) / (max_value - min_value)) - 1

Applying this formula to the dataset, we get the following scaled values: [-1, -0.6, -0.2, 0.2, 0.6].



In [4]:
import numpy as np

data = [1, 5, 10, 15, 20]

# Min-Max scaling to the range -1-1
min_val = min(data)
max_val = max(data)
scaled_value = [2*((x - min_val) / (max_val - min_val))-1 for x in data]

print("scaled data",scaled_value)


scaled data [-1.0, -0.5789473684210527, -0.052631578947368474, 0.4736842105263157, 1.0]


**Q8: For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?**

To perform feature extraction using PCA on the dataset, we can first standardize the features and then calculate the covariance matrix. The eigenvalues and eigenvectors of the covariance matrix can be used to determine the principal components.

The number of principal components to retain depends on the desired level of dimensionality reduction and the amount of variance explained by each principal component. In general, we can choose to retain the principal components that explain a certain percentage of the variance, such as 95%.   
