Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.

<u>Answer</u> -

Min-Max scaling, also known as normalization, is a technique used in data preprocessing to rescale numerical features to a range between 0 and 1. This is done by subtracting the minimum value of the feature and then dividing by the range of the feature, which is the difference between the maximum and minimum values.

The formula for Min-Max scaling is:

X_scaled = (X - X_min) / (X_max - X_min)

where X is a feature value, X_min is the minimum value of that feature, X_max is the maximum value of that feature, and X_scaled is the rescaled value of that feature.

Min-Max scaling is used to normalize features that have different scales so that they can be more easily compared and interpreted. It is particularly useful in machine learning algorithms that use distance calculations, such as k-nearest neighbors and clustering algorithms, where features with larger scales can have a disproportionate impact on the results.

Here is an example to illustrate how Min-Max scaling is used:

Suppose we have a dataset of customer orders that includes two numerical features: order total and number of items. The order total ranges from $50 to $500, and the number of items ranges from 1 to 10. We want to use these features to predict whether a customer will return for another order.

Before using these features in a machine learning algorithm, we apply Min-Max scaling to normalize them:

Order total: X_scaled = (X - $50) / ($500 - $50)

Number of items: X_scaled = (X - 1) / (10 - 1)

For example, if a customer's order total is $200 and they ordered 5 items, the normalized values would be:

Order total: X_scaled = ($200 - $50) / ($500 - $50) = 0.375

Number of items: X_scaled = (5 - 1) / (10 - 1) = 0.5714

Now the two features have been rescaled to a range between 0 and 1, and they can be more easily compared and interpreted.


Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.

<u>Answer</u> -

Unit Vector scaling, also known as normalization, is another technique used in feature scaling to rescale numerical features. Unlike Min-Max scaling, which scales the feature values to a range between 0 and 1, Unit Vector scaling scales the feature values to have a unit norm.

A unit norm means that the vector length of the feature values is equal to 1. This is done by dividing each feature value by the Euclidean norm, which is the square root of the sum of the squares of the feature values. This ensures that all feature vectors have the same magnitude, regardless of their original scale.

The formula for Unit Vector scaling is:

X_scaled = X / ||X||

where X is a feature vector, X_scaled is the rescaled feature vector, and ||X|| is the Euclidean norm of X.

Unit Vector scaling is useful when the magnitude of the feature values is not important, but the direction of the feature vectors is important. It is commonly used in text classification and natural language processing, where the magnitude of the word frequencies is not important, but the direction of the word vectors is important.

Here is an example to illustrate how Unit Vector scaling is used:

Suppose we have a dataset of customer reviews for a restaurant that includes three features: the frequency of the words "food", "service", and "atmosphere" in the reviews. We want to use these features to classify the reviews as positive or negative.

Before using these features in a machine learning algorithm, we apply Unit Vector scaling to normalize them:

Word frequencies: X_scaled = X / ||X||

For example, if a review has the following word frequencies:

Food: 10    
Service: 5    
Atmosphere: 3    

The normalized values would be:

Word frequencies: X_scaled = [10, 5, 3] / sqrt(10^2 + 5^2 + 3^2) = [0.8704, 0.4352, 0.2611]

Now the three features have been rescaled to have a unit norm, and they can be used as a feature vector to classify the reviews as positive or negative, based on their direction in the three-dimensional space.


Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.

<u>Answer</u> -

Principal Component Analysis (PCA) is a technique used in data analysis and machine learning to reduce the dimensionality of a dataset. It does this by identifying the most important patterns in the data and transforming the original variables into a new set of variables, known as principal components, that capture most of the variance in the data.

PCA works by finding linear combinations of the original variables that maximize the variance in the data. These linear combinations are called principal components, and they are orthogonal to each other, meaning they are uncorrelated. The first principal component captures the most variance in the data, and each subsequent principal component captures the next highest variance, subject to the constraint that it is orthogonal to all previous principal components.

PCA is used in dimensionality reduction because it allows us to reduce the number of variables in a dataset while retaining most of the important information. This can help improve the efficiency and accuracy of machine learning algorithms that are sensitive to the curse of dimensionality, where the performance decreases as the number of variables increases.

Here is an example to illustrate how PCA is used in dimensionality reduction:

Suppose we have a dataset of customer reviews for a restaurant that includes ten features, such as the frequency of words like "food", "service", "atmosphere", and so on. We want to use these features to predict whether a customer will return for another visit.

Before using these features in a machine learning algorithm, we apply PCA to reduce the dimensionality:

1. Standardize the data: We first standardize the data by subtracting the mean of each feature and dividing by its standard deviation. This ensures that all features have the same scale and prevents features with larger variances from dominating the principal components.

2. Compute the principal components: We then compute the principal components of the standardized data using the singular value decomposition (SVD) algorithm. This gives us a new set of variables, where the first principal component captures the most variance in the data, the second principal component captures the next highest variance, and so on.

3. Select the number of principal components: We select the number of principal components to retain based on the amount of variance they capture. A common rule of thumb is to retain enough principal components to capture at least 80% of the variance in the data.

For example, suppose that we find that the first three principal components capture 85% of the variance in the data. We can then use these three principal components as our new features in a machine learning algorithm, instead of the original ten features. This reduces the dimensionality of the dataset while retaining most of the important information.

Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.

<u>Answer</u> -

PCA and feature extraction are closely related concepts, as PCA can be used for feature extraction in data analysis and machine learning.

Feature extraction involves transforming the raw data into a new set of features that capture the most important information in the data. These new features can then be used as inputs to a machine learning algorithm. PCA is one technique used for feature extraction, as it identifies the most important patterns in the data and transforms the original variables into a new set of variables that capture most of the variance in the data.

To use PCA for feature extraction, we follow these steps:

1. Standardize the data: We first standardize the data by subtracting the mean of each feature and dividing by its standard deviation. This ensures that all features have the same scale and prevents features with larger variances from dominating the principal components.

2. Compute the principal components: We then compute the principal components of the standardized data using the singular value decomposition (SVD) algorithm. This gives us a new set of variables, where the first principal component captures the most variance in the data, the second principal component captures the next highest variance, and so on.

3. Select the number of principal components: We select the number of principal components to retain based on the amount of variance they capture. A common rule of thumb is to retain enough principal components to capture at least 80% of the variance in the data.

4. Use the principal components as features: We can then use the selected principal components as our new features in a machine learning algorithm.

Here is an example to illustrate how PCA can be used for feature extraction:

Suppose we have a dataset of customer reviews for a restaurant that includes ten features, such as the frequency of words like "food", "service", "atmosphere", and so on. We want to use these features to predict whether a customer will return for another visit.

Before using these features in a machine learning algorithm, we apply PCA for feature extraction:

1. Standardize the data: We first standardize the data by subtracting the mean of each feature and dividing by its standard deviation.

2. Compute the principal components: We then compute the principal components of the standardized data using the singular value decomposition (SVD) algorithm.

3. Select the number of principal components: We select the number of principal components to retain based on the amount of variance they capture. Suppose we find that the first three principal components capture 85% of the variance in the data.

4. Use the principal components as features: We can then use these three principal components as our new features in a machine learning algorithm, instead of the original ten features. This reduces the dimensionality of the dataset while retaining most of the important information, and it may improve the efficiency and accuracy of the machine learning algorithm.

In this example, PCA is used for feature extraction by identifying the most important patterns in the data and transforming the original variables into a new set of variables that capture most of the variance in the data. This new set of variables can then be used as features in a machine learning algorithm.

Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.

<u>Answer</u> -

In this project, we can use Min-Max scaling to preprocess the data in order to ensure that all features are on the same scale and prevent features with larger ranges from dominating the recommendation system.

Min-Max scaling rescales each feature to a range between 0 and 1, where the minimum value of the feature is mapped to 0 and the maximum value of the feature is mapped to 1, while the other values are mapped proportionally in between. This ensures that all features have the same range and prevents features with larger values from having a disproportionate impact on the recommendation system.

To apply Min-Max scaling to the features in our food delivery service dataset, we would follow these steps:

1. Identify the features that need to be scaled: In this case, we want to scale features such as price, rating, and delivery time.

2. Compute the minimum and maximum values for each feature: We compute the minimum and maximum values for each feature in the dataset.

3. Rescale the features using the Min-Max scaling formula: For each feature, we rescale the values using the Min-Max scaling formula:

scaled_value = (original_value - minimum_value) / (maximum_value - minimum_value)

where "original_value" is the original value of the feature, "minimum_value" is the minimum value of the feature, "maximum_value" is the maximum value of the feature, and "scaled_value" is the rescaled value of the feature.

4. Replace the original values with the rescaled values: We replace the original values of each feature with the corresponding rescaled values.

After applying Min-Max scaling to the features in our food delivery service dataset, we can be confident that all features have the same scale and are normalized to a range between 0 and 1. This preprocessing step can help to ensure that our recommendation system is more accurate and efficient, and can provide better recommendations to customers based on their preferences and needs.

Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.

<u>Answer</u> -

In the context of building a model to predict stock prices, PCA can be used to reduce the dimensionality of the dataset, which can help to improve the performance and efficiency of the model. Here are the steps to use PCA for dimensionality reduction:

1. Standardize the data: Before applying PCA, it is recommended to standardize the data to ensure that all features have the same scale. This step can help to prevent features with larger variances from dominating the PCA process. Standardization can be done using techniques such as mean normalization or standard scaling.

2. Compute the covariance matrix: Once the data is standardized, the next step is to compute the covariance matrix. The covariance matrix is a square matrix that represents the relationship between the different features in the dataset. It is computed by taking the dot product of the transpose of the standardized data matrix and the standardized data matrix itself.

3. Compute the eigenvectors and eigenvalues of the covariance matrix: The eigenvectors and eigenvalues of the covariance matrix represent the directions and magnitudes of the most significant variance in the dataset, respectively. The eigenvectors are computed by solving the eigenvalue equation for the covariance matrix.

4. Select the principal components: The principal components are the eigenvectors with the highest eigenvalues. They represent the directions of the most significant variance in the dataset. We can select the principal components that capture a sufficient amount of variance in the data, usually based on a certain threshold or percentage of variance explained.

5. Transform the data: The final step is to transform the data using the selected principal components. This step involves taking the dot product of the standardized data matrix with the transpose of the matrix of selected principal components.

After applying PCA to the dataset, we can obtain a new dataset with reduced dimensionality that contains only the selected principal components. This reduced dataset can be used as input to build the predictive model for stock prices. The advantage of using PCA is that it helps to reduce the dimensionality of the dataset while retaining most of the variance in the data, thus improving the performance and efficiency of the model.

Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.

<u>Answer</u> -

To perform Min-Max scaling on the given dataset and transform it to a range of -1 to 1, we need to follow the steps below:

Find the minimum and maximum values of the dataset.
Minimum value = 1
Maximum value = 20

Apply the Min-Max scaling formula to each value in the dataset.
scaled_value = (original_value - minimum_value) / (maximum_value - minimum_value) * 2 - 1

For the given dataset, the scaled values will be:

scaled_value of 1 = (-1)
scaled_value of 5 = (-0.5)
scaled_value of 10 = (0)
scaled_value of 15 = (0.5)
scaled_value of 20 = (1)

So, after performing Min-Max scaling, the dataset will be transformed to the following values: [-1, -0.5, 0, 0.5, 1].

In [2]:
import numpy as np

# Define the dataset
dataset = np.array([1, 5, 10, 15, 20])

# Find the minimum and maximum values
min_val = np.min(dataset)
max_val = np.max(dataset)

# Apply Min-Max scaling formula to each value in the dataset
scaled_dataset = (dataset - min_val) / (max_val - min_val) * 2 - 1

# Print the scaled dataset
print(scaled_dataset)

[-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

<u>Answer</u> -

The number of principal components to be retained in PCA depends on several factors, such as the amount of variance explained by each principal component, the specific problem at hand, and the desired level of accuracy in the model.

To determine the number of principal components to be retained in this case, we need to follow these steps:

1. Standardize the data: Before applying PCA, we need to standardize the data to ensure that all features have the same scale. This can be done using techniques such as mean normalization or standard scaling.

2. Compute the covariance matrix: Once the data is standardized, the next step is to compute the covariance matrix.

3. Compute the eigenvectors and eigenvalues of the covariance matrix: The eigenvectors and eigenvalues of the covariance matrix represent the directions and magnitudes of the most significant variance in the dataset, respectively. The eigenvectors are computed by solving the eigenvalue equation for the covariance matrix.

4. Determine the number of principal components to be retained: The number of principal components to be retained can be determined by examining the eigenvalues and their corresponding explained variance. We can plot the eigenvalues in descending order and choose the number of principal components that explain a significant amount of variance in the data, such as 80% or 90%.

Assuming that the dataset contains a large number of observations, we can choose to retain 2 or 3 principal components that explain a significant amount of variance in the data. The specific number of principal components would depend on the amount of variance explained by each component and the desired level of accuracy in the model.

For example, if the first two principal components explain 90% of the variance in the data, then we can choose to retain those two principal components for feature extraction. If the third principal component adds little additional variance, we may choose to discard it.

It is important to note that the choice of the number of principal components to be retained can have an impact on the performance and efficiency of the predictive model. Choosing too few principal components can result in loss of information, while choosing too many can result in overfitting. Therefore, it is important to strike a balance between the amount of variance explained and the desired level of accuracy in the model.