ASSIGNMENT: FE-3

1.  What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its 
application.

Min-Max scaling, also known as normalization, is a data preprocessing technique used to transform the values of a numerical dataset into a specified range. It involves scaling the values of the dataset so that they fall within a minimum and maximum value, typically between 0 and 1.

To apply Min-Max scaling, we use the following formula for each value in the dataset:

X_norm = (X - X_min) / (X_max - X_min)

where X is the original value, X_min is the minimum value in the dataset, X_max is the maximum value in the dataset, and X_norm is the scaled value.

Min-Max scaling is often used in machine learning to improve the performance of algorithms that are sensitive to the scale of the input features. By scaling the features to a common range, we can avoid features with larger scales dominating the model and potentially leading to biased results.

Here's an example to illustrate the application of Min-Max scaling:

Suppose we have a dataset containing the heights of people in centimeters. The minimum height in the dataset is 150 cm, and the maximum height is 200 cm. To apply Min-Max scaling to this dataset, we can use the formula above to scale each value:

X_norm = (X - 150) / (200 - 150)

For example, if a person's height is 170 cm, the scaled value would be:

X_norm = (170 - 150) / (200 - 150) = 0.33

After scaling all the values in the dataset, they will all fall within the range of 0 to 1, which can be helpful for training machine learning models that are sensitive to the scale of the input features.

2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? 
Provide an example to illustrate its application.

The Unit Vector technique is another data preprocessing technique used for feature scaling, also known as vector normalization. It scales the values of a feature to have a length of 1, while preserving the direction of the data points. This technique is commonly used when we want to ensure that the magnitude of the feature values does not affect the analysis or machine learning model.

To apply the Unit Vector technique to a feature, we first calculate the magnitude of the feature vector, which is the square root of the sum of the squared feature values. Then, we divide each feature value by the magnitude to obtain a new set of values that form a vector with a length of 1.

Here's the formula for the Unit Vector technique:

X_unit = X / ||X||

where X is the original feature vector, ||X|| is the magnitude of the feature vector, and X_unit is the new scaled feature vector.

Compared to Min-Max scaling, which scales the values of a feature to a specified range, the Unit Vector technique ensures that all features have the same scale and direction. This can be useful in cases where we want to measure similarity or distance between data points, or when we want to apply algorithms that are sensitive to feature magnitude, such as Principal Component Analysis (PCA).

Here's an example to illustrate the application of the Unit Vector technique:

Suppose we have a dataset containing the weight and height of people. We want to scale the weight feature using the Unit Vector technique. Suppose we have a person with a weight of 70 kg and a height of 180 cm. The original feature vector is [70, 180].

To apply the Unit Vector technique, we first calculate the magnitude of the feature vector:

||X|| = sqrt(70^2 + 180^2) = 191.5

Then, we divide each feature value by the magnitude to obtain the new scaled feature vector:

X_unit = [70 / 191.5, 180 / 191.5] = [0.365, 0.935]

After scaling the feature vector using the Unit Vector technique, the feature values have been scaled to have a length of 1, while preserving the direction of the data points.

3.  What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an 
example to illustrate its application

PCA, or Principal Component Analysis, is a technique used in data analysis and machine learning to reduce the dimensionality of a dataset by identifying and removing the correlated features. It is a method for transforming a dataset of potentially correlated variables into a set of uncorrelated variables, called principal components, that explain the maximum amount of variance in the original dataset.

PCA works by identifying the directions of maximum variance in the dataset and projecting the data onto these directions to create a new set of features that capture the most important information in the dataset. The first principal component captures the most variance in the data, followed by the second principal component, and so on.

PCA can be useful in reducing the dimensionality of a dataset while retaining most of the information in the data. By reducing the number of features, we can simplify the analysis, speed up the computation, and potentially improve the performance of machine learning models.

Here's an example to illustrate the application of PCA:

Suppose we have a dataset containing the height, weight, and shoe size of people. We want to reduce the dimensionality of the dataset using PCA. First, we standardize the data to have a mean of zero and a standard deviation of one to ensure that all variables have equal importance in the PCA.

Next, we apply PCA to the standardized data to identify the principal components that capture the maximum amount of variance in the data. Let's say we find that the first principal component captures 60% of the variance in the data and is a linear combination of the height and weight variables. The second principal component captures 30% of the variance in the data and is a linear combination of the height and shoe size variables.

We can use the first two principal components as new features to represent the original data in a lower-dimensional space. These principal components are uncorrelated and capture the most important information in the data. We can then use these new features for further analysis, such as clustering or classification.

By reducing the dimensionality of the dataset from three to two, we have simplified the analysis and potentially improved the performance of machine learning models. PCA can be a powerful tool for handling high-dimensional data and extracting meaningful information from complex datasets.

4.  What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature 
Extraction? Provide an example to illustrate this concept?

PCA can be used for feature extraction, which is a technique that involves creating new features from the existing features of a dataset. The goal of feature extraction is to create a smaller set of features that capture the most important information in the data, while reducing the dimensionality of the dataset.

PCA is a powerful technique for feature extraction because it identifies the directions of maximum variance in the dataset and projects the data onto these directions to create a new set of features. These new features are called principal components and are uncorrelated and ordered by the amount of variance they capture in the original dataset.

The first principal component captures the most variance in the data, followed by the second principal component, and so on. By selecting the top k principal components, we can create a smaller set of features that capture the most important information in the data, while reducing the dimensionality of the dataset from n to k.

Here's an example to illustrate the concept of using PCA for feature extraction:

Suppose we have a dataset containing images of handwritten digits, and each image is represented as a 28x28 pixel array. Each pixel is a feature, so the original dataset has a dimensionality of 784 (28x28).

We want to extract the most important features from the dataset to reduce the dimensionality and improve the performance of a machine learning model for classifying the digits. We apply PCA to the dataset and find that the first 50 principal components capture 80% of the variance in the data.

We can use these 50 principal components as new features to represent the original images in a lower-dimensional space. These features are uncorrelated and capture the most important information in the data, such as the orientation and curvature of the digits. We can then use these new features to train a machine learning model to classify the digits.

By reducing the dimensionality of the dataset from 784 to 50, we have simplified the analysis and potentially improved the performance of the machine learning model. PCA can be a powerful tool for feature extraction in image processing, signal processing, and other applications where high-dimensional data needs to be analyzed.

5. You are working on a project to build a recommendation system for a food delivery service. The dataset 
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to 
preprocess the data

To use Min-Max scaling to preprocess the data for a recommendation system for a food delivery service, we would first need to standardize the data, which involves transforming each feature to have a mean of zero and a standard deviation of one. This step ensures that all the features have equal importance in the analysis.

Once the data is standardized, we can use Min-Max scaling to rescale each feature to a range between 0 and 1. This step is important because some features, such as price, may have a larger range than other features, such as rating. Rescaling the features to a common range between 0 and 1 ensures that they are all treated equally in the recommendation system.

Here's how we would use Min-Max scaling to preprocess the data:

Standardize the data: We would subtract the mean of each feature from the values and divide by the standard deviation. This step would transform each feature to have a mean of zero and a standard deviation of one.

Apply Min-Max scaling: We would apply the following formula to rescale each feature to a range between 0 and 1:

x' = (x - min(x)) / (max(x) - min(x))

where x is the original feature value, min(x) is the minimum value of the feature, max(x) is the maximum value of the feature, and x' is the rescaled value.

For example, if the original price range is $5 to $50 and the rating range is 1 to 5, we would use Min-Max scaling to rescale the price feature to a range between 0 and 1 and the rating feature to a range between 0 and 1. This step ensures that both features have equal importance in the recommendation system.

By using Min-Max scaling to preprocess the data, we can ensure that all the features are treated equally in the recommendation system and that the range of each feature is consistent. This step can help improve the performance of the recommendation system and ensure that the recommendations are based on a fair comparison of all the features.

In [1]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Create the dataset
data = {'A': [20, 4.5, 30],
        'B': [10, 3.8, 20],
        'C': [15, 4.2, 25],
        'D': [30, 4.8, 35],
        'E': [25, 3.5, 40],
        'F': [12, 4.1, 22],
        'G': [18, 3.9, 28],
        'H': [22, 4.6, 32]}

# Create a dataframe from the dataset
df = pd.DataFrame.from_dict(data, orient='index', columns=['Price', 'Rating', 'Delivery Time'])

# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Fit and transform the data
df_scaled = scaler.fit_transform(df)

# Create a new dataframe with the scaled data
df_scaled = pd.DataFrame(df_scaled, columns=df.columns, index=df.index)

# Display the original and scaled data
print('Original Data:\n', df)
print('\nScaled Data:\n', df_scaled)


Original Data:
    Price  Rating  Delivery Time
A     20     4.5             30
B     10     3.8             20
C     15     4.2             25
D     30     4.8             35
E     25     3.5             40
F     12     4.1             22
G     18     3.9             28
H     22     4.6             32

Scaled Data:
    Price    Rating  Delivery Time
A   0.50  0.769231           0.50
B   0.00  0.230769           0.00
C   0.25  0.538462           0.25
D   1.00  1.000000           0.75
E   0.75  0.000000           1.00
F   0.10  0.461538           0.10
G   0.40  0.307692           0.40
H   0.60  0.846154           0.60


6.  You are working on a project to build a model to predict stock prices. The dataset contains many 
features, such as company financial data and market trends. Explain how you would use PCA to reduce the 
dimensionality of the dataset.

PCA (Principal Component Analysis) can be used to reduce the dimensionality of a dataset with many features, such as the one in the stock price prediction project. The idea is to identify the most important features that contribute the most to the variation in the data, and combine them to form a smaller set of new features, called principal components.

Here are the steps to use PCA for dimensionality reduction in the stock price prediction project:

Standardize the data: Before applying PCA, it's important to standardize the data to ensure that each feature has the same scale. This can be done using techniques such as Min-Max scaling or Standard scaling.

Calculate the covariance matrix: The next step is to calculate the covariance matrix of the standardized data. The covariance matrix is a square matrix that shows how the features are related to each other.

Compute the eigenvectors and eigenvalues: The eigenvectors and eigenvalues of the covariance matrix represent the direction and magnitude of the principal components, respectively. The eigenvectors are calculated by decomposing the covariance matrix using techniques such as Singular Value Decomposition (SVD).

Select the principal components: The next step is to select the top k eigenvectors with the largest eigenvalues. These eigenvectors represent the most important features that contribute the most to the variation in the data.

Transform the data: The final step is to transform the original data using the selected eigenvectors to form a new set of features, called principal components. Each principal component is a linear combination of the original features, with the coefficients given by the corresponding eigenvector.

By reducing the number of features to a smaller set of principal components, the dimensionality of the dataset can be effectively reduced, which can help to improve the performance of machine learning models by reducing the risk of overfitting and increasing the speed of training.

In the context of the stock price prediction project, PCA can be used to identify the most important financial and market trend features that contribute the most to the variation in the data, and combine them to form a smaller set of new features, which can be used to build a predictive model. For example, PCA can be used to identify the top 5 principal components that capture the most important financial and market trend features, and use them as input to a machine learning model for stock price prediction.

In [2]:
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA

# Create a sample stock market dataset with 10 features and 1000 data points
data = np.random.rand(1000, 10)

# Create a DataFrame to hold the data and set column names
df = pd.DataFrame(data, columns=['Price', 'Earnings', 'Revenue', 'Profit Margin', 'Debt/Equity Ratio',
                                 'Market Cap', 'Volume', 'PE Ratio', 'PEG Ratio', 'Dividend Yield'])

# Standardize the data
df_std = (df - df.mean()) / df.std()

# Calculate the covariance matrix
cov_mat = np.cov(df_std.T)

# Compute the eigenvectors and eigenvalues
eig_vals, eig_vecs = np.linalg.eig(cov_mat)

# Select the top 3 eigenvectors with the largest eigenvalues
top_eig_vecs = eig_vecs[:, :3]

# Transform the data using the selected eigenvectors
df_pca = pd.DataFrame(np.dot(df_std, top_eig_vecs), columns=['PC1', 'PC2', 'PC3'])

# Print the original data shape and the transformed data shape
print('Original data shape:', df.shape)
print('Transformed data shape:', df_pca.shape)


Original data shape: (1000, 10)
Transformed data shape: (1000, 3)


7.  For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the 
values to a range of -1 to 1

In [5]:
import numpy as np

# Define the dataset
X = np.array([1, 5, 10, 15, 20])

# Calculate the minimum and maximum values
X_min = np.min(X)
X_max = np.max(X)

# Perform Min-Max scaling
X_scaled = (X - X_min) / (X_max - X_min) * 2 - 1

# Print the scaled values
print(X_scaled)


[-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


8.  For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform 
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

Performing PCA on the given dataset [height, weight, age, gender, blood pressure] can help to extract the most important features and reduce the dimensionality of the dataset.

The number of principal components to retain depends on the amount of variance we want to preserve. The goal is to retain as much variance as possible while reducing the dimensionality of the dataset.

In [6]:
import numpy as np

# Define the dataset
data = np.array([
    [170, 65, 25, 0, 120],
    [165, 55, 32, 1, 130],
    [180, 75, 40, 0, 140],
    [160, 45, 18, 1, 110],
    [175, 70, 35, 0, 125],
    [185, 80, 45, 0, 135]
])

# Standardize the dataset
data_std = (data - np.mean(data, axis=0)) / np.std(data, axis=0)

# Calculate the covariance matrix
covariance_matrix = np.cov(data_std.T)

# Calculate the eigenvalues and eigenvectors of the covariance matrix
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)

# Sort the eigenvalues and eigenvectors in descending order
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

# Choose the number of principal components to retain (in this case, 3)
n_components = 3

# Project the data onto the chosen principal components
principal_components = eigenvectors[:, :n_components]
data_transformed = np.dot(data_std, principal_components)

# Print the transformed data
print(data_transformed)


[[ 0.52284174 -1.16779276  0.31213104]
 [ 1.2678436   1.39815728  0.13913134]
 [-2.05088996  0.35409515  0.38872529]
 [ 3.51125526 -0.13446876 -0.24434151]
 [-0.67822786 -0.59192415 -0.11448658]
 [-2.57282279  0.14193324 -0.48115957]]


In this example, we first define the dataset as a NumPy array. We then standardize the dataset using the mean and std functions from NumPy. We calculate the covariance matrix of the standardized dataset using the cov function. We then calculate the eigenvalues and eigenvectors of the covariance matrix using the eig function from NumPy. We sort the eigenvalues and eigenvectors in descending order to obtain the principal components. We choose to retain the first 3 principal components, and project the data onto these components using matrix multiplication. The resulting transformed data is printed to the console.