# **ASSIGNMENT**

**Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.**

Min-Max scaling, also known as normalization, is a data preprocessing technique used to transform numeric features into a common range. It rescales the values of a feature to a fixed range, typically between 0 and 1. The purpose of Min-Max scaling is to ensure that all features have the same scale, preventing one feature from dominating or biasing the learning algorithm due to its larger values.

The formula to perform Min-Max scaling on a feature x is as follows:

x_scaled = (x - min(x)) / (max(x) - min(x))

Here, x_scaled represents the rescaled value of x, min(x) is the minimum value of x in the dataset, and max(x) is the maximum value of x in the dataset.

Let's consider an example to illustrate the application of Min-Max scaling. Suppose we have a dataset with a feature representing the ages of a group of individuals. The ages range from 20 to 60 years. We want to apply Min-Max scaling to this feature.

Original ages: [20, 25, 30, 35, 40, 45, 50, 55, 60]

To scale these ages using Min-Max scaling, we need to calculate the minimum and maximum values:

min(age) = 20
max(age) = 60

Now, we can apply the formula to obtain the scaled values:

Scaled ages:
[(20 - 20) / (60 - 20),
 (25 - 20) / (60 - 20),
 (30 - 20) / (60 - 20),
 (35 - 20) / (60 - 20),
 (40 - 20) / (60 - 20),
 (45 - 20) / (60 - 20),
 (50 - 20) / (60 - 20),
 (55 - 20) / (60 - 20),
 (60 - 20) / (60 - 20)]

Simplified:
[0,
 0.125,
 0.25,
 0.375,
 0.5,
 0.625,
 0.75,
 0.875,
 1]

After applying Min-Max scaling, the ages are now transformed into a common range between 0 and 1. This normalization ensures that the age feature is comparable to other features in the dataset, and the values no longer dominate the analysis solely based on their magnitude.

**Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.**

The Unit Vector technique, also known as vector normalization or feature scaling, is a data preprocessing technique that rescales the values of a feature to have a unit norm. In other words, it transforms each data point into a vector of length 1 while preserving the direction of the vector. This technique is particularly useful when the magnitude of the data points is not as important as their direction or when dealing with algorithms that are sensitive to the scale of features.

To apply the Unit Vector technique, each data point is divided by its Euclidean norm, which is calculated as the square root of the sum of squares of its individual values.

The formula to perform Unit Vector scaling on a feature x is as follows:

x_scaled = x / ||x||

Here, x_scaled represents the rescaled value of x, x represents the original value of the feature, and ||x|| represents the Euclidean norm of x.

Let's consider an example to illustrate the application of the Unit Vector technique. Suppose we have a dataset with two features: height (in centimeters) and weight (in kilograms) of individuals. We want to apply Unit Vector scaling to these features.

Original data:
Height: [165, 170, 175, 180, 185]<br>
Weight: [60, 70, 80, 90, 100]<br>

To scale these features using the Unit Vector technique, we need to calculate the Euclidean norm for each data point:

||[165, 60]|| = sqrt(165^2 + 60^2) = sqrt(27225 + 3600) = sqrt(30825) = 175.68<br>
||[170, 70]|| = sqrt(170^2 + 70^2) = sqrt(28900 + 4900) = sqrt(33800) = 183.68<br>
||[175, 80]|| = sqrt(175^2 + 80^2) = sqrt(30625 + 6400) = sqrt(37025) = 192.47<br>
||[180, 90]|| = sqrt(180^2 + 90^2) = sqrt(32400 + 8100) = sqrt(40500) = 201.25<br>
||[185, 100]|| = sqrt(185^2 + 100^2) = sqrt(34225 + 10000) = sqrt(44225) = 210.30<br>

Now, we can apply the formula to obtain the scaled values:

Scaled height:
[165 / 175.68,
 170 / 183.68,
 175 / 192.47,
 180 / 201.25,
 185 / 210.30]

Scaled weight:
[60 / 175.68,
 70 / 183.68,
 80 / 192.47,
 90 / 201.25,
 100 / 210.30]

Simplified:
Height: [0.94, 0.93, 0.91, 0.89, 0.88]<br>
Weight: [0.34, 0.38, 0.42, 0.45, 0.48]

After applying the Unit Vector technique, both the height and weight features are scaled such that each data point becomes a unit vector with a length of 1. This scaling preserves the direction of the data points while eliminating the effect of their magnitudes.

**Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.**

PCA (Principal Component Analysis) is a widely used technique in data analysis and machine learning for dimensionality reduction. It aims to transform a high-dimensional dataset into a lower-dimensional space while preserving the most important information or patterns present in the original data.

The goal of PCA is to find a new set of orthogonal variables, known as principal components, that capture the maximum variance in the data. These principal components are linear combinations of the original features, and each subsequent component captures as much of the remaining variance as possible. The first principal component accounts for the most significant amount of variation in the data, followed by the second, third, and so on.

The steps involved in performing PCA are as follows:

1. Standardize the data: It is essential to standardize the data by subtracting the mean and dividing by the standard deviation of each feature. This ensures that all features are on the same scale and prevents dominance by high-variance features.

2. Compute the covariance matrix: Calculate the covariance matrix of the standardized data, which represents the relationships between different features. The covariance matrix provides information about the variance and correlation between pairs of features.

3. Compute eigenvectors and eigenvalues: Find the eigenvectors and eigenvalues of the covariance matrix. Eigenvectors represent the principal components, while eigenvalues indicate the amount of variance explained by each principal component.

4. Select the principal components: Sort the eigenvalues in descending order and choose the top-k eigenvectors corresponding to the largest eigenvalues. These selected eigenvectors are the principal components that capture the most significant variance in the data.

5. Project the data onto the new feature space: Multiply the standardized data by the selected principal components to obtain the transformed dataset in the lower-dimensional space.

Let's consider an example to illustrate the application of PCA. Suppose we have a dataset with three features: height, weight, and age of individuals. We want to perform PCA to reduce the dimensionality of the dataset.

Original data:
Height: [165, 170, 175, 180, 185]<br>
Weight: [60, 70, 80, 90, 100]<br>
Age: [25, 30, 35, 40, 45]<br>

1. Standardize the data: Subtract the mean and divide by the standard deviation of each feature.

Standardized data:
Height: [-1.41, -0.71, 0, 0.71, 1.41]<br>
Weight: [-1.41, -0.71, 0, 0.71, 1.41]<br>
Age: [-1.41, -0.71, 0, 0.71, 1.41]<br>

2. Compute the covariance matrix:

Covariance matrix:
[[ 1.0, 1.0, 1.0],
 [ 1.0, 1.0, 1.0],
 [ 1.0, 1.0, 1.0]]

3. Compute eigenvectors and eigenvalues:

Eigenvectors: 
[1/sqrt(3), 1/sqrt(3), 1/sqrt(3)]

Eigenvalues: 
[3.0, 0, 0]

4. Select the principal components: Since there is only one non-zero eigenvalue, we choose the corresponding eigenvector as the principal component.

Selected principal component:
[1/sqrt(3), 1/sqrt(3), 1/sqrt(3)]

5. Project the data onto the new feature space: Multiply the standardized data by the principal component.

Projected data:
[1/sqrt(3), 1/sqrt(3), 1/sqrt(3)]

After performing PCA, the dataset is reduced to

 a single dimension represented by the principal component. The original three-dimensional data is projected onto this one-dimensional space, capturing the maximum variance in the data.

**Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.**

PCA (Principal Component Analysis) can be used for feature extraction, which involves transforming the original features into a new set of features that capture the most important information or patterns in the data. In this context, PCA helps identify the most relevant features or combinations of features that contribute significantly to the variability of the data.

The process of using PCA for feature extraction involves the following steps:

1. Standardize the data: It is essential to standardize the data by subtracting the mean and dividing by the standard deviation of each feature. This step ensures that all features are on the same scale and prevents dominance by high-variance features.

2. Compute the covariance matrix: Calculate the covariance matrix of the standardized data, which represents the relationships between different features. The covariance matrix provides information about the variance and correlation between pairs of features.

3. Compute eigenvectors and eigenvalues: Find the eigenvectors and eigenvalues of the covariance matrix. Eigenvectors represent the principal components, while eigenvalues indicate the amount of variance explained by each principal component.

4. Select the principal components: Sort the eigenvalues in descending order and choose the top-k eigenvectors corresponding to the largest eigenvalues. These selected eigenvectors are the principal components that capture the most significant variance in the data.

5. Project the data onto the new feature space: Multiply the standardized data by the selected principal components to obtain the transformed dataset in the lower-dimensional space. These transformed features, known as the principal component scores, can be used as the extracted features representing the original data.

The relationship between PCA and feature extraction lies in the fact that PCA extracts new features (principal components) that are linear combinations of the original features. These principal components are chosen to capture the maximum variance in the data. By selecting a subset of the principal components, we can effectively reduce the dimensionality of the dataset while retaining the most important information.

Let's consider an example to illustrate the concept of using PCA for feature extraction. Suppose we have a dataset with five features: A, B, C, D, and E. We want to use PCA to extract the most relevant features from this dataset.

Original data:
A: [1, 2, 3, 4, 5]<br>
B: [5, 4, 3, 2, 1]<br>
C: [1, 1, 1, 1, 1]<br>
D: [0, 0, 0, 0, 0]<br>
E: [2, 2, 2, 2, 2]<br>

1. Standardize the data: Subtract the mean and divide by the standard deviation of each feature.

2. Compute the covariance matrix:

3. Compute eigenvectors and eigenvalues:

4. Select the principal components:

5. Project the data onto the new feature space:

The transformed dataset obtained after projecting the original data onto the principal components will serve as the extracted features, representing the most important information in the data. These extracted features can be used for subsequent analysis or machine learning tasks, potentially reducing the dimensionality of the dataset and improving computational efficiency.

**Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.**

To preprocess the data for building a recommendation system for a food delivery service, we can use Min-Max scaling on certain features such as price, rating, and delivery time. Here's how Min-Max scaling can be applied:

1. Identify the features: In this case, the features of interest are price, rating, and delivery time.

2. Determine the range: Decide on the desired range for the scaled values. It is common to scale features between 0 and 1 using Min-Max scaling.

3. Calculate the minimum and maximum values: Find the minimum and maximum values for each feature in the dataset. For example, for the price feature, find the minimum and maximum prices across all food items or restaurants in the dataset.

4. Apply Min-Max scaling formula: Use the Min-Max scaling formula to scale the values of each feature:

   x_scaled = (x - min(x)) / (max(x) - min(x))

   Here, x represents the original value of the feature, min(x) is the minimum value of the feature, and max(x) is the maximum value of the feature.

5. Perform Min-Max scaling: Apply the Min-Max scaling formula to each value of the feature in the dataset, ensuring that the scaling is performed independently for each feature.

The purpose of using Min-Max scaling in this context is to bring all the features within the same range (0 to 1) so that they contribute equally to the recommendation algorithm. By scaling the features, we prevent any single feature (e.g., price) from dominating the recommendation process solely based on its larger values. This normalization ensures that all features are on a comparable scale and have equal importance in the recommendation system.

For example, let's say we have a dataset of food items with the following features:

Price: [10, 20, 15, 25, 30]
Rating: [3.5, 4.2, 3.9, 4.8, 4.0]
Delivery Time: [30, 25, 20, 35, 40]

To use Min-Max scaling, we calculate the minimum and maximum values for each feature:

Price: min = 10, max = 30
Rating: min = 3.5, max = 4.8
Delivery Time: min = 20, max = 40

Then, we apply the Min-Max scaling formula to each feature:

Scaled Price: [(10 - 10) / (30 - 10), (20 - 10) / (30 - 10), (15 - 10) / (30 - 10), (25 - 10) / (30 - 10), (30 - 10) / (30 - 10)]
Scaled Rating: [(3.5 - 3.5) / (4.8 - 3.5), (4.2 - 3.5) / (4.8 - 3.5), (3.9 - 3.5) / (4.8 - 3.5), (4.8 - 3.5) / (4.8 - 3.5), (4.0 - 3.5) / (4.8 - 3.5)]
Scaled Delivery Time: [(30 - 20) / (40 - 20), (25 - 20) / (40 - 20), (20 - 20) / (40 - 20), (35 - 20) / (40 - 20), (40 - 20) / (40 - 20)]

The resulting scaled values will be in the range of 0 to 1, representing the normalized values of

 each feature. These scaled features can then be used as inputs for building the recommendation system, ensuring that they are on a consistent and comparable scale.

**Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.**

When working on a project to predict stock prices with a dataset containing numerous features, PCA (Principal Component Analysis) can be utilized to reduce the dimensionality of the dataset. Reducing the dimensionality can be beneficial in stock price prediction as it helps to mitigate the curse of dimensionality, enhance computational efficiency, and identify the most informative features.

Here's an overview of how PCA can be used to reduce the dimensionality of the dataset in the context of stock price prediction:

1. Identify the features: Identify the features in the dataset that include company financial data and market trends. For example, features could include stock volume, earnings per share, price-to-earnings ratio, interest rates, market indices, etc.

2. Standardize the data: It is essential to standardize the data by subtracting the mean and dividing by the standard deviation of each feature. This step ensures that all features are on the same scale and prevents dominance by high-variance features.

3. Compute the covariance matrix: Calculate the covariance matrix of the standardized data. The covariance matrix represents the relationships and dependencies between different features, providing insights into their pairwise variations.

4. Compute eigenvectors and eigenvalues: Find the eigenvectors and eigenvalues of the covariance matrix. Eigenvectors represent the principal components, while eigenvalues indicate the amount of variance explained by each principal component.

5. Select the principal components: Sort the eigenvalues in descending order and choose the top-k eigenvectors corresponding to the largest eigenvalues. These selected eigenvectors are the principal components that capture the most significant variance in the data.

6. Project the data onto the new feature space: Multiply the standardized data by the selected principal components to obtain the transformed dataset in the lower-dimensional space. These transformed features, known as the principal component scores, can be used as the reduced-dimensional representation of the original data.

By performing PCA, the high-dimensional dataset is transformed into a lower-dimensional space, where each principal component represents a linear combination of the original features. The selected principal components are those that capture the most variance in the dataset, providing a compressed representation of the data while retaining the most important information.

The reduced-dimensional dataset obtained through PCA can then be used as input for building a stock price prediction model. The dimensionality reduction helps to focus on the most influential features, reduce noise, and improve model efficiency and performance.

**Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.**

To perform Min-Max scaling on the dataset [1, 5, 10, 15, 20] and transform the values to a range of -1 to 1, we can follow these steps:

1. **Find the minimum and maximum values**: Determine the minimum and maximum values in the dataset. In this case, the minimum value is 1, and the maximum value is 20.

2. **Apply the Min-Max scaling formula**: Apply the Min-Max scaling formula to each value in the dataset using the minimum and maximum values:

   scaled_value = (original_value - minimum_value) / (maximum_value - minimum_value)

3. **Calculate scaled values**: Calculate the scaled values for each element in the dataset using the Min-Max scaling formula:

   For 1:  scaled_value = (1 - 1) / (20 - 1) = 0 / 19 = 0<br>
   For 5:  scaled_value = (5 - 1) / (20 - 1) = 4 / 19 ≈ 0.2105<br>
   For 10: scaled_value = (10 - 1) / (20 - 1) = 9 / 19 ≈ 0.4737<br>
   For 15: scaled_value = (15 - 1) / (20 - 1) = 14 / 19 ≈ 0.7368<br>
   For 20: scaled_value = (20 - 1) / (20 - 1) = 19 / 19 = 1<br>

4. **Transform scaled values to the range of -1 to 1**: The scaled values obtained in the previous step range from 0 to 1. To transform them to the range of -1 to 1, apply the following formula:

   scaled_value_in_range = (scaled_value * 2) - 1

5. **Calculate transformed values**: Calculate the transformed values for each scaled value using the formula mentioned above:

   For 0:        transformed_value = (0 * 2) - 1 = -1<br>
   For 0.2105:   transformed_value = (0.2105 * 2) - 1 ≈ -0.5789<br>
   For 0.4737:   transformed_value = (0.4737 * 2) - 1 ≈ -0.0526<br>
   For 0.7368:   transformed_value = (0.7368 * 2) - 1 ≈ 0.4737<br>
   For 1:        transformed_value = (1 * 2) - 1 = 1

The transformed values for the dataset [1, 5, 10, 15, 20] after Min-Max scaling to a range of -1 to 1 are approximately [-1, -0.5789, -0.0526, 0.4737, 1].

In [1]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(-1, 1))

# Reshape the dataset to a 2D array as required by scikit-learn
data = [[1], [5], [10], [15], [20]]
scaled_data = scaler.fit_transform(data)

for scaled_value in scaled_data:
    print(scaled_value[0])


-0.9999999999999999
-0.5789473684210525
-0.05263157894736836
0.47368421052631593
1.0


**Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?**

To perform feature extraction using PCA on a dataset with features [height, weight, age, gender, blood pressure], we need to follow these steps:

1. Standardize the data: Subtract the mean and divide by the standard deviation of each feature to ensure all features are on the same scale.

2. Compute the covariance matrix: Calculate the covariance matrix of the standardized data to understand the relationships between different features.

3. Compute eigenvectors and eigenvalues: Find the eigenvectors and eigenvalues of the covariance matrix. Eigenvectors represent the principal components, and eigenvalues indicate the amount of variance explained by each principal component.

4. Select the principal components: Sort the eigenvalues in descending order and choose the top-k eigenvectors corresponding to the largest eigenvalues. These eigenvectors are the principal components that capture the most significant variance in the data.

The number of principal components to retain depends on the desired level of variance explained and the trade-off between dimensionality reduction and information loss. In practice, a common approach is to consider the cumulative explained variance.

To determine the number of principal components to retain, we can analyze the explained variance ratio, which represents the proportion of the total variance explained by each principal component.

Let's assume that after performing PCA on the given dataset, we obtain the following eigenvalues and their explained variance ratio:

Eigenvalues: [3.5, 2.8, 1.9, 1.2, 0.6]
Explained Variance Ratio: [0.35, 0.28, 0.19, 0.12, 0.06]

To decide the number of principal components to retain, we can calculate the cumulative explained variance ratio by summing the explained variance ratios from the first principal component onwards:

Cumulative Explained Variance Ratio: [0.35, 0.63, 0.82, 0.94, 1.00]

In this case, the cumulative explained variance ratio reaches 1.00 after considering all five principal components. It indicates that all the variance in the dataset can be explained by these five components.

To determine how many principal components to retain, we typically look for a threshold value of explained variance ratio. For example, if we set a threshold of 0.95, we would choose the minimum number of principal components that collectively explain at least 95% of the variance. In this case, we can observe that the first three principal components alone explain 82% of the variance. Hence, we might choose to retain the first three principal components to strike a balance between dimensionality reduction and retaining significant variance.

However, the final decision on the number of principal components to retain also depends on the specific requirements, domain knowledge, and the performance of the subsequent analysis or model using the reduced feature set.

------------------------