# Qo 01

### What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

Min-Max scaling, also known as normalization, is a popular data preprocessing technique used to transform numerical features into a common range. It rescales the values of a feature to fit within a specified range, typically between 0 and 1.

The formula for Min-Max scaling is as follows:

scaled_value = (x - min_value) / (max_value - min_value)

where x is the original value of the feature, min_value is the minimum value of the feature in the dataset, and max_value is the maximum value of the feature in the dataset.

Min-Max scaling is particularly useful when the range of values in different features varies significantly. By scaling all features to the same range, it ensures that they have equal importance during modeling and prevents any particular feature from dominating the learning algorithm due to its larger value range.

Here's an example to illustrate the application of Min-Max scaling:

Suppose we have a dataset with a feature called "Age" that ranges from 20 to 60. We want to scale this feature using Min-Max scaling.

Original Age values: [20, 25, 30, 35, 40, 45, 50, 55, 60]

To apply Min-Max scaling, we need to calculate the minimum and maximum values of the Age feature:

min_value = 20
max_value = 60

Next, we use the formula to scale each value within the range of 0 to 1:

scaled_value = (x - min_value) / (max_value - min_value)

Scaled Age values: [0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]

As a result, the Age values are transformed into a common range between 0 and 1. This scaling ensures that all the age values have the same importance during the modeling process, regardless of their original range.

Min-Max scaling can be applied to multiple features in a dataset to normalize the entire dataset. It is commonly used in machine learning algorithms, such as support vector machines (SVMs) and artificial neural networks, to improve their performance and convergence.

# Qo 02

### What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

The Unit Vector technique, also known as normalization or vector normalization, is another data preprocessing technique used to scale numerical features. Unlike Min-Max scaling, which scales the values to a specific range, Unit Vector scaling transforms the values to have a magnitude of 1 while preserving their direction.

The formula for Unit Vector scaling is as follows:

scaled_value = x / ||x||

where x is the original value of the feature, and ||x|| represents the Euclidean norm or magnitude of the feature vector.

Unit Vector scaling is useful when the direction of the feature vectors is more important than their actual values. It ensures that all feature vectors have equal length, which can be beneficial in certain machine learning algorithms that rely on distances or similarity measures between vectors.

Here's an example to illustrate the application of the Unit Vector technique:

Suppose we have a dataset with two numerical features, "Height" and "Weight," and we want to apply Unit Vector scaling.

Original Height values: [150, 160, 170, 180]
Original Weight values: [50, 60, 70, 80]

To apply Unit Vector scaling, we need to calculate the Euclidean norm or magnitude of each feature vector:

||Height|| = sqrt(150^2 + 160^2 + 170^2 + 180^2)
           = sqrt(22500 + 25600 + 28900 + 32400)
           = sqrt(109400)
           ≈ 330.63

||Weight|| = sqrt(50^2 + 60^2 + 70^2 + 80^2)
           = sqrt(2500 + 3600 + 4900 + 6400)
           = sqrt(17400)
           ≈ 131.95

Next, we divide each feature value by its respective Euclidean norm:

scaled_height = [150/330.63, 160/330.63, 170/330.63, 180/330.63]
              ≈ [0.453, 0.484, 0.515, 0.546]

scaled_weight = [50/131.95, 60/131.95, 70/131.95, 80/131.95]
              ≈ [0.379, 0.455, 0.530, 0.606]

As a result, the Height and Weight values are scaled such that their vectors have a magnitude of 1. This scaling preserves the direction of the feature vectors while eliminating the influence of their original magnitudes.

Unit Vector scaling is commonly used in text classification, recommendation systems, and clustering algorithms. It allows for a meaningful comparison of vectors based on their directions and helps in cases where the absolute magnitudes of the features are not as important as their relative orientations.

# Qo 03

### What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

PCA, which stands for Principal Component Analysis, is a statistical technique used for dimensionality reduction. It identifies the most important features, known as principal components, in a dataset and transforms the data into a new coordinate system based on these components. PCA is commonly used to simplify complex datasets with a high number of variables into a smaller set of variables while retaining most of the information.

Here's an overview of how PCA works:

1. Standardize the data: Before applying PCA, it is essential to standardize the data by subtracting the mean and dividing by the standard deviation. This step ensures that all variables have the same scale and prevents any single variable from dominating the analysis.

2. Calculate the covariance matrix: The covariance matrix is computed based on the standardized data. It represents the relationships between variables, showing how they vary together.

3. Compute the eigenvectors and eigenvalues: The eigenvectors and eigenvalues of the covariance matrix are calculated. Eigenvectors represent the principal components, and eigenvalues indicate the amount of variance explained by each component. The eigenvectors are sorted in descending order of their corresponding eigenvalues.

4. Select the number of principal components: Based on the eigenvalues, the number of principal components to retain is determined. Typically, components with high eigenvalues are chosen to capture the most significant variation in the data.

5. Transform the data: The original data is transformed into the new coordinate system defined by the selected principal components. This transformation yields a set of new variables, known as the principal component scores, which are linear combinations of the original variables.

Here's an example to illustrate the application of PCA:

Suppose we have a dataset with three variables: "Height," "Weight," and "Age." We want to apply PCA to reduce the dimensionality of the dataset.

Original data:
Height: [150, 160, 170, 180]
Weight: [50, 60, 70, 80]
Age: [25, 30, 35, 40]

1. Standardize the data: Subtract the mean and divide by the standard deviation for each variable.

2. Calculate the covariance matrix:

          Height   Weight   Age
Height   1.67     1.67     0.33
Weight   1.67     1.67     0.33
Age      0.33     0.33     1.67

3. Compute the eigenvectors and eigenvalues:

Eigenvalues:
[3.33, 0.0, 0.0]

Eigenvectors:
[0.707, -0.707, 0.0]
[0.707, 0.707, 0.0]
[0.0, 0.0, 1.0]

4. Select the number of principal components: In this case, we choose to retain two principal components since they have non-zero eigenvalues and explain the most variance.

5. Transform the data: Multiply the original data by the selected eigenvectors to obtain the principal component scores.

Transformed data:
PC1: [0.707*Height - 0.707*Weight]
PC2: [0.707*Height + 0.707*Weight]

The transformed data now consists of two principal components, PC1 and PC2, which are linear combinations of the original variables. These components capture most of the variance in the original data and provide a reduced-dimensional representation.

PCA is widely used in various fields, including data analysis, pattern recognition, image processing, and feature extraction. It helps to uncover underlying patterns, reduce data complexity, and improve computational efficiency in machine learning tasks.

# Qo 04

### What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

PCA and feature extraction are closely related concepts. In fact, PCA is a commonly used technique for feature extraction. Feature extraction refers to the process of transforming the original set of features into a new set of features that capture the most relevant information in the data while reducing its dimensionality.

PCA achieves feature extraction by identifying the principal components of the data, which are linear combinations of the original features. These principal components are ordered based on the amount of variance they explain in the data. By selecting a subset of the principal components, PCA allows for the creation of a reduced-dimensional representation of the data that retains most of its essential information.

Here's an example to illustrate how PCA can be used for feature extraction:

Suppose we have a dataset with four numerical features: "Height," "Weight," "Age," and "Income." We want to extract the most informative features using PCA.

Original data:
Height: [150, 160, 170, 180]
Weight: [50, 60, 70, 80]
Age: [25, 30, 35, 40]
Income: [25000, 35000, 45000, 55000]

1. Standardize the data: Standardize the data by subtracting the mean and dividing by the standard deviation for each feature.

2. Calculate the covariance matrix:

          Height   Weight   Age      Income
Height   1.00     1.00     0.50     0.70
Weight   1.00     1.00     0.50     0.70
Age      0.50     0.50     1.00     0.60
Income   0.70     0.70     0.60     1.00

3. Compute the eigenvectors and eigenvalues:

Eigenvalues:
[2.71, 0.69, 0.07, 0.03]

Eigenvectors:
[0.58, -0.58, -0.48, -0.34]
[0.58, -0.58, 0.43, 0.38]
[0.55, 0.55, -0.33, -0.53]
[0.13, 0.13, 0.73, -0.66]

4. Select the number of principal components: Based on the eigenvalues, we choose to retain the first two principal components since they have the highest eigenvalues and explain the most variance.

5. Transform the data: Multiply the original data by the selected eigenvectors to obtain the principal component scores.

Transformed data:
PC1: [0.58*Height - 0.58*Weight - 0.48*Age - 0.34*Income]
PC2: [0.58*Height - 0.58*Weight + 0.43*Age + 0.38*Income]

The transformed data consists of two principal components, PC1 and PC2, which are linear combinations of the original features. These principal components capture most of the variance in the original data and provide a reduced-dimensional representation.

In this example, PCA was used for feature extraction by identifying the two most informative features (PC1 and PC2) that capture the underlying patterns in the data. These principal components can be used as new features in subsequent analysis or modeling tasks, reducing the dimensionality of the data while preserving its essential characteristics.

Feature extraction with PCA can be particularly useful in scenarios where the original feature space is high-dimensional, noisy, or redundant. It helps to reduce the computational complexity of subsequent tasks, enhance interpretability, and improve the performance of machine learning algorithms.

# Qo 05

### You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

To preprocess the data for building a recommendation system for a food delivery service, you can use Min-Max scaling to normalize the features such as price, rating, and delivery time. Here's how you can apply Min-Max scaling:

1. Understand the range of each feature: Examine the minimum and maximum values of each feature (price, rating, and delivery time) in the dataset.

2. Apply Min-Max scaling: For each feature, use the Min-Max scaling formula to transform the values into a common range between 0 and 1:

   scaled_value = (x - min_value) / (max_value - min_value)

   where x represents the original value of the feature, min_value is the minimum value of the feature in the dataset, and max_value is the maximum value of the feature in the dataset.

3. Normalize each feature: Apply the Min-Max scaling formula to normalize the values of each feature individually. This ensures that all the features are on the same scale and prevents any particular feature from dominating the recommendation algorithm due to its larger value range.

4. Update the dataset: Replace the original values of each feature with their corresponding scaled values after applying Min-Max scaling.

By using Min-Max scaling, you will transform the features such as price, rating, and delivery time into a common range between 0 and 1. This scaling ensures that all the features have equal importance during the recommendation process, regardless of their original value ranges.

For example, let's say you have the following values for each feature in the dataset:

Price: [10, 20, 30, 40]
Rating: [2.5, 3.7, 4.2, 4.8]
Delivery Time: [20, 30, 40, 50]

To apply Min-Max scaling, you calculate the minimum and maximum values for each feature:

Price: min_value = 10, max_value = 40
Rating: min_value = 2.5, max_value = 4.8
Delivery Time: min_value = 20, max_value = 50

Then, you use the Min-Max scaling formula to normalize the values of each feature:

Scaled Price: [(10-10)/(40-10), (20-10)/(40-10), (30-10)/(40-10), (40-10)/(40-10)]
            = [0, 0.333, 0.667, 1]

Scaled Rating: [(2.5-2.5)/(4.8-2.5), (3.7-2.5)/(4.8-2.5), (4.2-2.5)/(4.8-2.5), (4.8-2.5)/(4.8-2.5)]
             = [0, 0.509, 0.769, 1]

Scaled Delivery Time: [(20-20)/(50-20), (30-20)/(50-20), (40-20)/(50-20), (50-20)/(50-20)]
                    = [0, 0.333, 0.667, 1]

After applying Min-Max scaling, the dataset will have the normalized values for each feature, which can be used for building the recommendation system. The scaled values ensure that all the features have equal importance, and the algorithm can effectively consider their relative magnitudes when making recommendations.

# Qo 06

### You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

When building a model to predict stock prices with a dataset containing numerous features, PCA can be employed to reduce the dimensionality and extract the most important information. Here's how you can use PCA for dimensionality reduction in this context:

1. Data preparation: Preprocess the dataset by standardizing the features. Subtract the mean and divide by the standard deviation for each feature to ensure they are on a similar scale.

2. Covariance matrix: Calculate the covariance matrix using the standardized dataset. The covariance matrix represents the relationships between the different features, indicating how they vary together.

3. Eigenvalue decomposition: Perform eigenvalue decomposition on the covariance matrix to obtain the eigenvectors and eigenvalues. Eigenvectors represent the principal components, and eigenvalues denote the amount of variance explained by each component.

4. Sorting eigenvalues: Sort the eigenvalues in descending order. This ranking reflects the significance of each principal component in capturing the variance in the dataset.

5. Selecting principal components: Determine the number of principal components to retain based on the cumulative explained variance. A common approach is to choose the number of components that capture a significant portion of the total variance, such as 80% or 90%.

6. Projection: Project the original dataset onto the selected principal components to obtain the reduced-dimensional representation. Multiply the standardized dataset by the eigenvectors corresponding to the retained principal components.

By following these steps, you can reduce the dimensionality of the dataset while preserving most of the information. The retained principal components capture the underlying patterns and variability in the data, allowing you to work with a smaller set of features in the subsequent modeling process.

Dimensionality reduction with PCA is particularly useful when dealing with datasets that have a large number of features, as it helps to eliminate redundant or less informative variables. By focusing on the principal components that explain the most variance, you can simplify the dataset and potentially improve the model's performance, reduce overfitting, and enhance interpretability.

After applying PCA and obtaining the reduced-dimensional representation of the dataset, you can proceed to train a prediction model using the transformed features.

# Qo 07

### For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.

In [4]:
import pandas as pd
df = pd.DataFrame([1,5,10,15],columns = ["values"])

In [10]:
df["values"].apply(lambda x:(x-min(df["values"]))/(max(df["values"])-min(df["values"])))

0    0.000000
1    0.285714
2    0.642857
3    1.000000
Name: values, dtype: float64

# Qo 08

### For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?

To perform feature extraction using PCA on the given dataset with features [height, weight, age, gender, blood pressure], we can follow these steps:

1. Preprocess the data: Standardize the numerical features (height, weight, age, blood pressure) by subtracting the mean and dividing by the standard deviation. Categorical features like gender may need to be encoded as numeric values.

2. Compute the covariance matrix: Calculate the covariance matrix based on the standardized data. The covariance matrix represents the relationships between the features and provides insights into their variations.

3. Compute eigenvectors and eigenvalues: Perform eigenvalue decomposition on the covariance matrix to obtain the eigenvectors and eigenvalues. The eigenvectors represent the principal components, and the eigenvalues indicate the amount of variance explained by each principal component.

4. Sort eigenvalues: Sort the eigenvalues in descending order. This step helps determine the most significant principal components that explain the most variance in the data.

5. Retain principal components: Decide on the number of principal components to retain based on the cumulative explained variance. Retaining a subset of the principal components that explain a significant portion of the total variance allows for dimensionality reduction while retaining most of the information.

The number of principal components to retain can be determined by considering the cumulative explained variance. The cumulative explained variance shows the proportion of total variance explained by each principal component when summed up in order of their eigenvalues. It helps in assessing how much variance is retained by selecting a certain number of principal components.

To determine the number of principal components to retain, you can examine the cumulative explained variance plot and choose a threshold. A commonly used threshold is to retain principal components that explain a cumulative variance of 80% or 90%.

The decision on the number of principal components to retain may also depend on the specific requirements of the problem and the trade-off between dimensionality reduction and information retention.

Without the actual data or specific details about the dataset, it is challenging to provide an exact number of principal components to retain. However, you can follow the steps outlined above, compute the cumulative explained variance, and choose the number of principal components that retain a satisfactory amount of variance based on the specific problem and constraints.