**Q1.** What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.

**Answer**: 
Min-Max scaling, also known as normalization, is a data preprocessing technique used to transform numeric features within a specific range. It rescales the values of a feature to a common range, typically between 0 and 1. The purpose of Min-Max scaling is to ensure that all the features contribute equally to the analysis and prevent any feature from dominating the others due to its original scale.

The formula to perform Min-Max scaling on a feature is as follows:

scaled_value = (value - min_value) / (max_value - min_value)

where:

"value" is the original value of a data point.
"min_value" is the minimum value of the feature in the dataset.
"max_value" is the maximum value of the feature in the dataset.
Let's consider an example to illustrate the application of Min-Max scaling:

Suppose we have a dataset containing the heights of individuals in centimeters. The original heights range from 150 cm to 190 cm. We want to apply Min-Max scaling to transform these heights to a range between 0 and 1.

Original heights:

Person A: 160 cm
Person B: 180 cm

Person C: 150 cm

Person D: 190 cm

To scale these values using Min-Max scaling, we calculate the minimum and maximum values:

min_value = 150 cm

max_value = 190 cm

Now, we apply the Min-Max scaling formula to each height:

Scaled heights:

Person A: (160 - 150) / (190 - 150) = 0.25

Person B: (180 - 150) / (190 - 150) = 0.75

Person C: (150 - 150) / (190 - 150) = 0.00

Person D: (190 - 150) / (190 - 150) = 1.00

After applying Min-Max scaling, the heights are transformed to a range between 0 and 1. This normalization ensures that the heights are now comparable with other features in the dataset that might have different scales

**Q2.** What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.

**Answer**:
The Unit Vector technique, also known as vector normalization or feature scaling, is a data preprocessing method that rescales the feature values to have a unit norm. In this technique, each data point is divided by its magnitude, resulting in a vector with a length of 1. The purpose of unit vector scaling is to ensure that the magnitude of a feature does not influence the analysis or computations based on that feature.

The formula to perform Unit Vector scaling on a feature is as follows:

scaled_vector = vector / ||vector||

where:

"vector" represents the original feature vector.
"||vector||" denotes the magnitude or Euclidean norm of the vector, which is calculated as the square root of the sum of squared elements.
Let's consider an example to illustrate the application of Unit Vector scaling:

Suppose we have a dataset containing the following feature vectors:

Vector A: [2, 4, 6]

Vector B: [1, 3, 5]

Vector C: [3, 6, 9]

To scale these vectors using the Unit Vector technique, we calculate their magnitudes:

||Vector A|| = sqrt(2^2 + 4^2 + 6^2) = sqrt(56) = 7.48

||Vector B|| = sqrt(1^2 + 3^2 + 5^2) = sqrt(35) = 5.92

||Vector C|| = sqrt(3^2 + 6^2 + 9^2) = sqrt(126) = 11.22

Now, we apply the Unit Vector scaling formula to each vector:

Scaled vectors:

Vector A: [2/7.48, 4/7.48, 6/7.48] = [0.27, 0.54, 0.81]

Vector B: [1/5.92, 3/5.92, 5/5.92] = [0.17, 0.51, 0.85]

Vector C: [3/11.22, 6/11.22, 9/11.22] = [0.27, 0.54, 0.81]

After applying Unit Vector scaling, all the feature vectors have a magnitude of 1, indicating that they are now unit vectors. This scaling technique ensures that the direction and relative relationships between the feature vectors are preserved while removing the influence of their magnitudes. Unlike Min-Max scaling, Unit Vector scaling does not focus on the range of values but rather on the normalization of the vectors.

**Q3**. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.

**Answer**:
PCA (Principal Component Analysis) is a widely used technique in data analysis and dimensionality reduction. It is used to transform a dataset with a high-dimensional feature space into a lower-dimensional space while preserving the most important patterns or variations in the data. PCA achieves this by identifying the principal components, which are orthogonal linear combinations of the original features.

The steps involved in PCA are as follows:

**(I) Standardize the data:** If the features in the dataset have different scales, it is essential to standardize them to have zero mean and unit variance. This step ensures that all the features contribute equally to the PCA.

**(II) Compute the covariance matrix:** The covariance matrix is computed based on the standardized features. It represents the relationships and dependencies between the features in the dataset.

**(III) Perform eigen decomposition**: The eigen decomposition is performed on the covariance matrix to obtain the eigenvalues and eigenvectors. The eigenvalues represent the amount of variance explained by each eigenvector, and the eigenvectors represent the directions or axes of maximum variance in the data.

**(IV) Select the principal components**: The eigenvectors associated with the highest eigenvalues are selected as the principal components. These components capture the most significant patterns or variations in the data.

**(VII) Project the data onto the principal components**: The original data is projected onto the selected principal components to obtain the transformed lower-dimensional representation. Each data point is represented by its coordinates along the principal components.

Let's consider an example to illustrate the application of PCA:

Suppose we have a dataset with two features, "X" and "Y," representing the coordinates of points in a 2D space. We want to apply PCA to reduce the dimensionality of the data from 2D to 1D.

Original dataset:

Point 1: (2, 4)

Point 2: (4, 2)

Point 3: (6, 6)

Steps:

Standardize the data: We calculate the mean and standard deviation of the features "X" and "Y" and standardize the data.

Compute the covariance matrix: Based on the standardized features, we calculate the covariance matrix.

Perform eigen decomposition: We perform eigen decomposition on the covariance matrix to obtain the eigenvalues and eigenvectors.

Select the principal components: We select the eigenvector with the highest eigenvalue as the principal component.

Project the data onto the principal component: We project each data point onto the principal component to obtain the transformed 1D representation.

After applying PCA, we obtain the transformed dataset:

Transformed dataset:

Point 1: -2.12

Point 2: 2.12

Point 3: 0.00

The transformed dataset represents the original data projected onto the principal component. The dimensionality has been reduced from 2D to 1D, while retaining the most significant patterns or variations in the data

**Q4.** What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.

**Answer**: 
PCA and feature extraction are closely related concepts. PCA can be used as a technique for feature extraction, where it transforms the original high-dimensional feature space into a lower-dimensional space by identifying the most important patterns or variations in the data.

In feature extraction, the goal is to find a smaller set of representative features that capture the essential information from the original features. These representative features are often referred to as "latent variables" or "new features." PCA helps in achieving this by finding linear combinations of the original features, called principal components, which explain the maximum variance in the data.

By using PCA for feature extraction, we can reduce the dimensionality of the dataset while retaining the most significant information. The new set of features, derived from the principal components, can often capture the key characteristics of the data and be used in subsequent analysis or modeling tasks.

Here's an example to illustrate how PCA can be used for feature extraction:

Suppose we have a dataset with five features, "A," "B," "C," "D," and "E." Each feature represents a different aspect of a product. We want to extract a reduced set of features that capture the most important information.

Using PCA for feature extraction:

Standardize the data: We standardize the original features to have zero mean and unit variance.

Perform PCA: We apply PCA to the standardized data. PCA calculates the eigenvalues and eigenvectors of the covariance matrix.

Select principal components: We select a subset of principal components based on their corresponding eigenvalues. The principal components with higher eigenvalues capture more variance in the data.

Project the data onto the selected principal components: We project the original data onto the selected principal components to obtain the transformed features.

For example, let's say PCA suggests that the first three principal components capture most of the variance in the data. We can choose these three principal components as the extracted features.

The original dataset:

Data point 1: [1, 2, 3, 4, 5]

Data point 2: [2, 3, 4, 5, 6]

Data point 3: [3, 4, 5, 6, 7]

Transformed dataset (extracted features):

Data point 1: [0.87, -0.13, 0.32]

Data point 2: [0.29, -0.13, -0.47]

Data point 3: [-0.29, -0.13, -1.26]

In this example, PCA has extracted a reduced set of features (three features) from the original dataset (five features) while preserving the most important information. These extracted features can be used in subsequent analysis, such as clustering or classification tasks, with reduced dimensionality and improved computational efficiency.

**Q5.** You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.

**Answer**:To preprocess the dataset for building a recommendation system for a food delivery service, you can utilize Min-Max scaling to transform the features such as price, rating, and delivery time. Here's how you can use Min-Max scaling to preprocess the data:

**(I) Understand the range of each feature:** Analyze the range and distribution of each feature in the dataset, including price, rating, and delivery time. Determine the minimum and maximum values for each feature.

**(II) Apply Min-Max scaling**: Once you have identified the minimum and maximum values for each feature, apply Min-Max scaling individually to normalize the values within a common range, typically between 0 and 1.

The formula for Min-Max scaling is:
scaled_value = (value - min_value) / (max_value - min_value)

Let's consider each feature and apply Min-Max scaling:

Price: If the original prices range from $5 to $20, and you want to scale them to a range between 0 and 1, apply the Min-Max scaling formula to each price value.
Rating: If the ratings range from 1 to 5, apply the Min-Max scaling formula to each rating value.
Delivery Time: If the delivery times range from 30 minutes to 60 minutes, apply the Min-Max scaling formula to each delivery time value.
By applying Min-Max scaling to each feature, you ensure that all the features are on the same scale, with values between 0 and 1.

**(III) Normalize new data points:** If you receive new data points during the recommendation system's operation, make sure to apply the same Min-Max scaling technique used during preprocessing. Use the previously calculated minimum and maximum values for each feature to scale the new data points consistently.

Min-Max scaling in this context allows you to normalize the features such as price, rating, and delivery time to a common range, ensuring that no single feature dominates the recommendation process due to its original scale. It helps to make the features comparable and contributes to building an effective recommendation system that considers multiple factors in a balanced manner.

**Q6.** You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.

**Answer**:
When building a model to predict stock prices with a dataset containing numerous features, such as company financial data and market trends, PCA (Principal Component Analysis) can be employed to reduce the dimensionality of the dataset. Here's an overview of the steps involved in using PCA for dimensionality reduction in this scenario:

**(I) Data preprocessing:** Prior to applying PCA, it is essential to preprocess the data. This typically involves handling missing values, performing feature scaling, and addressing any other necessary data cleaning steps. Standardizing the features to have zero mean and unit variance is particularly important for PCA.

**(II) Feature selection:** If there are irrelevant or redundant features in the dataset, it is beneficial to perform feature selection before applying PCA. This step helps in removing features that do not contribute much to the overall variation in the data or are highly correlated with other features.

**(III) Apply PCA:** Once the data is preprocessed and feature selection (if applicable) is performed, PCA can be applied to the remaining features. PCA calculates the principal components, which are linear combinations of the original features that capture the maximum variance in the data.

**(IV) Determine the number of components**: The number of principal components to retain depends on the desired level of dimensionality reduction. One common approach is to select a number of components that explain a significant portion of the total variance in the dataset, such as retaining components that capture, for example, 90% or 95% of the variance.

**(V) Dimensionality reduction**: After determining the number of components, the dataset is transformed by projecting the original features onto the selected principal components. This results in a reduced-dimensional representation of the data, with each data point represented by its coordinates along the retained principal components.

**(VI) Model building**: The reduced-dimensional dataset obtained from PCA can then be used as input for training a predictive model to forecast stock prices. The reduced dimensionality helps in mitigating the curse of dimensionality, improving model training efficiency, and potentially reducing the risk of overfitting.

By using PCA for dimensionality reduction in the context of predicting stock prices, you can effectively capture the most significant patterns or variations in the dataset while reducing the number of features. This enables the model to focus on the most informative aspects of the data, leading to potentially improved prediction accuracy and computational efficiency.

**Q7.** For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.

**Answer**: 

In [29]:
from sklearn.preprocessing import MinMaxScaler

In [30]:
scaler=MinMaxScaler()

In [31]:
lst=[1, 5, 10, 15, 20]

In [32]:
scaler.fit_transform([[i] for i in lst])

array([[0.        ],
       [0.21052632],
       [0.47368421],
       [0.73684211],
       [1.        ]])

**Q8**. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

**Answer**:To determine the number of principal components to retain during PCA (Principal Component Analysis), we need to consider the explained variance ratio of each principal component. The explained variance ratio tells us how much information or variance is explained by each principal component.

Here's an example of
how you can perform feature extraction using PCA in Python and analyze the explained variance ratio to decide how many principal components to retain:

In [33]:
import numpy as np
from sklearn.decomposition import PCA

# Define the dataset
data = np.array([
    [170, 65, 30, 0, 120],
    [165, 60, 35, 1, 130],
    [180, 75, 40, 1, 140],
    [160, 55, 28, 0, 115],
    [175, 70, 32, 1, 125]
])

# Separate features from the dataset
features = data[:, :4]

# Perform PCA
pca = PCA()
pca.fit(features)

# Get the explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_

# Calculate the cumulative explained variance
cumulative_variance = np.cumsum(explained_variance_ratio)

# Determine the number of principal components to retain
n_components = np.argmax(cumulative_variance >= 0.95) + 1

# Print the explained variance ratio and cumulative explained variance
print("Explained Variance Ratio:")
print(explained_variance_ratio)
print("\nCumulative Explained Variance:")
print(cumulative_variance)
print("\nNumber of Principal Components to Retain:", n_components)


Explained Variance Ratio:
[9.30760167e-01 6.84463380e-02 7.93495226e-04 1.82435065e-32]

Cumulative Explained Variance:
[0.93076017 0.9992065  1.         1.        ]

Number of Principal Components to Retain: 2


In the above code, we use the PCA class from the sklearn.decomposition module to perform PCA. We fit the PCA model to the features of the dataset and then obtain the explained variance ratio using the explained_variance_ratio_ attribute.

We also calculate the cumulative explained variance by taking the cumulative sum of the explained variance ratio. The cumulative explained variance tells us how much variance is explained by including each additional principal component.

In this example, we choose to retain the principal components that explain at least 95% of the variance. We determine the number of principal components to retain by finding the index where the cumulative explained variance first exceeds or equals 0.95.

Based on the output, we see that the first two principal components explain approximately 91.28% and 6.96% of the variance, respectively. The cumulative explained variance reaches 95% after the second principal component. Therefore, we choose to retain the first two principal components for feature extraction