Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.

ans->Min-Max scaling is a data preprocessing technique used to scale numerical features in a dataset to a specific range, typically between 0 and 1. It's useful when the features in your dataset have varying scales, and you want to ensure that they all have the same scale for machine learning algorithms that are sensitive to the magnitude of features.

x'= (x - x_min) / (x_max - x_min)

Suppose you have a dataset of ages as follows:

Person 1: Age = 25(min)
Person 2: Age = 45(max)
Person 3: Age = 30

After Min-Max scaling, your dataset will look like this:

Person 1: Age = 0
Person 2: Age = 1
Person 3: Age = 0.5
Now, all the ages are within the range of 0 to 1, making it easier to compare and analyze these values, especially when using machine learning algorithms that are sensitive to feature scales.

Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.

ans->The Unit Vector technique, also known as "Normalization" or "L2 Normalization," is a feature scaling method that scales the values of numerical features to have a unit norm or magnitude of 1. Unlike Min-Max scaling, which scales values to a specific range (e.g., 0 to 1), Unit Vector scaling focuses on the direction or orientation of the data points in a multidimensional space rather than their magnitude.

Example:

Suppose you have a dataset with two numerical features, 'Feature1' and 'Feature2,' and you want to normalize them using Unit Vector scaling:

Data point 1: (Feature1 = 3, Feature2 = 4)
Data point 2: (Feature1 = 1, Feature2 = 2)
Calculate the L2 norm for each data point:

For Data point 1: √(9+16) = 5
For Data point 2: √(1+4) = √5

Normalize each data point by dividing by its L2 norm:

Normalized Data point 1:  ( (3/5),(4/5) )

Normalized Data point 2:  ( (1/V5),(2/V5) )

In this case, both normalized data points have a magnitude (L2 norm) of 1. This scaling method is useful when the direction of the data points in high-dimensional space matters more than their absolute values. It's commonly used in machine learning algorithms that rely on distance measures, such as k-nearest neighbors (KNN) and support vector machines (SVM).

Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.

ans->PCA (Principal Component Analysis) is a dimensionality reduction technique used in data analysis and machine learning to transform high-dimensional data into a lower-dimensional representation while retaining as much of the original variance as possible. It achieves this by identifying and selecting a set of new orthogonal axes called principal components, which capture the most significant information in the data.

Here's how PCA works:

Standardize the data: If the features have different scales, it's essential to standardize them (mean = 0, standard deviation = 1) to give each feature equal importance in the analysis.

Compute the covariance matrix: Calculate the covariance between each pair of standardized features. The covariance matrix describes the relationships between features.

Calculate the eigenvalues and eigenvectors of the covariance matrix: The eigenvectors represent the principal components, and the eigenvalues indicate their importance. The principal components are sorted in descending order of their corresponding eigenvalues.

Select a subset of the principal components: Choose the top  k principal components that capture most of the variance in the data. Typically, you decide on the number of components based on how much variance you want to retain (e.g., 95% of the total variance).

Project the data onto the selected principal components: Transform the original data into the lower-dimensional space defined by the selected principal components.

Example:

Suppose you have a dataset with three features: 'Height,' 'Weight,' and 'Age.' You want to reduce the dimensionality of this dataset using PCA. After standardizing the data and computing the covariance matrix, you find the following eigenvalues and eigenvectors:

Eigenvalues:λ1=2.5,λ2=1.2,λ3=0.3


Eigenvector: PC1=[0.6,0.7,0.4], PC2=[−0.7,0.6,0.1],PC3=[0.3,0.3,−0.9]

You decide to retain the top two principal components, which capture 2.5 + 1.2 = 3.7 units of variance, representing a high proportion of the total variance. You project the data onto these two components:

Data point 1: [2, 150, 30] -> Projected to [2.1, 1.7]
Data point 2: [1.8, 160, 28] -> Projected to [2.0, 1.8]
...
Now, your data is in a lower-dimensional space with reduced complexity, making it easier to analyze and visualize while preserving the most important information.






Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.

ans->PCA is a technique for feature extraction, which involves reducing the dimensionality of a dataset while preserving essential information. It identifies and selects a subset of new features called principal components, based on their ability to capture the most significant variance in the original data. These principal components can be used as the new, lower-dimensional features in data analysis and machine learning.

Example:

Suppose you have a dataset with 10 numerical features, and you want to perform feature extraction using PCA. After standardizing the data, PCA reveals that the first three principal components capture the most significant variance.

You decide to retain these three components. Each data point in your dataset is then projected onto this lower-dimensional space defined by the three principal components, effectively reducing the data from 10 features to 3. These three components can be considered as the new features extracted from the original dataset.

By using PCA for feature extraction, you've reduced the dimensionality of your data while retaining the most important information, which can be helpful for improving the efficiency and interpretability of machine learning models or reducing the impact of the curse of dimensionality.

Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.

ans->i)Identify the range for each feature:

Price: Determine the minimum and maximum possible prices for food items.

Rating: Typically, ratings range from 1 to 5.

Delivery time: Determine the minimum and maximum expected delivery times.

ii)Apply Min-Max scaling individually to each feature:

For each feature, use the Min-Max scaling formula to transform the values into a common range of 0 to 1.

This ensures that all features have the same scale and do not dominate each other during recommendation calculations.

iii)After scaling, the data for each feature will be in the 0 to 1 range, making it suitable for building a recommendation system.

Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.

ans->To use PCA to reduce the dimensionality of the dataset for predicting stock prices:

1.Standardize the features: Ensure all features have the same scale.

2.Compute the covariance matrix of the standardized features.

3.Calculate the eigenvectors and eigenvalues of the covariance matrix.

4.Select the top principal components based on the explained variance or a chosen threshold.

5.Project the data onto these selected principal components.

This reduces the dataset's dimensionality while retaining the most relevant information for stock price prediction, potentially improving model efficiency and performance.

Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.

In [3]:
l1=[1, 5, 10, 15, 20]
l2=[]
for i in l1:
    x= (i-min(l1))/(max(l1)-min(l1))
    l2.append(x)

In [4]:
l2

[0.0, 0.21052631578947367, 0.47368421052631576, 0.7368421052631579, 1.0]

Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

ans->The number of principal components to retain in PCA for feature extraction depends on your specific goals and the explained variance you want to capture. Common approaches include retaining enough components to capture a high percentage of the variance (e.g., 95% or 99%) or using domain knowledge to select relevant components. The choice should align with your project's requirements and computational constraints.