Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.

Min-Max scaling, also known as normalization, is a data preprocessing technique used to scale the features of a dataset to a specific range. The purpose of Min-Max scaling is to transform the data so that it falls within a predefined range, typically between 0 and 1.

The formula to perform Min-Max scaling on a feature is as follows:

scaled_value = (value - min_value) / (max_value - min_value)

In [1]:
#for example 
import pandas as pd

In [3]:
df=pd.DataFrame([20, 30, 40, 50, 60],columns=["age"])

In [5]:
from sklearn.preprocessing import MinMaxScaler
min_max=MinMaxScaler()

In [8]:
min_max.fit_transform(df[["age"]])

array([[0.  ],
       [0.25],
       [0.5 ],
       [0.75],
       [1.  ]])

Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application

The Unit Vector technique, also known as vector normalization, is a feature scaling method used to transform feature vectors into unit vectors. It scales the features in such a way that each vector has a length of 1, while maintaining the direction of the original vector. Unit Vector scaling is commonly used in machine learning algorithms that rely on distance calculations, such as k-nearest neighbors (KNN) and support vector machines (SVM).

The formula to calculate the unit vector of a feature vector is as follows:

unit_vector = vector / ||vector||

In [37]:
import pandas as pd

data={"x":[2, 4, 6, 8],"y":[1, 3, 5, 7]}
df1=pd.DataFrame(data,columns=["x","y"])

In [39]:
from sklearn.preprocessing import normalize

In [41]:
pd.DataFrame(normalize(df1[["x","y"]]))

Unnamed: 0,0,1
0,0.894427,0.447214
1,0.8,0.6
2,0.768221,0.640184
3,0.752577,0.658505


Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.

Principal Component Analysis (PCA) is a widely used technique in data analysis and machine learning for dimensionality reduction. It transforms a dataset containing a high number of variables into a new set of variables called principal components. These components are linear combinations of the original variables and capture the maximum amount of variance in the data.

Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.

PCA can be used as a feature extraction technique where it identifies the most important features (principal components) that explain the variance in the data. By selecting a subset of these components, PCA effectively reduces the dimensionality of the dataset while preserving the most relevant information. The extracted components can then be used as new features in subsequent analysis or modeling tasks. For example, in facial recognition, PCA can be used to extract facial features such as eyes, nose, and mouth from images, representing them as principal components for further processing.

Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data

Price: If the price feature has a wide range of values, Min-Max scaling can be applied to normalize the values between 0 and 1. This ensures that the price values are on a similar scale and prevents one feature from dominating the recommendation process.

Rating: The rating feature may have a range of values such as 1 to 5 or 0 to 10. Min-Max scaling can be used to normalize these values between 0 and 1, allowing a fair comparison between different restaurants based on their ratings.

Delivery time: The delivery time feature might have varying ranges, such as 10 minutes to 60 minutes or 20 minutes to 120 minutes. Min-Max scaling can be employed to scale these values between 0 and 1, making them comparable and ensuring that delivery time does not disproportionately influence the recommendation system.

In [73]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

np.random.seed(34)
data={
    "price":np.random.randint(1,50,size=100),
    "raiting":np.random.randint(1,5,size=100),
    "delivery_time":np.random.randint(10.60,size=100),
    
}

df=pd.DataFrame(data)

min_max=MinMaxScaler()
scaled_data=min_max.fit_transform(df)
scaled_data_df=pd.DataFrame(scaled_data,columns=df.columns)

print("original_data:")
print(df.head())
print("\nscaled data:")
print(scaled_data_df.head())

original_data:
   price  raiting  delivery_time
0     34        2              8
1     43        4              3
2     42        2              8
3     22        4              7
4      5        1              9

scaled data:
      price   raiting  delivery_time
0  0.687500  0.333333       0.888889
1  0.875000  1.000000       0.333333
2  0.854167  0.333333       0.888889
3  0.437500  1.000000       0.777778
4  0.083333  0.000000       1.000000


Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company fin
ancial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.

Normalize the dataset to ensure that all features have the same scale.

Compute the covariance matrix of the normalized dataset.

Perform eigenvalue decomposition on the covariance matrix to obtain the eigenvectors and eigenvalues.

Select the top k eigenvectors based on the corresponding largest eigenvalues.

Project the original dataset onto the selected eigenvectors to obtain a reduced-dimensional representation of the data.


Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.

In [76]:
df2=pd.DataFrame([1, 5, 10, 15, 20],columns=["x"])

In [77]:
df2

Unnamed: 0,x
0,1
1,5
2,10
3,15
4,20


In [79]:
scaler1=MinMaxScaler(feature_range=(-1,1))


In [80]:
scaler1.fit_transform(df2[["x"]])

array([[-1.        ],
       [-0.57894737],
       [-0.05263158],
       [ 0.47368421],
       [ 1.        ]])

Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why

Calculate the covariance matrix for the dataset, which measures the relationships between the features.

Compute the eigenvalues and eigenvectors of the covariance matrix.

Analyze the explained variance ratio associated with each principal component. This ratio indicates the amount of variance in 
the data captured by each component.

Determine the number of principal components to retain by selecting a threshold for the cumulative explained variance ratio. You might choose a threshold that captures a significant portion of the total variance, such as 90%.