In [None]:
Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.


In [None]:
Min-Max scaling is a data normalization technique that scales the features of a dataset to a range between 0 and 1.
It is used in data preprocessing to bring all features to a common scale, so that no single feature dominates the 
model's learning process.

The Min-Max scaling can be easily implemented using the scikit-learn library in Python. We can use the MinMaxScaler 
class from the preprocessing module of scikit-learn to perform Min-Max scaling.

Here's an example of how to use Min-Max scaling in scikit-learn:

In [1]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Create a sample dataframe
data = {'age': [20, 25, 30, 35, 40],
        'income': [25000, 30000, 40000, 50000, 60000]}
df = pd.DataFrame(data)

# Instantiate the MinMaxScaler object
scaler = MinMaxScaler()

# Apply Min-Max scaling to the dataframe
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# Print the scaled dataframe
print(df_scaled)


    age    income
0  0.00  0.000000
1  0.25  0.142857
2  0.50  0.428571
3  0.75  0.714286
4  1.00  1.000000


In [None]:
Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.


In [None]:
The Unit Vector technique in feature scaling, also known as the "Normalization" technique, is a method to rescale 
the feature vector to have a length of 1. The process involves dividing each feature value by the L2-norm of the 
feature vector. This method is used to bring all feature vectors to the same scale and to remove the effect of
magnitude from the feature vector.

The Unit Vector technique can be easily implemented using the scikit-learn library in Python. We can use the
Normalizer class from the preprocessing module of scikit-learn to perform Unit Vector scaling.

In [2]:
import pandas as pd
from sklearn.preprocessing import Normalizer

# Create a sample dataframe
data = {'age': [20, 25, 30, 35, 40],
        'income': [25000, 30000, 40000, 50000, 60000]}
df = pd.DataFrame(data)

# Instantiate the Normalizer object
normalizer = Normalizer(norm='l2')

# Apply Unit Vector scaling to the dataframe
df_scaled = pd.DataFrame(normalizer.transform(df), columns=df.columns)

# Print the scaled dataframe
print(df_scaled)


        age  income
0  0.000800     1.0
1  0.000833     1.0
2  0.000750     1.0
3  0.000700     1.0
4  0.000667     1.0




In [None]:
Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.


In [None]:
PCA, or Principal Component Analysis, is a statistical technique used for reducing the dimensionality of 
high-dimensional data while retaining most of its original variability. It works by transforming the original 
data into a lower-dimensional space, where the new dimensions are called principal components. These principal 
components are linear combinations of the original features, and each successive principal component captures
as much of the remaining variability in the data as possible. The first principal component is the linear
combination of the original features that captures the most variability in the data, the second principal
component captures the next most variability, and so on.

PCA is used for dimensionality reduction because it can reduce the number of features in a dataset while 
retaining most of the information present in the data. This can be useful in many applications, such as in 
image processing, where a large number of pixels can be compressed into a smaller number of principal components
without losing much information. PCA is also commonly used in data visualization and exploratory data analysis to 
identify patterns in high-dimensional data.

In [3]:
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA

# Load the iris dataset
iris = load_iris()

# Instantiate the PCA object
pca = PCA(n_components=2)

# Apply PCA to the dataset
X_pca = pca.fit_transform(iris.data)

# Print the explained variance ratio of the principal components
print(pca.explained_variance_ratio_)

# Print the transformed dataset
print(X_pca)


[0.92461872 0.05306648]
[[-2.68412563  0.31939725]
 [-2.71414169 -0.17700123]
 [-2.88899057 -0.14494943]
 [-2.74534286 -0.31829898]
 [-2.72871654  0.32675451]
 [-2.28085963  0.74133045]
 [-2.82053775 -0.08946138]
 [-2.62614497  0.16338496]
 [-2.88638273 -0.57831175]
 [-2.6727558  -0.11377425]
 [-2.50694709  0.6450689 ]
 [-2.61275523  0.01472994]
 [-2.78610927 -0.235112  ]
 [-3.22380374 -0.51139459]
 [-2.64475039  1.17876464]
 [-2.38603903  1.33806233]
 [-2.62352788  0.81067951]
 [-2.64829671  0.31184914]
 [-2.19982032  0.87283904]
 [-2.5879864   0.51356031]
 [-2.31025622  0.39134594]
 [-2.54370523  0.43299606]
 [-3.21593942  0.13346807]
 [-2.30273318  0.09870885]
 [-2.35575405 -0.03728186]
 [-2.50666891 -0.14601688]
 [-2.46882007  0.13095149]
 [-2.56231991  0.36771886]
 [-2.63953472  0.31203998]
 [-2.63198939 -0.19696122]
 [-2.58739848 -0.20431849]
 [-2.4099325   0.41092426]
 [-2.64886233  0.81336382]
 [-2.59873675  1.09314576]
 [-2.63692688 -0.12132235]
 [-2.86624165  0.06936447]
 [-2

In [None]:
Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.


In [None]:
PCA and feature extraction are closely related concepts. In fact, PCA can be used as a feature extraction technique.

Feature extraction is the process of transforming raw data into a set of features that are more informative and
relevant for a particular task. Feature extraction is commonly used in machine learning and computer vision 
applications to reduce the dimensionality of the data and extract meaningful features that can be used for 
classification, clustering, or other tasks.

PCA is a technique for dimensionality reduction that can be used for feature extraction. By identifying the
principal components of a dataset, PCA can extract a set of features that capture most of the variability in 
the data. These features can be used for further analysis or modeling.

In [None]:
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA

# Load the digits dataset
digits = load_digits()

# Instantiate the PCA object
pca = PCA(n_components=10)

# Apply PCA to the dataset
X_pca = pca.fit_transform(digits.data)

# Print the shape of the transformed dataset
print(X_pca.shape)


In [None]:
Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.


In [None]:
In a recommendation system for a food delivery service, it's important to preprocess the data before using it for 
modeling. One common technique for preprocessing numerical features is Min-Max scaling.

Min-Max scaling transforms the features so that they have a minimum value of 0 and a maximum value of 1. 
This technique can help to normalize the data and make it easier to compare and analyze the different features.

To use Min-Max scaling to preprocess the features in the food delivery service dataset, you would follow these steps:

Identify the numerical features that need to be scaled. In this case, the features might include price, rating,
and delivery time.

Calculate the minimum and maximum values for each feature in the dataset.

Use the formula (x - min) / (max - min) to scale each feature, where x is the original value, min is the minimum 
value for the feature, and max is the maximum value for the feature.

Replace the original values with the scaled values in the dataset.

In [None]:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

# Load the food delivery service dataset
df = pd.read_csv('food_delivery_service.csv')

# Identify the numerical features that need to be scaled
features_to_scale = ['price', 'rating', 'delivery_time']

# Instantiate the MinMaxScaler object
scaler = MinMaxScaler()

# Apply Min-Max scaling to the numerical features
df[features_to_scale] = scaler.fit_transform(df[features_to_scale])

# Print the preprocessed dataset
print(df.head())


In [None]:
Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.


In [None]:
When working with a large dataset containing many features, such as in the case of predicting stock prices, it can 
be helpful to use dimensionality reduction techniques like PCA to reduce the number of features and simplify the data.

PCA (Principal Component Analysis) is a technique that can be used to identify patterns in data and reduce the 
dimensionality of the dataset by transforming it into a new set of variables called principal components. 
These principal components are linear combinations of the original features, and each component captures a certain amount of the variation in the data.

Here's how you could use PCA to reduce the dimensionality of a dataset for predicting stock prices:

First, preprocess the data by standardizing the features so that they have a mean of 0 and a standard deviation of 1.
This is important for PCA, as it assumes that the data is normally distributed and standardized.

Use PCA to transform the data into a new set of principal components. The number of principal components you choose 
will depend on the amount of variance you want to preserve in the data. You can use the scikit-learn library in Python to perform PCA on the data:

In the above example, we first loaded the dataset and separated the features from the target variable. We then 
standardized the features using the StandardScaler function from scikit-learn, and applied PCA with two components
using the PCA function from scikit-learn. We then fit the PCA model to the standardized features using the 
fit_transform method.

After obtaining the new set of principal components, you can use them to train your prediction model.
Each principal component captures a certain amount of the variation in the original data, so you can choose 
the principal components that explain the most variation in the data.
PCA can help to reduce the dimensionality of the dataset and eliminate redundant features that may not be 
informative for predicting stock prices. By transforming the data into a new set of principal components, 
you can simplify the data and improve the performance of your prediction model.

In [None]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load the dataset and separate the features from the target variable
X = df.drop('target', axis=1)
y = df['target']

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Perform PCA on the standardized features
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)


In [None]:
Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.


In [5]:
from sklearn.preprocessing import MinMaxScaler

data = [1, 5, 10, 15, 20]

scaler = MinMaxScaler(feature_range=(-1,1))
scaled_data = scaler.fit_transform([data])

print(scaled_data)


[[-1. -1. -1. -1. -1.]]


In [None]:
Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

In [None]:
The number of principal components to retain in PCA depends on the amount of variance we want to preserve in the
original data. We want to retain as much variance as possible while reducing the dimensionality of the data. 
A commonly used criterion is to retain enough principal components to explain a certain percentage of the variance 
in the data.

To determine how many principal components to retain in this example, we would need to calculate the percentage of 
variance explained by each principal component. This can be done by looking at the eigenvalues of the covariance 
matrix generated by the data.

Without knowing the specific dataset, it's difficult to give an exact answer, but we could start by looking at the
scree plot, which shows the proportion of variance explained by each principal component. We would then choose the 
number of principal components that explain a large portion of the variance in the data, while also keeping in mind 
the practical implications of using a lower-dimensional representation of the data.