Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.



Min-Max scaling is a common data preprocessing technique used to normalize the numeric features of a dataset. It rescales the values of a feature to a fixed range, usually between 0 and 1, based on the minimum and maximum values of that feature.

The formula for Min-Max scaling is:

scaled_value = (value - min_value) / (max_value - min_value)



In [1]:

from sklearn.preprocessing import MinMaxScaler
import pandas as pd

data = [[500, 1],
        [1000, 2],
        [1500, 3],
        [2000, 4]]


min_max = MinMaxScaler()
scaled_data = min_max.fit_transform(data)




In [2]:
print(scaled_data)

[[0.         0.        ]
 [0.33333333 0.33333333]
 [0.66666667 0.66666667]
 [1.         1.        ]]


Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.



The Unit Vector technique, also known as normalization or feature scaling by vector norm, is a method used to scale the feature vectors in a dataset to have a unit norm or length. It aims to normalize the feature vectors so that they all have the same scale but retain their direction or orientation.

In [3]:
from sklearn.preprocessing import normalize

# Example dataset
data = [[2, 3],
        [1, 4],
        [3, 5]]

# Applying Unit Vector scaling to the data
scaled_data = normalize(data, norm='l2')

# Printing the scaled values
print("Scaled data:\n", scaled_data)


Scaled data:
 [[0.5547002  0.83205029]
 [0.24253563 0.9701425 ]
 [0.51449576 0.85749293]]


Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.



PCA, which stands for Principal Component Analysis, is a widely used dimensionality reduction technique in machine learning and data analysis. It helps to reduce the dimensionality of a dataset while retaining the most important information or patterns in the data.

In [4]:
import numpy as np
from sklearn.decomposition import PCA

# Example dataset
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9],
                 [10, 11, 12]])

# Creating an instance of PCA with 2 components
pca = PCA(n_components=2)

# Fitting the PCA model to the data and transforming it
reduced_data = pca.fit_transform(data)

# Printing the reduced data and explained variance ratio
print("Reduced data:\n", reduced_data)
print("Explained variance ratio:", pca.explained_variance_ratio_)


Reduced data:
 [[-7.79422863  0.        ]
 [-2.59807621  0.        ]
 [ 2.59807621  0.        ]
 [ 7.79422863 -0.        ]]
Explained variance ratio: [1. 0.]


Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.



PCA can be used as a feature extraction technique in addition to being a dimensionality reduction method. In the context of feature extraction, PCA is used to transform the original features into a new set of features, called principal components, that capture the most important information or patterns in the data. These principal components can be considered as a compressed representation of the original features.

In [5]:
import numpy as np
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features
y = iris.target  # Target

# Standardize the features
scaler = StandardScaler()
#X_standardized = (X - np.mean(X, axis=0)) / np.std(X, axis=0)

X_standardized = scaler.fit(X)
X_standardized = scaler.transform(X)

# Apply PCA for feature extraction
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_standardized)

# Print the transformed features
print("Transformed features:\n", X_pca)


Transformed features:
 [[-2.26470281  0.4800266 ]
 [-2.08096115 -0.67413356]
 [-2.36422905 -0.34190802]
 [-2.29938422 -0.59739451]
 [-2.38984217  0.64683538]
 [-2.07563095  1.48917752]
 [-2.44402884  0.0476442 ]
 [-2.23284716  0.22314807]
 [-2.33464048 -1.11532768]
 [-2.18432817 -0.46901356]
 [-2.1663101   1.04369065]
 [-2.32613087  0.13307834]
 [-2.2184509  -0.72867617]
 [-2.6331007  -0.96150673]
 [-2.1987406   1.86005711]
 [-2.26221453  2.68628449]
 [-2.2075877   1.48360936]
 [-2.19034951  0.48883832]
 [-1.898572    1.40501879]
 [-2.34336905  1.12784938]
 [-1.914323    0.40885571]
 [-2.20701284  0.92412143]
 [-2.7743447   0.45834367]
 [-1.81866953  0.08555853]
 [-2.22716331  0.13725446]
 [-1.95184633 -0.62561859]
 [-2.05115137  0.24216355]
 [-2.16857717  0.52714953]
 [-2.13956345  0.31321781]
 [-2.26526149 -0.3377319 ]
 [-2.14012214 -0.50454069]
 [-1.83159477  0.42369507]
 [-2.61494794  1.79357586]
 [-2.44617739  2.15072788]
 [-2.10997488 -0.46020184]
 [-2.2078089  -0.2061074 ]
 [-2.

Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.


In [6]:
from sklearn.preprocessing import MinMaxScaler

# Example dataset
data = [[10, 4.5, 30],
        [20, 3.8, 40],
        [15, 4.0, 35],
        [25, 4.2, 45]]

# Create an instance of MinMaxScaler
scaler = MinMaxScaler()

# Fit the scaler to the data and transform it
scaled_data = scaler.fit_transform(data)

# Print the scaled data
print("Scaled data:\n", scaled_data)


Scaled data:
 [[0.         1.         0.        ]
 [0.66666667 0.         0.66666667]
 [0.33333333 0.28571429 0.33333333]
 [1.         0.57142857 1.        ]]


Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.


In [7]:
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Example dataset
dataset = np.array([[100, 50, 200, 0.05, 0.01],
                    [150, 40, 180, 0.04, 0.02],
                    [120, 45, 220, 0.06, 0.03],
                    [130, 55, 210, 0.07, 0.01]])

# Preprocess the data by standardizing the features
scaler = StandardScaler()
scaled_dataset = scaler.fit_transform(dataset)

# Create an instance of PCA with the desired number of components
pca = PCA(n_components=3)

# Fit PCA to the preprocessed data
pca.fit(scaled_dataset)

# Analyze explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_
print("Explained variance ratio:", explained_variance_ratio)

# Transform the data using PCA
reduced_dataset = pca.transform(scaled_dataset)

# Print the reduced dataset
print("Reduced dataset:\n", reduced_dataset)


Explained variance ratio: [0.56703854 0.27809174 0.15486973]
Reduced dataset:
 [[-0.67124338 -1.0825524   1.24392235]
 [ 2.76094766 -0.3810689  -0.40028267]
 [-0.30866636  1.99149472  0.29719392]
 [-1.78103793 -0.52787342 -1.14083359]]


Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.


In [8]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Given dataset
data = np.array([1, 5, 10, 15, 20])

# Create an instance of MinMaxScaler
scaler = MinMaxScaler(feature_range=(-1, 1))

# Reshape the data to 2D as MinMaxScaler expects a 2D array
data_reshaped = data.reshape(-1, 1)

# Fit and transform the data using MinMaxScaler
min_max_scaled_data = scaler.fit_transform(data_reshaped)

# Flatten the scaled data to 1D array
min_max_scaled_data = min_max_scaled_data.flatten()

# Print the Min-Max scaled data
print("Min-Max scaled data:\n", min_max_scaled_data)


Min-Max scaled data:
 [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

In [9]:
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Example dataset
dataset = np.array([[170, 65, 30, 0, 120],
                    [160, 55, 40, 1, 130],
                    [175, 70, 35, 1, 125],
                    [180, 75, 45, 0, 140]])

# Preprocess the data by standardizing the features
scaler = StandardScaler()
scaled_dataset = scaler.fit_transform(dataset)

# Create an instance of PCA
pca = PCA()

# Fit PCA to the preprocessed data
pca.fit(scaled_dataset)

# Analyze explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_

# Print the explained variance ratio
print("Explained variance ratio:", explained_variance_ratio)

# Determine the number of principal components to retain
cumulative_variance_ratio = np.cumsum(explained_variance_ratio)
num_components_to_retain = np.argmax(cumulative_variance_ratio >= 0.95) + 1

# Print the number of principal components to retain
print("Number of principal components to retain:", num_components_to_retain)
ss

Explained variance ratio: [5.61656653e-01 3.17805122e-01 1.20538225e-01 4.30805467e-33]
Number of principal components to retain: 3


Will chose the number of PCA on basis of explained varriance ratio