In [None]:
Min-Max scaling is a data preprocessing technique used to scale numerical features to a fixed range, 
usually between 0 and 1. It is done by subtracting the minimum value of the feature and then dividing 
by the range of the feature (the maximum value minus the minimum value).

In [1]:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

# Sample dataset
data = {
    'age_of_houses': [20, 50, 80],
    'size_of_houses': [1000, 3000, 5000]
}
df = pd.DataFrame(data)

scaler = MinMaxScaler()

scaled_data = scaler.fit_transform(df)

scaled_df = pd.DataFrame(scaled_data, columns=df.columns)

print("Original Data:")
print(df)
print("\nScaled Data:")
print(scaled_df)


Original Data:
   age_of_houses  size_of_houses
0             20            1000
1             50            3000
2             80            5000

Scaled Data:
   age_of_houses  size_of_houses
0            0.0             0.0
1            0.5             0.5
2            1.0             1.0


In [None]:
The Unit Vector technique in feature scaling, also known as normalization, scales each feature so that 
the magnitude of each feature vector is 1

In [2]:
from sklearn.preprocessing import Normalizer
import pandas as pd


data = {
    'x': [2, 4, 6],
    'y': [1, 3, 5]
}
df = pd.DataFrame(data)


normalizer = Normalizer(norm='l2')  


normalized_data = normalizer.fit_transform(df)


normalized_df = pd.DataFrame(normalized_data, columns=df.columns)

print("Original Data:")
print(df)
print("\nNormalized Data:")
print(normalized_df)


Original Data:
   x  y
0  2  1
1  4  3
2  6  5

Normalized Data:
          x         y
0  0.894427  0.447214
1  0.800000  0.600000
2  0.768221  0.640184


In [None]:
PCA (Principal Component Analysis) is a dimensionality reduction technique used to reduce the number of 
features (or dimensions) in a dataset while retaining as much variance as possible. It does this by 
transforming the original features into a new set of orthogonal (uncorrelated) features called principal 
components. These principal components are ordered by the amount of variance they explain in the data, 
with the first component explaining the most variance.

PCA is used in dimensionality reduction to:

Reduce Overfitting: By reducing the number of features, PCA can help reduce overfitting in machine learning
models.
Improve Model Performance: PCA can improve the performance of machine learning models by focusing on the 
most important features.
Visualize High-dimensional Data: PCA can be used to visualize high-dimensional data in a lower-dimensional 
space.

In [4]:
from sklearn.decomposition import PCA
import pandas as pd


data = {
    'x1': [1.0, 4.0, 7.0],
    'x2': [2.0, 5.0, 8.0],
    'x3': [3.0, 6.0, 9.0]
}
df = pd.DataFrame(data)


pca = PCA(n_components=2)

pca_data = pca.fit_transform(df)

pca_df = pd.DataFrame(data=pca_data, columns=['PC1', 'PC2'])

print("Original Data:")
print(df)
print("\nPCA-transformed Data:")
print(pca_df)
print("\nExplained Variance Ratio:")
print(pca.explained_variance_ratio_)


Original Data:
    x1   x2   x3
0  1.0  2.0  3.0
1  4.0  5.0  6.0
2  7.0  8.0  9.0

PCA-transformed Data:
        PC1           PC2
0 -5.196152  2.563950e-16
1  0.000000  0.000000e+00
2  5.196152  2.563950e-16

Explained Variance Ratio:
[1.00000000e+00 2.43475588e-33]


In [None]:
PCA can be used for feature extraction by transforming the original features into a new set of features 
(principal components) that capture the most important information in the data. This reduces the 
dimensionality of the dataset while retaining as much variance as possible, making it useful for 
reducing overfitting and improving model performance.

In [5]:
from sklearn.decomposition import PCA
import pandas as pd


data = {
    'feature1': [1, 2, 3, 4],
    'feature2': [4, 3, 2, 1],
    'feature3': [1, 3, 2, 4],
    'feature4': [2, 4, 1, 3]
}
df = pd.DataFrame(data)


pca = PCA(n_components=2)

extracted_features = pca.fit_transform(df)

extracted_df = pd.DataFrame(data=extracted_features, columns=['PC1', 'PC2'])

print("Original Data:")
print(df)
print("\nExtracted Features (PC1 and PC2):")
print(extracted_df)


Original Data:
   feature1  feature2  feature3  feature4
0         1         4         1         2
1         2         3         3         4
2         3         2         2         1
3         4         1         4         3

Extracted Features (PC1 and PC2):
            PC1       PC2
0  2.645751e+00 -0.000000
1 -5.192593e-16  1.732051
2  3.115556e-16 -1.732051
3 -2.645751e+00 -0.000000


In [None]:
Understand the Dataset: Start by understanding the dataset and the features it
contains, such as price, rating, and delivery time.

Apply Min-Max Scaling: For each feature (price, rating, delivery time), apply Min-Max scaling to 
scale the values to a range between 0 and 1. This ensures that all features are on a similar scale,
which is important for many machine learning algorithms.

Interpretation: After scaling the data, the features (price, rating, delivery time) will be transformed
to a range between 0 and 1. This makes it easier to compare and analyze the features in the context of the
recommendation system.

In [None]:
Standardize the Data: Ensure that each feature has a mean of 0 and a standard deviation of 1 to make all 
features contribute equally to the principal components.

Apply PCA: Transform the standardized data into its principal components, which identify the directions 
along which the data varies the most.

Select the Number of Components: Choose the number of principal components to retain based on the explained
variance ratio, indicating the proportion of variance in the original data explained by each component.

Project the Data: Project the original data onto the selected principal components to obtain the 
reduced-dimensional dataset

In [6]:
import numpy as np


data = np.array([1, 5, 10, 15, 20])

min_val = -1
max_val = 1
min_data = np.min(data)
max_data = np.max(data)
scaled_data = ((data - min_data) / (max_data - min_data)) * (max_val - min_val) + min_val

print("Original Data:", data)
print("Scaled Data:", scaled_data)


Original Data: [ 1  5 10 15 20]
Scaled Data: [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


In [9]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import pandas as pd


data = {
    'height': [170, 180, 165, 175, 160],
    'weight': [70, 80, 60, 75, 55],
    'age': [30, 35, 25, 40, 22],
    'gender': ['male', 'male', 'female', 'female', 'female'],
    'blood_pressure': [120, 130, 110, 125, 105]
}
df = pd.DataFrame(data)


df['gender'] = df['gender'].map({'male': 0, 'female': 1})

scaler = StandardScaler()
standardized_data = scaler.fit_transform(df[['height', 'weight', 'age', 'gender', 'blood_pressure']])

pca = PCA()
pca.fit(standardized_data)

total_variance = sum(pca.explained_variance_ratio_)
for i, explained_variance in enumerate(pca.explained_variance_ratio_):
    if sum(pca.explained_variance_ratio_[:i+1]) >= 0.95 * total_variance:
        n_components = i+1
        break

print("Number of Principal Components to Retain:", n_components)


Number of Principal Components to Retain: 2
