In [50]:
# Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
# application.

In [51]:
# Min-Max scaling, also known as normalization, is a preprocessing technique used to rescale feature values to a specific
# range, typically [0, 1]. This method is important when features in a dataset have different scales, as it ensures all 
# features contribute equally to machine learning algorithms, particularly those relying on distance metrics.

In [52]:
# Example:
# For a feature Age with values ranging from 20 to 60, Min-Max scaling transforms these values into a range from 0 to 1.

In [53]:
# Application:
# Min-Max scaling is implemented in Python using sklearn's MinMaxScaler. This transformation is commonly applied before 
# training machine learning models to improve their performance and ensure faster convergence.

# Overall, Min-Max scaling helps standardize features, making them suitable for algorithms that are sensitive to feature 
# scales.

In [54]:
# Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
# Provide an example to illustrate its application.

In [55]:
# Unit Vector Scaling normalizes each feature vector to have a magnitude of 1, preserving the direction of the vector.

# Min-Max Scaling rescales individual features to a specific range, focusing on the spread of feature values rather than 
# their overall magnitude.

# Application: Unit Vector scaling is useful when the directionality of data points is important, such as in certain text 
# mining and clustering tasks.

In [56]:
# Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
# example to illustrate its application.

In [57]:
# PCA is a technique for reducing the dimensionality of a dataset by transforming it into a set of uncorrelated variables called principal components.

# How It Works: PCA identifies the directions (principal components) that capture the most variance in the data and projects 
# the data onto these components.

# Application: PCA is used to reduce the number of features in a dataset while retaining as much variance (information) as
# possible, which is helpful in reducing computation time, simplifying models, and visualizing data.

In [58]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.decomposition import PCA
import numpy as np
import pandas as pd

In [59]:
data = np.array([[2.5, 2.4, 3.0, 4.0],
                 [0.5, 0.7, 1.0, 2.0],
                 [2.2, 2.9, 2.8, 3.9],
                 [1.9, 2.2, 3.1, 3.7],
                 [3.1, 3.0, 3.4, 4.1]])

In [60]:
scaler = StandardScaler()
data_standardized = scaler.fit_transform(data)

In [61]:
pca = PCA(n_components=2)
data_pca = pca.fit_transform(data_standardized)

In [62]:
print("Original Data Shape:", data.shape)
print("PCA Transformed Data Shape:", data_pca.shape)
print("Explained Variance Ratio:", pca.explained_variance_ratio_)

Original Data Shape: (5, 4)
PCA Transformed Data Shape: (5, 2)
Explained Variance Ratio: [0.96067338 0.01999825]


In [63]:
# Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
# Extraction? Provide an example to illustrate this concept.

In [64]:
# PCA as Feature Extraction: PCA transforms the original features into new features (principal components) that are linear 
# combinations of the original features, capturing the directions of maximum variance.

# Feature Extraction: PCA reduces the dimensionality of the data while retaining the most important information, making it
# easier to analyze and model.

In [65]:
data = np.random.rand(100, 10)

In [66]:
scaler = StandardScaler()
data_standardized = scaler.fit_transform(data)

In [67]:
pca = PCA(n_components=3)
data_pca = pca.fit_transform(data_standardized)

In [68]:
print("Original Data Shape:", data.shape)
print("PCA Transformed Data Shape:", data_pca.shape)
print("Explained Variance Ratio:", pca.explained_variance_ratio_)

Original Data Shape: (100, 10)
PCA Transformed Data Shape: (100, 3)
Explained Variance Ratio: [0.15092141 0.13029773 0.12525747]


In [69]:
# Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
# contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
# preprocess the data.

In [70]:
# Min-Max scaling is a simple yet effective preprocessing technique used to normalize features such as price, rating, 
# and delivery time in a recommendation system for a food delivery service. By scaling these features to a [0, 1] range, 
# you ensure that they contribute uniformly to the model, improving its performance and reliability.

In [71]:
data = {
    'price': [5, 10, 15, 20, 25],
    'rating': [2, 3, 5, 4, 5],
    'delivery_time': [30, 20, 40, 50, 25]
}

In [72]:
df = pd.DataFrame(data)

In [73]:
scaler = MinMaxScaler()

In [74]:
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

In [75]:
print("\nScaled Data:")
print(df_scaled)


Scaled Data:
   price    rating  delivery_time
0   0.00  0.000000       0.333333
1   0.25  0.333333       0.000000
2   0.50  1.000000       0.666667
3   0.75  0.666667       1.000000
4   1.00  1.000000       0.166667


In [76]:
# Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
# features, such as company financial data and market trends. Explain how you would use PCA to reduce the
# dimensionality of the dataset.

In [77]:
# PCA is a powerful technique to reduce the dimensionality of a dataset when building a stock price prediction model. 
# By transforming the original features into a smaller set of principal components, you can retain most of the data’s 
# variance while simplifying the model, leading to more efficient and potentially more accurate predictions.

In [78]:
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [5, 4, 3, 2, 1],
    'feature3': [2, 3, 4, 5, 6],
    'feature4': [7, 8, 9, 10, 11]
}

In [79]:
df = pd.DataFrame(data)

In [80]:
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

In [81]:
pca = PCA(n_components=2)
pca_data = pca.fit_transform(scaled_data)

In [82]:
print("Reduced Data:")
print(pca_data)

Reduced Data:
[[ 2.82842712e+00  3.64856517e-16]
 [ 1.41421356e+00 -1.21618839e-16]
 [-0.00000000e+00  0.00000000e+00]
 [-1.41421356e+00  1.21618839e-16]
 [-2.82842712e+00  2.43237678e-16]]


In [83]:
# Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
# values to a range of -1 to 1.

In [84]:
data = np.array([1, 5, 10, 15, 20])

In [85]:
def min_max_scale(data, new_min=-1, new_max=1):
    min_val = data.min()
    max_val = data.max()
    scaled_data = new_min + (data - min_val) * (new_max - new_min) / (max_val - min_val)
    return scaled_data

In [86]:
scaled_data = min_max_scale(data)

In [87]:
print("Scaled Data:", scaled_data)

Scaled Data: [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


In [88]:
# Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
# Feature Extraction using PCA. How many principal components would you choose to retain, and why?

In [89]:
# To perform Feature Extraction using PCA on a dataset with features [height, weight, age, gender, blood pressure]:

# Standardize the data.
# Compute the covariance matrix.
# Perform eigendecomposition.
# Transform the data into principal components.
# Select the principal components that explain at least 95% of the total variance.