Q1: What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.


* Min-Max scaling is a normalization technique that scales the features of a dataset to a fixed range, typically [0, 1] or [-1, 1].
* It is used in data preprocessing to ensure that features contribute equally to the model's performance and to avoid biases due to differing feature scales.


In [1]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[1], [5], [10], [15], [20]])
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)
print(scaled_data)


[[0.        ]
 [0.21052632]
 [0.47368421]
 [0.73684211]
 [1.        ]]


Q2: What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

* Unit Vector scaling scales each feature vector to have a unit norm (length of 1). This technique is useful when the direction of the data is more important than its magnitude.
* It differs from Min-Max scaling, which scales the data to a specific range.

In [2]:
from sklearn.preprocessing import normalize
import numpy as np

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
normalized_data = normalize(data, norm='l2')
print(normalized_data)


[[0.26726124 0.53452248 0.80178373]
 [0.45584231 0.56980288 0.68376346]
 [0.50257071 0.57436653 0.64616234]]


Q3: What is PCA (Principal Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

* PCA (Principal Component Analysis) is a statistical technique used to reduce the dimensionality of a dataset by transforming the data into a new set of orthogonal axes (principal components) that capture the maximum variance.
* It helps in reducing the number of features while retaining most of the original variance.

In [3]:
from sklearn.decomposition import PCA
import numpy as np

data = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0]])
pca = PCA(n_components=1)
reduced_data = pca.fit_transform(data)
print(reduced_data)


[[-0.44362444]
 [ 2.17719404]
 [-0.57071239]
 [ 0.12902465]
 [-1.29188186]]


Q4: What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.


* Relationship: PCA is a technique for Feature Extraction that transforms the original features into a new set of features (principal components) that capture the most variance in the data.
* Usage: PCA can be used for Feature Extraction by selecting the top principal components as the new features.

In [4]:
from sklearn.decomposition import PCA
import numpy as np

data = np.array([[2.5, 2.4], [0.5, 0.7], [2.2, 2.9], [1.9, 2.2], [3.1, 3.0]])
pca = PCA(n_components=2)
pca_features = pca.fit_transform(data)
print(pca_features)


[[-0.44362444 -0.20099093]
 [ 2.17719404 -0.05500992]
 [-0.57071239  0.36808609]
 [ 0.12902465  0.06747325]
 [-1.29188186 -0.17955849]]


Q5: You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.


**Steps to use Min-Max scaling:**
1. Import the necessary library: from sklearn.preprocessing import MinMaxScaler.
2. Load the data: Assume features are price, rating, and delivery_time.
3. Initialize the scaler: scaler = MinMaxScaler().
4. Fit and transform the data:

In [None]:
data = np.array([[price, rating, delivery_time]])
scaled_data = scaler.fit_transform(data)


Q6: You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.


**Steps to use PCA:**
1. Import the necessary library: from sklearn.decomposition import PCA.
2. Load the data: Assume data contains financial and market trend features.
3. Standardize the data: PCA works better with standardized data

In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_standardized = scaler.fit_transform(data)


4. Initialize PCA: Decide on the number of components.


In [None]:
pca = PCA(n_components=desired_components)


5. Fit and transform the data:


In [8]:
reduced_data = pca.fit_transform(data_standardized)


Q7: For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.


**Steps to perform Min-Max scaling:**
1. Import the necessary library: from sklearn.preprocessing import MinMaxScaler.
2. Load the data:

In [9]:
import numpy as np
data = np.array([[1], [5], [10], [15], [20]])


In [10]:
# 3.Initialize the scaler with range -1 to 1:
scaler = MinMaxScaler(feature_range=(-1, 1))



In [11]:
# 4.Fit and transform the data:
scaled_data = scaler.fit_transform(data)
print(scaled_data)


[[-1.        ]
 [-0.57894737]
 [-0.05263158]
 [ 0.47368421]
 [ 1.        ]]


Q8: For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?


**Steps to use PCA:**

1. Import the necessary library: from sklearn.decomposition import PCA.
2. Load the data: Assume data contains the features.
3. Standardize the data: PCA works better with standardized data.


In [12]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_standardized = scaler.fit_transform(data)


In [None]:
# 4.Initialize PCA: Choose the number of components.
pca = PCA(n_components=5)
pca.fit(data_standardized)
explained_variance = pca.explained_variance_ratio_
cumulative_variance = np.cumsum(explained_variance)


In [None]:
# 5.Choose the number of components: Retain the number of components that explain a significant portion of the variance (e.g., 95%).
num_components = np.where(cumulative_variance >= 0.95)[0][0] + 1


In [None]:
# 6.Fit and transform the data.
pca = PCA(n_components=num_components)
reduced_data = pca.fit_transform(data_standardized)
print(reduced_data)
