In [1]:
#Ans 1: Min-Max scaling, also known as feature scaling or normalization, is a technique used in data 
#  preprocessing to scale and transform the numeric features of a dataset to a specific range. The goal is to bring
#  all features to a common scale, typically between 0 and 1, by linearly transforming the values.
#     This scaling method is particularly useful when working with algorithms that are sensitive to the scale of
#     the input features, such as gradient-based optimization algorithms used in machine learning.

# Here's an example in Python to illustrate Min-Max scaling using the popular scikit-learn library:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Sample dataset
data = np.array([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0]])

# Instantiate the MinMaxScaler
scaler = MinMaxScaler()

# Fit the scaler on the data and transform it
scaled_data = scaler.fit_transform(data)

print("Original Data:")
print(data)
print("\nScaled Data:")
print(scaled_data)

Original Data:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

Scaled Data:
[[0.  0.  0. ]
 [0.5 0.5 0.5]
 [1.  1.  1. ]]


In [3]:
# Ans 2: The Unit Vector technique, also known as Unit Vector Scaling or Vector Normalization, involves
# scaling each data point to have a magnitude of 1. This is achieved by dividing each data point by its
# Euclidean norm. This technique ensures that all data points lie on the unit hypersphere, and it is
# particularly useful when the direction of the data points is more important than their magnitude.
# Min-Max scaling scales data to a specific range (e.g., between 0 and 1), whereas unit vector scaling
# maintains the direction of the data points while ensuring they have a magnitude of 1.
from sklearn.preprocessing import normalize
import numpy as np

# Sample dataset
data = np.array([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0]])

# Apply Unit Vector Scaling
unit_vector_scaled_data = normalize(data, axis=1, norm='l2')

print("Original Data:")
print(data)
print("\nUnit Vector Scaled Data:")
print(unit_vector_scaled_data)

Original Data:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

Unit Vector Scaled Data:
[[0.26726124 0.53452248 0.80178373]
 [0.45584231 0.56980288 0.68376346]
 [0.50257071 0.57436653 0.64616234]]


In [4]:
# Ans 3: PCA is a technique used for dimensionality reduction. It identifies the principal components, which are 
# linear combinations of the original features, and ranks them by their ability to capture the variance in the data.
# By selecting a subset of these principal components, one can reduce the dimensionality of the dataset while 
# retaining most of the important information.

from sklearn.decomposition import PCA
import numpy as np

# Sample dataset
data = np.array([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0]])

# Instantiate PCA with the desired number of components
pca = PCA(n_components=2)

# Fit PCA and transform the data
pca_transformed_data = pca.fit_transform(data)

print("Original Data:")
print(data)
print("\nPCA Transformed Data:")
print(pca_transformed_data)


Original Data:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

PCA Transformed Data:
[[-5.19615242e+00  2.56395025e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 5.19615242e+00  2.56395025e-16]]


In [None]:
# Ans 4: PCA is a form of feature extraction. In feature extraction, the goal is to transform the 
# original features into a new set of features, often fewer in number, while retaining the essential
# information present in the data. PCA achieves this by finding the principal components, which are
# linear combinations of the original features.

from sklearn.decomposition import PCA
import numpy as np

# Sample dataset
data = np.array([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0]])

# Instantiate PCA with the desired number of components
pca = PCA(n_components=1)

# Fit PCA and transform the data
feature_extracted_data = pca.fit_transform(data)

print("Original Data:")
print(data)
print("\nFeature Extracted Data:")
print(feature_extracted_data)


In [None]:
# Ans 5: In a recommendation system for a food delivery service, Min-Max scaling can be applied to features
# like price, rating, and delivery time. This ensures that each of these features is on a consistent scale,
# preventing one feature from dominating the others during the recommendation process. For example, you might 
# scale the price feature to a range between 0 and 1 as follows:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

# Sample dataset
data = pd.DataFrame({
    'price': [10, 20, 30, 15, 25],
    'rating': [4.5, 3.8, 4.2, 4.8, 3.5],
    'delivery_time': [25, 30, 20, 35, 28]
})

# Apply Min-Max scaling to the 'price' feature
scaler = MinMaxScaler()
data['price_scaled'] = scaler.fit_transform(data[['price']])

print("Original Data:")
print(data[['price', 'rating', 'delivery_time']])
print("\nScaled Data:")
print(data[['price_scaled', 'rating', 'delivery_time']])


In [None]:
Ans 6:In a stock price prediction project with a dataset containing numerous features, PCA can be employed
to reduce the dimensionality and extract the most significant features. This helps in simplifying the model and
reducing the risk of overfitting.
from sklearn.decomposition import PCA
import pandas as pd

# Sample dataset with multiple features
stock_data = pd.DataFrame(...)  # Assuming you have a DataFrame with financial and market trend data

# Apply PCA for dimensionality reduction
pca = PCA(n_components=10)  # Choose an appropriate number of components
reduced_data = pca.fit_transform(stock_data)

print("Original Data Shape:", stock_data.shape)
print("Reduced Data Shape:", reduced_data.shape)


In [None]:
data = np.array([1, 5, 10, 15, 20])

# Min-Max scaling to a range of -1 to 1
scaled_data = (2 * (data - np.min(data)) / (np.max(data) - np.min(data))) - 1

print("Original Data:", data)
print("Scaled Data:", scaled_data)


In [None]:
Choosing the number of principal components to retain depends on the desired level of variance retention.
One common approach is to retain enough components to explain a high percentage of the total variance,
e.g., 95% or 99%. You can analyze the explained variance ratio provided by PCA to make this decision.
from sklearn.decomposition import PCA

# Sample dataset
data = pd.DataFrame(...)  # Assuming you have a DataFrame with features

# Instantiate PCA
pca = PCA()

# Fit PCA and obtain the explained variance ratio
pca.fit(data)
explained_variance_ratio = pca.explained_variance_ratio_

# Choose the number of components to retain (e.g., 95% of the variance)
cumulative_variance_ratio = np.cumsum(explained_variance_ratio)
num_components = np.argmax(cumulative_variance_ratio >= 0.95) + 1

# Fit PCA with the selected number of components
pca = PCA(n_components=num_components)
feature_extracted_data = pca.fit_transform(data)

print("Number of Components Retained:", num_components)
print("Feature Extracted Data Shape:", feature_extracted_data.shape)
