### Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

Min-Max scaling is a data preprocessing technique used to transform numerical features to a specific range, typically between 0 and 1. The transformation is done by subtracting the minimum value of the feature and then dividing by the range (the difference between the maximum and minimum values). The formula is as follows:

Scaled Value
    =
Original Value
−
Min Value
Max Value
−
Min Value
Scaled Value= 
Max Value−Min Value
Original Value−Min Value
​
 

This scaling method is useful when the features have different ranges, and it ensures that all features contribute equally to the model.

In [1]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Original data
data = np.array([[1.0, 5.0],
                 [10.0, 15.0],
                 [20.0, 25.0]])

# Apply Min-Max scaling
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(data)

print("Original Data:")
print(data)
print("\nScaled Data:")
print(scaled_data)


Original Data:
[[ 1.  5.]
 [10. 15.]
 [20. 25.]]

Scaled Data:
[[0.         0.        ]
 [0.47368421 0.5       ]
 [1.         1.        ]]


### Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

The Unit Vector technique in feature scaling involves scaling each feature by dividing it by its magnitude (Euclidean norm). The formula for unit vector scaling is:

Unit Vector
    =
Original Value
∥
Original Value
∥
Unit Vector= 
∥Original Value∥
Original Value
​
 

This technique ensures that the scaled values lie on the unit circle, and the direction of the vector remains unchanged.

Difference from Min-Max Scaling:

Min-Max scaling scales the values based on their range, bringing them within a specific range (e.g., 0 to 1).
Unit vector scaling focuses on the direction of the vector, ensuring that all vectors have the same direction but possibly different magnitudes.

In [2]:
from sklearn.preprocessing import Normalizer

# Original data
data = np.array([[1.0, 5.0],
                 [10.0, 15.0],
                 [20.0, 25.0]])

# Apply Unit Vector scaling
scaler = Normalizer()
unit_vector_scaled_data = scaler.fit_transform(data)

print("Original Data:")
print(data)
print("\nUnit Vector Scaled Data:")
print(unit_vector_scaled_data)


Original Data:
[[ 1.  5.]
 [10. 15.]
 [20. 25.]]

Unit Vector Scaled Data:
[[0.19611614 0.98058068]
 [0.5547002  0.83205029]
 [0.62469505 0.78086881]]



### Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

PCA (Principal Component Analysis) is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while retaining as much variance as possible. It achieves this by identifying the principal components, which are linear combinations of the original features.

In [3]:
from sklearn.decomposition import PCA

# Original data
data = np.array([[1.0, 2.0, 3.0],
                 [4.0, 5.0, 6.0],
                 [7.0, 8.0, 9.0]])

# Apply PCA for dimensionality reduction
pca = PCA(n_components=2)
transformed_data = pca.fit_transform(data)

print("Original Data:")
print(data)
print("\nTransformed Data (After PCA):")
print(transformed_data)


Original Data:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

Transformed Data (After PCA):
[[ 5.19615242  0.        ]
 [-0.          0.        ]
 [-5.19615242  0.        ]]


### Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

Relationship with Feature Extraction:

PCA is a technique for both dimensionality reduction and feature extraction.
In the context of feature extraction, PCA identifies a new set of features (principal components) that capture the maximum variance in the data.

In [4]:
# Assume 'data' is the original dataset
pca = PCA(n_components=2)
transformed_data = pca.fit_transform(data)

print("Original Data:")
print(data)
print("\nTransformed Data (Principal Components):")
print(transformed_data)


Original Data:
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

Transformed Data (Principal Components):
[[ 5.19615242  0.        ]
 [-0.          0.        ]
 [-5.19615242  0.        ]]


### Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data. 

In the context of building a recommendation system for a food delivery service, you can use Min-Max scaling to preprocess the data, ensuring that features like price, rating, and delivery time are on a similar scale. This allows the model to consider each feature equally when making recommendations.

### Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

### Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.


In [5]:
from sklearn.preprocessing import MinMaxScaler

# Original dataset
values = np.array([1, 5, 10, 15, 20]).reshape(-1, 1)

# Apply Min-Max scaling to transform values to a range of -1 to 1
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_values = scaler.fit_transform(values)

print("Original Values:")
print(values)
print("\nScaled Values (Min-Max Scaling -1 to 1):")
print(scaled_values)


Original Values:
[[ 1]
 [ 5]
 [10]
 [15]
 [20]]

Scaled Values (Min-Max Scaling -1 to 1):
[[-1.        ]
 [-0.57894737]
 [-0.05263158]
 [ 0.47368421]
 [ 1.        ]]


### Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?

In [6]:
from sklearn.decomposition import PCA

# Original dataset
features = np.array([[170, 65, 30, 1, 120],
                     [160, 55, 28, 0, 110],
                     [180, 75, 35, 1, 130]])

# Apply PCA for feature extraction
pca = PCA(n_components=3)
principal_components = pca.fit_transform(features)

print("Original Features:")
print(features)
print("\nPrincipal Components after PCA:")
print(principal_components)
print("\nExplained Variance Ratio:")
print(pca.explained_variance_ratio_)


Original Features:
[[170  65  30   1 120]
 [160  55  28   0 110]
 [180  75  35   1 130]]

Principal Components after PCA:
[[-1.89049746e-01  1.03700111e+00  8.29611047e-16]
 [-1.75831427e+01 -5.26818033e-01  8.29611047e-16]
 [ 1.77721924e+01 -5.10183078e-01  8.29611047e-16]]

Explained Variance Ratio:
[9.97425752e-01 2.57424785e-03 3.29483532e-33]
