### Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

Min-Max scaling is a technique used to transform features by scaling them to a specified range, typically between 0 and 1. 

In [1]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Sample data
data = np.array([[1, 2], [3, 4], [5, 6]])

# Instantiate MinMaxScaler
scaler = MinMaxScaler()

# Fit and transform data
scaled_data = scaler.fit_transform(data)

print(scaled_data)


[[0.  0. ]
 [0.5 0.5]
 [1.  1. ]]


### Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

A2: The Unit Vector technique, also known as normalization, scales individual samples to have unit norm (i.e., length of 1). 

X_Normalized = X/||X||

 This technique scales each sample independently to have a unit norm, ensuring that the features are on the same scale but preserving the direction of the data. Unlike Min-Max scaling, normalization does not scale the features to a predefined range but rather ensures that each sample has a length of 1.

In [3]:
from sklearn.preprocessing import Normalizer
import numpy as np

# Sample data
data = np.array([[1, 2], [3, 4], [5, 6]])

# Instantiate Normalizer
normalizer = Normalizer()

# Fit and transform data
normalized_data = normalizer.fit_transform(data)

print(normalized_data)


[[0.4472136  0.89442719]
 [0.6        0.8       ]
 [0.6401844  0.76822128]]


### Q3. What is PCA (Principal Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

PCA (Principal Component Analysis) is a dimensionality reduction technique that transforms the original features into a new set of orthogonal (uncorrelated) features called principal components. These principal components are ordered by the amount of variance they explain in the data, with the first component explaining the most variance.

PCA works by identifying the directions (principal axes) in which the data varies the most and projecting the data onto these axes. This reduces the dimensionality of the data while preserving most of its variance, allowing for simpler and more efficient models.

In [4]:
from sklearn.decomposition import PCA
import numpy as np

# Sample data
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Instantiate PCA with 2 components
pca = PCA(n_components=2)

# Fit and transform data
transformed_data = pca.fit_transform(data)

print(transformed_data)


[[-5.19615242e+00  3.62353582e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 5.19615242e+00  3.62353582e-16]]


### Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

|PCA can be used for Feature Extraction by transforming the original features into a new set of principal components. These principal components are linear combinations of the original features and represent directions in the feature space where the data varies the most. By selecting a subset of the principal components, we can effectively extract the most informative features from the original dataset.

In [5]:
from sklearn.decomposition import PCA
import numpy as np

# Sample data
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Instantiate PCA with 2 components
pca = PCA(n_components=2)

# Fit and transform data
transformed_data = pca.fit_transform(data)

print(transformed_data)


[[-5.19615242e+00  3.62353582e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 5.19615242e+00  3.62353582e-16]]


### Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

### Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

In the context of predicting stock prices, PCA can be used to reduce the dimensionality of the dataset by identifying the most important underlying factors driving the variability in stock prices. Here's how you would use PCA:

Data Preparation: Gather the dataset containing features related to company financial data and market trends, such as earnings per share, price-to-earnings ratio, volume of trade, etc.

Data Preprocessing: Standardize the features to have zero mean and unit variance. This step ensures that all features are on the same scale and prevents features with larger magnitudes from dominating the principal components.

Apply PCA: Use PCA to transform the standardized dataset into its principal components. PCA will identify linear combinations of the original features that capture the maximum variance in the data.

Select Number of Components: Determine the number of principal components to retain based on the cumulative explained variance ratio. Retain enough components to capture a significant portion of the variance in the dataset while reducing its dimensionality.

Feature Reduction: Transform the dataset using the selected number of principal components. This reduces the dimensionality of the dataset while preserving most of the information.

Model Training: Use the reduced-dimensional dataset as input to train the predictive model for stock price prediction.

PCA helps in simplifying the dataset by reducing the number of features while retaining the most important information. It can uncover hidden patterns and relationships in the data, making it easier for the predictive model to learn and generalize.

### Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.

In [7]:
import numpy as np

# Given dataset
data = np.array([1, 5, 10, 15, 20])

# Compute min and max values
min_val = np.min(data)
max_val = np.max(data)

# Perform Min-Max scaling
scaled_data = (data - min_val) / (max_val - min_val) * 2 - 1

print(scaled_data)


[-1.         -0.57894737 -0.05263158  0.47368421  1.        ]
