Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its 
application.



Min-Max scaling is a data normalization technique that rescales the features of a dataset to a specific range, typically between 0 and 1. It is used in data preprocessing to ensure that all features have a consistent scale, which can improve the performance of machine learning algorithms that are sensitive to the scale of features.

Example:
Let's say we have a dataset with a feature "Income" that ranges from $20,000 to $100,000. After applying Min-Max scaling, the values of this feature would be transformed to a range between 0 and 1, preserving the relative differences in values while ensuring they are on a consistent scale.

Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? 
Provide an example to illustrate its application.



The Unit Vector technique in feature scaling involves scaling each feature so that it has a unit norm (magnitude or length) of 1. This technique normalizes the vectors by dividing each feature vector by its magnitude. Unlike Min-Max scaling, which rescales features to a specific range, Unit Vector scaling focuses on the direction of the vectors rather than their absolute values.

Example:
Consider a dataset with two features, "Height" and "Weight." After applying Unit Vector scaling, each data point's feature vector is divided by its magnitude (Euclidean norm), ensuring that all feature vectors have a length of 1. This normalization technique is particularly useful in scenarios where the direction of the feature vectors is more important than their magnitude, such as in text classification using word embeddings.

Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an 
example to illustrate its application.



PCA (Principal Component Analysis) is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while preserving most of the variance in the data. It achieves this by finding the principal components, which are orthogonal vectors that represent the directions of maximum variance in the original data.

Example:
Let's say we have a dataset with multiple features such as age, income, education level, and spending habits. By applying PCA to this dataset, PCA will identify the principal components (linear combinations of the original features) that capture the most variance in the data. These principal components can be ranked by their importance, and we can choose to keep only the top components that explain a significant portion of the variance (e.g., 95%). The reduced dataset with fewer dimensions (features) can then be used for further analysis or modeling, reducing computational complexity and potential overfitting while retaining essential information from the original data.








Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature 
Extraction? Provide an example to illustrate this concept.



PCA is a method often used for feature extraction, where high-dimensional data is transformed into a lower-dimensional representation while preserving essential information. By identifying principal components, PCA effectively creates new features that capture the most significant variance in the original data.

Example:
Consider a dataset with numerous customer behavior features. Applying PCA, we derive principal components, which are combinations of the original features that best capture variance. These components serve as new, condensed features, aiding tasks like classification or visualization, while reducing computational complexity and noise.

Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset 
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to 
preprocess the data.



### To preprocess the data for building a recommendation system for a food delivery service using Min-Max scaling, follow these steps:

#### Understand the Data: 
Take a look at the dataset to understand the range and distribution of features such as price, rating, and delivery time.

#### Min-Max Scaling:
Min-Max scaling involves scaling features to a specific range, commonly between 0 and 1. Here's how you would apply it to each feature:

Price: Let's say prices range from $5 to $50. You would use the Min-Max scaling formula: Xscaled = (Xi - Xmin)/(Xmax-Xmin)

Rating: Ratings may be on a scale such as 1 to 5. Apply the same Min-Max scaling formula to scale the ratings between 0 and 1.

Delivery Time: If delivery times range from 10 minutes to 60 minutes, again, use Min-Max scaling to scale these values.

#### Apply to Dataset:
Implement the Min-Max scaling transformation to all relevant features in your dataset, ensuring they are scaled appropriately within the desired range.

Q6. You are working on a project to build a model to predict stock prices. The dataset contains many 
features, such as company financial data and market trends. Explain how you would use PCA to reduce the 
dimensionality of the dataset.



Prepare Data: Collect company financial and market trend data.<br>

Standardize Data: Ensure all features have a similar scale.<br>

Apply PCA: Use PCA to find important patterns in the data.<br>

Select Components: Keep principal components explaining most variance.<br>

Transform Data: Project data onto reduced-dimensional space.<br>

Train Model: Use transformed data for stock price prediction model.<br>








Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the 
values to a range of -1 to 1.



In [2]:
import numpy as np

# Define the dataset
data = np.array([1, 5, 10, 15, 20])

# Define the Min and Max values
min_val = np.min(data)
max_val = np.max(data)

# Define the range for scaling
new_min = -1
new_max = 1

# Perform Min-Max scaling
scaled_data = ((data - min_val) * (new_max - new_min) / (max_val - min_val)) + new_min

# Print the scaled dataset
print("Original Dataset:", data)
print("Scaled Dataset [-1 to 1]:", scaled_data)


Original Dataset: [ 1  5 10 15 20]
Scaled Dataset [-1 to 1]: [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform 
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

Standardize Data:
Standardize numerical features (height, weight, age, blood pressure) to have zero mean and unit variance.

Apply PCA:
Use PCA on the standardized features to find principal components.

Explained Variance:
Check the explained variance ratio for each component.

Cumulative Variance:
Calculate the cumulative explained variance.

Select Components:
Choose the number of principal components to retain based on the cumulative explained variance. Aim for a threshold like 90% or 95% to retain significant variance.








In [6]:
# from sklearn.decomposition import PCA
# from sklearn.preprocessing import StandardScaler
# import numpy as np

# # Define the dataset (features)
# features = np.array([[height1, weight1, age1, gender1, bp1],
#                      [height2, weight2, age2, gender2, bp2],
#                      ...,
#                      [heightn, weightn, agen, gendern, bpn]])

# # Standardize the features (excluding gender)
# scaler = StandardScaler()
# features_scaled = scaler.fit_transform(features[:, :-1])  # Exclude gender

# # Apply PCA
# pca = PCA()
# pca.fit(features_scaled)

# # Calculate explained variance ratio and cumulative explained variance
# explained_variance_ratio = pca.explained_variance_ratio_
# cumulative_variance = np.cumsum(explained_variance_ratio)

# # Determine number of components to retain (e.g., 95% of variance)
# n_components = np.argmax(cumulative_variance >= 0.95) + 1
