### Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

Min-Max scaling is a data preprocessing technique that transforms numerical features to a specific range, usually [0, 1]. It is achieved by subtracting the minimum value of the feature and then dividing by the range (difference between maximum and minimum values). The formula for Mi
n-Max scaling is:

Scaled Value = (Original Value −Min Value)/(Max Value−Min Value)

In [None]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Example DataFrame
data = {'Feature1': [2, 5, 10, 15, 20]}
df = pd.DataFrame(data)

# Apply Min-Max scaling
scaler = MinMaxScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

print("Original DataFrame:")
print(df)
print("\nDataFrame after Min-Max scaling:")
print(df_scaled)

### Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?Provide an example to illustrate its application.

Unit Vector scaling scales each feature by dividing it by its magnitude (L2-norm or Euclidean norm), ensuring that the scaled feature vector has a length of 1. The formula for Unit Vector scaling is:
Scaled Value=Original Value /sqrt(Sum of Squares of All Values)

In [None]:
import pandas as pd
from sklearn.preprocessing import Normalizer

# Example DataFrame
data = {'Feature1': [2, 5, 10, 15, 20]}
df = pd.DataFrame(data)

# Apply Unit Vector scaling
scaler = Normalizer(norm='l2')  # 'l2' for Euclidean norm
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

print("Original DataFrame:")
print(df)
print("\nDataFrame after Unit Vector scaling:")
print(df_scaled)

### Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms the original features into a new set of uncorrelated features called principal components. These components are ordered by the amount of variance they capture, allowing for dimensionality reduction while retaining the most significant information

In [None]:
from sklearn.decomposition import PCA
import pandas as pd

# Example DataFrame
data = {'Feature1': [2, 5, 10, 15, 20], 'Feature2': [1, 3, 7, 12, 18]}
df = pd.DataFrame(data)

# Apply PCA for dimensionality reduction to 1 component
pca = PCA(n_components=1)
df_pca = pd.DataFrame(pca.fit_transform(df), columns=['Principal Component'])

print("Original DataFrame:")
print(df)
print("\nDataFrame after PCA:")
print(df_pca)

### Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

PCA is a technique for Feature Extraction, where it transforms the original features into a set of principal components. These principal components represent linear combinations of the original features and are ordered by the amount of variance they capture. By selecting a subset of principal components, you effectively perform feature extraction while retaining the most critical information in the data.

In [None]:
from sklearn.decomposition import PCA
import pandas as pd

# Example DataFrame
data = {'Feature1': [2, 5, 10, 15, 20], 'Feature2': [1, 3, 7, 12, 18]}
df = pd.DataFrame(data)

# Apply PCA for feature extraction to 1 component
pca = PCA(n_components=1)
df_pca = pd.DataFrame(pca.fit_transform(df), columns=['Principal Component'])

print("Original DataFrame:")
print(df)
print("\nDataFrame after PCA for Feature Extraction:")
print(df_pca)

### Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

n the context of a recommendation system for a food delivery service, Min-Max scaling can be applied to features like price, rating, and delivery time to ensure that all these features are on the same scale. Here's a step-by-step explanation:

Understand the Features:
Identify the numerical features in your dataset that need scaling. In this case, it might be features like price, rating, and delivery time.

Apply Min-Max Scaling:
Use Min-Max scaling to transform each feature to the range [0, 1]. This ensures that no single feature dominates due to its larger scale.
Use a Min-Max scaler from a library like scikit-learn

### Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

In the context of predicting stock prices with a dataset containing many features, PCA can be used for dimensionality reduction. Here's how you would apply PCA:

Standardize the Data:
Before applying PCA, it's essential to standardize the data, ensuring that all features have a mean of 0 and a standard deviation of 1. This is crucial as PCA is sensitive to the scale of the features.

Apply PCA:
Use PCA to transform the standardized dataset into principal components. These components represent linear combinations of the original features, capturing the maximum variance in the data.

Determine the Number of Components:
Analyze the cumulative explained variance ratio to determine the optimal number of principal components to retain. You may choose a threshold (e.g., 95% variance explained) to decide how many components to keep.

Reduce Dimensionality:
Retain the selected number of principal components and discard the rest. This reduces the dimensionality of the dataset while preserving the most critical information.

In [None]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Example DataFrame with financial and market trend features
data = {'Feature1': [...], 'Feature2': [...], 'Feature3': [...], ...}
df = pd.DataFrame(data)

# Step 1: Standardize the data
scaler = StandardScaler()
df_standardized = scaler.fit_transform(df)

# Step 2: Apply PCA
pca = PCA()
df_pca = pca.fit_transform(df_standardized)

# Step 3: Determine the number of components to retain
cumulative_variance_ratio = np.cumsum(pca.explained_variance_ratio_)
# Choose the number of components that explain a desired amount of variance

# Step 4: Reduce dimensionality
selected_components = 3  # Example: Choose based on the analysis in Step 3
df_reduced = df_pca[:, :selected_components]

### Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.

Min-Max scaling is applied using the formula:
Scaled Value =(Original Value −Min Value)/(Max Value−Min Value)

For the given dataset, [1, 5, 10, 15, 20], the Min-Max scaling is performed as follows:
Identify Min and Max values: Min = 1, Max = 20.
Apply Min-Max scaling for each value.

In [None]:
import numpy as np

data = [1, 5, 10, 15, 20]

# Step 1: Identify Min and Max values
min_value = np.min(data)
max_value = np.max(data)

# Step 2: Apply Min-Max scaling
scaled_values = [((x - min_value) / (max_value - min_value)) * 2 - 1 for x in data]

print("Original values:", data)
print("Min-Max scaled values (-1 to 1):", scaled_values)

### Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?

In this scenario, PCA can be applied to extract principal components from the dataset. The number of principal components to retain depends on the variance explained by these components. Here's how you might proceed:

Standardize the Data:
Standardize the features to have zero mean and unit variance.

Apply PCA:
Use PCA to transform the standardized dataset into principal components.

Analyze Explained Variance:
Examine the cumulative explained variance ratio to determine how much of the total variance is captured by each principal component.

Choose the Number of Components:
Select the number of principal components that explain a sufficiently high percentage of the total variance. A common threshold is to retain components that collectively explain at least 95% of the variance.

In [None]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Example DataFrame with features
data = {'height': [...], 'weight': [...], 'age': [...], 'gender': [...], 'blood_pressure': [...]}

df = pd.DataFrame(data)

# Step 1: Standardize the data
scaler = StandardScaler()
df_standardized = scaler.fit_transform(df)

# Step 2: Apply PCA
pca = PCA()
df_pca = pca.fit_transform(df_standardized)

# Step 3: Analyze explained variance
cumulative_variance_ratio = np.cumsum(pca.explained_variance_ratio_)

# Step 4: Choose the number of components based on explained variance
selected_components = np.argmax(cumulative_variance_ratio >= 0.95) + 1

print("Number of components to retain:", selected_components)