In [None]:
Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.


Min-Max scaling, also known as normalization, is a data preprocessing technique that scales numerical features to 
a specific range, typically between 0 and 1. It is used to ensure that all features have the same scale, preventing 
some features from dominating others due to their magnitude.

Example:
    Suppose you have a feature "age" with values ranging from 0 to 100. 
    After Min-Max scaling, 
        these values will be transformed to the range [0, 1], preserving their relative proportions.

In [None]:
Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.


The unit vector technique, also known as vector normalization, scales features to have a length of 1 while preserving 
their direction. It is often used in machine learning algorithms that rely on the magnitude of vectors, like cosine similarity.

Example:-
    If you have a vector [3, 4], its unit vector would be [0.6, 0.8]. 
        The values are scaled, but the direction remains the same.

Differences:-
    Min-Max scaling scales features to a specific range, usually [0, 1], while the unit vector technique scales to unit length.
    Min-Max scaling is used to scale individual features, while the unit vector technique is used for scaling vectors.

In [None]:
Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.


PCA is a dimensionality reduction technique that identifies the most important linear combinations of features in a dataset. 
It is used to reduce the number of features while preserving as much variance as possible.

Example: 
    In a dataset with many correlated features (e.g., height, weight, age), PCA can create new features that capture the most 
    significant variance.

In [None]:
Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.


PCA is a feature extraction technique that transforms the original features into a set of orthogonal principal components. 
These components can be used as new features that often capture the most important information in the data.

Example: 
    In a dataset with features like height, weight, and age, PCA can be applied to extract principal components. 
    These components can represent combinations of the original features (e.g., "body size" component)
    and be used for further analysis or modeling.

In [None]:
Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.


For each feature:-

    (i). Calculate the minimum (min_val) and maximum (max_val) values within the dataset for that feature.

    (ii). For each data point, apply the Min-Max scaling formula:-
                Scaled_Value = (Value - min_val) / (max_val - min_val)

    (iii). Repeat this process for all features, ensuring that they are all scaled to the [0, 1] range.

In [None]:
Q6. You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.


To reduce the dimensionality of the dataset for stock price prediction, you can use PCA as follows:-

(i). Standardize the features:-
    Ensure that all features have a mean of 0 and standard deviation of 1.

(ii). Calculate the covariance matrix:-
    Compute the covariance matrix of the standardized features.

(iii). Perform PCA:-
    Calculate the eigenvectors and eigenvalues of the covariance matrix.

(iv). Select the number of components:-
    Decide how many principal components to retain based on the cumulative explained variance. 

(v). Transform the data:-
    Project the original data onto the selected principal components to create a new dataset with reduced dimensionality.

In [None]:
Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.

In [4]:
import numpy as np

data = np.array([1, 5, 10, 15, 20])

min_range = -1
max_range = 1

min_val = data.min()
max_val = data.max()

# Perform Min-Max scaling
scaled_data = (data - min_val) / (max_val - min_val) * (max_range - min_range) + min_range

print(["{:.2f}".format(i) for i in scaled_data])

['-1.00', '-0.58', '-0.05', '0.47', '1.00']


In [None]:
Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

In [1]:
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder

# Sample dataset
data = {
    'height': [165, 175, 160, 180, 170],
    'weight': [60, 70, 55, 75, 65],
    'age': [30, 25, 35, 40, 28],
    'gender': ['Male', 'Male', 'Female', 'Male', 'Female'],
    'blood_pressure': ['120/80', '130/85', '115/75', '140/90', '125/78']
}

df = pd.DataFrame(data)

encoder = OneHotEncoder()
gender_encoded = encoder.fit_transform(df[['gender']]).toarray()

scaler = StandardScaler()
numerical_features = scaler.fit_transform(df[['height', 'weight', 'age']])

X = pd.DataFrame(data=numerical_features, columns=['height', 'weight', 'age'])
X['gender_Male'] = gender_encoded[:, 0]
X['gender_Female'] = gender_encoded[:, 1]

# Perform PCA
pca = PCA()
pca.fit(X)

# Calculate explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_

# Calculate cumulative explained variance
cumulative_explained_variance = explained_variance_ratio.cumsum()

# Determine the number of components to retain 
n_components = (cumulative_explained_variance < 0.95).sum() + 1  # Add 1 for the first component
print(f"Number of components to retain: {n_components}")

Number of components to retain: 3


In [None]:
The number of principal components to retain in PCA depends on your specific goals and the variance explained
by each principal component. Generally, you want to retain enough principal components to capture a high
percentage of the total variance in the data while reducing dimensionality.