### Q1: What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.

A1. **Min-Max Scaling** is a feature scaling technique that transforms the data by scaling each feature to a given range, usually 0 to 1. This is achieved using the following formula:
$$
[ X' = \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}} ]
$$
Where:
- X is the original value.
- X_min is the minimum value of the feature.
- X_max is the maximum value of the feature.
- X' is the scaled value.



In [8]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

data = np.array([[1], [5], [10], [15], [20]])
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(data)

print("Original Data:\n", data)
print("Scaled Data:\n", scaled_data)

Original Data:
 [[ 1]
 [ 5]
 [10]
 [15]
 [20]]
Scaled Data:
 [[0.        ]
 [0.21052632]
 [0.47368421]
 [0.73684211]
 [1.        ]]



### Q2: What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.

A2. **Unit Vector Scaling** (or normalization) scales each feature vector to have a unit norm (e.g., L2 norm). The formula for L2 normalization is:

$$ [ X' = \frac{X}{\|X\|_2} ] $$

Where:
$$ ( \|X\|_2 ) $$ 
is the L2 norm of the vector (X)


In [7]:

from sklearn.preprocessing import Normalizer

data = np.array([[1, 2], [3, 4], [5, 6]])
normalizer = Normalizer(norm='l2')
normalized_data = normalizer.fit_transform(data)

print("Original Data:\n", data)
print("Normalized Data:\n", normalized_data)


Original Data:
 [[1 2]
 [3 4]
 [5 6]]
Normalized Data:
 [[0.4472136  0.89442719]
 [0.6        0.8       ]
 [0.6401844  0.76822128]]



**Difference:**
- **Min-Max Scaling** scales the data to a fixed range, usually [0, 1].
- **Unit Vector Scaling** scales the data to have unit norm, often used for directional data.

### Q3: What is PCA (Principal Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.

A3. **Principal Component Analysis (PCA)** is a technique used to reduce the dimensionality of a dataset while preserving as much variance as possible. PCA transforms the data into a set of linearly uncorrelated components called principal components. The first principal component captures the most variance, the second the second most, and so on.


In [6]:

from sklearn.decomposition import PCA
import numpy as np

# Sample data
data = np.array([[2.5, 2.4],
                 [0.5, 0.7],
                 [2.2, 2.9],
                 [1.9, 2.2],
                 [3.1, 3.0],
                 [2.3, 2.7],
                 [2, 1.6],
                 [1, 1.1],
                 [1.5, 1.6],
                 [1.1, 0.9]])

pca = PCA(n_components=1)
principal_components = pca.fit_transform(data)

print("Principal Components:\n", principal_components)


Principal Components:
 [[ 0.82797019]
 [-1.77758033]
 [ 0.99219749]
 [ 0.27421042]
 [ 1.67580142]
 [ 0.9129491 ]
 [-0.09910944]
 [-1.14457216]
 [-0.43804614]
 [-1.22382056]]


### Q4: What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.

A4. PCA is often used for feature extraction because it transforms the original features into a new set of features (principal components) that are uncorrelated and capture the maximum variance in the data. These principal components can be used as new features for a machine learning model.


In [5]:

from sklearn.decomposition import PCA
import numpy as np


data = np.array([[2.5, 2.4],
                 [0.5, 0.7],
                 [2.2, 2.9],
                 [1.9, 2.2],
                 [3.1, 3.0],
                 [2.3, 2.7],
                 [2, 1.6],
                 [1, 1.1],
                 [1.5, 1.6],
                 [1.1, 0.9]])

pca = PCA(n_components=1)
principal_components = pca.fit_transform(data)

print("Principal Components:\n", principal_components)

Principal Components:
 [[ 0.82797019]
 [-1.77758033]
 [ 0.99219749]
 [ 0.27421042]
 [ 1.67580142]
 [ 0.9129491 ]
 [-0.09910944]
 [-1.14457216]
 [-0.43804614]
 [-1.22382056]]




### Q5: You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.

A5. To preprocess the data using Min-Max scaling:


In [4]:

#1. Import necessary libraries
from sklearn.preprocessing import MinMaxScaler
import pandas as pd


#2. Prepare the dataset

data = pd.DataFrame({
        'Price': [10, 15, 20, 25, 30],
        'Rating': [3, 4, 5, 2, 1],
        'DeliveryTime': [30, 45, 20, 50, 40]
    })


#3. Apply Min-Max scaling

scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(data)
scaled_df = pd.DataFrame(scaled_data, columns=data.columns)
print(scaled_df)

   Price  Rating  DeliveryTime
0   0.00    0.50      0.333333
1   0.25    0.75      0.833333
2   0.50    1.00      0.000000
3   0.75    0.25      1.000000
4   1.00    0.00      0.666667



### Q6: You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.

A6. To use PCA for dimensionality reduction in predicting stock prices:


In [3]:
#1. Import necessary libraries

from sklearn.decomposition import PCA
import pandas as pd


#2. Prepare the dataset

data = pd.DataFrame({
        'Feature1': np.random.rand(100),
        'Feature2': np.random.rand(100),
        'Feature3': np.random.rand(100),
        'Feature4': np.random.rand(100),
        'Feature5': np.random.rand(100)
    })


#3. Apply PCA

pca = PCA(n_components=2)  # Reduce to 2 principal components
principal_components = pca.fit_transform(data)
pca_df = pd.DataFrame(principal_components, columns=['PC1', 'PC2'])
print(pca_df)

         PC1       PC2
0   0.145171  0.154351
1   0.140354  0.473175
2   0.327887  0.342398
3   0.334707  0.593629
4  -0.410961  0.265656
..       ...       ...
95 -0.326036 -0.650610
96 -0.029662  0.458845
97 -0.363441 -0.077084
98  0.202653  0.308292
99  0.282327  0.279806

[100 rows x 2 columns]







### Q7: For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.


In [1]:
#A7.

import numpy as np
from sklearn.preprocessing import MinMaxScaler

data = np.array([1, 5, 10, 15, 20]).reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_data = scaler.fit_transform(data)

print(scaled_data)


[[-1.        ]
 [-0.57894737]
 [-0.05263158]
 [ 0.47368421]
 [ 1.        ]]



### Q8: For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?

A8. To perform feature extraction using PCA:

In [2]:
#Import necessary libraries

from sklearn.decomposition import PCA
import pandas as pd
import numpy as np

# Prepare the dataset
data = pd.DataFrame({
    'Height': np.random.rand(100),
    'Weight': np.random.rand(100),
    'Age': np.random.rand(100),
    'Gender': np.random.randint(0, 2, 100),
    'BloodPressure': np.random.rand(100)
    })
# Determine the number of principal components
pca = PCA().fit(data)
explained_variance = np.cumsum(pca.explained_variance_ratio_)
print(explained_variance)

# Choose the number of components

pca = PCA(n_components=2)
principal_components = pca.fit_transform(data)
print(principal_components)

[0.44637188 0.64108974 0.7663485  0.8877167  1.        ]
[[-0.47630511  0.05251609]
 [ 0.55898603  0.35178451]
 [ 0.5815246  -0.05431049]
 [ 0.53586332 -0.11176342]
 [-0.48307167 -0.00769832]
 [-0.40071401  0.49961394]
 [ 0.52552511  0.20260653]
 [ 0.51901375 -0.29069652]
 [-0.44914701  0.10796691]
 [-0.43513098  0.20559456]
 [ 0.49021828 -0.41375791]
 [-0.4189681   0.36698166]
 [ 0.51002035  0.1570661 ]
 [ 0.48433065 -0.28651128]
 [ 0.48056675 -0.36764779]
 [-0.50797246  0.07354827]
 [ 0.5123688   0.22143492]
 [-0.4636189  -0.06440498]
 [ 0.54600764 -0.21587076]
 [ 0.48885654 -0.31497601]
 [-0.50023497 -0.55212064]
 [-0.46928775  0.05182924]
 [-0.46425341 -0.00667797]
 [-0.50498423 -0.06111888]
 [ 0.47482471 -0.62505295]
 [ 0.52553084  0.17752567]
 [ 0.5138526  -0.18716527]
 [ 0.55927582  0.24016118]
 [-0.49641444 -0.58037852]
 [ 0.54973331  0.40883433]
 [-0.46101069  0.12182765]
 [-0.5142002  -0.4960013 ]
 [ 0.56308889  0.20530628]
 [ 0.4621888  -0.96919999]
 [ 0.55810281  0.22363077