# ANSWER 1
Min-Max scaling, also known as normalization, is a data preprocessing technique used to scale numerical features to a specific range, typically between 0 and 1. It transforms the original data so that the minimum value of the feature becomes 0, and the maximum value becomes 1, while preserving the relative relationships between the data points. Min-Max scaling is useful when features have different scales and you want to bring them all to a common scale.

In [1]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

data = {
    'Area': [800, 1200, 1600, 2000, 2500]
}

df = pd.DataFrame(data)

# Apply Min-Max scaling
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df)
print(scaled_data)

[[0.        ]
 [0.23529412]
 [0.47058824]
 [0.70588235]
 [1.        ]]


# ANSWER 2
The Unit Vector technique, also known as normalization, is a feature scaling method that scales the data to have a magnitude of 1 (unit length). It involves dividing each data point by the magnitude of the feature vector. This method is commonly used in machine learning algorithms that rely on distances between data points, such as clustering or nearest neighbors.
## Unit Vector technique Different from Min-Max Scaling:
The primary difference is that Min-Max scaling scales the data to a specific range (e.g., 0 to 1), while the Unit Vector technique scales the data so that all feature vectors have a magnitude of 1.

In [2]:
import pandas as pd
from sklearn.preprocessing import Normalizer
data = {'Math Score': [85, 90, 78, 92, 88],'English Score': [80, 88, 84, 90, 86]}

df = pd.DataFrame(data)

# Apply Unit Vector technique
normalizer = Normalizer()
normalized_data = normalizer.fit_transform(df)
print(normalized_data)

[[0.72819999 0.6853647 ]
 [0.71500667 0.69911763]
 [0.6804511  0.73279349]
 [0.71483403 0.69929416]
 [0.7151872  0.69893295]]


# ANSWER 3
PCA is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while retaining as much variance as possible. It achieves this by identifying the principal components, which are orthogonal directions in the original feature space, along which the data varies the most. These principal components are used to represent the data in a lower-dimensional space.

In [3]:
import pandas as pd
from sklearn.decomposition import PCA
data = {
    'Feature1': [1, 2, 3, 4, 5],
    'Feature2': [5, 4, 3, 2, 1],
    'Feature3': [10, 8, 6, 4, 2]
}

df = pd.DataFrame(data)

# Apply PCA for dimensionality reduction
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(df)
print(reduced_data)

[[ 4.89897949e+00  3.84592537e-16]
 [ 2.44948974e+00 -1.28197512e-16]
 [-0.00000000e+00 -0.00000000e+00]
 [-2.44948974e+00  1.28197512e-16]
 [-4.89897949e+00  2.56395025e-16]]


# ANSWER 4
PCA can be used for feature extraction by identifying the most important patterns (principal components) in the data. Instead of using the original features, we can represent the data using these principal components, which are a linear combination of the original features. This reduces the dimensionality of the data while preserving most of the variance.

In [4]:
data = {
    'X': [1, 2, 3, 4, 5],
    'Y': [2, 4, 6, 8, 10]
}

df = pd.DataFrame(data)

# Apply PCA for feature extraction
pca = PCA(n_components=1)
extracted_feature = pca.fit_transform(df)

print(extracted_feature)


[[ 4.47213595]
 [ 2.23606798]
 [-0.        ]
 [-2.23606798]
 [-4.47213595]]


# ANSWER 5
In the context of building a recommendation system for a food delivery service, Min-Max scaling can be used to preprocess features like "price," "rating," and "delivery time" to ensure that they are on a common scale for more effective recommendations.

In [5]:
data = {
    'price': [10, 20, 15, 25, 30],
    'rating': [4.2, 3.8, 4.5, 3.7, 4.0],
    'delivery_time': [20, 30, 25, 40, 35]
}

df = pd.DataFrame(data)

# Apply Min-Max scaling
scaler = MinMaxScaler()
scaled_data = scaler.fit_transform(df)

print(scaled_data)

[[0.    0.625 0.   ]
 [0.5   0.125 0.5  ]
 [0.25  1.    0.25 ]
 [0.75  0.    1.   ]
 [1.    0.375 0.75 ]]


# ANSWER 6
In the context of predicting stock prices, where the dataset contains numerous features related to company financial data and market trends, PCA can be used to reduce the dimensionality of the dataset. This can help in simplifying the model and potentially improve prediction performance.

In [6]:
data = {
    'financial_feature1': [1000, 2000, 1500, 2500, 3000],
    'financial_feature2': [500, 800, 700, 900, 1000],
    'market_trend1': [10, 15, 12, 20, 18],
    'market_trend2': [5, 8, 7, 9, 10]
}

df = pd.DataFrame(data)

# Apply PCA for dimensionality reduction
pca = PCA(n_components=2)
reduced_data = pca.fit_transform(df)

print(reduced_data)


[[ 1.03776078e+03 -3.85408150e+01]
 [-4.67454928e+00  1.94468133e+01]
 [ 5.04859079e+02  3.90680170e+01]
 [-5.14217515e+02 -1.66328920e-01]
 [-1.02372780e+03 -1.98076864e+01]]


# ANSWER 7

In [7]:
import numpy as np

data = [1, 5, 10, 15, 20]

# Calculate min and max values
min_val = np.min(data)
max_val = np.max(data)

# Apply Min-Max scaling
scaled_data = [(x - min_val) / (max_val - min_val) * 2 - 1 for x in data]

print(scaled_data)

[-1.0, -0.5789473684210527, -0.052631578947368474, 0.4736842105263157, 1.0]


# ANSWER 8
The number of principal components to retain depends on the desired level of dimensionality reduction and the percentage of variance explained by the components. In practice, we aim to retain principal components that collectively explain a significant portion of the data's variance, while still reducing the dimensionality.

A common approach is to choose the number of principal components that explain a sufficiently high percentage of the total variance, such as 95% or 99%. To determine the optimal number of components, we can use the cumulative explained variance plot.

In [8]:
data = {
    'height': [170, 165, 180, 175, 160],
    'weight': [70, 65, 80, 75, 60],
    'age': [30, 25, 35, 28, 32],
    'gender': [0, 1, 0, 1, 1],
    'blood_pressure': [120, 110, 130, 125, 115]}
df = pd.DataFrame(data)
pca = PCA()
pca.fit(df)
cumulative_variance = np.cumsum(pca.explained_variance_ratio_)
num_components_95_percent = np.argmax(cumulative_variance >= 0.95) + 1
num_components_99_percent = np.argmax(cumulative_variance >= 0.99) + 1

print("Number of components to retain 95% of variance:", num_components_95_percent)
print("Number of components to retain 99% of variance:", num_components_99_percent)

Number of components to retain 95% of variance: 2
Number of components to retain 99% of variance: 3
