In [1]:
# Ans 1

# Min-Max scaling is a normalization technique that enables us to scale data in a dataset to a specific range using each feature’s minimum and maximum value. 
# It is used in data preprocessing to scale the data between 0 and 1. This normalization helps us to understand the data easily. The transformation is given by:

# X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)) X_scaled = X_std * (max - min) + min

# where min, max = feature_range. This transformation is often used as an alternative to zero mean, unit variance scaling.

# For example, suppose we have a dataset with values ranging from 0 to 100. We can use Min-Max scaling to scale the data between 0 and 1. The formula for Min-Max scaling is:

# x_scaled = (x - x_min) / (x_max - x_min)

# where x is the original value, x_min is the minimum value in the dataset, and x_max is the maximum value in the dataset.

In [2]:
# Ans 2

# The Unit Vector technique in feature scaling is a normalization technique that scales the data so that each feature vector has a length of 1. 
# This is done by dividing each feature vector by its magnitude. This technique is useful when dealing with features with hard boundaries. 
# For example, when dealing with image data, the colors can range from only 0 to 255.

# The difference between Min-Max scaling and Unit Vector scaling is that Min-Max scaling scales the data between 0 and 1 using each feature’s minimum and maximum value. 
# On the other hand, Unit Vector scaling scales the data so that each feature vector has a length of 1.

# For example, suppose we have a dataset with values ranging from 0 to 100. We can use Min-Max scaling to scale the data between 0 and 1. 
# The formula for Min-Max scaling is:

# x_scaled = (x - x_min) / (x_max - x_min)

# where x is the original value, x_min is the minimum value in the dataset, and x_max is the maximum value in the dataset.

# On the other hand, suppose we have a dataset with two features: age and income. We can use Unit Vector scaling to scale the data so that each feature vector has a length of 1. 
# The formula for Unit Vector scaling is:

# x_scaled = x / ||x||

# where x is the original feature vector and ||x|| is its magnitude1.

In [3]:
# Ans 3

# Principal Component Analysis (PCA) is a statistical technique that is used for dimensionality reduction. 
# It is an unsupervised learning algorithm that identifies a set of orthogonal axes, called principal components, that capture the maximum variance in the data. 
# The principal components are linear combinations of the original variables in the dataset and are ordered in decreasing order of importance.

# PCA is used to reduce the dimensionality of a dataset while retaining as much of the original information as possible. 
# It is useful when dealing with high-dimensional data where the amount of data required to obtain a statistically significant result increases exponentially. 
# This can lead to issues such as overfitting, increased computation time, and reduced accuracy of machine learning models.

# For example, suppose we have a dataset with 10 features. We can use PCA to reduce the number of features while retaining as much of the original information as possible. 
# The result might be a new dataset with only 3 features that capture most of the variance in the original dataset.

In [4]:
# Ans 4

# PCA can be used for feature extraction. Feature extraction is a technique that involves transforming the original features of a dataset into a new set of features that are more informative and easier to work with.
# PCA is one of the most popular feature extraction techniques that is used to reduce the dimensionality of a dataset while retaining as much of the original information as possible.

# PCA works by identifying a set of orthogonal axes, called principal components, that capture the maximum variance in the data. 
# These principal components are linear combinations of the original variables in the dataset and are ordered in decreasing order of importance. 
# By selecting only the top k principal components, we can reduce the dimensionality of the dataset while retaining most of the original information.

# For example, suppose we have a dataset with 10 features. We can use PCA to extract the top 3 principal components from the dataset.
# These principal components will be linear combinations of the original features and will capture most of the variance in the dataset. 
# We can then use these principal components as new features in our machine learning model.

In [5]:
# # Ans 5

# To preprocess the data for a recommendation system for a food delivery service, we can use Min-Max scaling to scale the data between 0 and 1. 
# This will ensure that all the features are on the same scale and will prevent features with larger values from dominating the model.

# For example, suppose we have a dataset with features such as price, rating, and delivery time. 
# We can use Min-Max scaling to scale each feature between 0 and 1. The formula for Min-Max scaling is:

# x_scaled = (x - x_min) / (x_max - x_min)

# where x is the original value, x_min is the minimum value in the dataset, and x_max is the maximum value in the dataset.

# After scaling each feature between 0 and 1, we can then use these features to build our recommendation system.
# We can use techniques such as collaborative filtering or content-based filtering to recommend food items to users based on their preferences.

In [6]:
# Ans 6

# To reduce the dimensionality of the dataset for a model to predict stock prices, we can use PCA. 
# PCA is a technique that identifies a set of orthogonal axes, called principal components, that capture the maximum variance in the data. 
# These principal components are linear combinations of the original variables in the dataset and are ordered in decreasing order of importance.

# To use PCA to reduce the dimensionality of the dataset, we would first standardize the data by subtracting the mean and dividing by the standard deviation. 
# We would then compute the covariance matrix of the standardized data. We would then compute the eigenvectors and eigenvalues of the covariance matrix. 
# The eigenvectors represent the principal components, and the eigenvalues represent the amount of variance explained by each principal component.

# We would then select only the top k principal components that capture most of the variance in the data. 
# By selecting only these top k principal components, we can reduce the dimensionality of the dataset while retaining most of the original information.

# For example, suppose we have a dataset with many features such as company financial data and market trends. 
# We can use PCA to extract only the top k principal components from the dataset. 
# These principal components will be linear combinations of the original features and will capture most of the variance in the dataset. 
# We can then use these principal components as new features in our machine learning model to predict stock prices.

In [7]:
# Ans 7

from sklearn.preprocessing import MinMaxScaler
data=[1,5,10,15,20]
scaler=MinMaxScaler(feature_range=(-1,1))
scaled_data=scaler.fit_transform([[x] for x in data])

print(scaled_data)

[[-1.        ]
 [-0.57894737]
 [-0.05263158]
 [ 0.47368421]
 [ 1.        ]]


In [8]:
# Ans 8

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Sample data 
data = np.array([
    [170, 65, 30, 1, 120],
    [160, 55, 25, 0, 130],
    [175, 70, 35, 1, 140],
    # ... more data points ...
])

# Separate features (X) and target (y) if applicable
X = data[:, :-1]  # Exclude the last column (blood pressure)
y = data[:, -1]   # Target variable (blood pressure)

# Standardize the features (important for PCA)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Perform PCA
pca = PCA()
X_pca = pca.fit_transform(X_scaled)


explained_variance_ratio = pca.explained_variance_ratio_

cumulative_explained_variance = np.cumsum(explained_variance_ratio)


num_components_to_retain = np.argmax(cumulative_explained_variance >= 0.95) + 1

print("Explained Variance Ratio:", explained_variance_ratio)
print("Cumulative Explained Variance:", cumulative_explained_variance)
print("Number of Principal Components to Retain:", num_components_to_retain)


Explained Variance Ratio: [9.65314794e-01 3.46852064e-02 6.50151632e-34]
Cumulative Explained Variance: [0.96531479 1.         1.        ]
Number of Principal Components to Retain: 1


In [None]:
# A common rule of thumb is to retain enough principal components to explain at least 80% of the variance in the data.
# Hence we will choose 80% of 5 i.e 4 principal components as principal features.