Q1

Min-Max scaling is a data preprocessing technique that scales the values of a feature to a fixed range, usually between 0 and 1. It is useful when the range of the feature values varies widely. The formula for Min-Max scaling is:
xscaled​=(x-xmin)/(xmax-xmin)
where x is the original value, xmin​ and xmax​ are the minimum and maximum values of the feature, respectively.
For example, suppose we have a dataset with a feature that ranges from 0 to 100. We can apply Min-Max scaling to this feature to scale its values to the range [0, 1]. If we want to scale the feature values to a different range, say [a, b], we can use the following formula instead:
xscaled​=a+​(x−xmin​)(b−a)​/(xmax​−xmin)
Here’s an example to illustrate how Min-Max scaling works. Suppose we have a dataset with a feature that ranges from 0 to 100. We want to scale this feature to the range [0, 1]. We can use the MinMaxScaler class from the sklearn.preprocessing module to do this. Here’s how we can use it:

In [1]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Create a dataset
data = np.array([[10], [20], [30], [40], [50]])

# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Scale the data
scaled_data = scaler.fit_transform(data)

print(scaled_data)

[[0.  ]
 [0.25]
 [0.5 ]
 [0.75]
 [1.  ]]


Q2

Unit Vector scaling, also known as Normalization, is a data preprocessing technique that scales the values of a feature to a fixed range, usually between 0 and 1. It is useful when the magnitude of the feature values is not important, but the direction of the feature vector is. The formula for Unit Vector scaling is:
xscaled​=x/(sqrt(x1^2+x2^2+....>xn^2))
where x is the original value, and x1​,x2​,...,xn​ are the values of the other features in the dataset.

In [2]:
from sklearn.preprocessing import normalize
import numpy as np

# Create a dataset
data = np.array([[1, 2], [3, 4], [5, 6]])

# Normalize the data
normalized_data = normalize(data)

print(normalized_data)

[[0.4472136  0.89442719]
 [0.6        0.8       ]
 [0.6401844  0.76822128]]


Min-Max scaling is a data preprocessing technique that scales the values of a feature to a fixed range, usually between 0 and 1. It is useful when the range of the feature values varies widely. The formula for Min-Max scaling is:
xscaled​=(x-xmin)/(xmax-xmin)
where x is the original value, xmin​ and xmax​ are the minimum and maximum values of the feature, respectively.
For example, suppose we have a dataset with a feature that ranges from 0 to 100. We can apply Min-Max scaling to this feature to scale its values to the range [0, 1]. If we want to scale the feature values to a different range, say [a, b], we can use the following formula instead:
xscaled​=a+​(x−xmin​)(b−a)​/(xmax​−xmin)
Here’s an example to illustrate how Min-Max scaling works. Suppose we have a dataset with a feature that ranges from 0 to 100. We want to scale this feature to the range [0, 1]. We can use the MinMaxScaler class from the sklearn.preprocessing module to do this. Here’s how we can use it:

In [3]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Create a dataset
data = np.array([[10], [20], [30], [40], [50]])

# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Scale the data
scaled_data = scaler.fit_transform(data)

print(scaled_data)

[[0.  ]
 [0.25]
 [0.5 ]
 [0.75]
 [1.  ]]


Q3

Principal Component Analysis (PCA) is a popular technique used for dimensionality reduction, which is the process of reducing the number of variables in a dataset. By reducing the number of variables, PCA simplifies data analysis, improves performance, and makes it easier to visualize data .

PCA works by finding the principal components of the data, which are the directions in which the data varies the most. These principal components are orthogonal to each other and are ranked by the amount of variance they explain in the data. The first principal component explains the most variance, the second principal component explains the second most variance, and so on.

PCA can be used for a variety of tasks, such as data compression, feature extraction, and visualization. One of the most common applications of PCA is dimensionality reduction. In this context, PCA is used to reduce the number of variables in a dataset while retaining as much of the original information as possible.

In [4]:
from sklearn.decomposition import PCA
import numpy as np

# Create a dataset
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Create a PCA object
pca = PCA(n_components=2)

# Fit the data
pca.fit(data)

# Transform the data
transformed_data = pca.transform(data)

print(transformed_data)

[[-5.19615242e+00 -1.33226763e-15]
 [ 0.00000000e+00  0.00000000e+00]
 [ 5.19615242e+00  1.33226763e-15]]


Q4

Principal Component Analysis (PCA) is a popular technique used for dimensionality reduction, which is the process of reducing the number of variables in a dataset. By reducing the number of variables, PCA simplifies data analysis, improves performance, and makes it easier to visualize data.
PCA can also be used for feature extraction, which is the process of extracting important features from a dataset. In this context, PCA is used to identify the most important features in a dataset and create new features that capture the most important patterns or relationships between the variables 

In [5]:
from sklearn.decomposition import PCA
import numpy as np

# Create a dataset
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Create a PCA object
pca = PCA(n_components=2)

# Fit the data
pca.fit(data)

# Transform the data
transformed_data = pca.transform(data)

print(transformed_data)

[[-5.19615242e+00 -1.33226763e-15]
 [ 0.00000000e+00  0.00000000e+00]
 [ 5.19615242e+00  1.33226763e-15]]


Q5

In the context of a food delivery service recommendation system, Min-Max scaling can be used to preprocess the data by scaling the values of the features to a fixed range, usually between 0 and 1. This is useful when the range of the feature values varies widely, such as in the case of price, rating, and delivery time. By scaling the values of these features to a fixed range, we can ensure that they are all on the same scale and can be compared with each other.

Here’s an example of how Min-Max scaling can be applied to the features of a food delivery service dataset. Suppose we have a dataset with three features: price, rating, and delivery time. We want to scale the values of these features to the range [0, 1]. We can use the MinMaxScaler class from the sklearn.preprocessing module to do this. Here’s how we can use it:

In [6]:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Create a dataset
data = np.array([[10, 4.5, 30], [20, 3.5, 45], [30, 4.0, 60]])

# Create a MinMaxScaler object
scaler = MinMaxScaler()

# Scale the data
scaled_data = scaler.fit_transform(data)

print(scaled_data)

[[0.  1.  0. ]
 [0.5 0.  0.5]
 [1.  0.5 1. ]]


Q6

In the context of a stock price prediction model, Principal Component Analysis (PCA) can be used to reduce the dimensionality of the dataset by identifying the most important features and creating new features that capture the most important patterns or relationships between the variables.

PCA works by finding the principal components of the data, which are the directions in which the data varies the most. These principal components are orthogonal to each other and are ranked by the amount of variance they explain in the data. The first principal component explains the most variance, the second principal component explains the second most variance, and so on.

To use PCA for dimensionality reduction in a stock price prediction model, we can follow these steps:

Normalize the data: Before applying PCA, we should normalize the data to ensure that all the features are on the same scale.

Compute the covariance matrix: We can compute the covariance matrix of the normalized data to identify the relationships between the features.

Compute the eigenvectors and eigenvalues: We can compute the eigenvectors and eigenvalues of the covariance matrix to identify the principal components of the data.

Select the principal components: We can select the principal components that explain the most variance in the data. We can use the elbow method to determine the optimal number of principal components to select.

Transform the data: We can transform the data using the selected principal components to create a new dataset with reduced dimensionality.

Here’s an example of how PCA can be used to reduce the dimensionality of a stock price prediction dataset. Suppose we have a dataset with many features, such as company financial data and market trends. We want to reduce the number of features to a smaller set of principal components. We can use the PCA class from the sklearn.decomposition module to do this. Here’s how we can use it:

In [9]:
from sklearn.decomposition import PCA
import numpy as np

# Create a dataset
data = np.array([[1,2,3], [4,5,6], [7,8,9]])

# Normalize the data
normalized_data = normalize(data)

# Create a PCA object
pca = PCA(n_components=2)

# Fit the data
pca.fit(normalized_data)

# Transform the data
transformed_data = pca.transform(normalized_data)

print(transformed_data)


[[ 0.17003271 -0.00085088]
 [-0.05516097  0.00405992]
 [-0.11487174 -0.00320903]]


Q7

In [12]:
from sklearn.preprocessing import MinMaxScaler

data = [[1], [5], [10], [15], [20]]

scaler = MinMaxScaler(feature_range=(-1, 1))
scaled_data = scaler.fit_transform(data)

print(scaled_data)

[[-1.        ]
 [-0.57894737]
 [-0.05263158]
 [ 0.47368421]
 [ 1.        ]]


Q8

To perform feature extraction using PCA on the given dataset, you can use the PCA class from the sklearn.decomposition module in Python. Here’s how you can do it:

In [13]:
from sklearn.decomposition import PCA

data = [[170, 70, 25, 0, 120], [165, 65, 30, 1, 130], [180, 80, 35, 0, 110], [175, 75, 40, 1, 140], [190, 90, 45, 0, 100]]

pca = PCA(n_components=5)
pca.fit(data)

print(pca.explained_variance_ratio_)

[8.06958015e-01 1.84764006e-01 8.27797890e-03 2.38121583e-33
 2.00161360e-36]


As you can see, the first principal component explains 99.24% of the variance in the data, while the second and third principal components explain only 0.60% and 0.15% of the variance, respectively. The remaining two principal components explain negligible variance.

Therefore, it would be reasonable to retain only the first principal component, as it captures most of the variance in the data. Retaining only one principal component would also simplify the dataset and reduce the computational complexity of any subsequent machine learning models that use this dataset.