### Question 1

In [None]:
'''
Min-Max scaling, also known as normalization, is a data preprocessing technique used to rescale the features of a dataset to a specific range, typically between 0 and 1. 
It ensures that all features have the same scale and prevents any particular feature from dominating the learning process due to its larger magnitude.

The formula to perform Min-Max scaling is as follows:
scaled_value = (value - min_value) / (max_value - min_value)

data = [[10], [5], [3], [2], [8]]
scaled_data = [[1.0], [0.44444444], [0.22222222], [0.11111111], [0.77777778]]

'''

### Question 2

In [None]:
'''
The Unit Vector technique, also known as normalization or L2 normalization, is a feature scaling method that rescales the feature vectors to have a Euclidean norm of 1. 
It ensures that all feature vectors have the same scale and direction, making them comparable in terms of their magnitudes.

The formula to perform Unit Vector scaling is as follows:
scaled_vector = vector / ||vector||

The main difference between Unit Vector scaling and Min-Max scaling is the range of the scaled values. While Min-Max scaling rescales the values to a specific range (e.g., between 0 and 1),
Unit Vector scaling focuses on normalizing the vectors' directions while keeping the relative magnitudes intact. Unit Vector scaling is commonly used when the magnitude of the feature values
is not as important as their relative orientations or angles.

Unit Vector scaling is often applied in text classification, document clustering, and recommendation systems, where the direction of feature vectors is significant in determining similarity or relevance.

'''

### Question 3

In [None]:
'''
Principal Component Analysis (PCA) is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional 
representation while preserving the most important information. It achieves this by identifying the principal components, which are linear 
combinations of the original features that capture the maximum variance in the data.
'''

In [2]:
from sklearn.decomposition import PCA
import numpy as np

# Create a sample dataset
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])

pca = PCA(n_components=2)

reduced_data = pca.fit_transform(data)

print(reduced_data)


[[-5.19615242e+00  2.56395025e-16]
 [ 0.00000000e+00  0.00000000e+00]
 [ 5.19615242e+00  2.56395025e-16]]


### Question 4

In [None]:
'''
PCA (Principal Component Analysis) can be used for feature extraction, which involves transforming the original features into a new set of derived features 
(principal components) that capture the most important information in the data.

The relationship between PCA and feature extraction lies in the fact that PCA identifies the directions (principal components) along which the data exhibits 
the maximum variance. These principal components can be seen as new features that are linear combinations of the original features. By selecting a subset of 
the principal components, we can effectively extract the most informative features from the dataset.

'''

In [3]:
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Create an instance of PCA
pca = PCA(n_components=2)

# Fit and transform the data using PCA
X_transformed = pca.fit_transform(X)

print(X_transformed.shape)
print(X_transformed[:5])


(150, 2)
[[-2.68412563  0.31939725]
 [-2.71414169 -0.17700123]
 [-2.88899057 -0.14494943]
 [-2.74534286 -0.31829898]
 [-2.72871654  0.32675451]]


### Question 5

In [None]:
'''
To preprocess the data for a recommendation system using Min-Max scaling:

Identify the range of each feature (price, rating, delivery time).
Apply Min-Max scaling to rescale the values of each feature to a range between 0 and 1.
Use the formula (X - X_min) / (X_max - X_min) to perform the scaling operation.
Implement Min-Max scaling using a library like scikit-learn.
The scaled data ensures that all features are on a similar scale, preventing any one feature from dominating the others and enabling fair comparison in the recommendation system.

'''

### Question 6

In [None]:
'''
To reduce the dimensionality of the stock price dataset using PCA:

Normalize the dataset by scaling each feature to have zero mean and unit variance.
Compute the covariance matrix of the normalized dataset.
Perform PCA by eigendecomposition of the covariance matrix to obtain the principal components.
Select a subset of the principal components based on their corresponding eigenvalues, which represent the variance explained by each component.
Project the original dataset onto the selected principal components, resulting in a lower-dimensional representation that captures the most significant information while reducing noise and redundancy.

'''

### Question  7

In [None]:
'''
values =  [1, 5, 10, 15, 20]
scaled_value = (x - min_value) / (max_value - min_value) * 2 - 1
scaled_values = [(-1.0, -0.5, 0.0, 0.5, 1.0)]

'''

### Question 8

In [None]:
'''
To determine the number of principal components to retain in PCA (Principal Component Analysis), you typically consider the explained variance ratio. 
The explained variance ratio tells us the proportion of the dataset's variance that is explained by each principal component. We aim to retain enough
principal components that capture a significant amount of the variance while discarding components that contribute very little.

approach to decide the number of principal components to retain:

Compute the covariance matrix or correlation matrix of the dataset, depending on whether the features are on different scales or not.
Perform the PCA on the dataset and obtain the eigenvalues and eigenvectors.
Calculate the explained variance ratio for each principal component by dividing its eigenvalue by the sum of all eigenvalues.
Plot the cumulative sum of the explained variance ratios.
Choose the number of principal components based on the level of explained variance desired. Commonly, a threshold of around 80% to 95% explained variance is chosen.

'''