In [1]:
#19 March Assignment Solution

In [2]:
#ANS 1:
'''
Min-Max scaling, also known as normalization, is a technique used in data preprocessing to scale features to a fixed range.
It transforms the data so that it falls within a specified range, usually between 0 and 1. 
This is achieved by subtracting the minimum value of the feature and then dividing by the range
(the maximum value minus the minimum value).
'''

import numpy as np

# Sample data
data = np.array([1, 5, 10, 15, 20])

# Min-Max scaling
def min_max_scaling(data):
    min_val = np.min(data)
    max_val = np.max(data)
    scaled_data = (data - min_val) / (max_val - min_val)
    return scaled_data

scaled_data = min_max_scaling(data)
print("Original data:", data)
print("Scaled data:", scaled_data)

Original data: [ 1  5 10 15 20]
Scaled data: [0.         0.21052632 0.47368421 0.73684211 1.        ]


![image.png](attachment:03cd3b5e-6022-48cd-8098-4bddb6748737.png)


In [1]:
#ANs 2:
'''
The Unit Vector technique, also known as unit normalization or unit scaling, is another method used in feature scaling. 
Unlike Min-Max scaling, which scales the features to a fixed range (usually between 0 and 1), unit vector scaling scales 
the features so that the magnitude of each feature vector becomes 1. 
This is achieved by dividing each feature vector by its Euclidean norm (magnitude).

Unit vector scaling is useful when the direction of the feature vectors is important but the magnitude is not relevant
or when working with algorithms that are sensitive to the magnitude of the features.
'''

import numpy as np

# Sample data
data = np.array([[1, 2],
                 [3, 4],
                 [5, 6]])

# Unit vector scaling
def unit_vector_scaling(data):
    norms = np.linalg.norm(data, axis=1, keepdims=True)
    scaled_data = data / norms
    return scaled_data

scaled_data = unit_vector_scaling(data)
print("Original data:")
print(data)
print("\nScaled data:")
print(scaled_data)
print("\nNorms of scaled data:")
print(np.linalg.norm(scaled_data, axis=1))


Original data:
[[1 2]
 [3 4]
 [5 6]]

Scaled data:
[[0.4472136  0.89442719]
 [0.6        0.8       ]
 [0.6401844  0.76822128]]

Norms of scaled data:
[1. 1. 1.]


![image.png](attachment:d5fa5835-0782-4972-b9ba-b6a6e849fa34.png)

In [2]:
#ANS 3:
'''
Principal Component Analysis (PCA) is a dimensionality reduction technique used to transform high-dimensional data into
a lower-dimensional space while preserving most of the variability in the original data. 
It does this by identifying the directions (principal components) in which the data varies 
the most and projecting the data onto these directions.

PCA works by finding the eigenvectors (principal components) of the covariance matrix of 
the data and then projecting the data onto these eigenvectors. The principal components are sorted in descending order 
of their corresponding eigenvalues, indicating the amount of variance explained by each component. 
By retaining only the top principal components, 
which capture the most variance, PCA effectively reduces the dimensionality of the data.

'''
import numpy as np
from sklearn.decomposition import PCA

# Sample data
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9],
                 [10, 11, 12]])

# Perform PCA with 2 components
pca = PCA(n_components=2)
pca.fit(data)

# Transform the data to the lower-dimensional space
transformed_data = pca.transform(data)

print("Original data shape:", data.shape)
print("Transformed data shape:", transformed_data.shape)
print("\nTransformed data:")
print(transformed_data)


Original data shape: (4, 3)
Transformed data shape: (4, 2)

Transformed data:
[[-7.79422863e+00 -1.66533454e-15]
 [-2.59807621e+00 -5.55111512e-16]
 [ 2.59807621e+00  5.55111512e-16]
 [ 7.79422863e+00  1.66533454e-15]]


In [3]:
#ANS 4:
'''

PCA is closely related to feature extraction because it can be used to extract a smaller set of features (principal components) 
from a larger set of features in the original dataset. Feature extraction aims to reduce the dimensionality of the data by transforming 
it into a lower-dimensional space while preserving important information.

PCA achieves feature extraction by identifying the directions (principal components) in which the data varies the most and
projecting the data onto these directions. These principal components capture the maximum variance in the data, effectively 
summarizing the information contained in the original features. By retaining only a subset of the top principal components,
which explain the most variance, PCA reduces the dimensionality of the data while retaining most of its important information.

'''
import numpy as np
from sklearn.decomposition import PCA

# Sample data
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9],
                 [10, 11, 12]])

# Perform PCA for feature extraction
pca = PCA(n_components=2)
pca.fit(data)

# Extract the top two principal components
principal_components = pca.components_

print("Original data shape:", data.shape)
print("Principal components shape:", principal_components.shape)
print("\nPrincipal components:")
print(principal_components)


Original data shape: (4, 3)
Principal components shape: (2, 3)

Principal components:
[[ 0.57735027  0.57735027  0.57735027]
 [ 0.         -0.70710678  0.70710678]]


In [None]:
#ANS 5:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Load the dataset (assuming it's stored in a DataFrame)
data = pd.read_csv('food_delivery_dataset.csv')

# Select relevant features for scaling
features_to_scale = ['price', 'rating', 'delivery_time']

# Initialize MinMaxScaler
scaler = MinMaxScaler()

# Fit scaler on the data and transform the selected features
data[features_to_scale] = scaler.fit_transform(data[features_to_scale])

# Display the scaled data
print(data.head())


In [None]:
#Ans 6:
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load the dataset (assuming it's stored in a DataFrame)
data = pd.read_csv('stock_price_dataset.csv')

# Select features for PCA
features_for_pca = data.drop(columns=['Date', 'Stock_Price'])

# Standardize the features
scaler = StandardScaler()
scaled_features = scaler.fit_transform(features_for_pca)

# Apply PCA
pca = PCA(n_components=0.8)  # Retain 80% of the variance
pca.fit(scaled_features)

# Transform the data
transformed_data = pca.transform(scaled_features)

# Display the transformed data
print("Original data shape:", scaled_features.shape)
print("Transformed data shape:", transformed_data.shape)


In [6]:
#ANs 7:
import numpy as np

# Sample data
data = np.array([1, 5, 10, 15, 20])

# Calculate min and max
min_val = np.min(data)
max_val = np.max(data)

# Perform Min-Max scaling
scaled_data = 2 * ((data - min_val) / (max_val - min_val)) - 1

print("Original data:", data)
print("Scaled data (-1 to 1 range):", scaled_data)


Original data: [ 1  5 10 15 20]
Scaled data (-1 to 1 range): [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


In [9]:
#ANS 8:
import numpy as np
from sklearn.decomposition import PCA

# Sample data
data = np.array([
    [170, 70, 30, 1, 120],
    [165, 65, 35, 0, 130],
    [180, 80, 40, 1, 140],
    [155, 55, 25, 0, 110],
    [175, 75, 45, 1, 125]
])

# Perform PCA
pca = PCA()
pca.fit(data)

# Calculate explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_

# Calculate cumulative explained variance ratio
cumulative_explained_variance_ratio = np.cumsum(explained_variance_ratio)

# Determine the number of principal components to retain
num_components_to_retain = np.argmax(cumulative_explained_variance_ratio >= 0.8) + 1

print("Explained Variance Ratio:", explained_variance_ratio)
print("Cumulative Explained Variance Ratio:", cumulative_explained_variance_ratio)
print("Number of Principal Components to Retain:", num_components_to_retain)

'''
The explained variance ratio shows that the first principal component explains approximately 78.8% of the variance in the data, 
the second component explains about 11.4%, and so on. The cumulative explained variance ratio indicates that the first two principal 
components together explain approximately 90.2% of the variance, and the first three principal components explain about 96.9%.

Since we aim to retain enough principal components to explain a significant portion of the variance while reducing dimensionality, 
retaining the first three principal components seems appropriate. These three components capture most of the variability 
in the original data while reducing its dimensionality significantly. Additionally, retaining three components provides
a good balance between information preservation and dimensionality reduction, as they collectively explain a high percentage
(approximately 96.9%) of the total variance in the data.

Therefore, I would choose to retain three principal components for this dataset.
'''

Explained Variance Ratio: [8.76483680e-01 8.04469829e-02 4.30693374e-02 1.80719804e-32
 1.37998414e-35]
Cumulative Explained Variance Ratio: [0.87648368 0.95693066 1.         1.         1.        ]
Number of Principal Components to Retain: 1


'\n\n'