# Q1

Min-Max scaling is a data preprocessing technique used to scale the numerical features of a dataset to a specific range, typically between 0 and 1. It works by transforming each feature's values proportionally so that the minimum value becomes 0, and the maximum value becomes 1. The formula for Min-Max scaling is:

Scaled_value = (X - X_min) / (X_max - X_min)

where X is the original value, X_min is the minimum value of the feature, and X_max is the maximum value of the feature.

Example: Let's say we have a dataset of house prices, and the minimum and maximum prices in the dataset are 10,00,000 Rs and 50,00,000, respectively. We want to scale the prices using Min-Max scaling. If a particular house price is 20,00,000 Rs , the scaled value would be:

Scaled_value = (2000000 - 1000000 ) / (5000000 - 1000000) = 0.25

So, the scaled value for 2000000 would be 0.25.

# Q2 

The Unit Vector technique, also known as normalization or L2 normalization, is a feature scaling method used to transform data such that each data point lies on the surface of a unit circle (i.e., having a magnitude of 1). Unlike Min-Max scaling, it doesn't constrain the data to a specific range but normalizes the data based on the feature vector's magnitude.

The formula for Unit Vector scaling is:

Scaled_value = X / ||X||

where X is the original feature vector, and ||X|| is the Euclidean norm (magnitude) of the feature vector.

Example: Consider a dataset with two numerical features: age and income. Let's say a particular data point has age = 40 and income = 60,000 Rs. To perform Unit Vector scaling, we first calculate the magnitude of the feature vector:

||X|| = √(40^2 + 60000^2) ≈ 60005.78

Then, we calculate the scaled values for age and income:

Scaled_age = 40 / 60005.78 ≈ 0.0006666666
Scaled_income = 60000 / 60005.78 ≈ 0.9999444464

So, the scaled values for age and income would be approximately 0.00067 and 0.99994, respectively.

# Q3. 

PCA (Principal Component Analysis) is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while preserving the most important information in the data. It does this by identifying the principal components, which are orthogonal vectors that represent the directions of maximum variance in the data.

In PCA, the first principal component captures the most variance, followed by the second principal component, and so on. By projecting the data onto a subset of these principal components, we can reduce the dimensionality of the data while retaining most of its variance.

Example: Let's consider a dataset with three features: height, weight, and age. PCA would analyze the data to find the first principal component, which would be a linear combination of the original features. The first principal component might represent overall body size. The second principal component could represent age-related variations, and the third principal component could capture any remaining variance.

# Q4. 

PCA can be used for Feature Extraction, which involves transforming the original features into a new set of features (principal components) that represent the data's variability in a more efficient way. This process reduces the number of features while retaining most of the important information.

Example: Suppose we have a dataset of images represented by pixel values. Each image is 100x100 pixels, giving us 10,000 features per image. Applying PCA to this dataset would identify the principal components that represent the most significant patterns and variations in the images. We could then choose to retain only the top few principal components (e.g., 100 components) instead of the original 10,000 features. These selected principal components can then be used as the new feature set for further analysis or machine learning tasks.


# Q5. 

For the food delivery service recommendation system project, Min-Max scaling can be used to preprocess the numerical features such as price, rating, and delivery time.

1. Identify the minimum and maximum values for each numerical feature (e.g., price, rating, delivery time) in the dataset.
2. For each feature value X, apply the Min-Max scaling formula:

Scaled_value = (X - X_min) / (X_max - X_min)

3. Replace the original feature values with the scaled values in the dataset.


# Q6. 

For the stock price prediction project, PCA can be used to reduce the dimensionality of the dataset containing multiple features related to company financial data and market trends.

1. Standardize the numerical features in the dataset (e.g., mean centering and scaling to unit variance) to ensure all features have comparable scales.
2. Apply PCA to the standardized dataset to calculate the principal components and their corresponding variances.
3. Sort the principal components in descending order of variance to identify the components that explain the most variance in the data.
4. Choose the desired number of principal components (retained features) based on the explained variance ratio or a predefined threshold.

For example, if the dataset has 20 features, you might decide to retain the top 10 principal components, which represent the most significant patterns and trends in the stock market data. These 10 principal components will serve as the new, lower-dimensional feature set for building the prediction model.


# Q7. 

To perform Min-Max scaling on the dataset [1, 5, 10, 15, 20] and transform the values to a range of -1 to 1, follow these steps:

1. Identify the minimum and maximum values in the dataset: Min = 1, Max = 20.
2. Apply the Min-Max scaling formula for each value:

Scaled_value = (X - X_min) / (X_max - X_min)

3. Replace the original values with the scaled values.

Scaled_values = [(-1 - 1) / (20 - 1), (5 - 1) / (20 - 1), (10 - 1) / (20 - 1), (15 - 1) / (20 - 1), (20 - 1) / (20 - 1)]
Scaled_values = [-0.909, 0.182, 0.545, 0.909, 1.000]

The Min-Max scaled values in the range of -1 to 1 would be approximately [-0.909, 0.182, 0.545, 0.909, 1.000].


# Q8. 

To perform Feature Extraction using PCA on the dataset [height, weight, age, gender

, blood pressure], follow these steps:

1. Standardize the numerical features in the dataset (e.g., mean centering and scaling to unit variance) to ensure all features have comparable scales.
2. Apply PCA to the standardized dataset to calculate the principal components and their corresponding variances.
3. Sort the principal components in descending order of variance to identify the components that explain the most variance in the data.
4. Choose the desired number of principal components to retain based on the explained variance ratio or a predefined threshold.

The number of principal components to retain depends on the desired level of dimensionality reduction. A common approach is to retain enough principal components to explain a significant portion of the total variance in the data. For example, you may choose to retain principal components that explain at least 95% of the total variance.

After selecting the number of principal components, you would project the data onto these components to obtain the reduced feature set, which would be used for further analysis or modeling.