### Q1]

Min-Max scaling is a data preprocessing technique used to rescale features to a fixed range, typically between 0 and 1. This normalization is achieved by applying the following formula to each feature X:

\[ X_{\text{scaled}} = \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}} \]

where:
- \( X \) is an individual data point in the feature.
- \( X_{\text{min}} \) is the minimum value of the feature.
- \( X_{\text{max}} \) is the maximum value of the feature.

**Example:**

Consider a dataset with feature values ranging from 50 to 200. Min-Max scaling would transform these values to a range between 0 and 1, preserving the relationships between the original data points.
oints.

### Q2]

The Unit Vector technique, also known as normalization, is a feature scaling method that transforms the values of a feature vector to have a length of 1. This is achieved by dividing each component of the vector by its magnitude (or length).

**Mathematical Representation:**

\[ X_{\text{normalized}} = \frac{X}{||X||} \]

Where:
- \( X \) is the original feature vector.
- \( ||X|| \) is the Euclidean norm (magnitude) of \( X \).

**Example:**

Consider a feature vector \( X = [3, 4] \). The Euclidean norm of \( X \) is \( ||X|| = \sqrt{3^2 + 4^2} = 5 \). Therefore, the normalized vector would be \( X_{\text{normalized}} = [\frac{3}{5}, \frac{4}{5}] = [0.6, 0.8] \).


### Q3]

PCA (Principal Component Analysis) is a statistical technique used for dimensionality reduction in machine learning and data analysis. It transforms a large set of potentially correlated variables into a smaller set of uncorrelated variables called principal components. These principal components capture the maximum variance in the original data while being orthogonal to each other.

**How PCA works:**

1. **Standardization:** The original data is standardized to have zero mean and unit variance. This ensures that all features contribute equally to the analysis, regardless of their original scales.

2. **Covariance Matrix Calculation:** The covariance matrix of the standardized data is computed. This matrix captures the relationships between all pairs of features.

3. **Eigen Decomposition:** The eigenvectors and eigenvalues of the covariance matrix are calculated. Eigenvectors represent the directions of the principal components, and eigenvalues represent the amount of variance explained by each component.

4. **Component Selection:** The eigenvectors are sorted based on their corresponding eigenvalues in descending order. The top \( k \) eigenvectors, corresponding to the \( k \) largest eigenvalues, are selected as the principal components.

5. **Projection:** The original data is projected onto the selected principal components, resulting in a lower-dimensional representation of the data.

**Example:**

Consider a dataset with two features, height and weight. These features might be correlated, meaning that taller people tend to weigh more. PCA can be used to reduce these two features into a single principal component that captures most of the variance in the data. This principal component would represent a combination of height and weight that best explains the overall variability in the dataset.


###Q4]

Feature extraction involves reducing the number of variables in a dataset while retaining the most important information. PCA (Principal Component Analysis) is a popular technique for feature extraction due to its ability to identify significant patterns in data efficiently.

**How PCA is used for Feature Extraction:**

1. **Dimensionality Reduction:** PCA creates new features (principal components) that are linear combinations of the original features. These principal components are ordered by the amount of variance they explain in the data. By selecting the top few principal components that capture most of the variance, PCA effectively reduces the dimensionality of the data.

2. **Decorrelation:** The principal components generated by PCA are uncorrelated, meaning they represent independent sources of information. This property is useful for removing redundancy and multicollinearity in the data.

3. **Feature Selection:** PCA can be used as a feature selection technique by choosing the principal components with the highest eigenvalues (variance explained). These components represent the most important features in the dataset.

**Example:**

Consider a dataset of customer purchase history with hundreds of features, such as items purchased, purchase amount, time of purchase, and customer demographics. PCA can be applied to extract the most relevant features that capture key patterns in customer buying behavior. The top few principal components might represent underlying factors like customer preferences for certain product categories, price sensitivity, or purchase frequency. These extracted features can then be used to build more efficient and interpretable predictive models.


### Q5]

In a food delivery recommendation system, features like price, rating, and delivery time often have different ranges and units. To ensure these features contribute equally to the recommendation algorithm, Min-Max scaling can be applied to normalize them to a common scale, typically between 0 and 1.

**Steps to Apply Min-Max Scaling:**

1. **Identify the features to be scaled:** In this case, price, rating, and delivery time are the features that need scaling.

2. **Calculate the minimum and maximum values:** For each feature, determine the minimum and maximum values present in the dataset.

3. **Apply the Min-Max scaling formula:** Transform each feature value using the following formula:
   \[ X_{\text{scaled}} = \frac{X - X_{\text{min}}}{X_{\text{max}} - X_{\text{min}}} \]
   Where:
   - \( X_{\text{scaled}} \) is the scaled value.
   - \( X \) is the original value.
   - \( X_{\text{min}} \) is the minimum value of the feature.
   - \( X_{\text{max}} \) is the maximum value of the feature.

4. **Replace the original features with scaled features:** Use the scaled features in the recommendation algorithm instead of the original features.

**Example:**

Suppose the price of a dish is $25, the minimum price in the dataset is $10, and the maximum price is $50. The scaled price would be calculated as:
\[ X_{\text{scaled}} = \frac{25 - 10}{50 - 10} = \frac{15}{40} = 0.375 \]

**Benefits of using Min-Max scaling in this scenario:**

- **Prevents feature dominance:** Ensures that no single feature dominates the recommendation algorithm due to its scale.
- **Improves algorithm convergence:** Can help gradient-based optimization algorithms converge faster.
- **Enhances model performance:** May lead to better recommendation accuracy by providing normalized input to the algorithm.


In [17]:
import numpy as np

# Original dataset
data = np.array([1, 5, 10, 15, 20])

# Define the desired range
min_range = -1
max_range = 1

# Calculate Min-Max scaling
X_min = np.min(data)
X_max = np.max(data)
scaled_data = (data - X_min) / (X_max - X_min) * (max_range - min_range) + min_range

print("Original data:", data)
print("Scaled data (-1 to 1):", scaled_data)


Original data: [ 1  5 10 15 20]
Scaled data (-1 to 1): [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


### Q8]

In a stock price prediction project, dealing with a high-dimensional dataset containing various financial indicators and market trends can pose challenges like overfitting, increased computational complexity, and difficulty in interpreting the model. PCA (Principal Component Analysis) can effectively address these issues by reducing the dimensionality of the dataset while preserving the most important information.

**Steps to Apply PCA for Dimensionality Reduction:**

1. **Data Preparation:**
   - Collect and clean the dataset, handling missing values and outliers.
   - Normalize the data to ensure that features with different scales contribute equally.

2. **PCA Implementation:**
   - Apply PCA to the normalized dataset.
   - Calculate the covariance matrix to understand the relationships between features.
   - Perform eigen decomposition to obtain the eigenvectors (principal components) and their corresponding eigenvalues (explained variance).

3. **Component Selection:**
   - Sort the principal components based on their eigenvalues in descending order.
   - Select the top 'k' principal components that capture a significant portion of the total variance (e.g., 95%). This reduces the dimensionality of the data from potentially hundreds of features to a smaller set of 'k' components.

4. **Feature Transformation:**
   - Project the original data onto the selected principal components. This transforms the data into a new feature space with reduced dimensions.

5. **Model Building:**
   - Use the transformed dataset with reduced features to train your stock price prediction model. This can be any suitable machine learning algorithm, such as linear regression, decision trees, or neural networks.

**Benefits of using PCA in this context:**
- **Reduced Overfitting:** Lowering the number of features helps prevent the model from fitting noise in the original data.
- **Improved Computational Efficiency:** Training and testing the model becomes faster with fewer features.
- **Enhanced Model Interpretability:** Focusing on the most important principal components can provide insights into the key drivers of stock prices.
