***Q1.*** What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its
application.

Min-Max scaling is a data preprocessing technique used in machine learning to transform features to a specific range, typically between 0 and 1. It is also known as Min-Max normalization. The formula to perform Min-Max scaling on a feature x is:

X(normalised) = (x - min(x))/(max(x)-min(x))

Min-Max scaling is beneficial when the features in the dataset have different scales, and some machine learning algorithms, like neural networks, perform better when input features are within a similar range. By scaling the features to a specific range, you can ensure that no particular feature dominates the others during the training of the model.


***Example*** 

Let's say you have a dataset containing the following ages: 20, 25, 30, and 35 years. To perform Min-Max scaling on these ages to a range between 0 and 1:

So, after Min-Max scaling, the ages 20, 25, 30, and 35 are transformed to 0, 0.25, 0.5, and 1 respectively, and they now fall within the range of 0 to 1.


***Q2.*** What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling?
Provide an example to illustrate its application.

The unit vector technique, also known as Unit Vector Scaling or Normalization, is a feature scaling method where each feature is scaled to have a unit norm (length 1). This technique is often used in machine learning when the direction of the data is more important than its magnitude. It is particularly useful in algorithms that involve measuring distances, like clustering algorithms or when using techniques such as cosine similarity.

The formula to calculate unit vector for a feature vector is :- 

X(normalized) = X / (||X||)


Let's consider a dataset with two features represented by vectors:X1=[3,4] and X2=[1,2].
After unit vector scaling, the vectors X1 and X2 are transformed into unit vectors [0.6,0.8] ans [0.45,0.89] respectively,meaning they both have a length of 1.

***Q3.*** What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an
example to illustrate its application.

***Principal Component Analysis (PCA)*** is a dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional form. It does so by identifying the principal components in the data, which are the directions in which the data varies the most.

Here's how PCA works step by step:

Standardize the data: If the features in the dataset are measured in different units or have different scales, it's important to standardize the data (subtract the mean and divide by the standard deviation for each feature) so that they all have a similar scale.

Compute the covariance matrix: Calculate the covariance matrix of the standardized data. The covariance matrix shows how different features vary with respect to each other.

Compute eigenvalues and eigenvectors: Calculate the eigenvalues and eigenvectors of the covariance matrix. Eigenvectors represent the principal components, and eigenvalues represent the magnitude of the variance in each of those components.

Sort eigenvalues and select principal components: Sort the eigenvalues in descending order. The eigenvectors corresponding to the largest eigenvalues are the principal components. You can choose the top k eigenvectors to form a k-dimensional subspace 
(where k is the number of dimensions you want to reduce the data to).

Project the data onto the new subspace: Multiply the original data by the selected eigenvectors to obtain the new lower-dimensional representation of the data.


## Here's an example to illustrate PCA:

Let's say you have a dataset with two features: height in inches and weight in pounds for a group of individuals. You want to reduce this data to one dimension using PCA.

Standardize the data: Assume the mean height is 68 inches, and the mean weight is 150 pounds. Standardize the data by subtracting the mean and dividing by the standard deviation for each feature.

Compute the covariance matrix: Calculate the covariance matrix of the standardized data.

Compute eigenvalues and eigenvectors: Calculate the eigenvalues and eigenvectors of the covariance matrix.

Sort eigenvalues and select principal components: Suppose the eigenvalue corresponding to the first principal component is 1.5 and the eigenvalue corresponding to the second principal component is 0.8. Since the first eigenvalue is larger, you select the corresponding eigenvector as the principal component.

Project the data onto the new subspace: Multiply the original data by the selected eigenvector to obtain the new one-dimensional representation of the data.

In this simplified example, PCA has reduced the two-dimensional data (height and weight) to one dimension, capturing the most significant variation in the data.

***Q4.*** What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature
Extraction? Provide an example to illustrate this concept.

PCA (Principal Component Analysis) is a technique used for both dimensionality reduction and feature extraction. The relationship between PCA and feature extraction lies in the fact that PCA identifies new features (principal components) that are linear combinations of the original features. These new features are orthogonal and capture the most significant patterns in the data. By selecting a subset of these principal components, you can perform feature extraction, transforming the original features into a smaller set of uncorrelated features that retain most of the essential information of the dataset.

Here's how PCA can be used for feature extraction:

### Steps for PCA-based Feature Extraction:

1. **Standardize the Data**: Standardize the original features to have zero mean and unit variance.

2. **Compute Covariance Matrix**: Calculate the covariance matrix of the standardized data.

3. **Compute Eigenvalues and Eigenvectors**: Compute the eigenvalues and eigenvectors of the covariance matrix.

4. **Sort Eigenvalues and Select Principal Components**: Sort the eigenvalues in descending order. Select the top \(k\) eigenvectors corresponding to the \(k\) largest eigenvalues to form the transformation matrix.

5. **Transform the Original Features**: Multiply the original feature matrix by the selected \(k\) eigenvectors to obtain the new feature matrix with reduced dimensions.

Here's an example to illustrate PCA-based feature extraction:

Let's consider a dataset with three features: age, income, and education level. We want to perform feature extraction using PCA to reduce the dimensionality to two features.

1. **Standardize the Data**: Assume the features are standardized (mean=0, variance=1) for simplicity.

2. **Compute Covariance Matrix**: Calculate the covariance matrix of the standardized data.

3. **Compute Eigenvalues and Eigenvectors**: Compute the eigenvalues and eigenvectors of the covariance matrix.

4. **Sort Eigenvalues and Select Principal Components**: Suppose the eigenvalues are [2.5, 0.8, 0.3] in descending order. We want to reduce the dimensionality to two features, so we select the top two eigenvectors corresponding to the largest eigenvalues.

   Transformation Matrix:
   \[
   \begin{bmatrix}
   0.6 & 0.7 \\
   0.4 & -0.5 \\
   0.7 & 0.4 \\
   \end{bmatrix}
   \]

5. **Transform the Original Features**: Multiply the original feature matrix by the selected eigenvectors:

   Original Feature Matrix:
   \[
   \begin{bmatrix}
   25 & 50000 & 16 \\
   30 & 60000 & 18 \\
   35 & 75000 & 20 \\
   \end{bmatrix}
   \]

   New Feature Matrix (after transformation):
   \[
   \begin{bmatrix}
   46000 & 2.5 \\
   56000 & 3.0 \\
   71000 & 2.8 \\
   \end{bmatrix}
   \]

In this example, PCA has extracted two features that capture the most significant patterns in the original data. The first new feature combines age, income, and education level in a specific way, and the second feature captures additional variation. These two features can now be used for further analysis or modeling, reducing the dimensionality of the data while preserving essential information.

***Q5.*** You are working on a project to build a recommendation system for a food delivery service. The dataset
contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to
preprocess the data.

In the context of building a recommendation system for a food delivery service, Min-Max scaling can be a valuable preprocessing technique to standardize the features like price, rating, and delivery time. Here's how you can use Min-Max scaling for this purpose:

1. **Understanding the Features**:
   - **Price**: It might be represented in a numerical format, such as dollars. Prices can vary significantly, and Min-Max scaling can help in bringing them to a similar scale.
   - **Rating**: Ratings are typically on a scale, say from 1 to 5. Even though they are on the same scale, scaling can still be beneficial for consistency.
   - **Delivery Time**: Delivery time might be represented in minutes. Like price, delivery times can vary widely and need to be scaled for uniformity.

2. **Min-Max Scaling**:
   - **Identify the Range for Each Feature**:
     - For **price**, suppose the prices range from $5 to $50.
     - For **rating**, the scale is 1 to 5.
     - For **delivery time**, it might range from 15 minutes to 90 minutes.
   - **Apply Min-Max Scaling**:
     - For each feature, apply the Min-Max scaling formula to transform the values into a range between 0 and 1. The formula for Min-Max scaling was mentioned earlier:

      X(normalised) = (x-x(min))/(x(max)-x(min))

     - For instance, if you want to scale the price of a food item priced at $20, the scaled value would be:

     x{normalized} = {20 - 5}/{50 - 5} = {15}/{45} = 0.3333 

   - **Repeat the Process for Other Features**: Apply the same scaling procedure for the rating and delivery time.

3. **Result**:
   - After Min-Max scaling, all the features (price, rating, and delivery time) are transformed into a common scale between 0 and 1. Now, they are ready to be used as input for your recommendation system algorithm.

By employing Min-Max scaling, you ensure that all the features have equal weight in your recommendation system, regardless of their original scales. This can be crucial, especially if you're using algorithms that rely on distance calculations, such as collaborative filtering, where the scale of features can significantly impact the results.

***Q6.*** You are working on a project to build a model to predict stock prices. The dataset contains many
features, such as company financial data and market trends. Explain how you would use PCA to reduce the
dimensionality of the dataset.

Using PCA (Principal Component Analysis) to reduce the dimensionality of a dataset for predicting stock prices can be beneficial, especially when dealing with a large number of features. Here's how you can apply PCA in the context of your stock price prediction project:

### Step 1: Data Preprocessing

Before applying PCA, it's crucial to preprocess the data:

1. **Handling Missing Values**: Address any missing or null values in the dataset using techniques like imputation or removal of incomplete data points.

2. **Standardization**: Standardize the features to give them a mean of 0 and a standard deviation of 1. Standardization ensures that all features are on a similar scale, which is a prerequisite for PCA.

### Step 2: Applying PCA

1. **Calculate Covariance Matrix**: Compute the covariance matrix of the standardized feature matrix. The covariance matrix represents the relationships between different features.

2. **Compute Eigenvalues and Eigenvectors**: Calculate the eigenvalues and eigenvectors of the covariance matrix. These eigenvectors represent the principal components, and eigenvalues indicate their magnitude.

3. **Sort Eigenvalues**: Sort the eigenvalues in descending order. The higher the eigenvalue, the more variance the corresponding principal component explains.

4. **Select Principal Components**: Choose the top \(k\) eigenvectors corresponding to the \(k\) largest eigenvalues to form the transformation matrix. The value of \(k\) can be determined based on the cumulative explained variance ratio. For instance, you might select \(k\) such that 95% of the variance is retained.

5. **Transform the Data**: Multiply the original standardized feature matrix by the selected \(k\) eigenvectors to obtain the reduced-dimensional feature matrix.

### Step 3: Model Training and Prediction

1. **Split Data**: Split the reduced-dimensional feature matrix and corresponding target values into training and testing sets.

2. **Train Model**: Train your stock price prediction model (such as regression or a time-series model) using the reduced-dimensional feature matrix in the training set.

3. **Evaluate Model**: Evaluate the model's performance using the testing set. Metrics like mean squared error (MSE) or root mean squared error (RMSE) can be used to assess the prediction accuracy.

### Benefits of PCA in Stock Price Prediction:

- **Dimensionality Reduction**: PCA reduces the number of features, which can help mitigate the curse of dimensionality and improve the model's generalization.
  
- **Noise Reduction**: By focusing on the principal components with the highest variance, PCA helps in filtering out noise and retaining the essential patterns in the data.

- **Visualization**: Reduced dimensionality allows for easier visualization of the data, which can aid in understanding the underlying relationships.

- **Speed Up Training**: With fewer dimensions, the model training process is faster, enabling quicker experimentation with different algorithms and hyperparameters.

Remember that while PCA can be a powerful tool, it's essential to strike a balance. Reducing dimensionality too aggressively might lead to loss of important information, impacting the predictive power of your model. Experimentation and careful evaluation of the model's performance at different dimensions are crucial in determining the optimal number of principal components to retain.

***Q7.*** For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the
values to a range of -1 to 1.

In [3]:
import numpy as np
data = np.array([1,5,10,15,20])
mini = np.min(data)
maxi = np.max(data)
# mini,maxi 
min_max_data = []
for i in data:
    scaled_data = ((i-mini)/(maxi-mini))*2-1
    min_max_data.append(scaled_data)
min_max_data

[-1.0, -0.5789473684210527, -0.052631578947368474, 0.4736842105263157, 1.0]

***Q8*** For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform
Feature Extraction using PCA. How many principal components would you choose to retain, and why?

The decision of how many principal components to retain in PCA involves balancing the need for dimensionality reduction with the preservation of as much variance as possible. A common approach to decide the number of principal components to keep is by examining the explained variance ratio.

The explained variance ratio tells us the proportion of the dataset's variance that lies along the axes of each principal component. Retaining a higher percentage of explained variance ensures that the retained components capture most of the dataset's variability. A common threshold is to retain principal components that collectively explain, for example, 95% or 99% of the total variance.

Let's consider the steps for PCA and then discuss how to decide the number of principal components to retain for your dataset:

### Steps for PCA:

1. **Standardize the Data**: Standardize the features (height, weight, age, blood pressure) to have zero mean and unit variance.

2. **Calculate Covariance Matrix**: Compute the covariance matrix of the standardized feature matrix.

3. **Compute Eigenvalues and Eigenvectors**: Calculate the eigenvalues and eigenvectors of the covariance matrix.

4. **Sort Eigenvalues**: Sort the eigenvalues in descending order.

5. **Decide Number of Principal Components to Retain**:
   - Calculate the explained variance ratio for each principal component: \( \text{explained variance ratio} = \frac{\text{eigenvalue}}{\sum \text{all eigenvalues}} \).
   - Accumulate the explained variance ratios.
   - Decide the number of principal components (\(k\)) to retain based on a threshold (e.g., 95% or 99% total variance explained).

6. **Select Principal Components and Transform Data**: Select the top \(k\) eigenvectors corresponding to the \(k\) largest eigenvalues. Multiply the original standardized feature matrix by these \(k\) eigenvectors to obtain the reduced-dimensional feature matrix.

### Decision on Number of Principal Components:

For example, if you find that the first three principal components explain 98% of the total variance, you might decide to retain these three components. This means you are reducing the dimensionality from 5 features to 3, capturing 98% of the dataset's variability.

The choice of the threshold (95%, 99%, etc.) depends on your specific use case and the trade-off between dimensionality reduction and information preservation. Retaining more principal components preserves more information but might lead to higher computational complexity. Conversely, retaining fewer components might result in information loss.

In practice, it's common to start with a threshold like 95% and analyze the results. If the model's performance is not satisfactory, you can experiment with retaining more components to capture additional variance.