<h1 style = 'color:red'><b>Week-13, Feature Engineering-2 Assignment</b><h1>

Name - Gorachanda Dash <br>
Date - 19-Mar-2023<br>
Week-13, Feature Engineering-2 Assignment

<p style=" color : #4233FF"><b>Q1. What is Min-Max scaling, and how is it used in data preprocessing? Provide an example to illustrate its application.</b></p>

**`Min-Max scaling`**, also known as min-max normalization or feature scaling, is a data preprocessing technique used in statistics and machine learning to scale numerical features in a specific range, typically between 0 and 1. The purpose of this scaling method is to transform the data in such a way that it falls within a consistent, interpretable range, making it easier for machine learning algorithms to converge and improving their performance.

The formula for Min-Max scaling is:

X_scaled = {X - X_min} / {X_max} - X_min

Where:
- X_scaled is the scaled value of the original data point \(X\).
- X is the original data point.
- X - X_min is the minimum value in the dataset.
- X_max is the maximum value in the dataset.

Here's an example to illustrate how Min-Max scaling works:

Suppose we have a dataset of ages of people, and we want to scale these ages using Min-Max scaling. The ages in our dataset range from 25 to 60 years.

Original data (ages):
- Person 1: 25 years
- Person 2: 40 years
- Person 3: 55 years
- Person 4: 60 years

Now, let's apply Min-Max scaling:

1. Find the minimum and maximum values in the dataset:
   - X_min = 25 (minimum age)
   - X_max = 60 (maximum age)

2. Apply the Min-Max scaling formula to each data point:

   - Person 1: X_scaled = {25 - 25}/{60 - 25} = 0.000000
   - Person 2: X_scaled = {40 - 25}/{60 - 25} = 0.428571
   - Person 3: X_scaled = {55 - 25}/{60 - 25} = 0.857143
   - Person 4: X_scaled = {60 - 25}/{60 - 25} = 1.0

After Min-Max scaling, the ages of people in the dataset are transformed to the range [0, 1]:

- Person 1: 0.000000
- Person 2: 0.428571
- Person 3: 0.857143
- Person 4: 1.0

Min-Max scaling is beneficial for machine learning algorithms that are sensitive to the scale of features, such as gradient-based optimization algorithms used in neural networks or support vector machines. It ensures that all features have similar scales, preventing one feature from dominating others during training. However, it's essential to note that Min-Max scaling may not be appropriate for all datasets, especially when dealing with outliers, as extreme values can significantly affect the scaling. In such cases, robust scaling methods may be preferred.

In [33]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
df=pd.DataFrame({'Age':{'person1':25, 'person2':40, 'person3':55, 'person4':60}})
scalar = MinMaxScaler()
pd.DataFrame(data = scalar.fit_transform(df[['Age']]), 
             index=['Person1', 'person2', 'Person3', 'Person4'], columns=['Age'])

Unnamed: 0,Age
Person1,0.0
person2,0.428571
Person3,0.857143
Person4,1.0


<p style=" color : #4233FF"><b>Q2. What is the Unit Vector technique in feature scaling, and how does it differ from Min-Max scaling? Provide an example to illustrate its application.</b>
</p>

**Unit Vector Scaling**, also known as "Normalization," is a feature scaling technique used to transform numerical features so that they have a unit norm, meaning they are scaled to have a length or magnitude of 1. This technique is commonly used in machine learning when the direction or relative relationships between data points are more important than their absolute values.

The formula for Unit Vector Scaling (Normalization) is as follows:

X_normalized = X / |X|

Where:
- X_normalized is the normalized value of the original data point X.
- X is the original data point.
- |X\| is the Euclidean norm or L2 norm of the data point, which is calculated as the square root of the sum of squared values of the data point's components.

Unit Vector Scaling is primarily used in scenarios where the scale of features varies significantly, and we want to emphasize the direction or similarity between data points rather than their absolute magnitudes. This can be particularly useful in machine learning algorithms like clustering or nearest neighbor classifiers.

Here's an example to illustrate Unit Vector Scaling:

Suppose we have a dataset of two numerical features, "Age" and "Income." We want to apply Unit Vector Scaling to these features.

Original data:
- Person 1: Age = 30, Income = $60,000
- Person 2: Age = 40, Income = $80,000
- Person 3: Age = 35, Income = $70,000

To apply Unit Vector Scaling:

1. Calculate the Euclidean norm (|X\|) for each data point:

   - Person 1: |X_1| = ‚àö{30^2 + 60,000^2} ‚âà 60,000.5
   - Person 2: |X_2| = ‚àö{40^2 + 80,000^2} ‚âà 80,000.4
   - Person 3: |X_3| = ‚àö{35^2 + 70,000^2} ‚âà 70,000.4

2. Apply Unit Vector Scaling for each feature:

   - Person 1: 
     - Age: Age_normalized = {30}/{60,000.5} ‚âà 0.0005
     - Income: Income_normalized = {60,000}{60,000.5} ‚âà 0.9995
   - Person 2: 
     - Age: Age_normalized = {40}/{80,000.4} \approx 0.0005
     - Income: Income_normalized  = {80,000}/{80,000.4} ‚âà 0.9995
   - Person 3: 
     - Age: Age_normalized = {35}/{70,000.4} ‚âà 0.0005
     - Income: Income_normalized  = {70,000}/{70,000.4} ‚âà 0.9995

In this example, we can see that after Unit Vector Scaling, all data points have a magnitude (Euclidean norm) of approximately 1. This scaling method emphasizes the direction of the data points in the feature space while preserving their relative relationships.

Key differences between Unit Vector Scaling (Normalization) and Min-Max scaling are:

1. **`Magnitude Preservation`**:
   - Unit Vector Scaling preserves the direction and relative relationships between data points but does not preserve the magnitude of the features. The magnitude becomes 1 for all data points.
   - Min-Max scaling preserves both the relative relationships and the magnitude of features within a specified range (typically between 0 and 1).

2. **`Use Cases`**:
   - Unit Vector Scaling is often used in scenarios where the absolute magnitude of features is not important, such as in clustering algorithms or cosine similarity calculations.
   - Min-Max scaling is used when we want to bring all features to a common scale while preserving their magnitudes and are working with algorithms sensitive to feature scales, like gradient-based optimization algorithms in machine learning.

In [37]:
import pandas as pd
from sklearn.preprocessing import Normalizer

# Create a DataFrame with sample data
data = {
    'Age': [30, 40, 35],
    'Income': [60000, 80000, 70000]
}

df = pd.DataFrame(data)

# Initialize the Normalizer
normalizer = Normalizer(norm='l2')  # 'l2' norm corresponds to Euclidean norm

# Apply normalization to the DataFrame
normalized_df = pd.DataFrame(normalizer.fit_transform(df), columns=df.columns)

# Display the normalized DataFrame
print("Original DataFrame:")
print(df)

print("\nNormalized DataFrame:")
normalized_df

Original DataFrame:
   Age  Income
0   30   60000
1   40   80000
2   35   70000

Normalized DataFrame:


Unnamed: 0,Age,Income
0,0.0005,1.0
1,0.0005,1.0
2,0.0005,1.0


<p style=" color : #4233FF"><b>Q3. What is PCA (Principle Component Analysis), and how is it used in dimensionality reduction? Provide an example to illustrate its application.</b></p>

**Principal Component Analysis (PCA)** is a dimensionality reduction technique widely used in statistics and machine learning. Its primary goal is to reduce the dimensionality of a dataset while preserving as much of the original variation (information) as possible. PCA achieves this by transforming the data into a new coordinate system, where the axes (principal components) are orthogonal (uncorrelated) and ranked by the amount of variance they explain.

Here's a simplified explanation of how PCA works:

1. **Centering the Data**: PCA begins by centering the data, which means subtracting the mean of each feature from every data point. This step ensures that the data is centered around the origin.

2. **Covariance Matrix**: PCA then computes the covariance matrix of the centered data. The covariance matrix describes the relationships between the features and quantifies how they vary together.

3. **Eigendecomposition**: Next, PCA performs eigendecomposition on the covariance matrix to find its eigenvalues and corresponding eigenvectors. These eigenvectors are the principal components.

4. **Selecting Principal Components**: PCA ranks the principal components in descending order of their corresponding eigenvalues. The principal components with the largest eigenvalues capture the most variance in the data.

5. **Reducing Dimensionality**: To reduce dimensionality, we can choose a subset of the top principal components while retaining most of the data's variance. This subset of components can be used to transform the original data into a lower-dimensional space.

PCA has effectively reduced the dimensionality of the dataset while preserving most of the variance. The reduced data (`reduced_df`) now has only two features, "Principal Component 1" and "Principal Component 2," which capture the most significant patterns in the original data. This reduced representation is often used in further analysis or visualization.

In [48]:
import pandas as pd
import seaborn as sns
from sklearn.decomposition import PCA

df = sns.load_dataset('iris')
X = df.iloc[:, :-1]
# Apply PCA for dimensionality reduction to 2 components
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)

# Create a DataFrame for the reduced data
reduced_df = pd.DataFrame(data=X_reduced, columns=["Principal Component 1", "Principal Component 2"])
pd.concat([reduced_df,df.iloc[:, -1]], axis = 1).head()

Unnamed: 0,Principal Component 1,Principal Component 2,species
0,-2.684126,0.319397,setosa
1,-2.714142,-0.177001,setosa
2,-2.888991,-0.144949,setosa
3,-2.745343,-0.318299,setosa
4,-2.728717,0.326755,setosa


<p style=" color : #4233FF"><b>Q4. What is the relationship between PCA and Feature Extraction, and how can PCA be used for Feature Extraction? Provide an example to illustrate this concept.</b>
</p>

**Principal Component Analysis (PCA)** is closely related to feature extraction in machine learning. PCA can be used as a feature extraction technique to reduce the dimensionality of a dataset while retaining the most important information, making it easier to work with and improving the performance of machine learning models.

Here's the relationship between PCA and feature extraction, along with an example:

1. **Dimensionality Reduction**: PCA is primarily used for dimensionality reduction. It takes a dataset with a high number of features and transforms it into a new dataset with fewer features (principal components) while preserving as much of the original information as possible.

2. **Orthogonal Features**: The principal components produced by PCA are orthogonal to each other, meaning they are uncorrelated. This orthogonality simplifies the relationships between features and often makes the data more interpretable.

3. **Variance Retention**: PCA ranks the principal components by the amount of variance they explain in the data. The first few components typically capture most of the variance, while the later components capture less. By selecting a subset of the top components, we can retain most of the important information in the data.

4. **Feature Extraction**: In the context of feature extraction, PCA can be used to transform the original features into a lower-dimensional space represented by the principal components. These transformed components can be treated as new features, which can be used in machine learning models. This is particularly useful when dealing with high-dimensional data or when we suspect that some features are redundant or noisy.

PCA has effectively performed feature extraction by reducing the dimensionality of the dataset while preserving its essential characteristics. These extracted features can then be used in machine learning models for tasks like classification or clustering.

In [49]:
import pandas as pd
import seaborn as sns
from sklearn.decomposition import PCA

df = sns.load_dataset('iris')
X = df.iloc[:, :-1]
# Apply PCA for dimensionality reduction to 2 components
pca = PCA(n_components=2)
X_extracted = pca.fit_transform(X)

# Create a DataFrame for the reduced data
reduced_df = pd.DataFrame(data=X_extracted, columns=["Feature 1 (PCA)", "Feature 2 (PCA)"])
# Percentage of variance explained by the selected components
explained_variance_ratio = pca.explained_variance_ratio_

print("Explained Variance Ratio:", explained_variance_ratio)
pd.concat([reduced_df,df.iloc[:, -1]], axis = 1).head()

Explained Variance Ratio: [0.92461872 0.05306648]


Unnamed: 0,Feature 1 (PCA),Feature 2 (PCA),species
0,-2.684126,0.319397,setosa
1,-2.714142,-0.177001,setosa
2,-2.888991,-0.144949,setosa
3,-2.745343,-0.318299,setosa
4,-2.728717,0.326755,setosa


<p style=" color : #4233FF"><b>Q5. You are working on a project to build a recommendation system for a food delivery service. The dataset contains features such as price, rating, and delivery time. Explain how you would use Min-Max scaling to preprocess the data.</b>
</p>

When building a recommendation system for a food delivery service, it's essential to preprocess the data before using it in machine learning algorithms. Min-Max scaling can be a useful preprocessing technique to ensure that the numerical features, such as price, rating, and delivery time, are on a common scale. Here's how we would use Min-Max scaling to preprocess the data:

1. **Understand the Data**: First, we should have a good understanding of our dataset. Identify the numerical features that need scaling. In our case, it's likely that "price," "rating," and "delivery time" are numerical features that need to be scaled.

2. **Import Libraries**: Import the necessary libraries, including the one for Min-Max scaling. In Python, we can use scikit-learn's `MinMaxScaler` for this purpose.

   ```python
   from sklearn.preprocessing import MinMaxScaler
   ```

3. **Load and Prepare the Data**: Load our dataset into a DataFrame and prepare it for scaling. Ensure that we have extracted the relevant columns.

   ```python
   import pandas as pd

   # Load our dataset (replace 'data.csv' with our actual dataset file)
   df = pd.read_csv('data.csv')

   # Extract the columns that need scaling (e.g., 'price', 'rating', 'delivery_time')
   features_to_scale = ['price', 'rating', 'delivery_time']
   ```

4. **Apply Min-Max Scaling**: Initialize the `MinMaxScaler` and fit it to our data to compute the scaling parameters (minimum and maximum values). Then, use the scaler to transform our data.

   ```python
   # Initialize the Min-Max scaler
   scaler = MinMaxScaler()

   # Fit the scaler to our data and transform it
   df[features_to_scale] = scaler.fit_transform(df[features_to_scale])
   ```

5. **Scaled Data**: After applying Min-Max scaling, our numerical features will be scaled to the range [0, 1]. This ensures that all these features have the same scale, making them suitable for use in machine learning algorithms without any feature dominating the others due to its original scale.

   ```python
   # Scaled data ready for further processing
   print(df.head())
   ```

6. **Further Processing**: With the scaled data, we can now proceed with building our recommendation system. We can use various recommendation algorithms, including collaborative filtering, content-based filtering, or hybrid methods, depending on our project's requirements.

Min-Max scaling is especially valuable when working with machine learning models that are sensitive to feature scales, as it ensures that the features are on a common scale and contributes to better model performance. In our food delivery recommendation system, this preprocessing step helps in making meaningful recommendations based on the scaled features like price, rating, and delivery time.

<p style=" color : #4233FF"><b>Q6. You are working on a project to build a model to predict stock prices. The dataset contains many features, such as company financial data and market trends. Explain how you would use PCA to reduce the dimensionality of the dataset.</b>
</p>

Using **Principal Component Analysis (PCA)** to reduce the dimensionality of a dataset when building a stock price prediction model can be a valuable technique, especially when dealing with a large number of features. Here's how our would use PCA in this context:

**Step 1: Data Preparation**

1. **Understand the Data**: Gain a thorough understanding of our dataset, including the features related to company financial data and market trends. Identify the target variable, which in this case would likely be the stock price or some derivative thereof.

2. **Feature Selection**: Carefully select the relevant features that our believe might have an impact on stock prices. This can involve domain knowledge and exploratory data analysis.

3. **Data Cleaning**: Perform data cleaning to handle missing values and ensure that our dataset is ready for analysis.

**Step 2: Standardization**

PCA is sensitive to the scale of the features. Therefore, it's essential to standardize (normalize) the data before applying PCA. We can use techniques like Min-Max scaling or Z-score standardization to ensure that all features have a mean of 0 and a standard deviation of 1.

```python
from sklearn.preprocessing import StandardScaler

# Standardize the feature data
scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)
```

**Step 3: Applying PCA**

Now, We can apply PCA to reduce the dimensionality of the standardized feature data.

```python
from sklearn.decomposition import PCA

# Initialize PCA with the number of components we want to retain
pca = PCA(n_components=K)

# Fit PCA to the standardized data and transform it
X_pca = pca.fit_transform(X_standardized)
```

In the code above:
- `K` is the number of principal components we want to retain. We can choose this based on the amount of variance we want to explain (e.g., retaining enough components to explain 95% of the variance).

**Step 4: Variance Explained**

PCA provides information about the variance explained by each principal component. We can access this information to decide how many components to retain.

```python
explained_variance = pca.explained_variance_ratio_
```

**Step 5: Model Building**

With the reduced-dimensional data (X_pca) and the target variable, we can proceed to build our stock price prediction model. We can use regression techniques, time series models, or any other appropriate modeling approach for this task.

**Step 6: Inverse Transform (Optional)**

If we need to interpret the model's predictions in the original feature space, we can perform an inverse transform using PCA. This can be useful for understanding which original features are most influential.

```python
# Inverse transform to get data back in the original feature space (X_original)
X_original = pca.inverse_transform(X_pca)
```

**Step 7: Model Evaluation and Refinement**

After building our initial model, evaluate its performance using appropriate metrics. We can also experiment with different numbers of retained principal components and fine-tune our model based on the results.

By applying PCA, we can reduce the dimensionality of our dataset, mitigate the curse of dimensionality, and potentially improve the efficiency and interpretability of our stock price prediction model. It's essential to strike a balance between dimensionality reduction and preserving the information necessary for accurate predictions.

<p style=" color : #4233FF"><b>Q7. For a dataset containing the following values: [1, 5, 10, 15, 20], perform Min-Max scaling to transform the values to a range of -1 to 1.</b>
</p>

In [51]:
import numpy as np

# Given dataset
data = np.array([1, 5, 10, 15, 20])

# Define the desired range (-1 to 1)
min_range = -1
max_range = 1

# Calculate min and max of the original data
data_min = min(data)
data_max = max(data)

# Apply Min-Max scaling
scaled_data = ((data - data_min) / (data_max - data_min)) * (max_range - min_range) + min_range

# Display the scaled data
print("Original Data:", data)
print("Scaled Data:", scaled_data)

Original Data: [ 1  5 10 15 20]
Scaled Data: [-1.         -0.57894737 -0.05263158  0.47368421  1.        ]


<p style=" color : #4233FF"><b>Q8. For a dataset containing the following features: [height, weight, age, gender, blood pressure], perform Feature Extraction using PCA. How many principal components would you choose to retain, and why?</b>
</p>

When performing Feature Extraction using PCA (Principal Component Analysis), the decision of how many principal components to retain depends on several factors, including the variance explained by each component and the specific goals of our analysis. Here are the steps to decide on the number of principal components to retain for our dataset containing the features [height, weight, age, gender, blood pressure]:

**Step 1: Standardize the Data**

Before applying PCA, standardize (normalize) the data to ensure that all features are on the same scale. This step is important because PCA is sensitive to the scale of the features.

**Step 2: Apply PCA**

Apply PCA to the standardized data. The PCA algorithm will provide us with the explained variance ratio for each principal component. The explained variance ratio tells us how much of the total variance in the dataset is explained by each component.

**Step 3: Decide on the Number of Components**

Deciding on the number of principal components to retain involves a trade-off between dimensionality reduction and information preservation. Here are some common methods to make this decision:

1. **Explained Variance**: Plot the cumulative explained variance as a function of the number of components. We can then choose the number of components that collectively explain a sufficiently high percentage of the total variance. A common threshold might be to retain enough components to explain 95% or 99% of the variance. This ensures that we retain most of the information in the data.

2. **Scree Plot**: Plot the explained variance for each component and look for an "elbow" point where the explained variance starts to level off. The number of components just before this point is often chosen as it represents a good balance between dimensionality reduction and information preservation.

3. **Domain Knowledge**: Sometimes, domain knowledge or the specific goals of our analysis can guide us in selecting the number of components. For example, if we have prior knowledge that only certain features are relevant, we might choose to retain those components.

4. **Cross-Validation**: If we're building a predictive model, we can use cross-validation to assess the performance of our model with different numbers of retained components. Choose the number that results in the best model performance.

5. **Use Case**: Consider the specific use case. If we are interested in visualization, we might choose a small number of components that can be easily visualized. If we need to reduce dimensionality for a machine learning task, we might choose a larger number of components that still capture most of the variance.

Ultimately, the choice of the number of components should align with our specific objectives and the amount of information we can afford to lose. There is no one-size-fits-all answer, and it often involves some experimentation and analysis.

<h1 style = 'color:orange'>
    <b><div>üôèüôèüôèüôèüôè       THANK YOU        üôèüôèüôèüôèüôè</div></b>
</h1>
