# Feature Engineering

In this lesson, you will learn how to create and select features for machine learning models. Feature engineering is a critical step that can significantly enhance model performance by providing the model with the most relevant information.

## Learning Objectives
- Define feature engineering and its importance in machine learning.
- Create new features from existing data to improve model performance.
- Select important features that contribute to the model's predictive power.
- Understand dimensionality reduction techniques and their applications.
- Explore practical examples of feature engineering in real-world scenarios.

## Why This Matters

Feature engineering is essential because it directly impacts the performance of machine learning models. By creating and selecting the right features, you can improve model accuracy, reduce overfitting, and enhance interpretability. This process allows you to leverage domain knowledge to extract meaningful insights from raw data.

### Concept 1: Feature Creation

Feature creation involves generating new features from existing data to provide additional insights for the model. This can include interaction features and polynomial features.

In [None]:
# Example of creating interaction features
import pandas as pd

# Sample dataset
data = {'feature1': [1, 2, 3], 'feature2': [4, 5, 6]}
df = pd.DataFrame(data)

# Creating an interaction feature
# Interaction feature is the product of feature1 and feature2

df['interaction_feature'] = df['feature1'] * df['feature2']
print(df)

#### Micro-Exercise 1

Explain what feature engineering is and why it is important.

```python
# Feature engineering explanation
# Feature engineering is the process of using domain knowledge to create features that make machine learning algorithms work.
```

In [None]:
# Micro-Exercise 1 Starter Code
# Let's define a function to explain feature engineering

def explain_feature_engineering():
    explanation = "Feature engineering is the process of using domain knowledge to create features that make machine learning algorithms work."
    return explanation

print(explain_feature_engineering())

### Concept 2: Feature Selection

Feature selection is the process of identifying and selecting a subset of relevant features for model training. This helps in reducing overfitting and improving model interpretability.

In [None]:
# Example of feature selection using filter methods
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, f_classif

# Load iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Select the top 2 features based on ANOVA F-value
selector = SelectKBest(score_func=f_classif, k=2)
X_new = selector.fit_transform(X, y)
print(X_new)

#### Micro-Exercise 2

List and describe two techniques for feature selection.

```python
# Feature selection techniques
# 1. Filter methods: Evaluate features based on statistical tests.
# 2. Wrapper methods: Use a predictive model to evaluate feature subsets.
```

In [None]:
# Micro-Exercise 2 Starter Code
# Let's define a function to list feature selection techniques

def feature_selection_techniques():
    techniques = {
        'Filter methods': 'Evaluate features based on statistical tests.',
        'Wrapper methods': 'Use a predictive model to evaluate feature subsets.'
    }
    return techniques

print(feature_selection_techniques())

## Examples

### Example 1: Creating Interaction Features for Marketing Analysis
This example demonstrates how to create interaction features to analyze the effectiveness of a marketing campaign.

```python
# Code to create interaction features
# df['interaction_feature'] = df['feature1'] * df['feature2']
```

### Example 2: Using PCA for Dimensionality Reduction in Image Classification
This example shows how to apply PCA to reduce the number of features in an image dataset while retaining essential information.

```python
# Code to apply PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
```

## Micro-exercises

### Exercise 1
Explain what feature engineering is and why it is important.

```python
# Feature engineering explanation
# Feature engineering is the process of using domain knowledge to create features that make machine learning algorithms work.
```

### Exercise 2
List and describe two techniques for feature selection.

```python
# Feature selection techniques
# 1. Filter methods: Evaluate features based on statistical tests.
# 2. Wrapper methods: Use a predictive model to evaluate feature subsets.
```

## Main Exercise: Feature Engineering Project
In this exercise, you will load a dataset, create new features, and select the most important features to improve model performance.

```python
# Load dataset
import pandas as pd
df = pd.read_csv('data.csv')

# Create new features
# Example of creating a new feature
# df['new_feature'] = df['feature1'] * df['feature2']

# Implement feature selection techniques
# from sklearn.feature_selection import SelectKBest
# selector = SelectKBest(k=2)
# X_selected = selector.fit_transform(X, y)
```

In [None]:
# Main Exercise Starter Code
# Let's implement a simple feature engineering process
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, f_classif

# Load iris dataset
iris = load_iris()
X, y = iris.data, iris.target

df = pd.DataFrame(X, columns=iris.feature_names)

# Create a new feature
# Example: Adding a new feature as the sum of two existing features

df['sum_feature'] = df['sepal length (cm)'] + df['sepal width (cm)']

# Feature selection
selector = SelectKBest(score_func=f_classif, k=2)
X_selected = selector.fit_transform(df, y)
print(X_selected)

## Common Mistakes
- Overfitting by adding too many features without validation.
- Ignoring feature importance which can lead to unnecessary complexity.

## Recap & Next Steps
In this lesson, we covered the importance of feature engineering, techniques for creating and selecting features, and practical examples. Next, we will explore model training and evaluation in AWS SageMaker.