#### 🧠 What is Feature Engineering?

**Feature engineering** is the process of:

* Selecting the most relevant variables (features)
* Creating new features from existing ones
* Transforming features to be more suitable for machine learning models

It’s the art of **making your data more understandable to the model**.

---

#### 📘 What You’ll Learn in This Course

#### 1. **Introduction to Feature Engineering**

* What is a **feature**?
* Why features matter more than the model in many cases.
* Introduction to techniques for:

  * Creating features
  * Transforming features
  * Selecting features

---

#### 2. **Baseline Model**

* Create a **baseline model** using raw features to compare improvements later.
* Example: Use a simple **Random Forest** without feature transformations.

```python
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(X_train, y_train)
```

---

#### 3. **Better Features for Tree Models**

* Tree-based models (like Random Forests, XGBoost) don’t need feature scaling.
* You’ll learn:

  * How to handle **ordinal** vs **nominal** data
  * Creating **interaction features** (e.g. multiplying or combining two columns)

Example:

```python
data['price_per_sqft'] = data['price'] / data['square_feet']
```

---

#### 4. **Numeric Transformations**

* Transform numeric features to improve model performance:

  * **Log transforms** to reduce skew
  * **Scaling** (though less useful for trees)
  * Handling outliers
  * **Clipping** extreme values

```python
import numpy as np
data['log_income'] = np.log(data['income'] + 1)
```

---

#### 5. **Categorical Variables**

* Categorical variables can be tricky. You’ll explore:

  * One-hot encoding (works well with trees)
  * Label encoding
  * **Target encoding** (advanced; encode categories based on target value)
  * Cardinality issues (too many unique categories)

```python
# One-hot encoding
data = pd.get_dummies(data, columns=['neighborhood'])
```

---

#### 6. **Feature Generation**

* How to **create new features** from dates, text, and domain knowledge.
* Examples:

  * Extracting **day of week** from a date
  * Counting **number of words** in a text
  * Binning numeric variables into categories (e.g. age groups)

```python
data['year'] = data['date'].dt.year
data['name_length'] = data['name'].apply(len)
```

---

#### 7. **Feature Selection**

* Not all features help — some hurt.
* You’ll learn to:

  * Use **correlation** and **importance scores** to drop useless features
  * Use `Permutation Importance` to see how much each feature affects model performance

```python
from sklearn.inspection import permutation_importance
results = permutation_importance(model, X_val, y_val)
```

---

#### ✅ Skills You'll Gain

* Create new, meaningful features from raw data
* Transform features for better model learning
* Use feature importance to prune or prioritize features
* Improve model accuracy by using better input variables