# Session 35: Introduction to Feature Engineering

**Unit 3: Data Collection and Cleaning**
**Hour: 35**
**Mode: Practical Lab**

---

### 1. Objective

This lab introduces **Feature Engineering**, which is often described as the most creative part of the machine learning workflow. Our goal is to create new, more informative features from the existing data in our Telco dataset.

**What is Feature Engineering?** The process of using domain knowledge to create new input variables (features) for your machine learning model. Better features often lead to better model performance more than a better algorithm does.

### 2. Setup

Let's load our Telco dataset.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)

### 3. Feature Engineering Techniques

#### 3.1. Creating Interaction Features

Sometimes, the interaction between two variables is more powerful than the variables themselves.

**Hypothesis:** The ratio of `MonthlyCharges` to `tenure` might indicate a customer's "value sensitivity". A high monthly charge for a new customer might be a high churn risk.

Let's create a `ChargePerTenure` feature. We need to be careful about `tenure` being 0, so we'll add 1 to the denominator to avoid division by zero.

In [None]:
df['ChargePerTenure'] = df['MonthlyCharges'] / (df['tenure'] + 1)

# Check the new feature for a few customers
df[['tenure', 'MonthlyCharges', 'ChargePerTenure']].head()

#### 3.2. Binning Numerical Data

Sometimes it's useful to group a continuous numerical variable into discrete bins. For example, we can convert `tenure` into categories like "New Customer", "Medium-Term Customer", and "Long-Term Customer".

We can use the `pd.cut()` function for this.

In [None]:
# Define the bin edges. -1 is used to include 0.
bins = [-1, 12, 48, 73] # (0-12 months), (13-48 months), (49-72 months)
labels = ['New Customer', 'Medium-Term Customer', 'Long-Term Customer']

df['TenureGroup'] = pd.cut(df['tenure'], bins=bins, labels=labels)

df[['tenure', 'TenureGroup']].head()

In [None]:
# Let's check the distribution of our new categorical feature
df['TenureGroup'].value_counts()

#### 3.3. Combining Categorical Features

We can create a new feature that represents the total number of additional services a customer has. This could be a proxy for how "invested" they are in the ecosystem.

Let's count how many of the following services each customer has: `OnlineSecurity`, `OnlineBackup`, `DeviceProtection`, `TechSupport`, `StreamingTV`, `StreamingMovies`.

In [None]:
service_cols = [
    'OnlineSecurity',
    'OnlineBackup',
    'DeviceProtection',
    'TechSupport',
    'StreamingTV',
    'StreamingMovies'
]

# We can count the 'Yes' values across the rows for these columns
# (df[service_cols] == 'Yes') creates a boolean DataFrame
# .sum(axis=1) sums up the True values (which equal 1) across each row

df['AdditionalServices'] = (df[service_cols] == 'Yes').sum(axis=1)

df[['AdditionalServices'] + service_cols].head()

### 4. Conclusion

In this lab, you explored the creative process of feature engineering:
1.  **Interaction Features:** Creating new features by combining existing numerical ones (e.g., ratios).
2.  **Binning:** Converting a numerical feature into a categorical one to capture broader trends.
3.  **Combining Categories:** Creating a numerical feature by counting occurrences across several categorical columns.

Good feature engineering is often the key to building a high-performing and interpretable machine learning model. It bridges the gap between raw data and business insight.

This session concludes our deep dive into data cleaning and transformation.

**Next Session:** We will begin Unit 4 by discussing the core principles of effective data visualization.