# Feature Construction in Data Science

## What is Feature Construction?
- Feature construction means **creating new features** from existing data
- These new features help models **learn better patterns**
- Good features can improve **model accuracy** a lot

---

## Why Feature Construction is Important
- Raw data is often **not ready** for machine learning
- Well-made features:
  - Make patterns clearer
  - Reduce noise
  - Improve model performance
- Sometimes **better features matter more than the model**

---

## Common Types of Feature Construction

### 1. Mathematical Features
- Add, subtract, multiply, or divide existing columns
- Examples:
  - Total price = price × quantity
  - Age = current year − birth year

---

### 2. Date and Time Features
- Extract useful parts from date columns
- Examples:
  - Year, month, day
  - Day of week
  - Is weekend (yes/no)

---

### 3. Text-Based Features
- Create features from text data
- Examples:
  - Text length
  - Word count
  - Presence of keywords

---

### 4. Categorical Feature Construction
- Combine categories to make new features
- Examples:
  - Country + city → location
  - Job title → job level (junior, senior)

---

### 5. Boolean (Yes/No) Features
- Turn conditions into true/false values
- Examples:
  - Is customer new?
  - Is price above average?

---

### 6. Aggregated Features
- Summarize data using groups
- Examples:
  - Average sales per customer
  - Total orders per month

---

## Feature Construction vs Feature Selection
- Feature construction: **create new features**
- Feature selection: **choose the best features**
- Both are important and often used together

---

## Best Practices
- Start simple and add complexity slowly
- Use domain knowledge when possible
- Avoid creating too many features
- Check for data leakage
- Test feature impact using model results

---

## Common Mistakes
- Creating features that use future data
- Adding too many noisy features
- Ignoring feature meaning
- Not validating features on new data

---

## Final Thoughts
- Feature construction is both **art and science**
- Small feature changes can lead to big improvements
- Always think: *Does this feature help the model understand the data better?*

---


In [2]:
import pandas as pd 

In [4]:
# creating dataset
data = {
    'price' : [20,56,42,63],
    'quantity' : [10,6,8,9],
    "purchase_date": ["2023-01-10", "2023-02-15", "2023-03-20", "2023-04-25"]

}

df = pd.DataFrame(data)

In [5]:
df

Unnamed: 0,price,quantity,purchase_date
0,20,10,2023-01-10
1,56,6,2023-02-15
2,42,8,2023-03-20
3,63,9,2023-04-25


In [6]:
df['purchase_date'] = pd.to_datetime(df['purchase_date'])

In [7]:
df

Unnamed: 0,price,quantity,purchase_date
0,20,10,2023-01-10
1,56,6,2023-02-15
2,42,8,2023-03-20
3,63,9,2023-04-25


In [8]:
df['total_cost'] = df['price'] + df['quantity']

In [9]:
df

Unnamed: 0,price,quantity,purchase_date,total_cost
0,20,10,2023-01-10,30
1,56,6,2023-02-15,62
2,42,8,2023-03-20,50
3,63,9,2023-04-25,72
