# Feature Construction in Machine Learning 
Feature Construction is a critical process in Machine Learning where you create new features from existing data to improve model performance. It leverages domain knowledge, data transformation, and creativity to reveal deeper patterns in the data.

### What is Feature Construction?
- Feature Construction is the process of deriving new meaningful features (also called attributes or variables) from existing raw data. These new features can make patterns easier to detect and relationships more linear or machine-readable.
- It is a subset of feature engineering focused only on creating new features.

##### Why Feature Construction?
| Benefit                    | Description                                   |
| -------------------------- | --------------------------------------------- |
| 📈 Improves accuracy       | Captures hidden relationships                 |
| 🧠 Adds domain insight     | Encodes expert knowledge                      |
| 📊 Makes patterns explicit | Exposes non-obvious correlations              |
| 🧪 Prepares for models     | Converts raw data into model-friendly formats |


## Types of Feature Construction Techniques

### 1. Mathematical Transformations
Use arithmetic operations to create new features.

🔧 Examples:
- Total_Amount = Quantity * Price
- Speed = Distance / Time
- BMI = Weight / Height^2

## 2. Date/Time Feature Extraction
Extract valuable parts from datetime objects.

Examples:
- From 2025-06-25:
- year = 2025
- month = 6
- day = 25
- weekday = Wednesday
- is_weekend = True/False
- hour (from timestamps)
- time_since_event = current_date - event_date


In [None]:
# Example In python 
df['year'] = pd.to_datetime(df['date']).dt.year
df['is_weekend'] = df['date'].dt.dayofweek >= 5

## 3. Text Feature Construction (for NLP)
Convert raw text into quantitative features.

🔧 Examples:
- Text Length
- Number of words
- Sentiment Score
- TF-IDF / Bag of Words
- Presence of keywords
- N-grams

In [None]:
# Example In Python
df['text_length'] = df['review'].apply(lambda x: len(x.split()))

## 4. Categorical Feature Combination
Combine multiple categorical variables into one.

🔧 Examples:
- Combine City and Product_Category → City_Product
- Married + Gender → Married_Male, Married_Female, etc.

In [None]:
# Example in Python
df['location_type'] = df['city'] + "_" + df['shop_type']

## 5. Boolean or Flag Features
Create binary flags to capture presence/absence of a condition.

🔧 Examples:
- is_high_income = income > 100000
- has_discount = price < original_price
- is_night = hour > 18

## 6. Aggregated/Grouped Features
Generate features from group statistics (mean, max, count).

🔧 Examples:
- Customer_avg_purchase = total_spent / num_orders
- Store_monthly_sales = sum(sales) per store per month

In [None]:
# Example in Python
df['avg_purchase'] = df.groupby('customer_id')['amount'].transform('mean')

## 7. Window/Rolling Features (Time Series)
Create lag-based and rolling statistics over a time window.

🔧 Examples:
- Rolling mean of last 3 days
- Lag values: sales_t-1, sales_t-2
- Cumulative sum or difference

In [None]:
# Example in Python
df['rolling_mean'] = df['sales'].rolling(window=3).mean()
df['lag_1'] = df['sales'].shift(1)

## 8. Binning / Bucketing Features
Convert continuous data into categorical bins.

🔧 Examples:
- Age → Child, Adult, Senior
- Income → Low, Medium, High

In [None]:
# Example in Python
df['age_group'] = pd.cut(df['age'], bins=[0, 18, 60, 100], labels=["Child", "Adult", "Senior"])

## 9. Ratio / Percentage Features
Create proportions to normalize values.

🔧 Examples:
- CTR = clicks / impressions
- Discount_Percent = (Original_Price - Sale_Price)/Original_Price


## 10. Domain-specific Feature Construction
Based on knowledge of the domain you're working in.

🔧 Examples:
- In healthcare: BMI, Heart_Rate_Category
- In finance: Debt-to-Income Ratio, Net Profit
- In eCommerce: Conversion Rate, Avg Basket Size