# Feature Splitting in Data Science

## What is Feature Splitting?
- Feature splitting means **breaking one feature into multiple features**
- It helps models **understand data in more detail**
- Often used when a single column has **too much information**

---

## Why Feature Splitting is Important
- Some features contain **hidden patterns**
- Splitting makes data:
  - Easier to learn
  - More meaningful
  - Better for machine learning models

---

## Common Types of Feature Splitting

### 1. Date Feature Splitting
- Split a date into smaller parts
- Examples:
  - Date → year, month, day
  - Date → day of week
  - Date → is weekend (yes/no)

---

### 2. Text Feature Splitting
- Break text into parts
- Examples:
  - Full name → first name, last name
  - Address → city, state, country
  - Sentence → individual words

---

### 3. Categorical Feature Splitting
- Split categories into simpler groups
- Examples:
  - Product code → product type, size
  - Email → username, domain

---

### 4. Numerical Feature Splitting (Binning)
- Divide numbers into ranges
- Examples:
  - Age → child, adult, senior
  - Salary → low, medium, high

---

### 5. Boolean Feature Splitting
- Turn one feature into multiple yes/no features
- Examples:
  - Payment method → is credit card, is cash
  - Device type → is mobile, is desktop

---

## Feature Splitting vs Feature Construction
- Feature splitting: **break one feature into many**
- Feature construction: **create new features from data**
- Both aim to improve model performance

---

## Best Practices
- Split only when it adds value
- Keep features simple and meaningful
- Avoid creating too many features
- Check how the model reacts after splitting

---

## Common Mistakes
- Over-splitting features
- Creating features with very little information
- Losing original meaning of the data
- Not testing model performance

---

## Final Thoughts
- Feature splitting helps models **see patterns clearly**
- Simple splits can make a big difference
- Always ask: *Does this split help the model learn better?*

---


In [2]:
import pandas as pd

In [4]:
# creating dataset
data = {
    'price' : [20,56,42,63],
    'quantity' : [10,6,8,9],
    "purchase_date": ["2023-01-10", "2023-02-15", "2023-03-20", "2023-04-25"]

}

df = pd.DataFrame(data)

df['total_cost'] = df['price'] + df['quantity']

In [5]:
df

Unnamed: 0,price,quantity,purchase_date,total_cost
0,20,10,2023-01-10,30
1,56,6,2023-02-15,62
2,42,8,2023-03-20,50
3,63,9,2023-04-25,72


In [17]:
# splitting data into month year and day
df['purchase_date'] = pd.to_datetime(df['purchase_date'])
df['purchase_date'].dtype

df['purchase_date']
df['Year'] = df['purchase_date'].dt.year
df['Month'] = df['purchase_date'].dt.month
df['Day'] = df['purchase_date'].dt.day
df.drop(axis=1,columns='purchase_date',inplace=True)
df

Unnamed: 0,price,quantity,total_cost,Year,Month,Day
0,20,10,30,2023,1,10
1,56,6,62,2023,2,15
2,42,8,50,2023,3,20
3,63,9,72,2023,4,25
