## Feature Engineering


Feature engineering is the process of **creating or transforming features** to improve how well a machine learning model can learn from data.  

Raw datasets often don’t contain the most informative representations directly — they may need transformation, grouping, or encoding before they can be used effectively.  

In this notebook, we will demonstrate **how new features can be created from existing ones** in the Adult Income dataset to make it more suitable for predictive modeling.  


Before engineering new features, let’s quickly look at what the dataset looks like 

In [None]:
from sklearn.datasets import fetch_openml
import pandas as pd

adult = fetch_openml(name="adult", version=2, as_frame=True)

df_adult = adult.frame
df_adult=df_adult.drop(columns=['fnlwgt'])
df_adult=df_adult.drop_duplicates()
print(df_adult.head())

Lets create a feature called **age-group** wherein we bucket ages intro categories like young, middle aged and old

In [None]:
print(df_adult['age'].min(), df_adult['age'].max())

I'm going to create age divisions as follows:
- Teens: 17-19 
- Young Adults: 20-29 
- Adults: 30-49 
- Middle-Aged Adults: 50-64 
- Seniors: 65+

In [None]:
import pandas as pd
bins=[17,19,29,49,64,float('inf')]

labels=['Teens','Young Adults','Adults','Middle-Aged Adults','Seniors']
df_adult['age_group']=pd.cut(df_adult['age'], bins=bins, labels=labels,right=True, include_lowest=True)
print(df_adult.head())

Next lets create a feature called capital flag which will hold true for people with captial-gain>0 and false for capital-gain<0

In [None]:
df_adult['capital_flag'] = df_adult['capital-gain'].apply(lambda x: 1 if x > 0 else 0)
print(df_adult.head())

Lets Simplify the marital-status into married vs not married instead of multiple categories

In [None]:
df_adult['marital-status'].value_counts()

In [None]:
import numpy as np

df_adult['marital-status'] = np.where(
    df_adult['marital-status'].str.contains('Married'), 
    'Married', 
    'Not-Married'
)

In [None]:
print(df_adult.head())