**What is Feature Engineering?**
 
            Feature engineering is the process of creating new features or modifying existing ones to improve the performance of machine learning models. It involves techniques like feature extraction, transformation, encoding, and scaling to make data more useful for predictions.

**Why Do We Need Feature Engineering?**

1.**Improves Model Performance** – Good features help models make better predictions.
 
2.**Reduces Overfitting** – Helps eliminate noise and irrelevant data.
 
3.**Handles Missing Data** – Creates meaningful replacements for missing values.
 
4.**Enables Better Interpretability** – Makes features more understandable and useful.
5.**Reduces Dimensionality** – Helps remove unnecessary data points, making the model efficient.

In [5]:
#Extrct Date &TimeFeatures
import pandas as pd
# Sample dataset
df = pd.DataFrame({'TransactionDate': pd.to_datetime(['2025-02-05 14:30:00', '2025-02-05 18:45:00'])})

# Extract date-related features
df['DayOfWeek'] = df['TransactionDate'].dt.dayofweek  # Monday=0, Sunday=6
df['Hour'] = df['TransactionDate'].dt.hour
df['IsWeekend'] = df['TransactionDate'].apply(lambda x: 1 if x.dayofweek >= 5 else 0)  # Weekend flag

# Display the DataFrame
print(df)

#why? helps capture behavioral trends( e,g.,shooping habits on weekend vs weekdays)

      TransactionDate  DayOfWeek  Hour  IsWeekend
0 2025-02-05 14:30:00          2    14          0
1 2025-02-05 18:45:00          2    18          0


In [4]:
#aggregated fatures
#find average transaction amount per user:
import pandas as pd

df_transaction = pd.DataFrame({
    'UserID': [101, 102, 101, 103, 102, 101, 103],
    'TransactionAmount': [500, 300, 700, 600, 500, 400, 700]
})

# Aggregate: find the average transaction amount per user
df_avg_transaction = df_transaction.groupby('UserID')['TransactionAmount'].mean().reset_index()

# Rename columns for clarity
df_avg_transaction.columns = ['UserID', 'AvgTransactionAmount']

# Display the result
print(df_avg_transaction)


   UserID  AvgTransactionAmount
0     101            533.333333
1     102            400.000000
2     103            650.000000


In [7]:
# Encoding Categorical Variables
# Convert ProductCategory (Electronics, Clothing) into numerical form:

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

df = pd.DataFrame({'ProductCategory': ['Electronics', 'Clothing', 'Clothing', 'Grocery']})

encoder = OneHotEncoder(sparse_output=False)
encoded_features = encoder.fit_transform(df[['ProductCategory']])

df_encoded = pd.DataFrame(encoded_features, columns=encoder.get_feature_names_out(['ProductCategory']))
df_encoded

Unnamed: 0,ProductCategory_Clothing,ProductCategory_Electronics,ProductCategory_Grocery
0,0.0,1.0,0.0
1,1.0,0.0,0.0
2,1.0,0.0,0.0
3,0.0,0.0,1.0


In [15]:
# Log Transformation for Skewed Data
# If TransactionAmount has outliers, apply log transformation:
import numpy as np
import pandas as pd

df = pd.DataFrame({'TransactionAmount': [100, 200, 500, 10000, 20000]})
df['LogTransactionAmount'] = np.log1p(df['TransactionAmount'])  # log1p avoids log(0) issues
print(df)


   TransactionAmount  LogTransactionAmount
0                100              4.615121
1                200              5.303305
2                500              6.216606
3              10000              9.210440
4              20000              9.903538


In [16]:
# Feature Scaling
from sklearn.preprocessing import MinMaxScaler, StandardScaler

scaler = MinMaxScaler()
df['NormalizedTransactionAmount'] = scaler.fit_transform(df[['TransactionAmount']])

standard_scaler = StandardScaler()
df['StandardizedTransactionAmount'] = standard_scaler.fit_transform(df[['TransactionAmount']])

df

Unnamed: 0,TransactionAmount,LogTransactionAmount,NormalizedTransactionAmount,StandardizedTransactionAmount
0,100,4.615121,0.0,-0.768912
1,200,5.303305,0.005025,-0.756223
2,500,6.216606,0.020101,-0.718158
3,10000,9.21044,0.497487,0.487231
4,20000,9.903538,1.0,1.756062
