**What is Feature Engineering?**

Feature engineering is the process of creating new features or modifying existing ones to improve the performance of machine learning models. It involves techniques like feature extraction, transformation, encoding, and scaling to make data more useful for predictions.

**Why Do We Need Feature Engineering?**

1.**Improves Model Performance** – Good features help models make better predictions.
 
2.**Reduces Overfitting** – Helps eliminate noise and irrelevant data.
 
3.**Handles Missing Data** – Creates meaningful replacements for missing values.
 
4.**Enables Better Interpretability** – Makes features more understandable and useful. 

5.**Reduces Dimensionality** – Helps remove unnecessary data points, making the model efficient.

In [2]:
import pandas as pd
df=pd.DataFrame({'Transaction_Date':pd.to_datetime(['2025-02-05 14:30:00','2025-02-06 18:45:00'])})
df['DayofWeek']=df['Transaction_Date'].dt.dayofweek
df['Hour']=df['Transaction_Date'].dt.hour
df['IsWeekend']=df['DayofWeek'].apply(lambda x: 1 if x>=5 else 0)
print(df)

     Transaction_Date  DayofWeek  Hour  IsWeekend
0 2025-02-05 14:30:00          2    14          0
1 2025-02-06 18:45:00          3    18          0


In [4]:
df_transactions=pd.DataFrame({
    'UserID':[101,102,101,103,102],
    'TransactionAmount':[500,300,700,1000,400]
})
df_user_avg=df_transactions.groupby('UserID')['TransactionAmount'].mean().reset_index()
df_user_avg.rename(columns={'TransactionAmount':'AvgTransactionAmount'},inplace=True)
print(df_user_avg)

   UserID  AvgTransactionAmount
0     101                 600.0
1     102                 350.0
2     103                1000.0


In [11]:
pip install --upgrade scikit-learn





In [13]:
from sklearn.preprocessing import OneHotEncoder
import pandas as pd
df = pd.DataFrame({'ProductCategory': ['Electronics', 'Clothing', 'Clothing', 'Grocery']})
encoded = OneHotEncoder(sparse_output=False)
encoded_features = encoded.fit_transform(df[['ProductCategory']])
df_encoded = pd.DataFrame(encoded_features, columns=encoded.get_feature_names_out(['ProductCategory']))
print(df_encoded)

   ProductCategory_Clothing  ProductCategory_Electronics  \
0                       0.0                          1.0   
1                       1.0                          0.0   
2                       1.0                          0.0   
3                       0.0                          0.0   

   ProductCategory_Grocery  
0                      0.0  
1                      0.0  
2                      0.0  
3                      1.0  


In [15]:
#Log Transformation for Skewed data
import numpy as np
df=pd.DataFrame({'TransactionAmount':[100,200,5000,10000,20000]})
df['LogTransactionAmount']=np.log1p(df['TransactionAmount'])
print(df)

   TransactionAmount  LogTransactionAmount
0                100              4.615121
1                200              5.303305
2               5000              8.517393
3              10000              9.210440
4              20000              9.903538


In [16]:
#Feature Scaling
from sklearn.preprocessing import MinMaxScaler,StandardScaler
#Normalization -> Range is 0-1
scaler=MinMaxScaler()
df['NormalizedTransactionAmount']=scaler.fit_transform(df[['TransactionAmount']])
#standarzation -> Mean=0 and SD=1
standard_scaler=StandardScaler()
df['StandardizedTransactionAmount']=standard_scaler.fit_transform(df[['TransactionAmount']])
print(df)

   TransactionAmount  LogTransactionAmount  NormalizedTransactionAmount  \
0                100              4.615121                     0.000000   
1                200              5.303305                     0.005025   
2               5000              8.517393                     0.246231   
3              10000              9.210440                     0.497487   
4              20000              9.903538                     1.000000   

   StandardizedTransactionAmount  
0                      -0.937070  
1                      -0.923606  
2                      -0.277351  
3                       0.395831  
4                       1.742196  
