**What is Feature Engineering?**
 
            Feature engineering is the process of creating new features or modifying existing ones to improve the performance of machine learning models. It involves techniques like feature extraction, transformation, encoding, and scaling to make data more useful for predictions.
 
**Why Do We Need Feature Engineering?**
 
1.**Improves Model Performance** – Good features help models make better predictions.
 
2.**Reduces Overfitting** – Helps eliminate noise and irrelevant data.
 
3.**Handles Missing Data** – Creates meaningful replacements for missing values.
 
4.**Enables Better Interpretability** – Makes features more understandable and useful.

5.**Reduces Dimensionality** – Helps remove unnecessary data points, making the model efficient.
 

In [2]:
import pandas as pd
df=pd.DataFrame({'transactiondate':pd.to_datetime(['2025-02-05 14:30:00','2025-02-06 18:45:00'])})
df['day']=df['transactiondate'].dt.dayofweek
df['hour']=df['transactiondate'].dt.hour
df['isweekend']=df['day'].apply(lambda x:1 if x>5 else 0)
df

Unnamed: 0,transactiondate,day,hour,isweekend
0,2025-02-05 14:30:00,2,14,0
1,2025-02-06 18:45:00,3,18,0


In [5]:
df_transactions=pd.DataFrame({
    'user id':[101,102,101,103,102],
    'transactionamount':[500,300,700,1000,400]
})
df_user_avg=df_transactions.groupby('user id')['transactionamount'].mean().reset_index()
df_user_avg.rename(columns={'transactionamount':'avgtransaction'},inplace=True)
df_user_avg

Unnamed: 0,user id,avgtransaction
0,101,600.0
1,102,350.0
2,103,1000.0


In [18]:
from sklearn.preprocessing import OneHotEncoder
import pandas as pd

# Creating the DataFrame
df = pd.DataFrame({'productcategory': ['electronic', 'clothing', 'clothing', 'grocery']})

# Initialize OneHotEncoder with correct parameter
encoder = OneHotEncoder(sparse_output=False)  # Fixed parameter name

# Fit and transform the data
encoded_features = encoder.fit_transform(df[['productcategory']])

# Convert to DataFrame with correct feature names
df_encoded = pd.DataFrame(encoded_features, columns=encoder.get_feature_names_out())

# Display encoded DataFrame
df_encoded


Unnamed: 0,productcategory_clothing,productcategory_electronic,productcategory_grocery
0,0.0,1.0,0.0
1,1.0,0.0,0.0
2,1.0,0.0,0.0
3,0.0,0.0,1.0


In [8]:
import numpy as np
df=pd.DataFrame({'transamt':[100,200,5000,10000,20000]})
df['logtransamt']=np.log1p(df['transamt'])
df

Unnamed: 0,transamt,logtransamt
0,100,4.615121
1,200,5.303305
2,5000,8.517393
3,10000,9.21044
4,20000,9.903538


In [17]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler
import pandas as pd

# Sample DataFrame
df = pd.DataFrame({'transamt': [100, 500, 1000, 1500, 2000]})

# Apply MinMaxScaler
scaler = MinMaxScaler()
df['normalizedtransamt'] = scaler.fit_transform(df[['transamt']])

# Apply StandardScaler (Fixed method name)
std_scaler = StandardScaler()
df['stdtransamt'] = std_scaler.fit_transform(df[['transamt']])

# Display the transformed DataFrame
df


Unnamed: 0,transamt,normalizedtransamt,stdtransamt
0,100,0.0,-1.354113
1,500,0.210526,-0.765368
2,1000,0.473684,-0.029437
3,1500,0.736842,0.706494
4,2000,1.0,1.442425
