In [2]:
import pandas as pd

# Correct the column name and create the DataFrame
df = pd.DataFrame({
    'transactiondate': pd.to_datetime(['2025-02-05 14:30:00', '2025-02-06 18:45:00'])
})

# Extract the day of the week and assign it to the 'dayofweek' column
df['dayofweek'] = df['transactiondate'].dt.dayofweek  # 0=Monday, 6=Sunday

# Create the 'isweekend' column: 1 if it's the weekend (Saturday or Sunday), 0 otherwise
df['isweekend'] = df['dayofweek'].apply(lambda x: 1 if x >= 5 else 0)

# Display the DataFrame
df


Unnamed: 0,transactiondate,dayofweek,isweekend
0,2025-02-05 14:30:00,2,0
1,2025-02-06 18:45:00,3,0


In [3]:
import pandas as pd

# Example DataFrame with user_id and transaction_amount columns
df = pd.DataFrame({
    'user_id': [101, 102, 101, 103, 102],
    'transaction_amount': [500, 300, 700, 1000, 400]
})

# Calculate the total transaction amount per user using groupby
transaction_per_user = df.groupby('user_id')['transaction_amount'].mean().reset_index()
transaction_per_user.rename(columns={'transaction_amount':'avgtransaction'},inplace=True)

# Display the result
transaction_per_user


Unnamed: 0,user_id,avgtransaction
0,101,600.0
1,102,350.0
2,103,1000.0


In [4]:
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# Original DataFrame
df = pd.DataFrame({'productcategory': ['electronics', 'clothing', 'clothing', 'grocery']})

# Initialize OneHotEncoder (sparse_output=False to return a dense array)
encoder = OneHotEncoder(sparse_output=False)

# Fit and transform the 'productcategory' column to get the one-hot encoded features
encoded_features = encoder.fit_transform(df[['productcategory']])

# Create a DataFrame from the encoded features with proper column names
df_encoded = pd.DataFrame(encoded_features, columns=encoder.get_feature_names_out())

# Print the result
print(df_encoded)


   productcategory_clothing  productcategory_electronics  \
0                       0.0                          1.0   
1                       1.0                          0.0   
2                       1.0                          0.0   
3                       0.0                          0.0   

   productcategory_grocery  
0                      0.0  
1                      0.0  
2                      0.0  
3                      1.0  


**What is Feature Engineering?**
 
Feature engineering is the process of creating new features or modifying existing ones to improve the performance of machine learning models. It involves techniques like feature extraction, transformation, encoding, and scaling to make data more useful for predictions.



**Why Do We Need Feature Engineering?**


1.**Improves Model Performance** – Good features help models make better predictions.
 
2.**Reduces Overfitting** – Helps eliminate noise and irrelevant data.
 
3.**Handles Missing Data** – Creates meaningful replacements for missing values.
 
4.**Enables Better Interpretability** – Makes features more understandable and useful.

5.**Reduces Dimensionality** – Helps remove unnecessary data points, making the model efficient.


In [5]:
import pandas as pd
import numpy as np
df = pd.DataFrame({'transaction_amount': [100, 200, 5000, 10000, 20000]})
df['LogTransactionAmount'] = np.log1p(df['transaction_amount'])
print(df)

   transaction_amount  LogTransactionAmount
0                 100              4.615121
1                 200              5.303305
2                5000              8.517393
3               10000              9.210440
4               20000              9.903538


      **Final Summary of Feature Engineering & Imbalanced Data Handling**
 
Feature Extraction : Extract new insights from raw data (e.g., Hour, DayOfWeek)
 
Aggregated Features : Calculate meaningful statistics (e.g., AvgTransactionAmountPerUser)
 
Encoding : Convert categorical variables into numerical (One-Hot Encoding)
 
Log Transformation : Reduce skewness in data distribution
 
Feature Scaling : Normalize numerical features for better model performance
 
Downsampling: Reduce the size of the majority class
 
Upsampling : Increase the size of the minority class
 
SMOTE(Synthetic Minority Over-sampling Technique) : Generate synthetic samples for the minority class                      