## Transaction Date Features

We extract several components from the `Transaction Date` column to enable time-based analysis and grouping.


In [42]:
import pandas as pd 
df = pd.read_csv("../data/prep/clean_dataset.csv", parse_dates=["Transaction Date"])

In [43]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9540 entries, 0 to 9539
Data columns (total 8 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Transaction ID    9540 non-null   object        
 1   Item              9540 non-null   object        
 2   Quantity          9540 non-null   float64       
 3   Price Per Unit    9540 non-null   float64       
 4   Total Spent       9540 non-null   float64       
 5   Payment Method    9540 non-null   object        
 6   Location          9540 non-null   object        
 7   Transaction Date  9540 non-null   datetime64[ns]
dtypes: datetime64[ns](1), float64(3), object(4)
memory usage: 596.4+ KB


In [44]:
df['Transaction Year'] = df['Transaction Date'].dt.year
df['Transaction Month'] = df['Transaction Date'].dt.month
df['Transaction Day'] = df['Transaction Date'].dt.day
df['Transaction Weekday'] = df['Transaction Date'].dt.weekday  # 0=Monday
df['Is Weekend'] = df['Transaction Weekday'].isin([5, 6])

## Price and Item-Based Features

We calculate unit prices, classify spending levels, and categorize items into broader food/drink categories. This helps in customer segmentation and item preference analysis.

In [None]:
df["Calculated Price per Unit"] = df["Total Spent"] / df["Quantity"]

# Create categories based on the spend
df["Spend Level"] = pd.cut(df["Total Spent"], bins=[0, 8, 10, 20, 40], labels=["Low", "Medium", "High", "Very High"])

# Create category based on the Item
df["Item Category"] = df["Item"].str.lower().map({
       "coffee": "Drink",
       "juice": "Drink",
       "tea": "Drink",
       "smoothie": "Drink",

       "cake": "Food",
       "cookie": "Food",
       "salad": "Food",
       "sandwich": "Food",
})

## Save prep dataset

In [51]:
import os

target_path = "../data/features"
if not os.path.exists(target_path):
    os.makedirs(target_path)

df.to_csv(f"{target_path}/prep_dataset.csv", index=False)