# Data Tutorial

This notebook demonstrates how to use the `src/meli_ads/data` module to load and transform the dataset.

In [1]:
import sys
import os
from pathlib import Path

# Ensure src is in python path
project_root = Path(os.getcwd()).parents[0]
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

from src.meli_ads.data import MeliChallengeDataset
from src.meli_ads.data.transforms import HistoryFeatureExtractor

## 1. Initialize Dataset

We initialize the dataset pointing to the raw data directory.

In [2]:
# Initialize dataset with the transformer
dataset = MeliChallengeDataset(
    data_dir='../data/raw',
    transform=HistoryFeatureExtractor()
)

## 2. Load Data

Load the training data (first 1000 rows for speed).

In [3]:
df_train = dataset.load_train(nrows=1000)
df_train.head()

Loading train data from ../data/raw/train_dataset.jl...


Unnamed: 0,user_history,item_bought
0,"[{'event_info': 1786148, 'event_timestamp': '2...",1748830
1,"[{'event_info': 643652, 'event_timestamp': '20...",228737
2,"[{'event_info': 248595, 'event_timestamp': '20...",1909110
3,"[{'event_info': 'RADIOBOSS', 'event_timestamp'...",1197370
4,"[{'event_info': 'AMAZFIT BIP', 'event_timestam...",2049207


## 3. Apply Transforms

The `load_train` returns the raw pandas DataFrame. 
To get a **transformed** example (with features extracted), we use `get_example(index)`.

In [4]:
# Get the 0-th example, transformed
example = dataset.get_example(0, dataset='train')

print("Transformed Keys:", example.keys())
print("Num Events:", example['num_events'])
print("Item Bought:", example['item_bought'])

Transformed Keys: dict_keys(['user_history', 'item_bought', 'num_events', 'num_views', 'num_searches', 'last_viewed_item'])
Num Events: 19
Item Bought: 1748830
