### Setup Google Colab

In [None]:
import os

if "COLAB_" in "".join(os.environ.keys()):
    try:
        import numpy

        get_numpy = f"numpy=={numpy.__version__}"
    except:
        get_numpy = "numpy"
    try:
        import subprocess

        is_t4 = "Tesla T4" in str(subprocess.check_output(["nvidia-smi"]))
    except:
        is_tesla_t4 = False
    get_vllm, get_triton = (
        ("vllm==0.9.2", "triton==3.2.0") if is_t4 else ("vllm", "triton")
    )
    !uv pip install --upgrade \
        "openpipe-art[backend]==0.4.8" langchain-core tenacity datasets "litellm[proxy]" "gql<4" "protobuf==5.29.5" {get_vllm} {get_numpy} --prerelease allow --no-cache-dir
    !uv pip install -qqq {get_triton}

[2mUsing Python 3.12.11 environment at: /usr[0m
[2K[2mResolved [1m241 packages[0m [2min 7.47s[0m[0m
[2mAudited [1m241 packages[0m [2min 0.23ms[0m[0m


### Task Configuration


In [None]:
# Required - Used for generating training inputs and RULER evaluation
OPENAI_API_KEY = ""
BASE_URL = "https://pd67dqn1bd.execute-api.eu-west-1.amazonaws.com"

# Optional - Enables metric logging
WANDB_API_KEY = ""

# Choose the base model to train
BASE_MODEL = "Qwen/Qwen3-0.6B"

# Model configuration
MODEL_NAME = "promo-matcher-90-v1"  # Name for your trained model
PROJECT_NAME = "prosus-assignment-colab"  # Project name for tracking

# Training configuration
TRAINING_CONFIG = {
    "num_training_inputs": 100,  # Number of training inputs to generate, 10 users
    "num_testing_inputs": 10, # Number of testing inputs
    "groups_per_step": 2,  # Inputs to process per training step
    "num_epochs": 3,  # Number of times through all data
    "rollouts_per_group": 5,  # Different responses per input (for RULER comparison), we have 10 promos so 5 is probably better than 3
    "learning_rate": 1e-5,  # Learning rate
    "max_training_steps": None,  # Maximum training steps (set to None for no limit)
}
TEMPERATURE = 0.2 # Not too random, more deterministic

NUM_TEST_INPUTS = 5  # Number of test inputs to generate
RULER_MODEL = "openai/gpt-4.1"  # Model for RULER evaluation
# SYSTEM_PROMPT_GENERATION_MODEL = "openrouter/moonshotai/kimi-k2"
# INPUT_GENERATION_MODEL = "openrouter/moonshotai/kimi-k2"

# GPU configuration (for T4 — keep these as-is unless you have a reason to change them)
MAX_SEQ_LENGTH = 4096  # Maximum sequence length
GPU_MEMORY_UTILIZATION = 0.7  # GPU memory usage (0.0-1.0)

### Task Description

In [5]:
MATCH_PROMOTION_TASK_DESCRIPTION = """
You are selecting exactly ONE restaurant promotion that maximizes expected user engagement (click/claim/order) for the provided user profile.

Use ONLY these user features for each user in your matching decision:
eat_habit, eat_time_pattern, diet, app_behavior, meal_frequency, snack_frequency,
cooking_skill, activity_level, breakfast_time, lunch_time, dinner_time,
app_usage_hours, notification_preference, goal_oriented, social_features_user.

There are 10 candidate promotions to choose from (IDs are the strings "1".."10" matching the list below):

1. Flash Feast: "Get 50% off all orders between 2 PM and 5 PM on weekdays. This promotion is designed to increase orders during typically slow periods."
2. Mystery Meal Monday: "Let us pick your dinner! Get a surprise main course from a top-rated local restaurant for a fixed price of $10."
3. Snap & Share Sunday: "Post a photo of your meal ordered through our app on social media, tag us, and use our special hashtag for a chance to win a $50 gift card. This encourages user-generated content and social media engagement."
4. Two-for-One Tuesdays: "Buy one main course and get a second one of equal or lesser value for free from participating restaurants."
5. Weekday Lunch Deal: "To attract the weekday lunch crowd, we're offering a special combo: a main dish, a side, and a drink for a discounted price."
6. Refer-a-Friend Rewards: "Invite a friend to the app and you both get $10 off your next order."
7. Themed Dinner Experience: "This week, explore the tastes of Italy with our 'Taste of Tuscany' promotion, featuring exclusive dishes and discounts from local Italian restaurants."
8. Loyalty Program Launch: "Earn points for every dollar you spend. Accumulate 500 points and receive a $15 credit. This encourages repeat business."
9. Late-Night Bites: "Craving a midnight snack? Get free delivery on all orders placed after 10 PM."
10. Family Meal Bundle: "Perfect for a family night in, get two large pizzas, a side of garlic bread, and a 2-liter soda for a special bundled price."

Decision rules:
1) Primary objective: Choose the promo that best fits the user’s habits/preferences and is most likely to be acted on.
2) Time alignment: Respect time/day windows implied by the descriptions.
3) Behavior fit: Map behavioral traits (e.g., social features, referral propensity, routine lunch orders, family needs, cuisine curiosity) to appropriate promos (e.g., #3/#6 social; #5 lunch routine; #10 family; #7 themed cuisine; #8 loyalty for repeat buyers).
4) Diet compatibility: Do not choose promotions that likely conflict with diet.
5) Avoid obviously misaligned picks (e.g., late-night offer for an early-to-bed user).

Scoring guideline (internal): time_alignment (0–3) + behavior_fit (0–3) + diet_compat (0 or 3) + social_fit (0–2 if social_features_user strong) + household_fit (0–2 for family needs). Break ties in that order.

Output requirements (STRICT):
- Return a single JSON object ONLY — no extra text, no markdown, no explanations.
- Format EXACTLY: {"promotion_id":"<1-10>","promotion_name":"<full name above>","reason":"<one sentence, ≤30 words>"}
- "promotion_id" must be the string "1".."10" corresponding to the list above.
- "promotion_name" must match the full promotion name above exactly.
- The "reason" must be concise and reference the most important aligning factors.

If multiple promotions fit, apply the scoring guideline and choose one. If none perfectly fit, select the best non-conflicting option by the same guideline.
"""
SYSTEM_PROMPT = MATCH_PROMOTION_TASK_DESCRIPTION


### Training Data Preperation

In [6]:
import pandas as pd, json

df = pd.read_csv("./fake_people.csv")

PROFILE_COLS = ["eat_habit","eat_time_pattern","diet","app_behavior","meal_frequency",
                "snack_frequency","cooking_skill","activity_level","breakfast_time",
                "lunch_time","dinner_time","app_usage_hours","notification_preference",
                "goal_oriented","social_features_user"]

records = []
training_inputs = []

for _, row in df.iterrows():
    user_id = str(row["id"])
    profile  = {k: row[k] for k in PROFILE_COLS}
    inp = json.dumps({"user_profile": profile}, ensure_ascii=False, separators=(",",":"))
    training_inputs.append(inp)
    records.append({"user_id": user_id, "input": inp})

N_TRAIN = TRAINING_CONFIG["num_training_inputs"]
N_TEST = TRAINING_CONFIG["num_testing_inputs"]
train_records = records[:N_TRAIN] # Number of training samples
test_records  = records[-N_TEST:] # Last 10 records for testing

print(train_records[0])

{'user_id': 'user_001', 'input': '{"user_profile":{"eat_habit":"Emotional eater","eat_time_pattern":"Brunch lover (10 AM-12 PM)","diet":"DASH","app_behavior":"Weekend warrior","meal_frequency":"3 times/day","snack_frequency":"Rarely","cooking_skill":"Expert","activity_level":"Moderately active","breakfast_time":"07:30","lunch_time":"14:00","dinner_time":"18:15","app_usage_hours":"Throughout day","notification_preference":"Off","goal_oriented":true,"social_features_user":false}}'}


In [10]:
# Export list of input strings from train_records
import json, os

OUTPUT_DIR = "results"
os.makedirs(OUTPUT_DIR, exist_ok=True)
output_path = os.path.join(OUTPUT_DIR, "train_records.json")

# Convert each stored JSON string into an object to avoid escaped quotes
inputs = [json.loads(r["input"]) for r in train_records]

with open(output_path, "w", encoding="utf-8") as f:
    json.dump(inputs, f, ensure_ascii=False, indent=2)

print(f"Wrote {len(inputs)} inputs to {output_path}")
output_path

Wrote 100 inputs to results\train_records.json


'results\\train_records.json'

### Training the model

In [None]:
import os
import random
from typing import List

import torch
import weave
from dotenv import load_dotenv
from litellm import acompletion
from pydantic import BaseModel, Field

import art
from art.local import LocalBackend
from art.rewards import ruler_score_group
from art.utils import iterate_dataset
from art.utils.litellm import convert_litellm_choice_to_openai

load_dotenv()

# Mute LiteLLM logging bug
import logging, os, warnings
logging.getLogger("LiteLLM").setLevel(logging.CRITICAL)
os.environ["LITELLM_LOGGING"] = "False"
os.environ["LITELLM_LOG"] = "False"
warnings.filterwarnings("ignore", message=".*'deprecated' attribute.*Field\\(\\).*")

# Required
if OPENAI_API_KEY:
    os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
else:
    raise ValueError(
        "OPENROUTER_API_KEY is required for data generation and RULER evaluation."
    )

# Optional
if WANDB_API_KEY:
    os.environ["WANDB_API_KEY"] = WANDB_API_KEY
else:
    print("WANDB_API_KEY is not set. We'll skip logging metrics to Weights & Biases.")

# =========== Model Creation Code ===========

random.seed(42)

# Declare the model
model = art.TrainableModel(
    name=MODEL_NAME,
    project=PROJECT_NAME,
    base_model=BASE_MODEL,
)

# To run on a T4, we need to override some config defaults.
if torch.cuda.get_device_properties(0).major < 8:
    model._internal_config = art.dev.InternalModelConfig(
        init_args=art.dev.InitArgs(
            max_seq_length=MAX_SEQ_LENGTH,
        ),
        engine_args=art.dev.EngineArgs(
            enforce_eager=True,
            gpu_memory_utilization=GPU_MEMORY_UTILIZATION,
        ),
    )

# Initialize the server
if torch.cuda.get_device_properties(0).major < 8:
    backend = LocalBackend(
        in_process=True,
        path="./.art",
    )
else:
    backend = LocalBackend()

# Register the model with the local Backend
await model.register(backend)

print("Model created!")
print("Base model:", BASE_MODEL)
print("Model name:", MODEL_NAME)
print("Project name:", PROJECT_NAME)

# ============ Rollout Function Code =============

if os.getenv("WANDB_API_KEY", ""):
    weave.init(PROJECT_NAME, settings={"print_call_link": False})


class TaskInput(BaseModel):
    step: int
    input_text: str
    user_id: str  # to keep track of user


@weave.op
async def rollout(model: art.Model, task_input: TaskInput) -> art.Trajectory:
    """Execute a single rollout for the custom task"""

    traj = art.Trajectory(
        reward=0.0,
        messages_and_choices=[],
        metadata={
            "step": task_input.step,
            "input": task_input.input_text,
            "user_id": task_input.user_id,
        },
    )

    # Build the conversation
    traj.messages_and_choices = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": task_input.input_text},
    ]

    # Get model response
    if model.trainable:
        litellm_model_name = f"hosted_vllm/{model.name}"
    else:
        litellm_model_name = model.name

    response = await acompletion(
        model=litellm_model_name,
        base_url=model.inference_base_url,
        api_key=model.inference_api_key,
        temperature=TEMPERATURE,
        max_tokens=4096,
        messages=traj.messages(),
        caching=False,
    )

    # Add the model's response to the trajectory
    traj.messages_and_choices.append(
        convert_litellm_choice_to_openai(response.choices[0])
    )

    return traj


print("\nRollout function defined!")


# Test RULER with example outputs for a text formalization task
test_input = "hey can u send me the report asap? thx"

base_messages = [
    {"role": "system", "content": "Convert informal text to formal business language."},
    {"role": "user", "content": test_input},
]

good_trajectory = art.Trajectory(
    messages_and_choices=[
        *base_messages,
        {
            "role": "assistant",
            "content": "Could you please send me the report at your earliest convenience? Thank you.",
        },
    ],
    reward=0,
)

mediocre_trajectory = art.Trajectory(
    messages_and_choices=[
        *base_messages,
        {"role": "assistant", "content": "Can you send me the report soon? Thanks."},
    ],
    reward=0,
)

bad_trajectory = art.Trajectory(
    messages_and_choices=[
        *base_messages,
        {"role": "assistant", "content": "hey send report quick thx"},
    ],
    reward=0,
)

sample_group = art.TrajectoryGroup(
    trajectories=[good_trajectory, mediocre_trajectory, bad_trajectory]
)

# RULER will score these based on how well they accomplish the task
# Allow ten retries in case of API rate limiting
for i in range(10):
    try:
        judged_group = await ruler_score_group(
            sample_group,
            RULER_MODEL,
            debug=True,
            extra_litellm_params={"api_base": BASE_URL}
        )
        break
    except Exception as e:
        print(f"Error scoring group: {e}")
        continue

assert judged_group is not None

# Display rankings
sorted_trajectories = sorted(
    judged_group.trajectories, key=lambda t: t.reward, reverse=True
)
for rank, traj in enumerate(sorted_trajectories, 1):
    messages = traj.messages()
    print(f"\nRank {rank}: Score {traj.reward:.3f}")
    print(f"  Response: {messages[-1]['content']}")


# ============ Training Loop =============

# Convert training inputs to TaskInput objects
training_task_inputs = [
    TaskInput(step=0, input_text=r["input"], user_id=r["user_id"])
    for r in train_records]

# Create training iterator
training_iterator = iterate_dataset(
    training_task_inputs,
    groups_per_step=TRAINING_CONFIG["groups_per_step"],
    num_epochs=TRAINING_CONFIG["num_epochs"],
    initial_step=await model.get_step(),
)

print(f"Starting training with {len(training_task_inputs)} inputs...")
print(f"Training for {TRAINING_CONFIG['num_epochs']} epoch(s)")
print(
    f"Generating {TRAINING_CONFIG['rollouts_per_group']} responses per input for RULER to compare"
)
print(
    "\nWhy multiple responses? RULER needs to compare different attempts to learn what's good!"
)

for batch in training_iterator:
    print(
        f"\nTraining step {batch.step}, epoch {batch.epoch}, epoch step {batch.epoch_step}"
    )
    print(f"Batch contains {len(batch.items)} inputs")

    # Create trajectory groups for this batch
    groups = []
    for task_input in batch.items:
        # Update step number
        task_input.step = batch.step

        # Generate multiple responses for each input (RULER will compare these)
        groups.append(
            art.TrajectoryGroup(
                (
                    rollout(model, task_input)
                    for _ in range(TRAINING_CONFIG["rollouts_per_group"])
                )
            )
        )

    # Gather all trajectory groups
    finished_groups = await art.gather_trajectory_groups(
        groups,
        pbar_desc="Generating responses",
        max_exceptions=TRAINING_CONFIG["rollouts_per_group"] * len(batch.items),
    )

    # Use RULER to score each group
    judged_groups = []
    for group in finished_groups:
        # Allow ten retries in case of API rate limiting
        judged_group = None
        for i in range(10):
            try:
                judged_group = await ruler_score_group(
                    group,
                    RULER_MODEL,
                    debug=False,
                     extra_litellm_params={"api_base": BASE_URL}
                )
                break
            except Exception as e:
                print(f"Error scoring group: {e}")
                continue
        assert judged_group is not None
        judged_groups.append(judged_group)

    # Train on the scored trajectories
    await model.delete_checkpoints()
    await model.train(
        judged_groups,
        config=art.TrainConfig(learning_rate=TRAINING_CONFIG["learning_rate"]),
        _config={"logprob_calculation_chunk_size": 8},
    )

    print(f"Completed training step {batch.step}")

    # Stop after configured steps (if limit is set)
    if (
        TRAINING_CONFIG["max_training_steps"]
        and batch.step >= TRAINING_CONFIG["max_training_steps"]
    ):
        print(
            f"Reached maximum training steps ({TRAINING_CONFIG['max_training_steps']})"
        )
        break

print("\n✅ Training completed!")

0,1
train/completion_tokens,885.7
train/entropy,0.66331
train/exception_rate,0.0
train/grad_norm,0.95362
train/independent_reward,0.0
train/loss,-0.27243
train/num_groups_submitted,2.0
train/num_groups_trainable,2.0
train/policy_loss,-0.27245
train/reward,0.683


Model created!
Base model: Qwen/Qwen3-0.6B
Model name: promo-matcher-90-v1
Project name: prosus-assignment-colab

Rollout function defined!



Rank 1: Score 1.000
  Response: Could you please send me the report at your earliest convenience? Thank you.

Rank 2: Score 0.700
  Response: Can you send me the report soon? Thanks.

Rank 3: Score 0.000
  Response: hey send report quick thx
Starting training with 100 inputs...
Training for 3 epoch(s)
Generating 5 responses per input for RULER to compare

Why multiple responses? RULER needs to compare different attempts to learn what's good!


Iterating dataset:   0%|          | 0/150 [00:00<?, ?batch/s]


Training step 0, epoch 0, epoch step 0
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

"./.art/prosus-assignment-colab/models/promo-matcher-90-v1/history.jsonl" not found


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.



tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]



Completed training step 0

Training step 1, epoch 0, epoch step 1
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]



No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0000
Packed 10 trajectories into 7 sequences of length 6144


train:   0%|          | 0/7 [00:00<?, ?it/s]

Completed training step 1

Training step 2, epoch 0, epoch step 2
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]



No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0001
Packed 9 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 2

Training step 3, epoch 0, epoch step 3
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]



No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0002
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 3

Training step 4, epoch 0, epoch step 4
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]



No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0003
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 4

Training step 5, epoch 0, epoch step 5
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]



No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0004
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 5

Training step 6, epoch 0, epoch step 6
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]



No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0005
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 6

Training step 7, epoch 0, epoch step 7
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0006
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 7

Training step 8, epoch 0, epoch step 8
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0007
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 8

Training step 9, epoch 0, epoch step 9
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0008
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 9

Training step 10, epoch 0, epoch step 10
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0009
Packed 9 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 10

Training step 11, epoch 0, epoch step 11
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0010
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 11

Training step 12, epoch 0, epoch step 12
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0011
Packed 10 trajectories into 3 sequences of length 6144


train:   0%|          | 0/3 [00:00<?, ?it/s]

Completed training step 12

Training step 13, epoch 0, epoch step 13
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0012
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 13

Training step 14, epoch 0, epoch step 14
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0013
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 14

Training step 15, epoch 0, epoch step 15
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0014
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 15

Training step 16, epoch 0, epoch step 16
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0015
Packed 10 trajectories into 3 sequences of length 6144


train:   0%|          | 0/3 [00:00<?, ?it/s]

Completed training step 16

Training step 17, epoch 0, epoch step 17
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0016
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 17

Training step 18, epoch 0, epoch step 18
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0017
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 18

Training step 19, epoch 0, epoch step 19
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0018
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 19

Training step 20, epoch 0, epoch step 20
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0019
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 20

Training step 21, epoch 0, epoch step 21
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0020
Packed 9 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 21

Training step 22, epoch 0, epoch step 22
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0021
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 22

Training step 23, epoch 0, epoch step 23
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0022
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 23

Training step 24, epoch 0, epoch step 24
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0023
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 24

Training step 25, epoch 0, epoch step 25
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0024
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 25

Training step 26, epoch 0, epoch step 26
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0025
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 26

Training step 27, epoch 0, epoch step 27
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0026
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 27

Training step 28, epoch 0, epoch step 28
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0027
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 28

Training step 29, epoch 0, epoch step 29
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0028
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 29

Training step 30, epoch 0, epoch step 30
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0029
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 30

Training step 31, epoch 0, epoch step 31
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0030
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 31

Training step 32, epoch 0, epoch step 32
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0031
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 32

Training step 33, epoch 0, epoch step 33
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0032
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 33

Training step 34, epoch 0, epoch step 34
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0033
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 34

Training step 35, epoch 0, epoch step 35
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0034
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 35

Training step 36, epoch 0, epoch step 36
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0035
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 36

Training step 37, epoch 0, epoch step 37
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0036
Packed 10 trajectories into 7 sequences of length 6144


train:   0%|          | 0/7 [00:00<?, ?it/s]

Completed training step 37

Training step 38, epoch 0, epoch step 38
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0037
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 38

Training step 39, epoch 0, epoch step 39
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0038
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 39

Training step 40, epoch 0, epoch step 40
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0039
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 40

Training step 41, epoch 0, epoch step 41
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0040
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 41

Training step 42, epoch 0, epoch step 42
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0041
Packed 10 trajectories into 3 sequences of length 6144


train:   0%|          | 0/3 [00:00<?, ?it/s]

Completed training step 42

Training step 43, epoch 0, epoch step 43
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0042
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 43

Training step 44, epoch 0, epoch step 44
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0043
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 44

Training step 45, epoch 0, epoch step 45
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0044
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 45

Training step 46, epoch 0, epoch step 46
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0045
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 46

Training step 47, epoch 0, epoch step 47
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0046
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 47

Training step 48, epoch 0, epoch step 48
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0047
Packed 10 trajectories into 7 sequences of length 2048


train:   0%|          | 0/7 [00:00<?, ?it/s]

Completed training step 48

Training step 49, epoch 0, epoch step 49
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0048
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 49

Training step 50, epoch 1, epoch step 0
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0049
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 50

Training step 51, epoch 1, epoch step 1
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0050
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 51

Training step 52, epoch 1, epoch step 2
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0051
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 52

Training step 53, epoch 1, epoch step 3
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0052
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 53

Training step 54, epoch 1, epoch step 4
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0053
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 54

Training step 55, epoch 1, epoch step 5
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0054
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 55

Training step 56, epoch 1, epoch step 6
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0055
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 56

Training step 57, epoch 1, epoch step 7
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0056
Packed 9 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 57

Training step 58, epoch 1, epoch step 8
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0057
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 58

Training step 59, epoch 1, epoch step 9
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0058
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 59

Training step 60, epoch 1, epoch step 10
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0059
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 60

Training step 61, epoch 1, epoch step 11
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0060
Packed 9 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 61

Training step 62, epoch 1, epoch step 12
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0061
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 62

Training step 63, epoch 1, epoch step 13
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0062
Packed 9 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 63

Training step 64, epoch 1, epoch step 14
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0063
Packed 9 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 64

Training step 65, epoch 1, epoch step 15
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0064
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 65

Training step 66, epoch 1, epoch step 16
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0065
Packed 10 trajectories into 8 sequences of length 6144


train:   0%|          | 0/8 [00:00<?, ?it/s]

Completed training step 66

Training step 67, epoch 1, epoch step 17
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0066
Packed 10 trajectories into 8 sequences of length 6144


train:   0%|          | 0/8 [00:00<?, ?it/s]

Completed training step 67

Training step 68, epoch 1, epoch step 18
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0067
Packed 9 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 68

Training step 69, epoch 1, epoch step 19
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0068
Packed 10 trajectories into 7 sequences of length 6144


train:   0%|          | 0/7 [00:00<?, ?it/s]

Completed training step 69

Training step 70, epoch 1, epoch step 20
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0069
Packed 10 trajectories into 9 sequences of length 6144


train:   0%|          | 0/9 [00:00<?, ?it/s]

Completed training step 70

Training step 71, epoch 1, epoch step 21
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0070
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 71

Training step 72, epoch 1, epoch step 22
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0071
Packed 9 trajectories into 7 sequences of length 6144


train:   0%|          | 0/7 [00:00<?, ?it/s]

Completed training step 72

Training step 73, epoch 1, epoch step 23
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0072
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 73

Training step 74, epoch 1, epoch step 24
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0073
Packed 10 trajectories into 10 sequences of length 6144


train:   0%|          | 0/10 [00:00<?, ?it/s]

Completed training step 74

Training step 75, epoch 1, epoch step 25
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0074
Packed 10 trajectories into 9 sequences of length 6144


train:   0%|          | 0/9 [00:00<?, ?it/s]

Completed training step 75

Training step 76, epoch 1, epoch step 26
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0075
Packed 10 trajectories into 10 sequences of length 6144


train:   0%|          | 0/10 [00:00<?, ?it/s]

Completed training step 76

Training step 77, epoch 1, epoch step 27
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0076
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 77

Training step 78, epoch 1, epoch step 28
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0077
Packed 10 trajectories into 9 sequences of length 6144


train:   0%|          | 0/9 [00:00<?, ?it/s]

Completed training step 78

Training step 79, epoch 1, epoch step 29
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0078
Packed 10 trajectories into 9 sequences of length 6144


train:   0%|          | 0/9 [00:00<?, ?it/s]

Completed training step 79

Training step 80, epoch 1, epoch step 30
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0079
Packed 10 trajectories into 10 sequences of length 6144


train:   0%|          | 0/10 [00:00<?, ?it/s]

Completed training step 80

Training step 81, epoch 1, epoch step 31
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0080
Packed 10 trajectories into 8 sequences of length 6144


train:   0%|          | 0/8 [00:00<?, ?it/s]

Completed training step 81

Training step 82, epoch 1, epoch step 32
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0081
Packed 9 trajectories into 7 sequences of length 6144


train:   0%|          | 0/7 [00:00<?, ?it/s]

Completed training step 82

Training step 83, epoch 1, epoch step 33
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0082
Packed 10 trajectories into 8 sequences of length 6144


train:   0%|          | 0/8 [00:00<?, ?it/s]

Completed training step 83

Training step 84, epoch 1, epoch step 34
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0083
Packed 10 trajectories into 9 sequences of length 6144


train:   0%|          | 0/9 [00:00<?, ?it/s]

Completed training step 84

Training step 85, epoch 1, epoch step 35
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0084
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 85

Training step 86, epoch 1, epoch step 36
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0085
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 86

Training step 87, epoch 1, epoch step 37
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0086
Packed 10 trajectories into 8 sequences of length 6144


train:   0%|          | 0/8 [00:00<?, ?it/s]

Completed training step 87

Training step 88, epoch 1, epoch step 38
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0087
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 88

Training step 89, epoch 1, epoch step 39
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0088
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 89

Training step 90, epoch 1, epoch step 40
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0089
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 90

Training step 91, epoch 1, epoch step 41
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0090
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 91

Training step 92, epoch 1, epoch step 42
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0091
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 92

Training step 93, epoch 1, epoch step 43
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0092
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 93

Training step 94, epoch 1, epoch step 44
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0093
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 94

Training step 95, epoch 1, epoch step 45
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0094
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 95

Training step 96, epoch 1, epoch step 46
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0095
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 96

Training step 97, epoch 1, epoch step 47
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0096
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 97

Training step 98, epoch 1, epoch step 48
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0097
Packed 10 trajectories into 7 sequences of length 4096


train:   0%|          | 0/7 [00:00<?, ?it/s]

Completed training step 98

Training step 99, epoch 1, epoch step 49
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0098
Packed 10 trajectories into 8 sequences of length 6144


train:   0%|          | 0/8 [00:00<?, ?it/s]

Completed training step 99

Training step 100, epoch 2, epoch step 0
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0099
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 100

Training step 101, epoch 2, epoch step 1
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0100
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 101

Training step 102, epoch 2, epoch step 2
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0101
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 102

Training step 103, epoch 2, epoch step 3
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0102
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 103

Training step 104, epoch 2, epoch step 4
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0103
Packed 10 trajectories into 3 sequences of length 6144


train:   0%|          | 0/3 [00:00<?, ?it/s]

Completed training step 104

Training step 105, epoch 2, epoch step 5
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0104
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 105

Training step 106, epoch 2, epoch step 6
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0105
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 106

Training step 107, epoch 2, epoch step 7
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0106
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 107

Training step 108, epoch 2, epoch step 8
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0107
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 108

Training step 109, epoch 2, epoch step 9
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0108
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 109

Training step 110, epoch 2, epoch step 10
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0109
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 110

Training step 111, epoch 2, epoch step 11
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0110
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 111

Training step 112, epoch 2, epoch step 12
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0111
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 112

Training step 113, epoch 2, epoch step 13
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0112
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 113

Training step 114, epoch 2, epoch step 14
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0113
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 114

Training step 115, epoch 2, epoch step 15
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0114
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 115

Training step 116, epoch 2, epoch step 16
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0115
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 116

Training step 117, epoch 2, epoch step 17
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0116
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 117

Training step 118, epoch 2, epoch step 18
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0117
Packed 10 trajectories into 7 sequences of length 6144


train:   0%|          | 0/7 [00:00<?, ?it/s]

Completed training step 118

Training step 119, epoch 2, epoch step 19
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0118
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 119

Training step 120, epoch 2, epoch step 20
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0119
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 120

Training step 121, epoch 2, epoch step 21
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0120
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 121

Training step 122, epoch 2, epoch step 22
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0121
Packed 10 trajectories into 7 sequences of length 4096


train:   0%|          | 0/7 [00:00<?, ?it/s]

Completed training step 122

Training step 123, epoch 2, epoch step 23
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0122
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 123

Training step 124, epoch 2, epoch step 24
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0123
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 124

Training step 125, epoch 2, epoch step 25
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0124
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 125

Training step 126, epoch 2, epoch step 26
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0125
Packed 10 trajectories into 8 sequences of length 4096


train:   0%|          | 0/8 [00:00<?, ?it/s]

Completed training step 126

Training step 127, epoch 2, epoch step 27
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0126
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 127

Training step 128, epoch 2, epoch step 28
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0127
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 128

Training step 129, epoch 2, epoch step 29
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0128
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 129

Training step 130, epoch 2, epoch step 30
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0129
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 130

Training step 131, epoch 2, epoch step 31
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0130
Packed 10 trajectories into 10 sequences of length 2048


train:   0%|          | 0/10 [00:00<?, ?it/s]

Completed training step 131

Training step 132, epoch 2, epoch step 32
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0131
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 132

Training step 133, epoch 2, epoch step 33
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0132
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 133

Training step 134, epoch 2, epoch step 34
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0133
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 134

Training step 135, epoch 2, epoch step 35
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0134
Packed 10 trajectories into 4 sequences of length 4096


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 135

Training step 136, epoch 2, epoch step 36
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0135
Packed 10 trajectories into 6 sequences of length 4096


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 136

Training step 137, epoch 2, epoch step 37
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0136
Packed 10 trajectories into 3 sequences of length 6144


train:   0%|          | 0/3 [00:00<?, ?it/s]

Completed training step 137

Training step 138, epoch 2, epoch step 38
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0137
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 138

Training step 139, epoch 2, epoch step 39
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0138
Packed 10 trajectories into 3 sequences of length 6144


train:   0%|          | 0/3 [00:00<?, ?it/s]

Completed training step 139

Training step 140, epoch 2, epoch step 40
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0139
Packed 10 trajectories into 10 sequences of length 2048


train:   0%|          | 0/10 [00:00<?, ?it/s]

Completed training step 140

Training step 141, epoch 2, epoch step 41
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0140
Packed 10 trajectories into 3 sequences of length 6144


train:   0%|          | 0/3 [00:00<?, ?it/s]

Completed training step 141

Training step 142, epoch 2, epoch step 42
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0141
Packed 10 trajectories into 7 sequences of length 4096


train:   0%|          | 0/7 [00:00<?, ?it/s]

Completed training step 142

Training step 143, epoch 2, epoch step 43
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0142
Packed 10 trajectories into 5 sequences of length 6144


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 143

Training step 144, epoch 2, epoch step 44
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0143
Packed 10 trajectories into 4 sequences of length 6144


train:   0%|          | 0/4 [00:00<?, ?it/s]

Completed training step 144

Training step 145, epoch 2, epoch step 45
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0144
Packed 10 trajectories into 6 sequences of length 6144


train:   0%|          | 0/6 [00:00<?, ?it/s]

Completed training step 145

Training step 146, epoch 2, epoch step 46
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0145
Packed 10 trajectories into 5 sequences of length 4096


train:   0%|          | 0/5 [00:00<?, ?it/s]

Completed training step 146

Training step 147, epoch 2, epoch step 47
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0146
Packed 10 trajectories into 3 sequences of length 6144


train:   0%|          | 0/3 [00:00<?, ?it/s]

Completed training step 147

Training step 148, epoch 2, epoch step 48
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0147
Packed 10 trajectories into 3 sequences of length 4096


train:   0%|          | 0/3 [00:00<?, ?it/s]

Completed training step 148

Training step 149, epoch 2, epoch step 49
Batch contains 2 inputs


Generating responses:   0%|          | 0/10 [00:00<?, ?it/s]

No "val/reward" metric found in history
Deleted checkpoint ./.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0148
Packed 5 trajectories into 3 sequences of length 4096


train:   0%|          | 0/3 [00:00<?, ?it/s]

Completed training step 149

✅ Training completed!


### Download the LoRa checkpoint

In [None]:
# Path to your checkpoint folder (adjust if needed)
CKPT_DIR = "/root/.cache/huggingface/hub/models--unsloth--qwen3-0.6b-bnb-4bit"

# Create a compressed archive
!tar -czf /content/promo-matcher-base-model.tar.gz -C "$(dirname "$CKPT_DIR")" "$(basename "$CKPT_DIR")"

# Download to your machine
from google.colab import files
files.download("/content/promo-matcher-base-model.tar.gz")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Model Evaluation

In [None]:
import pandas as pd, json

df = pd.read_csv("./fake_people_shuffled.csv")

PROFILE_COLS = ["eat_habit","eat_time_pattern","diet","app_behavior","meal_frequency",
                "snack_frequency","cooking_skill","activity_level","breakfast_time",
                "lunch_time","dinner_time","app_usage_hours","notification_preference",
                "goal_oriented","social_features_user"]

records = []
training_inputs = []

for _, row in df.iterrows():
    user_id = str(row["id"])
    profile  = {k: row[k] for k in PROFILE_COLS}
    inp = json.dumps({"user_profile": profile}, ensure_ascii=False, separators=(",",":"))
    training_inputs.append(inp)
    records.append({"user_id": user_id, "input": inp})

N_TRAIN = TRAINING_CONFIG["num_training_inputs"]
test_records = records[:N_TRAIN] # Number of training samples
print(test_records[0])

{'user_id': 'user_001', 'input': '{"user_profile":{"eat_habit":"Mindful eater","eat_time_pattern":"Standard (7-9 AM breakfast)","diet":"Paleo","app_behavior":"Quick checker","meal_frequency":"3 times/day","snack_frequency":"Rarely","cooking_skill":"Expert","activity_level":"Lightly active","breakfast_time":"07:00","lunch_time":"14:15","dinner_time":"19:30","app_usage_hours":"Night","notification_preference":"High","goal_oriented":false,"social_features_user":true}}'}


In [None]:
import pandas as pd
import art
from art.local import LocalBackend

# --- config ---
BASE_MODEL_NAME = BASE_MODEL
OUTPUT_CSV = "/content/promo_comparison_v0_all_shuffled.csv"
MAX_TOKENS = 2048
TEMPERATURE = 0.2
TIMEOUT = 60

def make_ask(client):
    async def ask(model_name: str, system_text: str, user_text: str) -> str:
        resp = await client.chat.completions.create(
            model=model_name,
            messages=[
                {"role": "system", "content": system_text},
                {"role": "user", "content": user_text},
            ],
            max_tokens=MAX_TOKENS,
            temperature=TEMPERATURE,
            timeout=TIMEOUT,
        )
        return (resp.choices[0].message.content or "").strip()
    return ask

# ===== PASS 1: BASE MODEL =====
base_rows = []
with LocalBackend() as backend:                         # starts server; auto-shutdown on exit
    base_model = art.TrainableModel(
        name="qwen-base",
        project="promo-matcher-comparison",
        base_model=BASE_MODEL_NAME,
    )
    # must register before using the OpenAI-compatible client
    await base_model.register(backend)                  # sets up routing to this backend
    base_client = base_model.openai_client()            # OpenAI-style client bound to this backend
    ask_base = make_ask(base_client)

    for i, r in enumerate(test_records):
        user_text = r["input"]
        user_id = r.get("user_id", "")
        base_resp = await ask_base(base_model.name, SYSTEM_PROMPT, user_text)
        print(f"Testing base model {i}")
        print(base_resp)
        base_rows.append({
            "id": i,
            "user_id": user_id,
            "input": user_text,
            "base_response": base_resp,
        })

base_df = pd.DataFrame(base_rows)

# ===== PASS 2: FINE-TUNED MODEL =====
ft_rows = []
with LocalBackend() as backend:                         # new clean backend for the FT pass
    ft_model = art.TrainableModel(
        name="promo-matcher-90-v0",
        project="promo-matcher-comparison",
        base_model="/content/.art/prosus-assignment-colab/models/promo-matcher-90-v1/checkpoints/0150",     # your local LoRA adapter dir
    )
    await ft_model.register(backend)
    ft_client = ft_model.openai_client()
    ask_ft = make_ask(ft_client)

    for i, r in enumerate(test_records):
        ft_resp = await ask_ft(ft_model.name, SYSTEM_PROMPT, r["input"])
        print(f"Testing ft model {i}")
        print(ft_resp)
        ft_rows.append({"id": i, "ft_response": ft_resp})

ft_df = pd.DataFrame(ft_rows)

# ===== MERGE & SAVE =====
out = base_df.merge(ft_df, on="id", how="left")
out.to_csv(OUTPUT_CSV, index=False)
print(f"Saved: {OUTPUT_CSV}")

Testing base model 0
<think>
Okay, let's see. The user has a specific profile, and I need to pick the best promotion. Let me start by checking the given features.

First, the user's habits: they have a mindful eater, eat_time_pattern is standard (7-9 AM breakfast), diet is Paleo, app_behavior is quick checker, meal_frequency is 3 times/day, snack_frequency is rare, cooking_skill is expert, activity_level is light, breakfast_time is 7:00, lunch 14:15, dinner 19:30, app_usage_hours is night, notification_preference is high, goal_oriented is false, social_features_user is true.

Now, looking at the promotions:

1. Flash Feast: "Get 50% off all orders between 2 PM and 5 PM on weekdays. This promotion is designed to increase orders during typically slow periods." Time window is 2 PM to 5 PM, but the user's time is 7-9 AM. Not matching.

2. Mystery Meal Monday: "Let us pick your dinner! Get a surprise main course from a top-rated local restaurant for a fixed price of $10." Themed dinner, but

ERROR:asyncio:Task was destroyed but it is pending!
task: <Task cancelling name='Task-7192' coro=<Event.wait() running at /usr/lib/python3.12/asyncio/locks.py:212> wait_for=<Future cancelled>>


ERROR:asyncio:Exception in callback LocalBackend._prepare_backend_for_training.<locals>.done_callback(<Task finishe...> is closed')>) at /usr/local/lib/python3.12/dist-packages/art/local/backend.py:278
handle: <Handle LocalBackend._prepare_backend_for_training.<locals>.done_callback(<Task finishe...> is closed')>) at /usr/local/lib/python3.12/dist-packages/art/local/backend.py:278>
Traceback (most recent call last):
  File "/usr/lib/python3.12/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.12/dist-packages/art/local/backend.py", line 279, in done_callback
    close_proxy(self._services.pop(model.name))
  File "/usr/local/lib/python3.12/dist-packages/mp_actors/move.py", line 60, in close_proxy
    getattr(proxy, "close", lambda: None)()
  File "/usr/local/lib/python3.12/dist-packages/mp_actors/move.py", line 209, in close
    self._responses.put_nowait(Response(_SHUTDOWN_ID, None, None))
  File "/usr/lib/python3.12/

Testing ft model 0
<think>
Okay, let's see. The user has a specific profile with certain habits and preferences. The goal is to select the promotion that maximizes expected user engagement. Let me check the user's data again.

The user's eat_habit is a mindful eater, eat_time_pattern is standard (7-9 AM breakfast), diet is Paleo, app_behavior is quick checker, meal_frequency is 3 times/day, snack_frequency is rarely, cooking_skill is expert, activity_level is lightly active, breakfast_time is 07:00, lunch_time is 14:15, dinner_time is 19:30, app_usage_hours is night, notification_preference is high, goal_oriented is false, social_features_user is true.

Now, looking at the promotions. Let's check each one. 

Promotion 1: Flash Feast is about getting 50% off between 2 PM and 5 PM. The user's time is 07:00, so that's late, maybe not. 

Promotion 2: Mystery Meal Monday is a dinner promotion. The user's diet is Paleo, so maybe that's a fit. 

Promotion 3: Snap & Share Sunday is posting pho