Quick sanity-check for prompt construction and gold JSON formatting.

- Loads a few examples from the NYT Connections dataset (Hugging Face `datasets`).
- Builds the teacher/student prompt using `src/prompts.py`.
- Builds the gold answer JSON (categories + words).

> If the dataset name/config differs on your machine, change `DATASET_NAME` and/or `DATASET_CONFIG` below.

In [1]:
# If running from repo root, this adds ./src to sys.path
import sys
from pathlib import Path

ROOT = Path.cwd()
SRC = ROOT / "src"
if str(SRC) not in sys.path:
    sys.path.insert(0, str(SRC))

print("Repo root:", ROOT)
print("Using src:", SRC)


Repo root: c:\Users\cola0\Desktop\nlp.project-colangelo-2526\src
Using src: c:\Users\cola0\Desktop\nlp.project-colangelo-2526\src\src


In [2]:
from prompts import (
    ConnectionsPromptConfig,
    build_teacher_prompt,
    gold_answer_json,
)


In [3]:
# Dataset loading
from datasets import load_dataset

# TODO: set these to the correct dataset id/config you are using.
# From your screenshot the schema includes fields like: words, answers, difficulty
DATASET_NAME = "tm21cy/NYT-Connections"  # <-- change if needed
DATASET_CONFIG = None  # e.g., "default" or similar; keep None if not required

split = "train"  # change if needed

if DATASET_CONFIG is None:
    ds = load_dataset(DATASET_NAME, split=split)
else:
    ds = load_dataset(DATASET_NAME, DATASET_CONFIG, split=split)

print(ds)
print("Columns:", ds.column_names)
print("Example 0 keys:", ds[0].keys())


  from .autonotebook import tqdm as notebook_tqdm


Dataset({
    features: ['date', 'contest', 'words', 'answers', 'difficulty'],
    num_rows: 652
})
Columns: ['date', 'contest', 'words', 'answers', 'difficulty']
Example 0 keys: dict_keys(['date', 'contest', 'words', 'answers', 'difficulty'])


In [4]:
# Inspect one example
ex = ds[0]
words = ex["words"]
answers = ex["answers"]  # list of dicts with answerDescription + words
categories = [a["answerDescription"] for a in answers]

print("Words (16):", words)
print("Categories (4):", categories)
print("Difficulty:", ex.get("difficulty"))


Words (16): ['LASER', 'PLUCK', 'THREAD', 'WAX', 'COIL', 'SPOOL', 'WIND', 'WRAP', 'HONEYCOMB', 'ORGANISM', 'SOLAR PANEL', 'SPREADSHEET', 'BALL', 'MOVIE', 'SCHOOL', 'VITAMIN']
Categories (4): ['REMOVE, AS BODY HAIR', 'TWIST AROUND', 'THINGS MADE OF CELLS', 'B-___']
Difficulty: 3.3


In [5]:
# Build prompt
cfg = ConnectionsPromptConfig(
    max_reasoning_words=120,
    forbid_extra_words=True,
    output_json_only=True,
    seed=0,
    shuffle_words=False,
    shuffle_categories=False,
)

prompt = build_teacher_prompt(words=words, categories=categories, cfg=cfg)
print(prompt)


You must assign all 16 words to exactly one of the 4 given categories. Each category must contain exactly 4 words. Do not invent new words. Do not repeat words across categories. Use only the exact words provided (case-insensitive match is allowed). Output must be valid JSON and nothing else. Include a brief explanation (<= 120 words) as field "reasoning" inside the JSON.

NYT Connections (categories are provided).

Words (16):
01. LASER
02. PLUCK
03. THREAD
04. WAX
05. COIL
06. SPOOL
07. WIND
08. WRAP
09. HONEYCOMB
10. ORGANISM
11. SOLAR PANEL
12. SPREADSHEET
13. BALL
14. MOVIE
15. SCHOOL
16. VITAMIN

Categories (4):
1. REMOVE, AS BODY HAIR
2. TWIST AROUND
3. THINGS MADE OF CELLS
4. B-___

Think step-by-step internally, but DO NOT output your internal steps.
Before finalizing, double-check that every word appears exactly once.

Required JSON schema:
{"reasoning": "string (brief explanation)", "groups": [{"category": "string", "words": ["w1", "w2", "w3", "w4"]}, {"category": "string", 

In [8]:
# Build gold JSON (no reasoning)
gold = gold_answer_json(answers)
print(gold)


{"groups": [{"category": "REMOVE, AS BODY HAIR", "words": ["LASER", "PLUCK", "THREAD", "WAX"]}, {"category": "TWIST AROUND", "words": ["COIL", "SPOOL", "WIND", "WRAP"]}, {"category": "THINGS MADE OF CELLS", "words": ["HONEYCOMB", "ORGANISM", "SOLAR PANEL", "SPREADSHEET"]}, {"category": "B-___", "words": ["BALL", "MOVIE", "SCHOOL", "VITAMIN"]}]}


In [9]:
# Optional: quick check that gold JSON is parseable and has the expected structure
import json

obj = json.loads(gold)
assert "groups" in obj and len(obj["groups"]) == 4
for g in obj["groups"]:
    assert "category" in g and "words" in g and len(g["words"]) == 4
print("Gold JSON structure OK")


Gold JSON structure OK
