## Context and assignment

I explored tokenization with LLMs (GPT‑5 and Sonnet 4.5) and provide feedback on a peer’s prompting strategy. I analyze effectiveness, cross‑model checks, improvements, and why outcomes were expected/unexpected based on class concepts.

- Evidence base: two chats exploring Discrete Fourier Transform (DFT), tokenizer design, and "better than BPE" ideas.
- Deliverable: A direct URL to my feedback on the issues page (added below when posted).



## What the interactions showed (summary)

- DFT knowledge: The model stated standard DFT/IDFT formulas, properties, FFT relation, and practical notes — expected for a strong general model.
- Tokenizer idea critique: It argued a plain DFT isn’t a practical tokenizer for text (losslessness, semantics, variable length), but suggested Fourier inside models and feature augmentations — aligned with class emphasis on data‑model match.
- Novel tokenizer directions: Proposed learned/VQ tokenizers, morphology‑aware hybrids, adaptive tokenizers, information‑theoretic/MDL objectives, and graph/semantic clustering — consistent with "optimize for predictability, not just compression".



## Did they prompt different models and check multiple outputs?

- Yes — two separate model families were engaged (GPT‑5 and Sonnet 4.5). This is good practice for triangulation.
- Multiple outputs/checks: The threads iterate across related questions (DFT → tokenizer feasibility → alternatives), eliciting deeper reasoning rather than one‑shot answers. This aligns with our class guidance to sample more than once and compare.



## What I asked (conversation excerpts)

- DFT basics: “do you know discrete Fourier transform?” → requested definitions and properties.
- Tokenizer feasibility: “imagine we want to use this as tokenizer to LLMs… is that possible?”
- Beyond BPE: “what are some plausible ways to tokenize better than BPE… novel ideas?”
- Tailored tokenizer: “make a tokenizer that can be tailored to your data.”

> Key thread: DFT as tokenizer → critique → alternatives (learned/VQ, morphology, adaptive/MDL) → concrete design for a data‑tailorable tokenizer.



## Prompting effectiveness: what worked

- Multi‑model triangulation: Comparing GPT‑5 and Sonnet 4.5 surfaced consensus (DFT basics) and nuance (tokenizer feasibility).
- Iterative deepening: Follow‑ups moved from definition → feasibility → design alternatives → concrete architecture. This reduced surface‑level answers.
- Specific, evaluable asks: Requests for “novel ideas” and “tailorable tokenizer” elicited structured proposals (VQ, morphology, adaptive/MDL).
- Tying to data: Pushing for “according‑to‑data” grounded the discussion in invertibility, lossless paths, and MDL.



## Expected vs unexpected (class concepts)

- Losslessness and invertibility (expected): DFT as a tokenizer fails exact round‑trip — aligns with our requirement that tokenization be reversible.
- Information‑theoretic framing (expected): MDL/predictability is a better objective than raw compression frequency — matches lessons on optimizing for downstream loss.
- Model overconfidence (expected): Definitions were confident; feasibility analysis improved when we asked for constraints and tradeoffs.
- Creative yet grounded proposals (unexpectedly thorough): The jump from critique to concrete VQ‑Text with byte fallback reflects strong prior knowledge and aligns with class emphasis on data‑driven design.



## How to improve prompting next time

- Ask for contrastive outputs: “Show two alternative designs and when each wins.”
- Require citations or links: “Cite at least 2 papers or docs per proposal.”
- Force failure modes: “List 3 ways this approach can fail and mitigations.”
- Demand concrete tests: “Give a 10‑line PyTorch sketch and a toy eval plan.”
- Cross‑check models explicitly: “Answer with GPT‑5; then critique that answer as Sonnet 4.5.”
- Quantify trade‑offs: “Estimate token savings vs perplexity change under MDL assumptions.”



## Concrete outputs sampled (snippets)

> DFT feasibility as tokenizer: “Using a plain DFT as the tokenizer for text LLMs isn’t practical… it’s lossy, length‑rigid, and not semantically aligned. Use Fourier ideas inside the model or as features instead.”

> Novel tokenization ideas: “Learned VQ tokenizers, morphology‑aware hybrids, adaptive/MDL tokenization, semantic/graph clustering, multi‑resolution tokens.”

> Data‑tailorable tokenizer: “VQ‑Text with byte fallback: learned codes for common spans; exact byte runs for everything else; train with reconstruction + commitment + predictability (MDL‑style).”

