## Context

I explored tokenization with two models (GPT‑5 and Sonnet 4.5) and reflected on what made the prompts effective. The discussion moved from DFT basics to whether a Fourier transform could serve as a tokenizer, then broadened into ways to improve on BPE and how to tailor a tokenizer to a specific corpus. A direct URL to my feedback will be added when it’s posted on the issues page.



## What the interactions showed

Both models handled the DFT prompt comfortably, giving standard forward/inverse formulas and practical notes. When I asked if a DFT could be used as a tokenizer, the answers pushed back: a plain Fourier representation is lossy, fixed‑length, and not aligned to discrete linguistic structure; it’s better used inside the model or as auxiliary features. On “better than BPE,” the conversation shifted to learned tokenizers (e.g., VQ with byte fallback), morphology‑aware segmentation, adaptive/MDL‑motivated tokenization, and semantic or graph‑based grouping—approaches that target predictability rather than just compression.



## Models and outputs

I used two different model families (GPT‑5 and Sonnet 4.5) and iterated across related prompts rather than stopping at one answer. That back‑and‑forth surfaced consensus on basics, clearer reasoning about feasibility, and a pathway from critique to concrete designs.


## Conversation excerpts

I began with a simple check: “do you know discrete Fourier transform?” The reply included the standard definition — “X[k] = Σ x[n] e^(−j2πkn/N), with an inverse x[n] = (1/N) Σ X[k] e^(+j2πkn/N)” — and reminders about linearity, modulation, and the FFT. From there I asked: “imagine we want to use this as tokenizer to LLMs… is that possible?” The answer argued a plain DFT is a poor tokenizer because it’s lossy, tied to fixed lengths, and not aligned to discrete symbols; it suggested using Fourier features inside models instead. Finally, I asked for alternatives “better than BPE” and a tokenizer that can be tailored to data; the response proposed learned VQ schemes with a byte‑fallback for exact reversibility, morphology‑aware segmentation, adaptive tokenization optimized for predictability, and semantic/graph‑based grouping.


## What worked in the prompting

Using two different models and pushing beyond the first answer helped. Starting with definitions created a shared foundation; then asking “is this actually usable as a tokenizer?” forced a feasibility check and concrete trade‑offs. Requesting “novel ideas” and a “tailorable tokenizer” nudged the models to move from critique to constructive designs (e.g., VQ with byte‑fallback) rather than staying abstract. Framing the problem in terms of reversibility and predictability kept the discussion grounded in properties that matter for real systems.


## What I expected vs what surprised me

I expected the definitions to be solid and the DFT idea to be challenged on reversibility and fit to discrete text. What surprised me was how quickly the conversation moved from critique to a concrete, workable direction: a learned vector‑quantized tokenizer with exact byte fallback, paired with a predictability‑oriented objective. That made the idea immediately testable rather than purely conceptual.


## How I’d prompt next time

Next time I’d ask for two contrasting designs and when each is preferable, require a couple of citations, and force the model to spell out likely failure modes with mitigations. I’d also request a tiny code sketch and a toy evaluation plan so the proposal is actionable. Finally, I’d have one model critique the other’s answer and estimate simple trade‑offs like token savings versus predicted perplexity change.


## Concrete outputs sampled (snippets)

> DFT feasibility as tokenizer: “Using a plain DFT as the tokenizer for text LLMs isn’t practical… it’s lossy, length‑rigid, and not semantically aligned. Use Fourier ideas inside the model or as features instead.”

> Novel tokenization ideas: “Learned VQ tokenizers, morphology‑aware hybrids, adaptive/MDL tokenization, semantic/graph clustering, multi‑resolution tokens.”

> Data‑tailorable tokenizer: “VQ‑Text with byte fallback: learned codes for common spans; exact byte runs for everything else; train with reconstruction + commitment + predictability (MDL‑style).”

