Skip to content

v0.2.0 — ChatGPT Adapter MVP

Choose a tag to compare

@louzt louzt released this 11 Jun 02:23
· 4 commits to main since this release
b3abf6c

ChatGPT export adapter (F2)

The first production adapter. You can now feed a real ChatGPT
conversations.json export into the library and get back a clean linear
thread of NormalizedMessages.

What works

  • ✅ Detects ChatGPT exports by sniffing "mapping" or "conversations" in the file header
  • ✅ Reconstructs the active linear thread from the non-linear mapping tree
    (walks current_node → parent chain → root, then yields root-first)
  • ✅ Role mapping: user / assistant / system / tool. Unknown roles
    (e.g. custom GPT name) are dropped.
  • ✅ Skips whitespace-only and empty messages
  • ✅ Extracts text parts only — drops structured payloads (image_url,
    code interpreter, tool calls)
  • ✅ FNV-1a 64-bit content hash for downstream dedup
  • ✅ Slugifies conversation title into a project_hint

Library API

use loust_llm_mempipe::adapter::chatgpt::ChatGptAdapter;
use loust_llm_mempipe::adapter::Adapter;

let adapter = ChatGptAdapter;
let reader = Box::new(std::fs::File::open("conversations.json")?);
let messages: Vec<_> = adapter.stream_messages(reader)?.collect();

Validation

  • cargo fmt --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test — 16/16 pass (7 new ChatGPT tests + 9 library tests)
  • cargo build --release — OK (~10s clean build)

Limitations (F2 MVP)

  • The full export is materialized in memory (~2× JSON size). For 50 MB
    exports that's ~100 MB peak — fine for a developer laptop. True
    streaming optimization is tracked for F3+ if real exports need it.
  • CLI surface still placeholder (F4). Library is usable from Rust code.

Next

  • F3: pipeline core (Rule E secret scrubber → Jaccard dedup → signal scoring
    → JSONL + Markdown writer)
  • F4: clap CLI with --input, --output, --format, --stats
  • F5: CI + smoke E2E