Skip to content

v0.3.0 — Pipeline Core

Choose a tag to compare

@louzt louzt released this 11 Jun 03:07
· 3 commits to main since this release
1b9fa86

Pipeline core (F3)

The pipeline is now end-to-end. Feed an adapter a raw export, get back
clean, scored, ready-to-ingest output.

use loust_llm_mempipe::adapter::chatgpt::ChatGptAdapter;
use loust_llm_mempipe::pipeline::{parser, writer};
use loust_llm_mempipe::{Pipeline, PipelineConfig, OutputFormat};
use chrono::Utc;

let adapter = ChatGptAdapter;
let messages = parser::parse(
    &adapter,
    Box::new(std::fs::File::open("conversations.json")?),
    "chatgpt",
)?;

let output = Pipeline::with_safe_defaults().run(messages, Utc::now());
let written = writer::write_all("./out/", &output, OutputFormat::Both)?;
// written = ["out/memory.jsonl", "out/<project>/<thread>.md", ...]

Stages (in order)

  1. Scrub (Rule E): redactions of AWS, GitHub, Anthropic, OpenAI
    keys, email, private IPs, absolute user paths. Captures
    original_length (pre-scrub byte count) for stats.
  2. Normalize: defensive original_length backfill + trailing
    whitespace trim.
  3. Dedup: pass 1 exact by FNV-1a content_hash, pass 2 Jaccard
    token similarity (threshold 0.85). Duplicates fold into the
    survivor's hits counter.
  4. Age filter: drop messages older than max_thread_age_days (1095
    by default).
  5. Signal score: 0.4·hits + 0.3·recency + 0.3·type_weight where
    recency = exp(-age_days/365), type_weight = assistant=1.0,
    user=0.8, tool=0.5, system=0.3. Saturates at 10 hits.
  6. Filter + sort: drop signal_score < signal_min (0.2 default),
    sort survivors DESC by score.

Output formats

  • JSONL (out/memory.jsonl) — one NormalizedMessage per line.
    Ready for claude-code --context ./out/memory.jsonl.
  • Markdown (out/<project_slug>/<thread_slug>.md) — one file per
    project/thread, with metadata frontmatter and ## role sections.
    Ready for Claude Projects.

New public API

  • loust_llm_mempipe::Pipeline (orchestrator)
  • loust_llm_mempipe::PipelineOutput, PipelineStats (results)
  • loust_llm_mempipe::pipeline::scrubber::scrub (Rule E)
  • loust_llm_mempipe::pipeline::dedup::dedup
  • loust_llm_mempipe::pipeline::signals::score
  • loust_llm_mempipe::pipeline::writer::{write_jsonl, write_markdown, write_all}
  • loust_llm_mempipe::pipeline::parser::parse

NormalizedMessage gained hits: u32 and signal_score: f32. F2
ChatGPT adapter updated; any future adapter must set hits: 1 in its
initializer.

Validation

  • cargo fmt --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test48/48 pass (9 lib + 6 chatgpt + 9 scrubber +
    5 dedup + 6 signals + 4 writer + 3 normalizer + 1 parser + 4 e2e)
  • cargo build --release — 21s

What's next

  • F4: clap CLI surface (--input, --output, --format, --stats,
    --dry-run, --dedup-threshold, --signal-min)
  • F5: CI workflow (.github/workflows/ci.yml with fmt/clippy/test)
  • F7: Reddit post cancelled