v0.3.0 — Pipeline Core
Pipeline core (F3)
The pipeline is now end-to-end. Feed an adapter a raw export, get back
clean, scored, ready-to-ingest output.
use loust_llm_mempipe::adapter::chatgpt::ChatGptAdapter;
use loust_llm_mempipe::pipeline::{parser, writer};
use loust_llm_mempipe::{Pipeline, PipelineConfig, OutputFormat};
use chrono::Utc;
let adapter = ChatGptAdapter;
let messages = parser::parse(
&adapter,
Box::new(std::fs::File::open("conversations.json")?),
"chatgpt",
)?;
let output = Pipeline::with_safe_defaults().run(messages, Utc::now());
let written = writer::write_all("./out/", &output, OutputFormat::Both)?;
// written = ["out/memory.jsonl", "out/<project>/<thread>.md", ...]Stages (in order)
- Scrub (Rule E): redactions of AWS, GitHub, Anthropic, OpenAI
keys, email, private IPs, absolute user paths. Captures
original_length(pre-scrub byte count) for stats. - Normalize: defensive
original_lengthbackfill + trailing
whitespace trim. - Dedup: pass 1 exact by FNV-1a
content_hash, pass 2 Jaccard
token similarity (threshold 0.85). Duplicates fold into the
survivor'shitscounter. - Age filter: drop messages older than
max_thread_age_days(1095
by default). - Signal score:
0.4·hits + 0.3·recency + 0.3·type_weightwhere
recency =exp(-age_days/365), type_weight =assistant=1.0,
user=0.8,tool=0.5,system=0.3. Saturates at 10 hits. - Filter + sort: drop
signal_score < signal_min(0.2 default),
sort survivors DESC by score.
Output formats
- JSONL (
out/memory.jsonl) — oneNormalizedMessageper line.
Ready forclaude-code --context ./out/memory.jsonl. - Markdown (
out/<project_slug>/<thread_slug>.md) — one file per
project/thread, with metadata frontmatter and## rolesections.
Ready for Claude Projects.
New public API
loust_llm_mempipe::Pipeline(orchestrator)loust_llm_mempipe::PipelineOutput,PipelineStats(results)loust_llm_mempipe::pipeline::scrubber::scrub(Rule E)loust_llm_mempipe::pipeline::dedup::deduploust_llm_mempipe::pipeline::signals::scoreloust_llm_mempipe::pipeline::writer::{write_jsonl, write_markdown, write_all}loust_llm_mempipe::pipeline::parser::parse
NormalizedMessage gained hits: u32 and signal_score: f32. F2
ChatGPT adapter updated; any future adapter must set hits: 1 in its
initializer.
Validation
cargo fmt --check— cleancargo clippy --all-targets -- -D warnings— cleancargo test— 48/48 pass (9 lib + 6 chatgpt + 9 scrubber +
5 dedup + 6 signals + 4 writer + 3 normalizer + 1 parser + 4 e2e)cargo build --release— 21s
What's next
- F4: clap CLI surface (
--input,--output,--format,--stats,
--dry-run,--dedup-threshold,--signal-min) - F5: CI workflow (
.github/workflows/ci.ymlwith fmt/clippy/test) - F7:
Reddit postcancelled