Releases: LOUST-PRO/LLMmempipe
v0.5.0 — CI Validation
CI validation (F5)
The MVP is now self-validating. Every push to main and every pull
request runs cargo fmt --check, cargo clippy --all-targets -- -D warnings, cargo test --all-targets, and cargo build --release
on both stable and beta toolchains.
CI runs
- Run #1 (commit
d5dbd53): 1m 3s — bothstableandbeta✅ - Run #2 (commit
1007398): 1m 4s — bothstableandbeta✅- Added
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=trueto opt into Node
24 early, before the 2026-06-16 forced default and 2026-09-16
removal. Runner is on Node 24 now;actions/checkout@v4works.
- Added
Workflow: https://github.com/LOUST-PRO/loust-llm-mempipe/actions/workflows/ci.yml
Workflow design
- Matrix:
stable+beta,fail-fast: false(so a beta breakage
is visible even if stable passes). - Cache:
Swatinem/rust-cache@v2(registry +target/). - Permissions:
contents: readonly. No secrets, no write tokens,
nopackages:scope. - Security: zero
${{ github.event.* }}interpolations inrun:
blocks. The only${{ }}reference ismatrix.rustwhich resolves
to static"stable"/"beta"values declared in the workflow
file, not external input. No command-injection vectors. - Supply chain: actions pinned by major tag (
@v4,@v2,
@masterfordtolnay/rust-toolchain).
What CI catches
- Formatting drift (
cargo fmt --check) - Lint regressions, even in tests (
cargo clippy --all-targets) - Test failures on either toolchain (catches features that stable
has but beta doesn't, or vice versa) - Release build breakages (the path
cargo installexercises)
Final MVP status
| Phase | Scope | Status |
|---|---|---|
| F0.1 | Pre-publish audit | ✅ done |
| F0.2 | Org hardening | ✅ done |
| F1 | Skeleton + contracts | ✅ v0.1.0 |
| F2 | ChatGPT adapter MVP | ✅ v0.2.0 |
| F3 | Pipeline core | ✅ v0.3.0 |
| F4 | CLI ergonomics | ✅ v0.4.0 |
| F5 | CI validation | ✅ v0.5.0 |
| F7 | ❌ cancelled |
cargo install loust-llm-mempipe ships you the working MVP.
v0.4.0 — CLI Ergonomics
CLI ergonomics (F4)
The CLI is feature-complete. cargo install loust-llm-mempipe and
loust-llm-mempipe --help give you a working pipeline in one
command.
loust-llm-mempipe \
--input conversations.json \
--output ./claude-memory/ \
--format both \
--statsStderr transcript on a real run
$ loust-llm-mempipe --input tests/fixtures/chatgpt-tiny.json \
--output /tmp/lmp-smoke --format both --stats
detected adapter: chatgpt
parsed 7 messages
stats: in=7 out=6 scrubbed=0 redactions=0 dedup_exact=0 dedup_fuzzy=0 age_drop=0 signal_drop=1
wrote: /tmp/lmp-smoke/memory.jsonl
wrote: /tmp/lmp-smoke/multi-turn-with-system-message/multi-turn-with-system-message.md
wrote: /tmp/lmp-smoke/rust-fnv-1a-hashing/rust-fnv-1a-hashing.md
done: 3 files written
In this run, the system message from conv-003 got dropped by the
signal_min filter (assistant=1.0 > user=0.8 > tool=0.5 > system=0.3,
and the message was a year old).
Flags
| Flag | Default | Notes |
|---|---|---|
-i, --input |
required | path to the raw export file |
-o, --output |
required | output dir, created if missing |
-f, --format |
jsonl |
jsonl, markdown/md, or both |
--adapter |
auto-detect | chatgpt, claude_web, gemini, claude_code |
--dedup-threshold |
0.85 |
Jaccard sim, range [0.0, 1.0] |
--signal-min |
0.2 |
drop signal_score < min, range [0.0, 1.0] |
--max-age-days |
1095 |
drop messages older than N days |
--stats |
off | print one-line stats to stderr |
--dry-run |
off | compute but don't write |
--info |
off | print build metadata and exit |
Library additions
loust_llm_mempipe::adapter::registry()— ordered list of all
known adapters for auto-detection.loust_llm_mempipe::adapter::pick_adapter(kind, header)— explicit
override or first-detect.OutputFormat::from_cli(s)andAdapterKind::from_cli(s)—
kebab-case parsers used by clap'svalue_parser.
Validation
cargo fmt --check— cleancargo clippy --all-targets -- -D warnings— cleancargo test— 61/61 pass (44 lib + 4 main clap + 9 cli_e2e + 4 e2e)cargo build --release— 12s- Smoke test against real fixture produces
memory.jsonl+ 2 Markdown
files as expected.
What's next
- F5: CI workflow (
.github/workflows/ci.ymlwith fmt/clippy/test) - F7:
Reddit postcancelled
v0.3.0 — Pipeline Core
Pipeline core (F3)
The pipeline is now end-to-end. Feed an adapter a raw export, get back
clean, scored, ready-to-ingest output.
use loust_llm_mempipe::adapter::chatgpt::ChatGptAdapter;
use loust_llm_mempipe::pipeline::{parser, writer};
use loust_llm_mempipe::{Pipeline, PipelineConfig, OutputFormat};
use chrono::Utc;
let adapter = ChatGptAdapter;
let messages = parser::parse(
&adapter,
Box::new(std::fs::File::open("conversations.json")?),
"chatgpt",
)?;
let output = Pipeline::with_safe_defaults().run(messages, Utc::now());
let written = writer::write_all("./out/", &output, OutputFormat::Both)?;
// written = ["out/memory.jsonl", "out/<project>/<thread>.md", ...]Stages (in order)
- Scrub (Rule E): redactions of AWS, GitHub, Anthropic, OpenAI
keys, email, private IPs, absolute user paths. Captures
original_length(pre-scrub byte count) for stats. - Normalize: defensive
original_lengthbackfill + trailing
whitespace trim. - Dedup: pass 1 exact by FNV-1a
content_hash, pass 2 Jaccard
token similarity (threshold 0.85). Duplicates fold into the
survivor'shitscounter. - Age filter: drop messages older than
max_thread_age_days(1095
by default). - Signal score:
0.4·hits + 0.3·recency + 0.3·type_weightwhere
recency =exp(-age_days/365), type_weight =assistant=1.0,
user=0.8,tool=0.5,system=0.3. Saturates at 10 hits. - Filter + sort: drop
signal_score < signal_min(0.2 default),
sort survivors DESC by score.
Output formats
- JSONL (
out/memory.jsonl) — oneNormalizedMessageper line.
Ready forclaude-code --context ./out/memory.jsonl. - Markdown (
out/<project_slug>/<thread_slug>.md) — one file per
project/thread, with metadata frontmatter and## rolesections.
Ready for Claude Projects.
New public API
loust_llm_mempipe::Pipeline(orchestrator)loust_llm_mempipe::PipelineOutput,PipelineStats(results)loust_llm_mempipe::pipeline::scrubber::scrub(Rule E)loust_llm_mempipe::pipeline::dedup::deduploust_llm_mempipe::pipeline::signals::scoreloust_llm_mempipe::pipeline::writer::{write_jsonl, write_markdown, write_all}loust_llm_mempipe::pipeline::parser::parse
NormalizedMessage gained hits: u32 and signal_score: f32. F2
ChatGPT adapter updated; any future adapter must set hits: 1 in its
initializer.
Validation
cargo fmt --check— cleancargo clippy --all-targets -- -D warnings— cleancargo test— 48/48 pass (9 lib + 6 chatgpt + 9 scrubber +
5 dedup + 6 signals + 4 writer + 3 normalizer + 1 parser + 4 e2e)cargo build --release— 21s
What's next
- F4: clap CLI surface (
--input,--output,--format,--stats,
--dry-run,--dedup-threshold,--signal-min) - F5: CI workflow (
.github/workflows/ci.ymlwith fmt/clippy/test) - F7:
Reddit postcancelled
v0.2.0 — ChatGPT Adapter MVP
ChatGPT export adapter (F2)
The first production adapter. You can now feed a real ChatGPT
conversations.json export into the library and get back a clean linear
thread of NormalizedMessages.
What works
- ✅ Detects ChatGPT exports by sniffing
"mapping"or"conversations"in the file header - ✅ Reconstructs the active linear thread from the non-linear
mappingtree
(walkscurrent_node→ parent chain → root, then yields root-first) - ✅ Role mapping:
user/assistant/system/tool. Unknown roles
(e.g. custom GPTname) are dropped. - ✅ Skips whitespace-only and empty messages
- ✅ Extracts text
partsonly — drops structured payloads (image_url,
code interpreter, tool calls) - ✅ FNV-1a 64-bit content hash for downstream dedup
- ✅ Slugifies conversation title into a
project_hint
Library API
use loust_llm_mempipe::adapter::chatgpt::ChatGptAdapter;
use loust_llm_mempipe::adapter::Adapter;
let adapter = ChatGptAdapter;
let reader = Box::new(std::fs::File::open("conversations.json")?);
let messages: Vec<_> = adapter.stream_messages(reader)?.collect();Validation
cargo fmt --check— cleancargo clippy --all-targets -- -D warnings— cleancargo test— 16/16 pass (7 new ChatGPT tests + 9 library tests)cargo build --release— OK (~10s clean build)
Limitations (F2 MVP)
- The full export is materialized in memory (~2× JSON size). For 50 MB
exports that's ~100 MB peak — fine for a developer laptop. True
streaming optimization is tracked for F3+ if real exports need it. - CLI surface still placeholder (F4). Library is usable from Rust code.
Next
- F3: pipeline core (Rule E secret scrubber → Jaccard dedup → signal scoring
→ JSONL + Markdown writer) - F4: clap CLI with
--input,--output,--format,--stats - F5: CI + smoke E2E
v0.1.0 — Initial Skeleton
What's in v0.1.0
F1 of loust-llm-mempipe: the skeleton that makes the rest of the project possible.
Added
Cargo.tomlwith full SEO metadata (description, keywords, categories, license MIT/Apache-2.0, repo URL)- Public library surface: re-exports of
Adapter,AdapterKind,OutputFormat,PipelineConfig,SecretKind,NormalizedMessage,Role Adaptertrait withdetect()andstream_messages()contracts- 4 adapter stubs: ChatGPT, Claude Web, Gemini, Claude Code JSONL
PipelineConfigwith safe defaults: dedup threshold 0.85, signal_min 0.2, max thread age 1095 days, 7 secret pattern slots- 6 pipeline module stubs: parser, scrubber, normalizer, dedup, signals, writer
NormalizedMessage::compute_content_hash(FNV-1a via seahash) +slugifyhelper- 9 unit tests covering hash determinism, slugify edge cases, role serialization, config defaults, and secret pattern coverage
Makefilewith build / test / clippy / fmt / release / info targetsREADME.mdwith SEO tagline, project status table, build instructionsCHANGELOG.mdfollowing Keep a Changelog format
Validation
cargo build --release: 40.24s clean, binary 703 KBcargo clippy --all-targets -- -D warnings: 0 errorscargo test: 9/9 passcargo fmt --check: clean- Smoke:
--version,--info,--helpall work
Roadmap
- F2: ChatGPT adapter MVP (streaming JSON deserializer, thread reconstruction)
- F3: Pipeline core (scrubber + dedup + signal_score + writer)
- F4: CLI ergonomics (
--input,--output,--format,--stats) - F5: GitHub Actions CI + smoke E2E
- F7: Public release announcement (r/Anthropic, etc.)