Skip to content

Releases: LOUST-PRO/LLMmempipe

v0.5.0 — CI Validation

11 Jun 04:56
v0.5.0
d5dbd53

Choose a tag to compare

CI validation (F5)

The MVP is now self-validating. Every push to main and every pull
request runs cargo fmt --check, cargo clippy --all-targets -- -D warnings, cargo test --all-targets, and cargo build --release
on both stable and beta toolchains.

CI runs

  • Run #1 (commit d5dbd53): 1m 3s — both stable and beta
  • Run #2 (commit 1007398): 1m 4s — both stable and beta
    • Added FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true to opt into Node
      24 early, before the 2026-06-16 forced default and 2026-09-16
      removal. Runner is on Node 24 now; actions/checkout@v4 works.

Workflow: https://github.com/LOUST-PRO/loust-llm-mempipe/actions/workflows/ci.yml

Workflow design

  • Matrix: stable + beta, fail-fast: false (so a beta breakage
    is visible even if stable passes).
  • Cache: Swatinem/rust-cache@v2 (registry + target/).
  • Permissions: contents: read only. No secrets, no write tokens,
    no packages: scope.
  • Security: zero ${{ github.event.* }} interpolations in run:
    blocks. The only ${{ }} reference is matrix.rust which resolves
    to static "stable" / "beta" values declared in the workflow
    file, not external input. No command-injection vectors.
  • Supply chain: actions pinned by major tag (@v4, @v2,
    @master for dtolnay/rust-toolchain).

What CI catches

  • Formatting drift (cargo fmt --check)
  • Lint regressions, even in tests (cargo clippy --all-targets)
  • Test failures on either toolchain (catches features that stable
    has but beta doesn't, or vice versa)
  • Release build breakages (the path cargo install exercises)

Final MVP status

Phase Scope Status
F0.1 Pre-publish audit ✅ done
F0.2 Org hardening ✅ done
F1 Skeleton + contracts v0.1.0
F2 ChatGPT adapter MVP v0.2.0
F3 Pipeline core v0.3.0
F4 CLI ergonomics v0.4.0
F5 CI validation v0.5.0
F7 Reddit post ❌ cancelled

cargo install loust-llm-mempipe ships you the working MVP.

v0.4.0 — CLI Ergonomics

11 Jun 04:18
c2decd4

Choose a tag to compare

CLI ergonomics (F4)

The CLI is feature-complete. cargo install loust-llm-mempipe and
loust-llm-mempipe --help give you a working pipeline in one
command.

loust-llm-mempipe \
  --input conversations.json \
  --output ./claude-memory/ \
  --format both \
  --stats

Stderr transcript on a real run

$ loust-llm-mempipe --input tests/fixtures/chatgpt-tiny.json \
                     --output /tmp/lmp-smoke --format both --stats
detected adapter: chatgpt
parsed 7 messages
stats: in=7 out=6 scrubbed=0 redactions=0 dedup_exact=0 dedup_fuzzy=0 age_drop=0 signal_drop=1
wrote: /tmp/lmp-smoke/memory.jsonl
wrote: /tmp/lmp-smoke/multi-turn-with-system-message/multi-turn-with-system-message.md
wrote: /tmp/lmp-smoke/rust-fnv-1a-hashing/rust-fnv-1a-hashing.md
done: 3 files written

In this run, the system message from conv-003 got dropped by the
signal_min filter (assistant=1.0 > user=0.8 > tool=0.5 > system=0.3,
and the message was a year old).

Flags

Flag Default Notes
-i, --input required path to the raw export file
-o, --output required output dir, created if missing
-f, --format jsonl jsonl, markdown/md, or both
--adapter auto-detect chatgpt, claude_web, gemini, claude_code
--dedup-threshold 0.85 Jaccard sim, range [0.0, 1.0]
--signal-min 0.2 drop signal_score < min, range [0.0, 1.0]
--max-age-days 1095 drop messages older than N days
--stats off print one-line stats to stderr
--dry-run off compute but don't write
--info off print build metadata and exit

Library additions

  • loust_llm_mempipe::adapter::registry() — ordered list of all
    known adapters for auto-detection.
  • loust_llm_mempipe::adapter::pick_adapter(kind, header) — explicit
    override or first-detect.
  • OutputFormat::from_cli(s) and AdapterKind::from_cli(s)
    kebab-case parsers used by clap's value_parser.

Validation

  • cargo fmt --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test61/61 pass (44 lib + 4 main clap + 9 cli_e2e + 4 e2e)
  • cargo build --release — 12s
  • Smoke test against real fixture produces memory.jsonl + 2 Markdown
    files as expected.

What's next

  • F5: CI workflow (.github/workflows/ci.yml with fmt/clippy/test)
  • F7: Reddit post cancelled

v0.3.0 — Pipeline Core

11 Jun 03:07
1b9fa86

Choose a tag to compare

Pipeline core (F3)

The pipeline is now end-to-end. Feed an adapter a raw export, get back
clean, scored, ready-to-ingest output.

use loust_llm_mempipe::adapter::chatgpt::ChatGptAdapter;
use loust_llm_mempipe::pipeline::{parser, writer};
use loust_llm_mempipe::{Pipeline, PipelineConfig, OutputFormat};
use chrono::Utc;

let adapter = ChatGptAdapter;
let messages = parser::parse(
    &adapter,
    Box::new(std::fs::File::open("conversations.json")?),
    "chatgpt",
)?;

let output = Pipeline::with_safe_defaults().run(messages, Utc::now());
let written = writer::write_all("./out/", &output, OutputFormat::Both)?;
// written = ["out/memory.jsonl", "out/<project>/<thread>.md", ...]

Stages (in order)

  1. Scrub (Rule E): redactions of AWS, GitHub, Anthropic, OpenAI
    keys, email, private IPs, absolute user paths. Captures
    original_length (pre-scrub byte count) for stats.
  2. Normalize: defensive original_length backfill + trailing
    whitespace trim.
  3. Dedup: pass 1 exact by FNV-1a content_hash, pass 2 Jaccard
    token similarity (threshold 0.85). Duplicates fold into the
    survivor's hits counter.
  4. Age filter: drop messages older than max_thread_age_days (1095
    by default).
  5. Signal score: 0.4·hits + 0.3·recency + 0.3·type_weight where
    recency = exp(-age_days/365), type_weight = assistant=1.0,
    user=0.8, tool=0.5, system=0.3. Saturates at 10 hits.
  6. Filter + sort: drop signal_score < signal_min (0.2 default),
    sort survivors DESC by score.

Output formats

  • JSONL (out/memory.jsonl) — one NormalizedMessage per line.
    Ready for claude-code --context ./out/memory.jsonl.
  • Markdown (out/<project_slug>/<thread_slug>.md) — one file per
    project/thread, with metadata frontmatter and ## role sections.
    Ready for Claude Projects.

New public API

  • loust_llm_mempipe::Pipeline (orchestrator)
  • loust_llm_mempipe::PipelineOutput, PipelineStats (results)
  • loust_llm_mempipe::pipeline::scrubber::scrub (Rule E)
  • loust_llm_mempipe::pipeline::dedup::dedup
  • loust_llm_mempipe::pipeline::signals::score
  • loust_llm_mempipe::pipeline::writer::{write_jsonl, write_markdown, write_all}
  • loust_llm_mempipe::pipeline::parser::parse

NormalizedMessage gained hits: u32 and signal_score: f32. F2
ChatGPT adapter updated; any future adapter must set hits: 1 in its
initializer.

Validation

  • cargo fmt --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test48/48 pass (9 lib + 6 chatgpt + 9 scrubber +
    5 dedup + 6 signals + 4 writer + 3 normalizer + 1 parser + 4 e2e)
  • cargo build --release — 21s

What's next

  • F4: clap CLI surface (--input, --output, --format, --stats,
    --dry-run, --dedup-threshold, --signal-min)
  • F5: CI workflow (.github/workflows/ci.yml with fmt/clippy/test)
  • F7: Reddit post cancelled

v0.2.0 — ChatGPT Adapter MVP

11 Jun 02:23
b3abf6c

Choose a tag to compare

ChatGPT export adapter (F2)

The first production adapter. You can now feed a real ChatGPT
conversations.json export into the library and get back a clean linear
thread of NormalizedMessages.

What works

  • ✅ Detects ChatGPT exports by sniffing "mapping" or "conversations" in the file header
  • ✅ Reconstructs the active linear thread from the non-linear mapping tree
    (walks current_node → parent chain → root, then yields root-first)
  • ✅ Role mapping: user / assistant / system / tool. Unknown roles
    (e.g. custom GPT name) are dropped.
  • ✅ Skips whitespace-only and empty messages
  • ✅ Extracts text parts only — drops structured payloads (image_url,
    code interpreter, tool calls)
  • ✅ FNV-1a 64-bit content hash for downstream dedup
  • ✅ Slugifies conversation title into a project_hint

Library API

use loust_llm_mempipe::adapter::chatgpt::ChatGptAdapter;
use loust_llm_mempipe::adapter::Adapter;

let adapter = ChatGptAdapter;
let reader = Box::new(std::fs::File::open("conversations.json")?);
let messages: Vec<_> = adapter.stream_messages(reader)?.collect();

Validation

  • cargo fmt --check — clean
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test — 16/16 pass (7 new ChatGPT tests + 9 library tests)
  • cargo build --release — OK (~10s clean build)

Limitations (F2 MVP)

  • The full export is materialized in memory (~2× JSON size). For 50 MB
    exports that's ~100 MB peak — fine for a developer laptop. True
    streaming optimization is tracked for F3+ if real exports need it.
  • CLI surface still placeholder (F4). Library is usable from Rust code.

Next

  • F3: pipeline core (Rule E secret scrubber → Jaccard dedup → signal scoring
    → JSONL + Markdown writer)
  • F4: clap CLI with --input, --output, --format, --stats
  • F5: CI + smoke E2E

v0.1.0 — Initial Skeleton

11 Jun 02:13
ea57baa

Choose a tag to compare

What's in v0.1.0

F1 of loust-llm-mempipe: the skeleton that makes the rest of the project possible.

Added

  • Cargo.toml with full SEO metadata (description, keywords, categories, license MIT/Apache-2.0, repo URL)
  • Public library surface: re-exports of Adapter, AdapterKind, OutputFormat, PipelineConfig, SecretKind, NormalizedMessage, Role
  • Adapter trait with detect() and stream_messages() contracts
  • 4 adapter stubs: ChatGPT, Claude Web, Gemini, Claude Code JSONL
  • PipelineConfig with safe defaults: dedup threshold 0.85, signal_min 0.2, max thread age 1095 days, 7 secret pattern slots
  • 6 pipeline module stubs: parser, scrubber, normalizer, dedup, signals, writer
  • NormalizedMessage::compute_content_hash (FNV-1a via seahash) + slugify helper
  • 9 unit tests covering hash determinism, slugify edge cases, role serialization, config defaults, and secret pattern coverage
  • Makefile with build / test / clippy / fmt / release / info targets
  • README.md with SEO tagline, project status table, build instructions
  • CHANGELOG.md following Keep a Changelog format

Validation

  • cargo build --release: 40.24s clean, binary 703 KB
  • cargo clippy --all-targets -- -D warnings: 0 errors
  • cargo test: 9/9 pass
  • cargo fmt --check: clean
  • Smoke: --version, --info, --help all work

Roadmap

  • F2: ChatGPT adapter MVP (streaming JSON deserializer, thread reconstruction)
  • F3: Pipeline core (scrubber + dedup + signal_score + writer)
  • F4: CLI ergonomics (--input, --output, --format, --stats)
  • F5: GitHub Actions CI + smoke E2E
  • F7: Public release announcement (r/Anthropic, etc.)