Your RAG chatbot cannot answer questions about diagrams because it never indexed them.
ragimg is a small Go CLI that reads Markdown and HTML documentation, finds useful images, describes them once with a
vision model, and writes ordinary text chunks for your existing embedding pipeline.
Index diagrams, screenshots, charts, and tables from your docs. Skip junk. Avoid duplicate API calls. Export JSONL.
The demo above is generated from the checked-in fixture in testdata/demo-docs. Regenerate it with scripts/demo.tape, or use the fallback renderer when VHS is not installed:
go run scripts/render-demo-gif.goMost RAG setups treat documentation as text and quietly lose whatever is stored in images: architecture diagrams, database screenshots, charts, product states, sequence flows, or deployment dashboards. Query-time multimodal retrieval can help, but it usually means higher latency, higher cost, and a different serving stack.
ragimg moves that work to indexing time. Each useful image becomes a text record with source metadata, cache state,
model details, and enough context to load into Pinecone, Qdrant, Chroma, LangChain, LlamaIndex, or anything else that
already accepts documents.
| Step | What happens |
|---|---|
| Scan | Reads local .md, .mdx, .html, and .htm files under --docs. |
| Filter | Drops tiny files, missing images, unsupported formats, obvious logos, badges, icons, trackers, and unsafe symlinks. |
| Deduplicate | Hashes image bytes or canonical remote URLs plus nearby context, provider, model, and detail. |
| Caption | Sends only new work to OpenAI or a local Ollama server. |
| Export | Writes stable JSONL by default, with JSON, CSV, and Markdown exports for review workflows. |
| Audit | Builds a static HTML report with image previews and generated captions. |
Download a release binary from GitHub Releases, or install from source with Go 1.25 or newer:
go install github.com/balyakin/ragimg@latestDocker works well in CI or on machines where you do not want to install Go:
docker run --rm \
-v "$PWD:/work" \
ghcr.io/balyakin/ragimg:latest \
index --docs /work/docs --output /work/chunks.jsonlStart with a dry run. It scans, filters, checks the cache, estimates the remaining work, and never asks for an API key:
ragimg index --docs ./docs --output chunks.jsonl --dry-runThen run the actual caption job:
export OPENAI_API_KEY=...
ragimg index --docs ./docs --output chunks.jsonlReview what came out:
ragimg stats chunks.jsonl
ragimg report --input chunks.jsonl --output ragimg-report.htmlOr try the fixture without touching your own repository:
ragimg index --docs testdata/demo-docs --output chunks.jsonl --dry-run --verboseJSONL is the default because it is easy to stream, diff, upload, and inspect. Each line is one image caption chunk. The example below is expanded for readability:
{
"id": "img_demo_architecture",
"text": "OAuth2 architecture diagram showing a Browser PKCE client sending authorize requests through an API Gateway to an Auth Service. The Auth Service issues tokens, stores encrypted refresh tokens in Token Store, and sends login and consent events to Audit Log.",
"metadata": {
"chunk_type": "image_caption",
"source_file": "README.md",
"source_type": "markdown",
"image_path": "images/architecture.svg",
"original_path": "images/architecture.svg",
"is_remote": false,
"alt_text": "OAuth2 architecture",
"title": "OAuth2 architecture",
"section_heading": "OAuth2 Flow",
"heading_path": ["OAuth2 Flow"],
"provider": "openai",
"model": "gpt-5.4-mini",
"detail": "low",
"cached": false,
"indexed_at": "2026-06-05T08:00:00Z"
}
}The same scan can be exported in other formats:
ragimg index --docs ./docs --format json --output chunks.json
ragimg index --docs ./docs --format csv --output chunks.csv
ragimg index --docs ./docs --format md --output chunks.mdThe report is a single static HTML file. It has no external CSS, no external JavaScript, and it does not fetch remote images. Small local non-SVG previews are embedded by default; large files and SVGs are referenced from disk.
ragimg report --input chunks.jsonl --output ragimg-report.htmlUse --docs-root when the report is generated away from the original documentation tree:
ragimg report --input chunks.jsonl --docs-root ./docs --output ragimg-report.htmlragimg is designed for reruns. The default cache file is .ragimg-cache.json; the default progress file is
.ragimg-progress.json.
The cache key includes:
- image bytes for local files, or a canonical URL for remote images
- surrounding documentation context
- provider, model, and detail
- prompt-affecting metadata such as alt text and headings
That means unchanged image/context pairs are not sent to the provider again. If a run is interrupted, the progress file lets the next run reuse completed work before rebuilding the output.
Useful controls:
ragimg index --docs ./docs --max-images 25
ragimg index --docs ./docs --include "**/*.md" --exclude "**/assets/logo*"
ragimg index --docs ./docs --no-cache
ragimg index --docs ./docs --no-resumeOpenAI is the default provider:
export OPENAI_API_KEY=...
ragimg index --docs ./docs --provider openai --model gpt-5.4-miniFor a local path, run Ollama and pick a vision-capable model:
ragimg index --docs ./docs --provider ollama --model llavaOpenAI can receive local images and remote image URLs. Ollama in v0.1 supports local images only. In both cases,
--dry-run is the safe way to see what would be processed before sending anything to a model.
- uses: balyakin/ragimg@v0.1.0
with:
docs: ./docs
output: chunks.jsonl
provider: openai
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}A common CI pattern is to run a dry run on pull requests and reserve paid captioning for a scheduled job or a protected branch:
- uses: balyakin/ragimg@v0.1.0
with:
docs: ./docs
output: chunks.jsonl
dry-run: "true"Full examples live in examples. The important part is simple: embed text, keep metadata, and use id as
the document key.
Pinecone:
import json
from pinecone import Pinecone
pc = Pinecone()
index = pc.Index("docs")
for line in open("chunks.jsonl", encoding="utf-8"):
chunk = json.loads(line)
index.upsert_records(
"default",
[{"_id": chunk["id"], "text": chunk["text"], **chunk["metadata"]}],
)LangChain:
import json
from langchain_core.documents import Document
docs = []
for line in open("chunks.jsonl", encoding="utf-8"):
chunk = json.loads(line)
docs.append(Document(page_content=chunk["text"], metadata=chunk["metadata"]))LlamaIndex:
import json
from llama_index.core import Document
documents = []
for line in open("chunks.jsonl", encoding="utf-8"):
chunk = json.loads(line)
documents.append(Document(text=chunk["text"], metadata=chunk["metadata"]))ragimg reads .ragimg.yaml when present. RAGIMG_CONFIG or --config can point to another file. Paths inside the
config file are resolved relative to that file, which makes checked-in configs easier to move between machines.
docs: ./docs
output: chunks.jsonl
format: jsonl
provider: openai
model: gpt-5.4-mini
detail: low
workers: 4
timeout: 30s
cache: .ragimg-cache.json
resume: .ragimg-progress.json
dedup: true
max_images: 0
include:
- "**/*.md"
- "**/*.mdx"
- "**/*.html"
- "**/*.htm"
exclude:
- "**/node_modules/**"Provider secrets are never read from config files. Use OPENAI_API_KEY or --api-key. RAGIMG_CACHE can override the
cache path, and NO_COLOR, RAGIMG_NO_COLOR=1, or --no-color disable colored output.
| Command | Purpose |
|---|---|
ragimg index |
Scan docs and write caption chunks. |
ragimg preview --image path/to/image.png --dry-run |
Show the resolved prompt and image metadata for one image. |
ragimg report --input chunks.jsonl |
Build the static review report. |
ragimg stats chunks.jsonl |
Print totals, model usage, date range, and top source files. |
ragimg completion bash |
Generate shell completion. |
ragimg version |
Print build version, commit, and date. |
The default filters are intentionally conservative. ragimg skips missing local files, unsupported formats, Git LFS
pointer files, very small or very large files, tiny dimensions, extreme aspect ratios, symlinks that leave the docs
root, and filenames that look like logos, icons, badges, avatars, spacers, tracking pixels, or social sharing assets.
Supported image extensions are .png, .jpg, .jpeg, .gif, .webp, and .svg.
Supported document extensions are .md, .mdx, .html, and .htm.
This README does not invent benchmark numbers. Generate current numbers from the checked-in fixture:
scripts/benchmark.shWhen publishing benchmark claims, include the command, date, commit, fixture or repository snapshot, and output summary.
ragimg v0.1 is a local indexing tool. It does not crawl websites, parse PDFs, run OCR, upload directly to vector
databases, or host a review UI. Those are good future features, but the first release keeps the contract narrow: scan
local docs, caption useful images, write portable chunks, and make reruns cheap.
go test ./...
go vet ./...The repository includes CI for Go 1.25.x and 1.26.x, release binaries for Linux, macOS, and Windows, a Docker image, and a root action.yml for GitHub Actions.
Apache-2.0. See LICENSE.
