ragimg

Your RAG chatbot cannot answer questions about diagrams because it never indexed them.

ragimg is a small Go CLI that reads Markdown and HTML documentation, finds useful images, describes them once with a vision model, and writes ordinary text chunks for your existing embedding pipeline.

Index diagrams, screenshots, charts, and tables from your docs. Skip junk. Avoid duplicate API calls. Export JSONL.

The demo above is generated from the checked-in fixture in testdata/demo-docs. Regenerate it with scripts/demo.tape, or use the fallback renderer when VHS is not installed:

go run scripts/render-demo-gif.go

Why ragimg exists

Most RAG setups treat documentation as text and quietly lose whatever is stored in images: architecture diagrams, database screenshots, charts, product states, sequence flows, or deployment dashboards. Query-time multimodal retrieval can help, but it usually means higher latency, higher cost, and a different serving stack.

ragimg moves that work to indexing time. Each useful image becomes a text record with source metadata, cache state, model details, and enough context to load into Pinecone, Qdrant, Chroma, LangChain, LlamaIndex, or anything else that already accepts documents.

What it does

Step	What happens
Scan	Reads local `.md`, `.mdx`, `.html`, and `.htm` files under `--docs`.
Filter	Drops tiny files, missing images, unsupported formats, obvious logos, badges, icons, trackers, and unsafe symlinks.
Deduplicate	Hashes image bytes or canonical remote URLs plus nearby context, provider, model, and detail.
Caption	Sends only new work to OpenAI or a local Ollama server.
Export	Writes stable JSONL by default, with JSON, CSV, and Markdown exports for review workflows.
Audit	Builds a static HTML report with image previews and generated captions.

Install

Download a release binary from GitHub Releases, or install from source with Go 1.25 or newer:

go install github.com/balyakin/ragimg@latest

Docker works well in CI or on machines where you do not want to install Go:

docker run --rm \
  -v "$PWD:/work" \
  ghcr.io/balyakin/ragimg:latest \
  index --docs /work/docs --output /work/chunks.jsonl

Quickstart

Start with a dry run. It scans, filters, checks the cache, estimates the remaining work, and never asks for an API key:

ragimg index --docs ./docs --output chunks.jsonl --dry-run

Then run the actual caption job:

export OPENAI_API_KEY=...
ragimg index --docs ./docs --output chunks.jsonl

Review what came out:

ragimg stats chunks.jsonl
ragimg report --input chunks.jsonl --output ragimg-report.html

Or try the fixture without touching your own repository:

ragimg index --docs testdata/demo-docs --output chunks.jsonl --dry-run --verbose

Output

JSONL is the default because it is easy to stream, diff, upload, and inspect. Each line is one image caption chunk. The example below is expanded for readability:

{
  "id": "img_demo_architecture",
  "text": "OAuth2 architecture diagram showing a Browser PKCE client sending authorize requests through an API Gateway to an Auth Service. The Auth Service issues tokens, stores encrypted refresh tokens in Token Store, and sends login and consent events to Audit Log.",
  "metadata": {
    "chunk_type": "image_caption",
    "source_file": "README.md",
    "source_type": "markdown",
    "image_path": "images/architecture.svg",
    "original_path": "images/architecture.svg",
    "is_remote": false,
    "alt_text": "OAuth2 architecture",
    "title": "OAuth2 architecture",
    "section_heading": "OAuth2 Flow",
    "heading_path": ["OAuth2 Flow"],
    "provider": "openai",
    "model": "gpt-5.4-mini",
    "detail": "low",
    "cached": false,
    "indexed_at": "2026-06-05T08:00:00Z"
  }
}

The same scan can be exported in other formats:

ragimg index --docs ./docs --format json --output chunks.json
ragimg index --docs ./docs --format csv --output chunks.csv
ragimg index --docs ./docs --format md --output chunks.md

HTML report

The report is a single static HTML file. It has no external CSS, no external JavaScript, and it does not fetch remote images. Small local non-SVG previews are embedded by default; large files and SVGs are referenced from disk.

ragimg report --input chunks.jsonl --output ragimg-report.html

Use --docs-root when the report is generated away from the original documentation tree:

ragimg report --input chunks.jsonl --docs-root ./docs --output ragimg-report.html

Cache and resume

ragimg is designed for reruns. The default cache file is .ragimg-cache.json; the default progress file is .ragimg-progress.json.

The cache key includes:

image bytes for local files, or a canonical URL for remote images
surrounding documentation context
provider, model, and detail
prompt-affecting metadata such as alt text and headings

That means unchanged image/context pairs are not sent to the provider again. If a run is interrupted, the progress file lets the next run reuse completed work before rebuilding the output.

Useful controls:

ragimg index --docs ./docs --max-images 25
ragimg index --docs ./docs --include "**/*.md" --exclude "**/assets/logo*"
ragimg index --docs ./docs --no-cache
ragimg index --docs ./docs --no-resume

Providers

OpenAI is the default provider:

export OPENAI_API_KEY=...
ragimg index --docs ./docs --provider openai --model gpt-5.4-mini

For a local path, run Ollama and pick a vision-capable model:

ragimg index --docs ./docs --provider ollama --model llava

OpenAI can receive local images and remote image URLs. Ollama in v0.1 supports local images only. In both cases, --dry-run is the safe way to see what would be processed before sending anything to a model.

GitHub Action

- uses: balyakin/ragimg@v0.1.0
  with:
    docs: ./docs
    output: chunks.jsonl
    provider: openai
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

A common CI pattern is to run a dry run on pull requests and reserve paid captioning for a scheduled job or a protected branch:

- uses: balyakin/ragimg@v0.1.0
  with:
    docs: ./docs
    output: chunks.jsonl
    dry-run: "true"

Loading chunks

Full examples live in examples. The important part is simple: embed text, keep metadata, and use id as the document key.

Pinecone:

import json
from pinecone import Pinecone

pc = Pinecone()
index = pc.Index("docs")

for line in open("chunks.jsonl", encoding="utf-8"):
    chunk = json.loads(line)
    index.upsert_records(
        "default",
        [{"_id": chunk["id"], "text": chunk["text"], **chunk["metadata"]}],
    )

LangChain:

import json
from langchain_core.documents import Document

docs = []
for line in open("chunks.jsonl", encoding="utf-8"):
    chunk = json.loads(line)
    docs.append(Document(page_content=chunk["text"], metadata=chunk["metadata"]))

LlamaIndex:

import json
from llama_index.core import Document

documents = []
for line in open("chunks.jsonl", encoding="utf-8"):
    chunk = json.loads(line)
    documents.append(Document(text=chunk["text"], metadata=chunk["metadata"]))

Configuration

ragimg reads .ragimg.yaml when present. RAGIMG_CONFIG or --config can point to another file. Paths inside the config file are resolved relative to that file, which makes checked-in configs easier to move between machines.

docs: ./docs
output: chunks.jsonl
format: jsonl
provider: openai
model: gpt-5.4-mini
detail: low
workers: 4
timeout: 30s
cache: .ragimg-cache.json
resume: .ragimg-progress.json
dedup: true
max_images: 0
include:
  - "**/*.md"
  - "**/*.mdx"
  - "**/*.html"
  - "**/*.htm"
exclude:
  - "**/node_modules/**"

Provider secrets are never read from config files. Use OPENAI_API_KEY or --api-key. RAGIMG_CACHE can override the cache path, and NO_COLOR, RAGIMG_NO_COLOR=1, or --no-color disable colored output.

Commands

Command	Purpose
`ragimg index`	Scan docs and write caption chunks.
`ragimg preview --image path/to/image.png --dry-run`	Show the resolved prompt and image metadata for one image.
`ragimg report --input chunks.jsonl`	Build the static review report.
`ragimg stats chunks.jsonl`	Print totals, model usage, date range, and top source files.
`ragimg completion bash`	Generate shell completion.
`ragimg version`	Print build version, commit, and date.

What gets skipped

The default filters are intentionally conservative. ragimg skips missing local files, unsupported formats, Git LFS pointer files, very small or very large files, tiny dimensions, extreme aspect ratios, symlinks that leave the docs root, and filenames that look like logos, icons, badges, avatars, spacers, tracking pixels, or social sharing assets.

Supported image extensions are .png, .jpg, .jpeg, .gif, .webp, and .svg.

Supported document extensions are .md, .mdx, .html, and .htm.

Benchmarking

This README does not invent benchmark numbers. Generate current numbers from the checked-in fixture:

scripts/benchmark.sh

When publishing benchmark claims, include the command, date, commit, fixture or repository snapshot, and output summary.

Current limits

ragimg v0.1 is a local indexing tool. It does not crawl websites, parse PDFs, run OCR, upload directly to vector databases, or host a review UI. Those are good future features, but the first release keeps the contract narrow: scan local docs, caption useful images, write portable chunks, and make reruns cheap.

Development

go test ./...
go vet ./...

The repository includes CI for Go 1.25.x and 1.26.x, release binaries for Linux, macOS, and Windows, a Docker image, and a root action.yml for GitHub Actions.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
cmd		cmd
docs		docs
examples		examples
internal		internal
scripts		scripts
testdata		testdata
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ragimg

Why ragimg exists

What it does

Install

Quickstart

Output

HTML report

Cache and resume

Providers

GitHub Action

Loading chunks

Configuration

Commands

What gets skipped

Benchmarking

Current limits

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ragimg

Why ragimg exists

What it does

Install

Quickstart

Output

HTML report

Cache and resume

Providers

GitHub Action

Loading chunks

Configuration

Commands

What gets skipped

Benchmarking

Current limits

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages