Recon tool for RAG pipelines. Point it at a RAG-enabled API endpoint and it'll figure out if there's retrieval happening, what documents are in the knowledge base, where the retrieval thresholds fall off, and how the chunks are structured.
Built for AI red team engagements.
git clone https://github.com/dejisec/ragmap.git
uv sync
uv run ragmap --help# Full scan -- runs all 4 phases
ragmap scan http://target/api/chat --preset langchain
# With auth
ragmap scan http://target/api/chat \
-H "Authorization: Bearer tok_xxx" \
--preset llamaindex
# JSON output
ragmap scan http://target/api/chat --preset haystack --json -o results.json
# Through a proxy
ragmap scan http://target/api/chat \
--proxy http://127.0.0.1:8080 \
--preset langchainragmap runs four phases. You can run them individually or use scan to run all four sequentially.
ragmap scan <URL> [OPTIONS]Sends a control query and a domain query, compares the responses, and tells you whether retrieval is in play. Also classifies how much metadata the endpoint leaks (none / minimal / moderate / detailed).
ragmap detect <URL> [OPTIONS]
--control-query TEXT General knowledge query (default: "What is 2+2?")
--domain-query TEXT Domain-specific query (default: "What are the company policies?")Iterates through topics, sends natural-sounding queries, and builds an inventory of documents. Deduplicates by title and tracks how often each doc gets hit.
ragmap enumerate <URL> [OPTIONS]
--topics FILE Custom topic list (one per line)
--exhaustive Don't stop on diminishing returns
--max-queries INT Query limit (default: 50)Takes a reference query and progressively degrades it -- synonym swaps, misspellings, off-topic drift -- to find where the retriever stops returning results.
ragmap threshold <URL> [OPTIONS]
--test-query TEXT Query to degrade (default: "What is the company policy?")Offline analysis only (no HTTP requests). Feeds in sources from a prior scan and looks for chunk ID patterns, estimates chunk sizes, and detects text overlap between chunks.
ragmap chunks --input results.json [OPTIONS]These work with scan, detect, enumerate, and threshold:
| Flag | What it does |
|---|---|
-H, --header |
HTTP header Key: Value (repeatable) |
-b, --cookie |
Cookie name=value (repeatable) |
--proxy |
Proxy URL (e.g. http://127.0.0.1:8080) |
--preset |
generic, langchain, llamaindex, haystack (default: generic) |
--delay |
Seconds between requests (default: 1.0) |
--jitter |
Random +/- jitter on delay (default: 0.0) |
--insecure |
Skip TLS cert verification |
--timeout |
Request timeout in seconds (default: 30.0) |
--json |
JSON output |
-o, --output |
Write output to file |
-v, --verbose |
Show request/response details |
Presets tell ragmap how to build requests and parse responses for common RAG frameworks. If your target uses something custom, see generic below.
ragmap scan http://target/api/chat --preset langchainSends {"query": "..."}, parses source_documents[].page_content and source_documents[].metadata.source.
ragmap scan http://target/api/chat --preset llamaindexSends {"query": "..."}, parses source_nodes[].node.text and source_nodes[].node.metadata.file_name.
ragmap scan http://target/api/chat --preset haystackSends {"query": "..."}, parses documents[].content and documents[].meta.name.
For anything else. You tell ragmap where things are using dot-notation paths:
ragmap scan http://target/api/chat \
--preset generic \
--body-template '{"input": "{query}", "mode": "search"}' \
--source-path "results[].doc" \
--answer-path "output.text" \
--score-path "results[].relevance" \
--title-path "name" \
--chunk-id-path "id" \
--text-path "content"| Flag | Default | What it does |
|---|---|---|
--body-template |
{"query": "..."} |
JSON body with {query} placeholder |
--query-field |
query |
Field name for query (when no template) |
--source-path |
sources[] |
Path to the sources array |
--answer-path |
answer |
Path to the LLM answer |
--score-path |
score |
Similarity score field |
--title-path |
title |
Document title field |
--chunk-id-path |
chunk_id |
Chunk ID field |
--text-path |
text |
Chunk text field |
--retrieval-time-path |
retrieval_time_ms |
Retrieval time field |
Reduces the chance of tripping detection rules.
ragmap scan http://target/api/chat \
--preset langchain \
--stealthWhat changes with --stealth:
- Default queries swap to non-obvious alternatives that still get the job done
- Delay bumps to 5s (from 1s), jitter to 2s (from 0s) -- unless you override
- Session IDs rotate every N requests to break burst correlation
- User-supplied queries get checked against known detection patterns (you'll get a warning)
| Flag | Default | What it does |
|---|---|---|
--stealth |
off | Enable stealth mode |
--evasion-rules |
built-in | Custom evasion rules YAML |
--rotate-every |
3 | Rotate session ID every N requests |
--session-field |
session_id |
Request body field for session ID |
categories:
document_enumeration:
severity: high
evasion_tip: "Ask contextual questions that force source citation"
triggers:
- "what * documents"
- "list * sources"
- "enumerate *"
burst_threshold: 3Custom rules merge with the defaults:
ragmap scan http://target/api/chat --stealth --evasion-rules custom.yml