LLM-powered visualizer for the argumentative skeleton of any text.
Paste or upload a document. Discourse X-Ray segments it into clauses, sends them to a large language model for Rhetorical Structure Theory (RST) parsing, and renders the resulting argument tree alongside structural metrics, a composite quality score, and actionable writing recommendations.
Essays, op-eds, policy papers, and research abstracts all rest on a hidden scaffolding: which claims are main points, which are evidence, which concede a counterargument. This scaffolding is invisible in the prose but decisive for whether an argument lands.
Discourse X-Ray makes that scaffolding visible:
- Writers see which claims lack support (orphans) and where structure is muddled.
- Teachers / tutors get an objective lens on student essays.
- Analysts compare the rhetorical fingerprint of multiple documents.
- Debate / policy teams audit the balance between assertion, evidence, and concession.
- EDU segmentation — splits text into Elementary Discourse Units (clause-level spans) using OpenNLP.
- RST parsing via Anthropic, OpenAI, or Google Gemini — swap providers without code changes.
- Rhetorical tree — zoomable, pannable D3 visualization with relation-colored edges (EVIDENCE, CAUSE, CONTRAST, CONCESSION, etc.) and NUCLEUS/SATELLITE nuclearity markers.
- Structural metrics — claim-to-evidence ratio, orphan claims, relation distribution, coherence distance.
- Quality score — composite 0–100 rating with sub-scores for Evidence, Structure, Balance, and Coherence, plus human-readable insights.
- Actionable recommendations — severity-tagged suggestions (ACTION / WARNING / TIP) that link directly to the affected nodes.
- Node inspector — click any tree node to see full text, nuclearity, relation, span, depth, children list, and the path-to-root breadcrumb.
- Orphan-to-source highlighting — click an orphan claim or recommendation to auto-scroll the textarea and select the exact span.
- File upload — drag-and-drop or picker for
.txt,.md,.html,.json,.pdf,.docx(client-side extraction via pdf.js + mammoth). - History drawer — every analysis is persisted; sidebar lists recent runs with relative timestamps.
- Shareable links — every analysis gets a
?a={id}URL that rehydrates the full state. - Provider switcher — choose LLM per-request from the dropdown.
- Result caching — Caffeine (local) or Redis (distributed) cache keyed on
(provider, text hash); identical re-analyses skip the LLM call. Switch backends viaCACHE_TYPE=redis. - Persistence — analyses stored in PostgreSQL (prod) or in-memory H2 (dev profile).
- CORS, validation, structured error responses built-in.
┌──────────────┐ POST /api/analyze ┌─────────────────────────────────┐
│ React SPA │ ────────────────────► │ Spring Boot (port 8080) │
│ (Vite) │ │ │
│ port 5173 │ ◄──────────────────── │ ┌───────────────────────────┐ │
└──────────────┘ JSON + tree │ │ EduSegmenter (OpenNLP) │ │
│ ├───────────────────────────┤ │
│ │ RstParser → LlmClient │ │
│ │ └ Anthropic / OpenAI │ │
│ │ └ Gemini │ │
│ ├───────────────────────────┤ │
│ │ TreeAssembler (JGraphT) │ │
│ ├───────────────────────────┤ │
│ │ MetricsService │ │
│ │ QualityScore │ │
│ │ Recommendation engine │ │
│ ├───────────────────────────┤ │
│ │ AnalysisService (JPA) │ │
│ └───────────────────────────┘ │
└──────────────┬──────────────────┘
│
┌────────▼────────┐
│ PostgreSQL / │
│ H2 (dev) │
└─────────────────┘
| Package | Purpose |
|---|---|
parser |
EDU segmentation, LLM clients, JSON response parser, tree assembly |
domain |
DiscourseNode, DiscourseTree, RhetoricalRelation, Nuclearity |
metrics |
MetricsService, QualityScore, Recommendation |
persistence |
AnalysisEntity, AnalysisService, AnalysisRepository |
api |
AnalyzeController, DTOs, global exception handler |
config |
DiscurseProperties, CORS, cache enablement |
App.tsx orchestrates state, routing-by-query-param
api.ts typed client wrappers
components/
TreeView.tsx D3 rendering, zoom/pan, click-to-inspect
MetricsPanel.tsx metric cards + relation bars
QualityScoreCard.tsx ring gauge + sub-score bars
RecommendationsPanel.tsx severity-coded recs with jump-to-node
NodeInspector.tsx per-node detail view + actions
HistoryDrawer.tsx recent analyses sidebar
lib/
fileExtract.ts pdf.js + mammoth wrappers
The fastest path — one command brings up PostgreSQL, Redis, backend, and frontend.
cp .env.example .env
# edit .env and set at least ANTHROPIC_API_KEY (or OPENAI / GEMINI)
docker compose up --build- Frontend: http://localhost:5173
- Backend: http://localhost:8080
- Postgres: localhost:5432 (inside the network:
postgres:5432) - Redis: internal only (used as the RST parse cache)
Nginx in the frontend container proxies /api/* to the backend service, so no CORS config needed in Docker mode.
Override ports via env:
BACKEND_PORT=9090 FRONTEND_PORT=3000 docker compose upStop and wipe volumes:
docker compose down -v- Java 21+ (project compiles to 17, JDK 17 minimum works)
- Node 18+
- Maven 3.9+
- One LLM API key: Anthropic, OpenAI, or Google Gemini
git clone <repo-url>
cd discurse
cp .env.example .env # optional — or export variables directlySet at least one API key:
# Windows / PowerShell
$env:ANTHROPIC_API_KEY = "sk-ant-..."
$env:LLM_PROVIDER = "anthropic"
$env:SPRING_PROFILES_ACTIVE = "dev" # uses in-memory H2 — no DB setup# macOS / Linux
export ANTHROPIC_API_KEY="sk-ant-..."
export LLM_PROVIDER="anthropic"
export SPRING_PROFILES_ACTIVE="dev"mvn spring-boot:runListens on http://localhost:8080. Verify:
GET http://localhost:8080/api/health
cd frontend
npm install
npm run devOpen http://localhost:5173.
$env:SERVER_PORT = "9090"
mvn spring-boot:runThen create frontend/.env.local:
VITE_API_BASE=http://localhost:9090
Restart npm run dev.
Drop the dev profile and supply connection vars:
$env:DB_URL = "jdbc:postgresql://localhost:5432/discurse"
$env:DB_USER = "discurse"
$env:DB_PASSWORD = "discurse"Create the user and database once:
CREATE USER discurse WITH PASSWORD 'discurse';
CREATE DATABASE discurse OWNER discurse;All endpoints live under /api.
{
"text": "Remote work boosts productivity. Studies show...",
"provider": "anthropic" // optional; defaults to server config
}Response:
{
"id": 12,
"inputText": "…",
"tree": {
"rootId": "n0",
"nodes": [{ "id": "e0", "text": "…", "nuclearity": "NUCLEUS", "start": 0, "end": 30 }],
"edges": [{ "source": "n0", "target": "e0", "relation": "EVIDENCE" }]
},
"metrics": {
"claimToEvidenceRatio": 0.67,
"nucleusCount": 2,
"satelliteCount": 3,
"orphanClaims": [],
"relationDistribution": { "EVIDENCE": 0.4, "CONTRAST": 0.2 },
"coherence": { "averageDistance": 2.1, "maxDistance": 3, "outliers": [] }
},
"quality": {
"overall": 76, "evidence": 80, "structure": 74, "balance": 60, "coherence": 90,
"insights": ["Strong evidentiary support throughout."]
},
"recommendations": [
{
"id": "no-counterpoint",
"severity": "WARNING",
"category": "Balance",
"title": "Add a counterpoint",
"suggestion": "No CONTRAST or CONCESSION detected...",
"nodeIds": []
}
]
}Returns { "available": ["anthropic", "openai", "gemini"] } — reflects which keys are configured.
{ "status": "ok" | "degraded", "providers": [...] }.
Recent analyses as summaries (id, provider, createdAt, preview).
Full AnalyzeResponse for a stored analysis.
Edit src/main/resources/application.yml or override via environment variables.
| Property | Default | Description |
|---|---|---|
server.port |
8080 |
Backend HTTP port |
spring.cache.type |
caffeine |
Cache backend. Set redis to use Redis |
spring.data.redis.host |
localhost |
Redis host (only used when cache type = redis) |
spring.data.redis.port |
6379 |
Redis port |
discurse.llm.provider |
anthropic |
Default provider when request omits one |
discurse.llm.max-tokens |
32000 |
Upper bound on LLM output |
discurse.llm.timeout-seconds |
120 |
LLM request timeout |
discurse.llm.anthropic-model |
claude-sonnet-4-6 |
Anthropic model id |
discurse.llm.openai-model |
gpt-4o |
OpenAI model id |
discurse.llm.gemini-model |
gemini-2.5-pro |
Gemini model id |
discurse.parser.max-edus |
400 |
Hard cap on discourse units per doc |
discurse.parser.max-chars |
60000 |
Hard cap on input length |
discurse.metrics.coherence-distance-threshold |
5 |
Depth at which nodes count as outliers |
Environment-variable equivalents: uppercase, dot → underscore
(e.g. DISCURSE_LLM_MAX_TOKENS=16000).
Backend
- Java 17 / Spring Boot 3.3
- Spring Data JPA + PostgreSQL / H2
- JGraphT for graph analytics
- OpenNLP 2.3 for sentence detection
- Caffeine cache
- Anthropic Java SDK / OpenAI REST / Google Gemini REST
Frontend
- React 18 + TypeScript
- Vite 5
- D3 7 (hierarchy, zoom, selection)
- pdfjs-dist (client-side PDF extraction)
- mammoth (client-side .docx extraction)
mvn testmvn clean package # backend → target/*.jar
cd frontend && npm run build # frontend → dist/cd frontend && npx tsc -b --noEmit- Scanned PDFs — pdf.js extracts text layer only; no OCR.
- Non-English — EDU segmentation uses an English OpenNLP model. LLMs will still parse other languages, but segmentation quality drops.
- Long documents — capped at 60 000 chars / 400 EDUs by default. Split into sections for book-length inputs.
- Gemini free tier —
gemini-2.5-prohas a 0-request/day free quota; switch togemini-2.5-flashor upgrade the plan. - LLM variance — same text may produce slightly different trees across runs; the cache mitigates this for identical inputs.
High-leverage features on the backlog — contributions welcome:
- Draft comparison view (v1 vs v2 structural diff)
- Export (SVG / PNG / Markdown report)
- Browser extension for right-click analysis
- Streaming tree rendering as the LLM produces JSON
- Multi-model consensus view (disagreement surfaced as uncertainty)
- Local open-weights RST parser for offline mode
- Inline counterargument generator per nucleus
TBD.