Discourse X-Ray

LLM-powered visualizer for the argumentative skeleton of any text.

Paste or upload a document. Discourse X-Ray segments it into clauses, sends them to a large language model for Rhetorical Structure Theory (RST) parsing, and renders the resulting argument tree alongside structural metrics, a composite quality score, and actionable writing recommendations.

Why it exists

Essays, op-eds, policy papers, and research abstracts all rest on a hidden scaffolding: which claims are main points, which are evidence, which concede a counterargument. This scaffolding is invisible in the prose but decisive for whether an argument lands.

Discourse X-Ray makes that scaffolding visible:

Writers see which claims lack support (orphans) and where structure is muddled.
Teachers / tutors get an objective lens on student essays.
Analysts compare the rhetorical fingerprint of multiple documents.
Debate / policy teams audit the balance between assertion, evidence, and concession.

Features

Analysis

EDU segmentation — splits text into Elementary Discourse Units (clause-level spans) using OpenNLP.
RST parsing via Anthropic, OpenAI, or Google Gemini — swap providers without code changes.
Rhetorical tree — zoomable, pannable D3 visualization with relation-colored edges (EVIDENCE, CAUSE, CONTRAST, CONCESSION, etc.) and NUCLEUS/SATELLITE nuclearity markers.
Structural metrics — claim-to-evidence ratio, orphan claims, relation distribution, coherence distance.
Quality score — composite 0–100 rating with sub-scores for Evidence, Structure, Balance, and Coherence, plus human-readable insights.
Actionable recommendations — severity-tagged suggestions (ACTION / WARNING / TIP) that link directly to the affected nodes.

Interaction

Node inspector — click any tree node to see full text, nuclearity, relation, span, depth, children list, and the path-to-root breadcrumb.
Orphan-to-source highlighting — click an orphan claim or recommendation to auto-scroll the textarea and select the exact span.
File upload — drag-and-drop or picker for .txt, .md, .html, .json, .pdf, .docx (client-side extraction via pdf.js + mammoth).
History drawer — every analysis is persisted; sidebar lists recent runs with relative timestamps.
Shareable links — every analysis gets a ?a={id} URL that rehydrates the full state.
Provider switcher — choose LLM per-request from the dropdown.

Platform

Result caching — Caffeine (local) or Redis (distributed) cache keyed on (provider, text hash); identical re-analyses skip the LLM call. Switch backends via CACHE_TYPE=redis.
Persistence — analyses stored in PostgreSQL (prod) or in-memory H2 (dev profile).
CORS, validation, structured error responses built-in.

Architecture

┌──────────────┐   POST /api/analyze   ┌─────────────────────────────────┐
│  React SPA   │ ────────────────────► │  Spring Boot  (port 8080)       │
│  (Vite)      │                       │                                 │
│  port 5173   │ ◄──────────────────── │  ┌───────────────────────────┐  │
└──────────────┘     JSON + tree       │  │  EduSegmenter (OpenNLP)   │  │
                                       │  ├───────────────────────────┤  │
                                       │  │  RstParser  →  LlmClient  │  │
                                       │  │   └ Anthropic / OpenAI    │  │
                                       │  │   └ Gemini                │  │
                                       │  ├───────────────────────────┤  │
                                       │  │  TreeAssembler (JGraphT)  │  │
                                       │  ├───────────────────────────┤  │
                                       │  │  MetricsService           │  │
                                       │  │  QualityScore             │  │
                                       │  │  Recommendation engine    │  │
                                       │  ├───────────────────────────┤  │
                                       │  │  AnalysisService (JPA)    │  │
                                       │  └───────────────────────────┘  │
                                       └──────────────┬──────────────────┘
                                                      │
                                             ┌────────▼────────┐
                                             │  PostgreSQL /   │
                                             │  H2 (dev)       │
                                             └─────────────────┘

Key modules (backend)

Package	Purpose
`parser`	EDU segmentation, LLM clients, JSON response parser, tree assembly
`domain`	`DiscourseNode`, `DiscourseTree`, `RhetoricalRelation`, `Nuclearity`
`metrics`	`MetricsService`, `QualityScore`, `Recommendation`
`persistence`	`AnalysisEntity`, `AnalysisService`, `AnalysisRepository`
`api`	`AnalyzeController`, DTOs, global exception handler
`config`	`DiscurseProperties`, CORS, cache enablement

Frontend layout (`frontend/src/`)

App.tsx                  orchestrates state, routing-by-query-param
api.ts                   typed client wrappers
components/
  TreeView.tsx           D3 rendering, zoom/pan, click-to-inspect
  MetricsPanel.tsx       metric cards + relation bars
  QualityScoreCard.tsx   ring gauge + sub-score bars
  RecommendationsPanel.tsx  severity-coded recs with jump-to-node
  NodeInspector.tsx      per-node detail view + actions
  HistoryDrawer.tsx      recent analyses sidebar
lib/
  fileExtract.ts         pdf.js + mammoth wrappers

Quick start with Docker

The fastest path — one command brings up PostgreSQL, Redis, backend, and frontend.

cp .env.example .env
# edit .env and set at least ANTHROPIC_API_KEY (or OPENAI / GEMINI)
docker compose up --build

Frontend: http://localhost:5173
Backend: http://localhost:8080
Postgres: localhost:5432 (inside the network: postgres:5432)
Redis: internal only (used as the RST parse cache)

Nginx in the frontend container proxies /api/* to the backend service, so no CORS config needed in Docker mode.

Override ports via env:

BACKEND_PORT=9090 FRONTEND_PORT=3000 docker compose up

Stop and wipe volumes:

docker compose down -v

Quick start

Prerequisites

Java 21+ (project compiles to 17, JDK 17 minimum works)
Node 18+
Maven 3.9+
One LLM API key: Anthropic, OpenAI, or Google Gemini

1. Clone & configure

git clone <repo-url>
cd discurse
cp .env.example .env    # optional — or export variables directly

Set at least one API key:

# Windows / PowerShell
$env:ANTHROPIC_API_KEY = "sk-ant-..."
$env:LLM_PROVIDER = "anthropic"
$env:SPRING_PROFILES_ACTIVE = "dev"   # uses in-memory H2 — no DB setup

# macOS / Linux
export ANTHROPIC_API_KEY="sk-ant-..."
export LLM_PROVIDER="anthropic"
export SPRING_PROFILES_ACTIVE="dev"

2. Run the backend

mvn spring-boot:run

Listens on http://localhost:8080. Verify:

GET  http://localhost:8080/api/health

3. Run the frontend

cd frontend
npm install
npm run dev

Open http://localhost:5173.

Using a different backend port

$env:SERVER_PORT = "9090"
mvn spring-boot:run

Then create frontend/.env.local:

VITE_API_BASE=http://localhost:9090

Restart npm run dev.

Production database (PostgreSQL)

Drop the dev profile and supply connection vars:

$env:DB_URL = "jdbc:postgresql://localhost:5432/discurse"
$env:DB_USER = "discurse"
$env:DB_PASSWORD = "discurse"

Create the user and database once:

CREATE USER discurse WITH PASSWORD 'discurse';
CREATE DATABASE discurse OWNER discurse;

HTTP API

All endpoints live under /api.

`POST /api/analyze`

{
  "text": "Remote work boosts productivity. Studies show...",
  "provider": "anthropic"          // optional; defaults to server config
}

Response:

{
  "id": 12,
  "inputText": "…",
  "tree": {
    "rootId": "n0",
    "nodes": [{ "id": "e0", "text": "…", "nuclearity": "NUCLEUS", "start": 0, "end": 30 }],
    "edges": [{ "source": "n0", "target": "e0", "relation": "EVIDENCE" }]
  },
  "metrics": {
    "claimToEvidenceRatio": 0.67,
    "nucleusCount": 2,
    "satelliteCount": 3,
    "orphanClaims": [],
    "relationDistribution": { "EVIDENCE": 0.4, "CONTRAST": 0.2 },
    "coherence": { "averageDistance": 2.1, "maxDistance": 3, "outliers": [] }
  },
  "quality": {
    "overall": 76, "evidence": 80, "structure": 74, "balance": 60, "coherence": 90,
    "insights": ["Strong evidentiary support throughout."]
  },
  "recommendations": [
    {
      "id": "no-counterpoint",
      "severity": "WARNING",
      "category": "Balance",
      "title": "Add a counterpoint",
      "suggestion": "No CONTRAST or CONCESSION detected...",
      "nodeIds": []
    }
  ]
}

`GET /api/providers`

Returns { "available": ["anthropic", "openai", "gemini"] } — reflects which keys are configured.

`GET /api/health`

{ "status": "ok" | "degraded", "providers": [...] }.

`GET /api/analyses?limit=20`

Recent analyses as summaries (id, provider, createdAt, preview).

`GET /api/analyses/{id}`

Full AnalyzeResponse for a stored analysis.

Configuration reference

Edit src/main/resources/application.yml or override via environment variables.

Property	Default	Description
`server.port`	`8080`	Backend HTTP port
`spring.cache.type`	`caffeine`	Cache backend. Set `redis` to use Redis
`spring.data.redis.host`	`localhost`	Redis host (only used when cache type = redis)
`spring.data.redis.port`	`6379`	Redis port
`discurse.llm.provider`	`anthropic`	Default provider when request omits one
`discurse.llm.max-tokens`	`32000`	Upper bound on LLM output
`discurse.llm.timeout-seconds`	`120`	LLM request timeout
`discurse.llm.anthropic-model`	`claude-sonnet-4-6`	Anthropic model id
`discurse.llm.openai-model`	`gpt-4o`	OpenAI model id
`discurse.llm.gemini-model`	`gemini-2.5-pro`	Gemini model id
`discurse.parser.max-edus`	`400`	Hard cap on discourse units per doc
`discurse.parser.max-chars`	`60000`	Hard cap on input length
`discurse.metrics.coherence-distance-threshold`	`5`	Depth at which nodes count as outliers

Environment-variable equivalents: uppercase, dot → underscore (e.g. DISCURSE_LLM_MAX_TOKENS=16000).

Tech stack

Backend

Java 17 / Spring Boot 3.3
Spring Data JPA + PostgreSQL / H2
JGraphT for graph analytics
OpenNLP 2.3 for sentence detection
Caffeine cache
Anthropic Java SDK / OpenAI REST / Google Gemini REST

Frontend

React 18 + TypeScript
Vite 5
D3 7 (hierarchy, zoom, selection)
pdfjs-dist (client-side PDF extraction)
mammoth (client-side .docx extraction)

Development

Run tests

mvn test

Build for production

mvn clean package              # backend → target/*.jar
cd frontend && npm run build   # frontend → dist/

Linting / types

cd frontend && npx tsc -b --noEmit

Known limitations

Scanned PDFs — pdf.js extracts text layer only; no OCR.
Non-English — EDU segmentation uses an English OpenNLP model. LLMs will still parse other languages, but segmentation quality drops.
Long documents — capped at 60 000 chars / 400 EDUs by default. Split into sections for book-length inputs.
Gemini free tier — gemini-2.5-pro has a 0-request/day free quota; switch to gemini-2.5-flash or upgrade the plan.
LLM variance — same text may produce slightly different trees across runs; the cache mitigates this for identical inputs.

Roadmap

High-leverage features on the backlog — contributions welcome:

Draft comparison view (v1 vs v2 structural diff)
Export (SVG / PNG / Markdown report)
Browser extension for right-click analysis
Streaming tree rendering as the LLM produces JSON
Multi-model consensus view (disagreement surfaced as uncertainty)
Local open-weights RST parser for offline mode
Inline counterargument generator per nucleus

License

TBD.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
frontend		frontend
scripts		scripts
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Discourse X-Ray_ Viability Analysis.pdf		Discourse X-Ray_ Viability Analysis.pdf
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pom.xml		pom.xml

Folders and files

Latest commit

History

Repository files navigation

Discourse X-Ray

Why it exists

Features

Analysis

Interaction

Platform

Architecture

Key modules (backend)

Frontend layout (frontend/src/)

Quick start with Docker

Quick start

Prerequisites

1. Clone & configure

2. Run the backend

3. Run the frontend

Using a different backend port

Production database (PostgreSQL)

HTTP API

POST /api/analyze

GET /api/providers

GET /api/health

GET /api/analyses?limit=20

GET /api/analyses/{id}

Configuration reference

Tech stack

Development

Run tests

Build for production

Linting / types

Known limitations

Roadmap

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Frontend layout (`frontend/src/`)

`POST /api/analyze`

`GET /api/providers`

`GET /api/health`

`GET /api/analyses?limit=20`

`GET /api/analyses/{id}`

Packages