A miniature watsonx-style platform — built end-to-end from scratch
Spring Boot · Ollama · Parquet · 768-dim embeddings · RAG (chunking + reranking) · multimodal (vision + OCR) · tabular text-to-SQL (DuckDB) · multi-tenant · PII governance
정형 표(CSV/XLSX)는 DuckDB로 text-to-SQL — 집계는 SQL, 텍스트는 RAG.
MiniWatson is a learning project that recreates IBM watsonx's 3-layer architecture (data · ai · governance) at a small scale. The goal: understand how enterprise GenAI platforms work by building one — not by reading about it.
답변마다 출처(grounding)가 붙고, 모든 LLM 호출이 감사 로그에 남는다 — RAG와 거버넌스를 한 화면에서.
Three layers, each mapping to a watsonx component:
┌─────────────────────────────────────────────────────────┐
│ Frontend (Plain HTML + JS · IBM Carbon style) │
│ http://localhost:8080 │
└─────────────────────────┬───────────────────────────────┘
│ REST · JSON
┌─────────────────────────▼───────────────────────────────┐
│ Backend: Spring Boot 4 · Java 21 (Temurin/HotSpot) │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ AI Layer (watsonx.ai analog) │ │
│ │ • Chat: multi-LLM, per-request (gemma/granite/..) │ │
│ │ • Embeddings: 768-dim (granite-embedding:278m) │ │
│ │ • Vision: image Q&A (llava / granite-vision) │ │
│ │ • OCR grounding: Tesseract (exact text/numbers) │ │
│ │ • RAG: chunk → embed → hybrid search → rerank │ │
│ │ • Hybrid: vector + BM25 keyword (RRF fusion) │ │
│ │ • Reranking: none/llm/mmr/cross (pluggable) │ │
│ │ • Tabular: CSV/XLSX → text-to-SQL (DuckDB) │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Data Layer (watsonx.data analog) │ │
│ │ • Ingest: Wikipedia / image (vision+OCR) / file │ │
│ │ • Multi-format: PDF/DOCX/PPTX/XLSX/HTML + HWP/HWPX│ │
│ │ • Chunking: fixed / recursive / semantic │ │
│ │ • Multi-tenant namespaces + dedup + CRUD │ │
│ │ • Tiered: hot JSON → cold Parquet (compaction) │ │
│ │ • Catalog: H2 doc metadata (catalog/data split) │ │
│ │ • Parquet (Avro schema, SNAPPY) — 7× < JSON │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Governance Layer (watsonx.governance analog) │ │
│ │ • Auto audit log every LLM call in H2 │ │
│ │ • Tracks model, latency, timestamp │ │
│ │ • PII detection & redaction before persist │ │
│ │ • Provenance: source chunks logged per answer │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
| Layer | Choice | Why |
|---|---|---|
| Language | Java 21 (Temurin/HotSpot) | OpenJ9는 요청 처리 중 walkStackFrames 크래시로 회피. HOTSPOT-RUNTIME.md |
| Framework | Spring Boot 4.0 | Enterprise standard, fast bootstrap |
| LLM | Ollama (local) 기본, 제공자 교체 가능 | 자체호스팅(키 불필요), LlmClient 추상화로 watsonx/Bedrock 등 설정 교체. LLM-ABSTRACTION.md |
| Chat model | ibm/granite4:latest (default) · multi-LLM | per-request model, whitelist-validated |
| Embedding model | granite-embedding:278m (default) · nomic / 30m / mxbai compared | 768-dim 다국어 승자 (recall 97%, 한국어 11/11). 4종 비교는 EMBEDDINGS.md |
| Vision model | llava / granite-vision | image Q&A + caption (multimodal) |
| OCR | Tesseract (CLI) | exact text/number extraction for grounding |
| Data format | Apache Parquet | Columnar + SNAPPY = 7× smaller than JSON |
| Schema | Avro | Schema-first, evolution-safe |
| Storage | Tiered (JSON hot → Parquet cold) | cheap appends + columnar compaction |
| Catalog | H2 document_catalog (mirror) | SQL-queryable KB metadata; catalog/data split |
| Retrieval | In-memory vector index (brute-force default, LSH opt-in) | exact cosine by default; LSH for sub-linear approximate kNN |
| Vector store | In-memory ↔ pgvector (vector.store 스위치) |
영속·확장은 pgvector(HNSW), 차원실험은 인메모리. 인메모리 패리티 35/35. PGVECTOR.md |
| Hybrid search | Vector + BM25 keyword, RRF fusion | lexical recall for exact tokens (IDs, codes) |
| Chunking | fixed / recursive / semantic (pluggable) | recursive default; balance quality vs cost |
| Reranking | none / llm / mmr / cross (pluggable) | two-stage: fetch top-N → rerank → top-K |
| Cross-encoder | DJL + PyTorch + BGE-reranker | dedicated reranker model (Linux/Apple Silicon) |
| Tabular SQL | DuckDB (embedded, in-memory) | text-to-SQL over CSV/XLSX; aggregation RAG can't do |
| Database | H2 (in-memory) | Zero config for governance audit |
| Security | API key / JWT 인증 + 테넌트 격리 강제 | namespace를 코드로 강제(authN/authZ 분리, A/B/C 3안). SECURITY.md |
| CI/CD | GitHub Actions + GitLab CI + Docker | 양쪽 ./mvnw test 게이트(이식성). 이미지 빌드·푸시(GHCR)는 GitHub만, GitLab은 테스트 게이트 |
| Build | Maven | pom.xml + spring-boot-maven-plugin |
| Frontend | Plain HTML + JS | No framework lock-in, instant load |
| 문서 | 내용 |
|---|---|
| ARCHITECTURE.md | 컴포넌트·데이터 흐름 |
| SECURITY.md | 위협모델 · 인증 A/B/C · 테넌트 격리 · 설계결정 |
| RAG-LANDSCAPE.md | RAG 종류(Naive/Advanced/Modular, RAPTOR/CRAG/GraphRAG 등) · 로컬 구현 가능성 · MiniWatson 현황 · 로드맵 |
| PGVECTOR.md | pgvector 이관 · HNSW · 인메모리 패리티 |
| EMBEDDINGS.md | 임베딩 4종 비교 (승자 granite-278m) |
| CHUNKING.md | 청킹 전략 + 약어 확장 |
| EVALUATION.md | 골든셋 recall + text-to-SQL |
| DEBUGGING.md | 실전 트러블슈팅 |
| DECISIONS.md | 기술 선택 결정 가이드 |
| OPERATIONS.md | 배포 · 재임베딩 · 가용성 · 장애 런북 · 프로덕션 체크리스트 |
| CLOUD-DEPLOYMENT.md | 벤더 중립 배포(추론 제공자/호스트 교체), Phase 0 |
| HOTSPOT-RUNTIME.md | OpenJ9 크래시에서 HotSpot 전환 근거와 절차 |
| LLM-ABSTRACTION.md | LlmClient/EmbeddingClient 추상화, 거버넌스 분리 |
| CICD.md | CI/CD 파이프라인 현황과 갭 |
# 1. Java 21 (Temurin/HotSpot recommended)
java --version # → openjdk 21+
# 2. Ollama
brew install ollama
ollama pull ibm/granite4:latest # chat (default)
ollama pull granite-embedding:278m # 768-dim embeddings (default, 다국어)
ollama pull llava # vision (multimodal Q&A / image ingest)
# 3. OCR (for image grounding — exact text/numbers)
brew install tesseract
# 4. Start Ollama server (separate terminal)
ollama serve# Clone
git clone https://github.com/dea980/miniwatson.git
cd miniwatson
# Run Spring Boot
./mvnw spring-boot:run
# Open browser
open http://localhost:8080- Ingest a Wikipedia article
→ TypeRetrieval-augmented_generation, click Ingest - Ask a question
→ TypeWhat is RAG?, click Submit - Audit Trail tab
→ See every Q&A logged with model, latency, timestamp
curl -X POST "http://localhost:8080/api/data/ingest?title=Vector_database"Returns the stored article (with id, title, summary, url, ingestedAt).
Embedding is generated and stored in Parquet but hidden from the response.
Optional &namespace=acme scopes the article to a tenant (default: default).
curl -X POST http://localhost:8080/api/rag/ask \
-H "Content-Type: application/json" \
-d '{"question": "What is RAG?", "namespace": "default", "model": "ibm/granite4:latest"}'namespace and model are optional. Returns the answer plus the top-K source
articles used for grounding.
curl http://localhost:8080/api/rag/models # { default, available[] }# Ask about an image (vision + OCR grounding)
curl -X POST http://localhost:8080/api/multimodal/ask \
-F "image=@invoice.png" -F "question=What is the total?"
# Ingest an image into the knowledge base (searchable by later text queries)
curl -X POST http://localhost:8080/api/multimodal/ingest \
-F "image=@invoice.png" -F "namespace=demo"curl -X POST http://localhost:8080/api/data/ingest-file \
-F "file=@report.pdf" -F "namespace=demo"The file is text-extracted, split into chunks, and each chunk stored as an
Article (title #1, #2, ...). Extraction branches by extension: Tika
(PDF/DOCX/PPTX/XLSX/HTML/txt/md/csv) and HWP/HWPX via hwplib/hwpxlib (see
docs/INGESTION-FORMATS.md). Returns the list of
created chunks. Chunk strategy/size via chunking.* config.
curl -X POST http://localhost:8080/api/data/summarize/5 # any chunk id of the docAggregates all chunks of the document (by base title) and returns a summary.
This is separate from RAG ask — summarization needs the whole document, not
retrieved fragments.
curl http://localhost:8080/api/data/articles # all (or ?namespace=demo)
curl -X DELETE http://localhost:8080/api/data/articles/5 # remove by id (+index resync)
curl http://localhost:8080/api/data/index/stats # mode (brute-force default), vectors, buckets# List documents (chunks grouped by namespace + title)
curl "http://localhost:8080/api/data/documents"
# Delete a whole document (all its chunks at once)
curl -X DELETE "http://localhost:8080/api/data/documents?title=report.pdf&namespace=demo"A long file is stored as many chunks; these endpoints present and manage it as one
document. The same metadata is mirrored to the H2 document_catalog for SQL queries.
curl http://localhost:8080/api/governance/logs # every LLM call (model, latency, PII, sources)
curl http://localhost:8080/api/governance/stats # aggregates: per-model, per-source-type, KPI totalsEvery LLM call is logged: question, answer, model, latency (ms), timestamp, and PII count (sensitive data is masked before persisting).
All settings are externalized via Spring profiles and environment variables.
# application.yaml
spring:
profiles:
active: dev # dev | demo | prod
ollama:
url: ${OLLAMA_URL:http://localhost:11434}
chat-model: ${OLLAMA_CHAT_MODEL:ibm/granite4:latest}
chat-models: ${OLLAMA_CHAT_MODELS:ibm/granite4:latest,gemma4} # multi-LLM whitelist
embed-model: ${OLLAMA_EMBED_MODEL:granite-embedding:278m} # 비교 승자 (EMBEDDINGS.md 7절)
vision-model: ${OLLAMA_VISION_MODEL:llava:latest} # multimodal
num-predict: ${OLLAMA_NUM_PREDICT:256}
retrieval:
hybrid:
enabled: true # vector + BM25 (false = vector-only)
storage:
tier:
threshold: ${STORAGE_TIER_THRESHOLD:3} # hot(JSON) count before compaction → Parquet
vector:
index:
lsh:
enabled: ${VECTOR_LSH_ENABLED:false} # brute-force default; true = LSH approximate kNN
hyperplanes: ${VECTOR_LSH_HYPERPLANES:16}
chunking:
strategy: recursive # fixed | recursive | semantic
max-size: 1000 # chars per chunk
rerank:
strategy: mmr # none | llm | mmr | cross
eval:
overrides:
enabled: ${EVAL_OVERRIDES:true} # dev/demo = true, prod = false| Profile | Storage | When |
|---|---|---|
dev (default) |
H2 in-memory | Fast iteration |
demo |
H2 file-backed | Persistent demos |
prod |
Externalized via env vars | Real deployment |
Switch model without code change:
OLLAMA_CHAT_MODEL=gemma4 ./mvnw spring-boot:runminiwatson/
├── src/main/java/com/miniwatson/
│ ├── MiniwatsonApplication.java
│ ├── controller/
│ │ ├── RagController.java # POST /api/rag/ask · GET /api/rag/models
│ │ ├── DataController.java # /api/data/* (ingest, file, delete, stats)
│ │ ├── MultimodalController.java # /api/multimodal/ask · /ingest (vision)
│ │ ├── GovernanceController.java # /api/governance/logs · /stats · POST /feedback
│ │ └── TabularController.java # POST /api/tabular/load · /ask (DuckDB text-to-SQL)
│ ├── service/
│ │ ├── OllamaService.java # Chat (multi-LLM) + vision (images)
│ │ ├── EmbeddingService.java # Embed: 768-dim
│ │ ├── OcrService.java # Tesseract CLI → text
│ │ ├── IngestionService.java # Wikipedia / image / file → chunk → Article
│ │ ├── IndexingService.java # one place to update all indexes (vector + keyword)
│ │ ├── HybridRetriever.java # vector + BM25 candidates, RRF fusion
│ │ ├── Chunker.java # interface: fixed / recursive / semantic
│ │ ├── FixedChunker.java # N-char + overlap (baseline)
│ │ ├── RecursiveChunker.java # separator-priority split (default)
│ │ ├── SemanticChunker.java # sentence-embedding boundary detection
│ │ ├── Reranker.java # interface: none / llm / mmr / cross
│ │ ├── NoopReranker.java # 1st-stage top-K passthrough (baseline)
│ │ ├── LlmReranker.java # listwise LLM rerank
│ │ ├── MmrReranker.java # relevance + diversity (MMR)
│ │ ├── CrossEncoderReranker.java # DJL cross-encoder (graceful fallback)
│ │ └── RagService.java # Embed → vector search (top-N) → rerank → top-K
│ ├── data/
│ │ ├── Article.java # POJO + namespace + embedding (write-only)
│ │ ├── WikipediaResponse.java # External API DTO
│ │ ├── ArticleRepository.java # storage interface
│ │ ├── ArticleStore.java # JSON store (hot tier)
│ │ ├── ArticleParquetStore.java # Parquet store (cold tier)
│ │ ├── TieredArticleStore.java # hot→cold compaction (@Primary)
│ │ ├── VectorIndex.java # in-memory LSH index (semantic)
│ │ └── KeywordIndex.java # in-memory BM25 index (lexical)
│ ├── governance/
│ │ ├── QueryLog.java # JPA entity (+ piiCount, sources/provenance)
│ │ ├── QueryLogRepository.java # Spring Data JPA
│ │ ├── DocumentCatalog.java # KB metadata mirror (H2, catalog/data split)
│ │ ├── DocumentCatalogRepository.java
│ │ └── PiiRedactionService.java # regex PII masking
│ └── dto/
│ ├── AskRequest.java
│ ├── OllamaRequest.java # Includes think:false
│ ├── OllamaResponse.java
│ ├── EmbeddingRequest.java
│ └── EmbeddingResponse.java
├── src/main/resources/
│ ├── application.yaml # Common config + active profile
│ ├── application-dev.yaml # H2 in-memory
│ ├── application-demo.yaml # H2 file-backed
│ ├── application-prod.yaml # Env vars (PostgreSQL ready)
│ ├── article.avsc # Avro schema for Parquet
│ └── static/
│ ├── index.html # Dashboard
│ ├── css/styles.css # IBM Carbon-inspired
│ └── js/app.js # fetch + DOM
├── data/ # runtime state (gitignored)
│ ├── articles.json # hot tier (recent appends)
│ └── articles.parquet # cold tier (compacted)
├── docs/ # API, ARCHITECTURE, GOVERNANCE, MULTIMODAL, ...
├── sample/ # demo fixtures (invoice, chart, text)
└── pom.xml
Migrating embedding storage from JSON to Parquet:
| Format | Size | Compression |
|---|---|---|
| JSON (with 768-dim float arrays) | 54 KB | baseline |
| Parquet (SNAPPY) | 7.8 KB | 7× |
Parquet's columnar layout means embedding columns compress aggressively while still allowing per-row reads. This is exactly why watsonx.data uses Parquet as its native format.
Notes from building this:
-
Java 21 + Hadoop SecurityManager — Hadoop's
UserGroupInformationcallsSubject.getSubject()which Java 17+ deprecated. Fix:-Djava.security.manager=allowin JVM arguments. -
Gemma "thinking" tokens — gemma3/gemma4 uses internal reasoning tokens that drain the
num_predictbudget.think: falsedisables this; cut latency by ~3×. -
Wikipedia User-Agent policy — REST API requires
User-Agent: AppName/version (URL; email)or returns 403. Standard enterprise API hygiene. -
Anti-corruption layer — Keeping
WikipediaResponseseparate from internalArticlelets the external API change without touching the rest of the codebase. -
@JsonProperty(WRITE_ONLY)— Hide 768-dim embedding from the API response while keeping it in storage. Trims 50KB+ off every response. -
Spring profiles for sovereignty — dev/demo/prod with environment variables means the same code ships to different environments with different credentials. This is the technical implementation of "sovereignty at the core."
-
Vision models hallucinate numbers; OCR doesn't —
llavainvented an invoice total that wasn't on the page. The fix wasn't a bigger model — it was splitting roles: OCR (Tesseract) for exact text, vision for layout/context, and a prompt that tells the LLM to trust OCR over vision on conflicts. Combining sources isn't enough; you must declare which one is authoritative. (Seedocs/MULTIMODAL.mdfor the full before/after and limitations.) -
OCR has its own failure modes — it nails row-structured tables but mangles low-contrast/inverted text and loses the 2-D mapping in charts (reads
$28Mbut not that it belongs to Q4). The hard part is the pipeline, not the model. -
LSH for sub-linear retrieval — random-hyperplane signatures bucket similar vectors so a query only scores a small candidate set, with an exact-cosine fallback for correctness. Dimension-agnostic (384/768/1024).
-
Chunking is the real fix for long-doc retrieval — a 90k-char PDF stored as one embedding broke retrieval: the embedder truncates past ~8k tokens, and one vector can't match a specific passage. Splitting into per-chunk Articles fixed it (101 chunks). Compared fixed/recursive/semantic — recursive wins on quality-vs-cost; semantic is best but pays a per-sentence embedding cost. (See
docs/CHUNKING.md.) -
Reranking helps most when first-stage search is weak — fetch top-N (20) then rerank to top-K (2). On a strong embedder + good chunks, easy questions already rank right and rerank barely changes them; the gain shows on vocabulary-mismatch questions. Built none/llm/mmr/cross to compare. (See
docs/RERANKING.md.) -
Hybrid search fixes vector's blind spot for exact tokens — embeddings can't match "INV-2026-0042" or a model name; BM25 (lexical) can. Fused vector + BM25 with RRF (rank-based, no score normalization). Same caveat as rerank: on a small clean corpus the win is small (top-N already covers everything); it pays off on large/noisy corpora with rare-token queries. Indexing was split into one
IndexingServiceso adding the keyword index touched only that one class. (Seedocs/HYBRID-SEARCH.md.) -
Pin the error to the real cause, then design a fallback — the DJL cross-encoder failed to load on Intel macOS. Suspected the OpenJ9 (Semeru) JVM first, but switching to HotSpot reproduced it — the real cause was a missing osx-x86_64 native (PyTorch dropped Intel-mac wheels). The reranker falls back to top-K instead of crashing (graceful degradation); it runs on Linux/Apple Silicon. Library APIs also differ by version — confirmed the 0.30.0 javadoc instead of trusting an example (no
CrossEncoderTranslatorFactory; input isStringPair). -
Tiered storage = lakehouse in miniature — cheap row-oriented appends (JSON hot tier) compacted into columnar Parquet (cold tier) past a threshold. Avoids rewriting the whole Parquet file on every single ingest.
-
Governance must redact PII — the audit log is the leak risk. Mask emails/phones/SSNs/cards before persisting, return the original to the user. Function preserved, record protected.
-
Provenance makes answers auditable — logging the rerank-final source chunks per answer means you can later check "was this grounded, and in what?" — and tell a retrieval error (wrong chunk) apart from a generation error (right chunk, wrong answer). One subtle bug: set the field before
save(), or it never persists. -
Catalog/data split = lakehouse in miniature — vectors and text live in Parquet (the data); lightweight document metadata is mirrored to H2 (the catalog), so the knowledge base itself becomes SQL-queryable for governance. Parquet is the source of truth; the H2 catalog is rebuilt from it on startup (
@PostConstruct), same philosophy as the vector index hydrate. -
Spring Boot 4 ignores
javax.annotation—@PostConstructsilently never ran because it was imported fromjavax, notjakarta. On Jakarta EE, callbacks must usejakarta.annotation. When a lifecycle hook quietly doesn't fire, suspect the javax/jakarta namespace first.
- 1 — Spring Boot setup
- 2 — Ollama integration (watsonx.ai analog)
- 3 — H2 audit log (watsonx.governance analog)
- 4 — Wikipedia → Parquet (watsonx.data analog)
- 5 — RAG with embeddings + cosine similarity
- 6 — Frontend dashboard (IBM Carbon-style)
- 7 — Multi-tenant article namespacing
- 8 — Vector index (random-hyperplane LSH) for sub-linear retrieval
- 9 — Multi-LLM chat model selection (per-request, whitelist)
- 10 — Multimodal vision Q&A + image ingest (Ollama vision)
- 11 — OCR grounding (Tesseract) + OCR/Vision fusion
- 12 — PII detection & redaction in audit log (governance)
- 13 — Tiered storage (hot JSON → cold Parquet compaction)
- 14 — Knowledge-base CRUD (delete, dedup, file upload)
- 15 — Universal file ingest (Apache Tika) + document chunking (fixed/recursive/semantic)
- 16 — Two-stage retrieval with pluggable reranking (none/llm/mmr/cross)
- 17 — Provenance: source chunks logged per answer (governance)
- 18 — Document catalog in H2 (catalog/data split, SQL-queryable KB)
- 19 — Governance stats dashboard (per-model, per-source-type, KPIs)
- 20 — Hybrid search (vector + BM25, RRF) with indexing split
- 21 — Eval harness (recall + LLM-as-judge), unit tests, user feedback loop
- 22 — PostgreSQL + pgvector container via Podman (prod profile, persistent governance storage)
- 23 — Korean HWP/HWPX ingest (hwplib/hwpxlib + PrvText fallback); extractText extension dispatch
- 24 — Embedding model comparison (384/768/1024-dim, 4종; 승자 granite-embedding:278m, recall 97% / 한국어 11/11)
- 25 — PgVectorStore — pgvector(HNSW) 영속 vector store, 인메모리 패리티 35/35 (RRF id 붕괴 버그 해결)
- 26 — 청킹 개선: 약어 확장(CAIO→Chief AI Officer)으로 구조적 miss 회복 → 35/35
- 27 — 멀티테넌트 보안: API key/JWT 인증(A/B/C 3안) + 테넌트 격리 강제 (authN/authZ 분리)
- 28 — CI/CD: GitHub Actions + GitLab CI 양쪽 테스트 게이트(./mvnw test, 이식성). 이미지 빌드·푸시(멀티아치 GHCR)는 GitHub Actions
- 29 — 운영 하드닝: 감사 fail-open · Ollama 타임아웃 · rerank fallback · OPERATIONS.md
핵심 플랫폼(data/ai/governance) + 보안 + CI는 완성. 아래는 운영·심화 영역으로, 배포 산출물(docker-compose.prod, 멀티아치 CI, Oracle 가이드)은 준비됐으나 라이브 비용/시간 대비 우선순위를 미뤘다.
- 30 — 라이브 배포 (VPS docker-compose, 또는 IBM Cloud Code Engine + watsonx.ai 스왑)
- 31 — 보안 Tier 2: 프롬프트 인젝션 방어, PII 커버리지 확대, TLS/레이트리밋
- 32 — 평가 심화(RAGAS류 답변품질), 관측성(metrics/health/tracing)
Reading about watsonx is one thing. Recreating its data-ai-governance loop end-to-end is another. The point was to find out where the hard parts actually are.
Verdict: they aren't in the model. They're in the pipeline, the storage format, and the audit trail. That matches IBV CEO Study's "6 capabilities for 5.4× adoption" — change management, AI governance, data governance, real-time integration, system integration, financial integration. Model selection is the easy part.
MIT
| Doc | What's inside |
|---|---|
| docs/API.md | REST API reference — every endpoint with curl + schemas |
| docs/ARCHITECTURE.md | Component diagram, request flows, watsonx mapping |
| docs/DATA-MODEL.md | Article schema, Avro + Parquet, anti-corruption layer |
| docs/GOVERNANCE.md | Audit log schema + PII redaction, watsonx.governance parity |
| docs/MULTIMODAL.md | Vision Q&A + OCR grounding, image ingest, findings & limits |
| docs/CHUNKING.md | Chunking strategies (fixed/recursive/semantic), measured comparison |
| docs/CHUNKING-TEST.md | Step-by-step guide to reproduce the chunking comparison |
| docs/RERANKING.md | Two-stage retrieval, reranker strategies, before/after + platform findings |
| docs/HYBRID-SEARCH.md | Vector + BM25 hybrid retrieval, RRF fusion, indexing split, measured limits |
| docs/EMBEDDINGS.md | Embedding model comparison (384/768/1024-dim), prefix convention, measurement harness |
| docs/INGESTION-FORMATS.md | Multi-format ingest — Tika + Korean HWP/HWPX (PrvText fallback), extension dispatch |
| docs/TABULAR-SQL.md | Tabular text-to-SQL over CSV/XLSX with DuckDB — aggregation path RAG can't do, SELECT-only guard |
| docs/EVALUATION.md | Retrieval eval harness, rerank/hybrid sweep, findings (llm rerank can hurt) |
| docs/TESTING.md | JUnit unit tests; how a test caught a Korean-phone PII gap |
| docs/VERIFICATION.md | How each feature was verified — unit / offline eval / curl / UI |
| docs/DEPLOYMENT.md | Postgres + pgvector via Podman, profiles, and 3 real deployment gotchas |
| docs/H2-CONSOLE.md | H2 web console — enable, login, SQL cookbook, prod warning |
Live (interactive) API docs: run the app, then open http://localhost:8080/swagger-ui.html. See SWAGGER-SETUP.md to enable.
Daeyeop Kim — github.com/dea980 · kdea989@gmail.com
