MiniWatson

A miniature watsonx-style platform — built end-to-end from scratch
Spring Boot · Ollama · Parquet · 768-dim embeddings · RAG (chunking + reranking) · multimodal (vision + OCR) · tabular text-to-SQL (DuckDB) · multi-tenant · PII governance

정형 표(CSV/XLSX)는 DuckDB로 text-to-SQL — 집계는 SQL, 텍스트는 RAG.

MiniWatson is a learning project that recreates IBM watsonx's 3-layer architecture (data · ai · governance) at a small scale. The goal: understand how enterprise GenAI platforms work by building one — not by reading about it.

답변마다 출처(grounding)가 붙고, 모든 LLM 호출이 감사 로그에 남는다 — RAG와 거버넌스를 한 화면에서.

Architecture

Three layers, each mapping to a watsonx component:

┌─────────────────────────────────────────────────────────┐
│  Frontend (Plain HTML + JS · IBM Carbon style)          │
│  http://localhost:8080                                  │
└─────────────────────────┬───────────────────────────────┘
│ REST · JSON
┌─────────────────────────▼───────────────────────────────┐
│  Backend: Spring Boot 4 · Java 21 (Temurin/HotSpot)     │
│                                                         │
│  ┌────────────────────────────────────────────────────┐ │
│  │  AI Layer (watsonx.ai analog)                      │ │
│  │  • Chat: multi-LLM, per-request (gemma/granite/..) │ │
│  │  • Embeddings: 768-dim (granite-embedding:278m)    │ │
│  │  • Vision: image Q&A (llava / granite-vision)      │ │
│  │  • OCR grounding: Tesseract (exact text/numbers)   │ │
│  │  • RAG: chunk → embed → hybrid search → rerank     │ │
│  │  • Hybrid: vector + BM25 keyword (RRF fusion)      │ │
│  │  • Reranking: none/llm/mmr/cross (pluggable)       │ │
│  │  • Tabular: CSV/XLSX → text-to-SQL (DuckDB)        │ │
│  └────────────────────────────────────────────────────┘ │
│                                                         │
│  ┌────────────────────────────────────────────────────┐ │
│  │  Data Layer (watsonx.data analog)                  │ │
│  │  • Ingest: Wikipedia / image (vision+OCR) / file   │ │
│  │  • Multi-format: PDF/DOCX/PPTX/XLSX/HTML + HWP/HWPX│ │
│  │  • Chunking: fixed / recursive / semantic          │ │
│  │  • Multi-tenant namespaces + dedup + CRUD          │ │
│  │  • Tiered: hot JSON → cold Parquet (compaction)    │ │
│  │  • Catalog: H2 doc metadata (catalog/data split)   │ │
│  │  • Parquet (Avro schema, SNAPPY) — 7× < JSON       │ │
│  └────────────────────────────────────────────────────┘ │
│                                                         │
│  ┌────────────────────────────────────────────────────┐ │
│  │  Governance Layer (watsonx.governance analog)      │ │
│  │  • Auto audit log every LLM call in H2             │ │
│  │  • Tracks model, latency, timestamp                │ │
│  │  • PII detection & redaction before persist        │ │
│  │  • Provenance: source chunks logged per answer     │ │
│  └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘

Tech Stack

Layer	Choice	Why
Language	Java 21 (Temurin/HotSpot)	OpenJ9는 요청 처리 중 walkStackFrames 크래시로 회피. HOTSPOT-RUNTIME.md
Framework	Spring Boot 4.0	Enterprise standard, fast bootstrap
LLM	Ollama (local) 기본, 제공자 교체 가능	자체호스팅(키 불필요), LlmClient 추상화로 watsonx/Bedrock 등 설정 교체. LLM-ABSTRACTION.md
Chat model	ibm/granite4:latest (default) · multi-LLM	per-request model, whitelist-validated
Embedding model	granite-embedding:278m (default) · nomic / 30m / mxbai compared	768-dim 다국어 승자 (recall 97%, 한국어 11/11). 4종 비교는 EMBEDDINGS.md
Vision model	llava / granite-vision	image Q&A + caption (multimodal)
OCR	Tesseract (CLI)	exact text/number extraction for grounding
Data format	Apache Parquet	Columnar + SNAPPY = 7× smaller than JSON
Schema	Avro	Schema-first, evolution-safe
Storage	Tiered (JSON hot → Parquet cold)	cheap appends + columnar compaction
Catalog	H2 document_catalog (mirror)	SQL-queryable KB metadata; catalog/data split
Retrieval	In-memory vector index (brute-force default, LSH opt-in)	exact cosine by default; LSH for sub-linear approximate kNN
Vector store	In-memory ↔ pgvector (`vector.store` 스위치)	영속·확장은 pgvector(HNSW), 차원실험은 인메모리. 인메모리 패리티 35/35. PGVECTOR.md
Hybrid search	Vector + BM25 keyword, RRF fusion	lexical recall for exact tokens (IDs, codes)
Chunking	fixed / recursive / semantic (pluggable)	recursive default; balance quality vs cost
Reranking	none / llm / mmr / cross (pluggable)	two-stage: fetch top-N → rerank → top-K
Cross-encoder	DJL + PyTorch + BGE-reranker	dedicated reranker model (Linux/Apple Silicon)
Tabular SQL	DuckDB (embedded, in-memory)	text-to-SQL over CSV/XLSX; aggregation RAG can't do
Database	H2 (in-memory)	Zero config for governance audit
Security	API key / JWT 인증 + 테넌트 격리 강제	namespace를 코드로 강제(authN/authZ 분리, A/B/C 3안). SECURITY.md
CI/CD	GitHub Actions + GitLab CI + Docker	양쪽 `./mvnw test` 게이트(이식성). 이미지 빌드·푸시(GHCR)는 GitHub만, GitLab은 테스트 게이트
Build	Maven	pom.xml + spring-boot-maven-plugin
Frontend	Plain HTML + JS	No framework lock-in, instant load

Docs

문서	내용
ARCHITECTURE.md	컴포넌트·데이터 흐름
SECURITY.md	위협모델 · 인증 A/B/C · 테넌트 격리 · 설계결정
RAG-LANDSCAPE.md	RAG 종류(Naive/Advanced/Modular, RAPTOR/CRAG/GraphRAG 등) · 로컬 구현 가능성 · MiniWatson 현황 · 로드맵
PGVECTOR.md	pgvector 이관 · HNSW · 인메모리 패리티
EMBEDDINGS.md	임베딩 4종 비교 (승자 granite-278m)
CHUNKING.md	청킹 전략 + 약어 확장
EVALUATION.md	골든셋 recall + text-to-SQL
DEBUGGING.md	실전 트러블슈팅
DECISIONS.md	기술 선택 결정 가이드
OPERATIONS.md	배포 · 재임베딩 · 가용성 · 장애 런북 · 프로덕션 체크리스트
CLOUD-DEPLOYMENT.md	벤더 중립 배포(추론 제공자/호스트 교체), Phase 0
HOTSPOT-RUNTIME.md	OpenJ9 크래시에서 HotSpot 전환 근거와 절차
LLM-ABSTRACTION.md	LlmClient/EmbeddingClient 추상화, 거버넌스 분리
CICD.md	CI/CD 파이프라인 현황과 갭

Quick Start

Prerequisites

# 1. Java 21 (Temurin/HotSpot recommended)
java --version    # → openjdk 21+

# 2. Ollama
brew install ollama
ollama pull ibm/granite4:latest  # chat (default)
ollama pull granite-embedding:278m   # 768-dim embeddings (default, 다국어)
ollama pull llava              # vision (multimodal Q&A / image ingest)

# 3. OCR (for image grounding — exact text/numbers)
brew install tesseract

# 4. Start Ollama server (separate terminal)
ollama serve

Run

# Clone
git clone https://github.com/dea980/miniwatson.git
cd miniwatson

# Run Spring Boot
./mvnw spring-boot:run

# Open browser
open http://localhost:8080

Try it

Ingest a Wikipedia article
→ Type Retrieval-augmented_generation, click Ingest
Ask a question
→ Type What is RAG?, click Submit
Audit Trail tab
→ See every Q&A logged with model, latency, timestamp

API

Ingest Wikipedia article

curl -X POST "http://localhost:8080/api/data/ingest?title=Vector_database"

Returns the stored article (with id, title, summary, url, ingestedAt). Embedding is generated and stored in Parquet but hidden from the response.

Optional &namespace=acme scopes the article to a tenant (default: default).

Ask a RAG question

curl -X POST http://localhost:8080/api/rag/ask \
  -H "Content-Type: application/json" \
  -d '{"question": "What is RAG?", "namespace": "default", "model": "ibm/granite4:latest"}'

namespace and model are optional. Returns the answer plus the top-K source articles used for grounding.

Multi-LLM — list selectable chat models

curl http://localhost:8080/api/rag/models      # { default, available[] }

Multimodal — image Q&A and image ingest

# Ask about an image (vision + OCR grounding)
curl -X POST http://localhost:8080/api/multimodal/ask \
  -F "image=@invoice.png" -F "question=What is the total?"

# Ingest an image into the knowledge base (searchable by later text queries)
curl -X POST http://localhost:8080/api/multimodal/ingest \
  -F "image=@invoice.png" -F "namespace=demo"

Upload a text/document file (Tika + Korean HWP/HWPX)

curl -X POST http://localhost:8080/api/data/ingest-file \
  -F "file=@report.pdf" -F "namespace=demo"

The file is text-extracted, split into chunks, and each chunk stored as an Article (title #1, #2, ...). Extraction branches by extension: Tika (PDF/DOCX/PPTX/XLSX/HTML/txt/md/csv) and HWP/HWPX via hwplib/hwpxlib (see docs/INGESTION-FORMATS.md). Returns the list of created chunks. Chunk strategy/size via chunking.* config.

Summarize an uploaded document

curl -X POST http://localhost:8080/api/data/summarize/5   # any chunk id of the doc

Aggregates all chunks of the document (by base title) and returns a summary. This is separate from RAG ask — summarization needs the whole document, not retrieved fragments.

List / delete articles, index stats

curl  http://localhost:8080/api/data/articles            # all (or ?namespace=demo)
curl -X DELETE http://localhost:8080/api/data/articles/5 # remove by id (+index resync)
curl  http://localhost:8080/api/data/index/stats         # mode (brute-force default), vectors, buckets

Documents (document-level view over chunks)

# List documents (chunks grouped by namespace + title)
curl "http://localhost:8080/api/data/documents"

# Delete a whole document (all its chunks at once)
curl -X DELETE "http://localhost:8080/api/data/documents?title=report.pdf&namespace=demo"

A long file is stored as many chunks; these endpoints present and manage it as one document. The same metadata is mirrored to the H2 document_catalog for SQL queries.

Audit trail & governance stats

curl http://localhost:8080/api/governance/logs    # every LLM call (model, latency, PII, sources)
curl http://localhost:8080/api/governance/stats   # aggregates: per-model, per-source-type, KPI totals

Every LLM call is logged: question, answer, model, latency (ms), timestamp, and PII count (sensitive data is masked before persisting).

Configuration

All settings are externalized via Spring profiles and environment variables.

# application.yaml
spring:
profiles:
active: dev      # dev | demo | prod

ollama:
url: ${OLLAMA_URL:http://localhost:11434}
chat-model: ${OLLAMA_CHAT_MODEL:ibm/granite4:latest}
chat-models: ${OLLAMA_CHAT_MODELS:ibm/granite4:latest,gemma4}  # multi-LLM whitelist
embed-model: ${OLLAMA_EMBED_MODEL:granite-embedding:278m}  # 비교 승자 (EMBEDDINGS.md 7절)
vision-model: ${OLLAMA_VISION_MODEL:llava:latest}                       # multimodal
num-predict: ${OLLAMA_NUM_PREDICT:256}

retrieval:
hybrid:
enabled: true          # vector + BM25 (false = vector-only)

storage:
tier:
threshold: ${STORAGE_TIER_THRESHOLD:3}     # hot(JSON) count before compaction → Parquet

vector:
index:
lsh:
enabled: ${VECTOR_LSH_ENABLED:false}       # brute-force default; true = LSH approximate kNN
hyperplanes: ${VECTOR_LSH_HYPERPLANES:16}

chunking:
strategy: recursive   # fixed | recursive | semantic
max-size: 1000        # chars per chunk

rerank:
strategy: mmr         # none | llm | mmr | cross

eval:
overrides:
enabled: ${EVAL_OVERRIDES:true}    # dev/demo = true, prod = false

Profile overrides

Profile	Storage	When
`dev` (default)	H2 in-memory	Fast iteration
`demo`	H2 file-backed	Persistent demos
`prod`	Externalized via env vars	Real deployment

Switch model without code change:

OLLAMA_CHAT_MODEL=gemma4 ./mvnw spring-boot:run

Project Structure

miniwatson/
├── src/main/java/com/miniwatson/
│   ├── MiniwatsonApplication.java
│   ├── controller/
│   │   ├── RagController.java            # POST /api/rag/ask · GET /api/rag/models
│   │   ├── DataController.java           # /api/data/* (ingest, file, delete, stats)
│   │   ├── MultimodalController.java     # /api/multimodal/ask · /ingest (vision)
│   │   ├── GovernanceController.java     # /api/governance/logs · /stats · POST /feedback
│   │   └── TabularController.java        # POST /api/tabular/load · /ask (DuckDB text-to-SQL)
│   ├── service/
│   │   ├── OllamaService.java            # Chat (multi-LLM) + vision (images)
│   │   ├── EmbeddingService.java         # Embed: 768-dim
│   │   ├── OcrService.java               # Tesseract CLI → text
│   │   ├── IngestionService.java         # Wikipedia / image / file → chunk → Article
│   │   ├── IndexingService.java          # one place to update all indexes (vector + keyword)
│   │   ├── HybridRetriever.java          # vector + BM25 candidates, RRF fusion
│   │   ├── Chunker.java                  # interface: fixed / recursive / semantic
│   │   ├── FixedChunker.java             # N-char + overlap (baseline)
│   │   ├── RecursiveChunker.java         # separator-priority split (default)
│   │   ├── SemanticChunker.java          # sentence-embedding boundary detection
│   │   ├── Reranker.java                 # interface: none / llm / mmr / cross
│   │   ├── NoopReranker.java             # 1st-stage top-K passthrough (baseline)
│   │   ├── LlmReranker.java              # listwise LLM rerank
│   │   ├── MmrReranker.java              # relevance + diversity (MMR)
│   │   ├── CrossEncoderReranker.java     # DJL cross-encoder (graceful fallback)
│   │   └── RagService.java               # Embed → vector search (top-N) → rerank → top-K
│   ├── data/
│   │   ├── Article.java                  # POJO + namespace + embedding (write-only)
│   │   ├── WikipediaResponse.java        # External API DTO
│   │   ├── ArticleRepository.java        # storage interface
│   │   ├── ArticleStore.java             # JSON store (hot tier)
│   │   ├── ArticleParquetStore.java      # Parquet store (cold tier)
│   │   ├── TieredArticleStore.java       # hot→cold compaction (@Primary)
│   │   ├── VectorIndex.java              # in-memory LSH index (semantic)
│   │   └── KeywordIndex.java             # in-memory BM25 index (lexical)
│   ├── governance/
│   │   ├── QueryLog.java                 # JPA entity (+ piiCount, sources/provenance)
│   │   ├── QueryLogRepository.java       # Spring Data JPA
│   │   ├── DocumentCatalog.java          # KB metadata mirror (H2, catalog/data split)
│   │   ├── DocumentCatalogRepository.java
│   │   └── PiiRedactionService.java      # regex PII masking
│   └── dto/
│       ├── AskRequest.java
│       ├── OllamaRequest.java            # Includes think:false
│       ├── OllamaResponse.java
│       ├── EmbeddingRequest.java
│       └── EmbeddingResponse.java
├── src/main/resources/
│   ├── application.yaml                  # Common config + active profile
│   ├── application-dev.yaml              # H2 in-memory
│   ├── application-demo.yaml             # H2 file-backed
│   ├── application-prod.yaml             # Env vars (PostgreSQL ready)
│   ├── article.avsc                      # Avro schema for Parquet
│   └── static/
│       ├── index.html                    # Dashboard
│       ├── css/styles.css                # IBM Carbon-inspired
│       └── js/app.js                     # fetch + DOM
├── data/                                 # runtime state (gitignored)
│   ├── articles.json                     # hot tier (recent appends)
│   └── articles.parquet                  # cold tier (compacted)
├── docs/                                 # API, ARCHITECTURE, GOVERNANCE, MULTIMODAL, ...
├── sample/                               # demo fixtures (invoice, chart, text)
└── pom.xml

Storage Efficiency

Migrating embedding storage from JSON to Parquet:

Format	Size	Compression
JSON (with 768-dim float arrays)	54 KB	baseline
Parquet (SNAPPY)	7.8 KB	7×

Parquet's columnar layout means embedding columns compress aggressively while still allowing per-row reads. This is exactly why watsonx.data uses Parquet as its native format.

What I Learned

Notes from building this:

Java 21 + Hadoop SecurityManager — Hadoop's UserGroupInformation calls Subject.getSubject() which Java 17+ deprecated. Fix: -Djava.security.manager=allow in JVM arguments.
Gemma "thinking" tokens — gemma3/gemma4 uses internal reasoning tokens that drain the num_predict budget. think: false disables this; cut latency by ~3×.
Wikipedia User-Agent policy — REST API requires User-Agent: AppName/version (URL; email) or returns 403. Standard enterprise API hygiene.
Anti-corruption layer — Keeping WikipediaResponse separate from internal Article lets the external API change without touching the rest of the codebase.
@JsonProperty(WRITE_ONLY) — Hide 768-dim embedding from the API response while keeping it in storage. Trims 50KB+ off every response.
Spring profiles for sovereignty — dev/demo/prod with environment variables means the same code ships to different environments with different credentials. This is the technical implementation of "sovereignty at the core."
Vision models hallucinate numbers; OCR doesn't — llava invented an invoice total that wasn't on the page. The fix wasn't a bigger model — it was splitting roles: OCR (Tesseract) for exact text, vision for layout/context, and a prompt that tells the LLM to trust OCR over vision on conflicts. Combining sources isn't enough; you must declare which one is authoritative. (See docs/MULTIMODAL.md for the full before/after and limitations.)
OCR has its own failure modes — it nails row-structured tables but mangles low-contrast/inverted text and loses the 2-D mapping in charts (reads $28M but not that it belongs to Q4). The hard part is the pipeline, not the model.
LSH for sub-linear retrieval — random-hyperplane signatures bucket similar vectors so a query only scores a small candidate set, with an exact-cosine fallback for correctness. Dimension-agnostic (384/768/1024).
Chunking is the real fix for long-doc retrieval — a 90k-char PDF stored as one embedding broke retrieval: the embedder truncates past ~8k tokens, and one vector can't match a specific passage. Splitting into per-chunk Articles fixed it (101 chunks). Compared fixed/recursive/semantic — recursive wins on quality-vs-cost; semantic is best but pays a per-sentence embedding cost. (See docs/CHUNKING.md.)
Reranking helps most when first-stage search is weak — fetch top-N (20) then rerank to top-K (2). On a strong embedder + good chunks, easy questions already rank right and rerank barely changes them; the gain shows on vocabulary-mismatch questions. Built none/llm/mmr/cross to compare. (See docs/RERANKING.md.)
Hybrid search fixes vector's blind spot for exact tokens — embeddings can't match "INV-2026-0042" or a model name; BM25 (lexical) can. Fused vector + BM25 with RRF (rank-based, no score normalization). Same caveat as rerank: on a small clean corpus the win is small (top-N already covers everything); it pays off on large/noisy corpora with rare-token queries. Indexing was split into one IndexingService so adding the keyword index touched only that one class. (See docs/HYBRID-SEARCH.md.)
Pin the error to the real cause, then design a fallback — the DJL cross-encoder failed to load on Intel macOS. Suspected the OpenJ9 (Semeru) JVM first, but switching to HotSpot reproduced it — the real cause was a missing osx-x86_64 native (PyTorch dropped Intel-mac wheels). The reranker falls back to top-K instead of crashing (graceful degradation); it runs on Linux/Apple Silicon. Library APIs also differ by version — confirmed the 0.30.0 javadoc instead of trusting an example (no CrossEncoderTranslatorFactory; input is StringPair).
Tiered storage = lakehouse in miniature — cheap row-oriented appends (JSON hot tier) compacted into columnar Parquet (cold tier) past a threshold. Avoids rewriting the whole Parquet file on every single ingest.
Governance must redact PII — the audit log is the leak risk. Mask emails/phones/SSNs/cards before persisting, return the original to the user. Function preserved, record protected.
Provenance makes answers auditable — logging the rerank-final source chunks per answer means you can later check "was this grounded, and in what?" — and tell a retrieval error (wrong chunk) apart from a generation error (right chunk, wrong answer). One subtle bug: set the field before save(), or it never persists.
Catalog/data split = lakehouse in miniature — vectors and text live in Parquet (the data); lightweight document metadata is mirrored to H2 (the catalog), so the knowledge base itself becomes SQL-queryable for governance. Parquet is the source of truth; the H2 catalog is rebuilt from it on startup (@PostConstruct), same philosophy as the vector index hydrate.
Spring Boot 4 ignores javax.annotation — @PostConstruct silently never ran because it was imported from javax, not jakarta. On Jakarta EE, callbacks must use jakarta.annotation. When a lifecycle hook quietly doesn't fire, suspect the javax/jakarta namespace first.

Roadmap

추후 (Backlog — 보류)

핵심 플랫폼(data/ai/governance) + 보안 + CI는 완성. 아래는 운영·심화 영역으로, 배포 산출물(docker-compose.prod, 멀티아치 CI, Oracle 가이드)은 준비됐으나 라이브 비용/시간 대비 우선순위를 미뤘다.

30 — 라이브 배포 (VPS docker-compose, 또는 IBM Cloud Code Engine + watsonx.ai 스왑)
31 — 보안 Tier 2: 프롬프트 인젝션 방어, PII 커버리지 확대, TLS/레이트리밋
32 — 평가 심화(RAGAS류 답변품질), 관측성(metrics/health/tracing)

Why This Project

Reading about watsonx is one thing. Recreating its data-ai-governance loop end-to-end is another. The point was to find out where the hard parts actually are.

Verdict: they aren't in the model. They're in the pipeline, the storage format, and the audit trail. That matches IBV CEO Study's "6 capabilities for 5.4× adoption" — change management, AI governance, data governance, real-time integration, system integration, financial integration. Model selection is the easy part.

License

MIT

Documentation

Doc	What's inside
docs/API.md	REST API reference — every endpoint with curl + schemas
docs/ARCHITECTURE.md	Component diagram, request flows, watsonx mapping
docs/DATA-MODEL.md	Article schema, Avro + Parquet, anti-corruption layer
docs/GOVERNANCE.md	Audit log schema + PII redaction, watsonx.governance parity
docs/MULTIMODAL.md	Vision Q&A + OCR grounding, image ingest, findings & limits
docs/CHUNKING.md	Chunking strategies (fixed/recursive/semantic), measured comparison
docs/CHUNKING-TEST.md	Step-by-step guide to reproduce the chunking comparison
docs/RERANKING.md	Two-stage retrieval, reranker strategies, before/after + platform findings
docs/HYBRID-SEARCH.md	Vector + BM25 hybrid retrieval, RRF fusion, indexing split, measured limits
docs/EMBEDDINGS.md	Embedding model comparison (384/768/1024-dim), prefix convention, measurement harness
docs/INGESTION-FORMATS.md	Multi-format ingest — Tika + Korean HWP/HWPX (PrvText fallback), extension dispatch
docs/TABULAR-SQL.md	Tabular text-to-SQL over CSV/XLSX with DuckDB — aggregation path RAG can't do, SELECT-only guard
docs/EVALUATION.md	Retrieval eval harness, rerank/hybrid sweep, findings (llm rerank can hurt)
docs/TESTING.md	JUnit unit tests; how a test caught a Korean-phone PII gap
docs/VERIFICATION.md	How each feature was verified — unit / offline eval / curl / UI
docs/DEPLOYMENT.md	Postgres + pgvector via Podman, profiles, and 3 real deployment gotchas
docs/H2-CONSOLE.md	H2 web console — enable, login, SQL cookbook, prod warning

Live (interactive) API docs: run the app, then open http://localhost:8080/swagger-ui.html. See SWAGGER-SETUP.md to enable.

Author

Daeyeop Kim — github.com/dea980 · kdea989@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github/workflows		.github/workflows
.mvn/wrapper		.mvn/wrapper
docs		docs
eval		eval
monitoring		monitoring
reference/graphrag		reference/graphrag
sample		sample
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
README.md		README.md
SWAGGER-SETUP.md		SWAGGER-SETUP.md
docker-compose.monitoring.yml		docker-compose.monitoring.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.watsonx.yml		docker-compose.watsonx.yml
docker-compose.yml		docker-compose.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Folders and files

Latest commit

History

Repository files navigation

MiniWatson

Architecture

Tech Stack

Docs

Quick Start

Prerequisites

Run

Try it

API

Ingest Wikipedia article

Ask a RAG question

Multi-LLM — list selectable chat models

Multimodal — image Q&A and image ingest

Upload a text/document file (Tika + Korean HWP/HWPX)

Summarize an uploaded document

List / delete articles, index stats

Documents (document-level view over chunks)

Audit trail & governance stats

Configuration

Profile overrides

Project Structure

Storage Efficiency

What I Learned

Roadmap

추후 (Backlog — 보류)

Why This Project

License

Documentation

Author

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages