Documentation RAG

Sanitized case study for a documentation assistant based on retrieval augmented generation.

This repository is a public portfolio version of a private automation concept. It explains the architecture and engineering decisions without publishing a usable n8n export, live endpoints, server details, provider configuration, credentials, prompts, account identifiers, vector table names, or production payloads.

Case Study

Problem

Documentation grows faster than teams can read it. Search alone often returns pages, not answers. A useful assistant needs refreshed source content, chunked knowledge, retrieval, and answer generation with enough context to stay grounded.

Solution

The private system indexes selected documentation pages, cleans and chunks the content, stores embeddings, and exposes a chat interface that retrieves relevant context before answering. This public case study keeps the architecture visible without publishing the importable graph.

flowchart LR
  A["Documentation Source"] --> B["Content Collector"]
  B --> C["Cleaner"]
  C --> D["Chunker"]
  D --> E["Embedding Model"]
  E --> F["Knowledge Store"]
  G["User Question"] --> H["Retriever"]
  F --> H
  H --> I["Answer Agent"]

Engineering Decisions

Split ingestion from query-time answering.
Deduplicate pages before indexing to avoid repeated context.
Keep chunks small enough for retrieval while preserving useful sections.
Refresh the knowledge store on a schedule instead of relying on stale exports.
Keep provider settings and storage schema outside public documentation.

Outcome

The private workflow turns a documentation set into a practical support assistant for implementation questions. The public version demonstrates RAG architecture, indexing strategy, and operational hygiene without exposing a reproducible setup.

What This Shows

Documentation ingestion and cleaning
Scheduled re-indexing
Chunking and embedding strategy
Vector retrieval before answer generation
Chat assistant architecture for technical documentation

What Is Intentionally Missing

No importable n8n workflow
No node parameters
No server URL or source URL list
No vector database schema
No prompts, credentials, or provider configuration
No production examples or execution payloads

Repository Structure

docs/architecture.md                     Sanitized architecture notes
scripts/validate-public-case-study.mjs   Leak and export-shape validation
SECURITY.md                              Public handling policy
package.json                             Validation command

Validate

npm test

The validator blocks common secret formats, URLs, webhook paths, credential blocks, local paths, n8n export shapes, and private terms supplied through SHOWCASE_PRIVATE_TERMS.

Note

This is a case study, not a template. It is designed to show retrieval architecture and automation judgment without giving away a working implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
docs		docs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Documentation RAG

Case Study

Problem

Solution

Engineering Decisions

Outcome

What This Shows

What Is Intentionally Missing

Repository Structure

Validate

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Documentation RAG

Case Study

Problem

Solution

Engineering Decisions

Outcome

What This Shows

What Is Intentionally Missing

Repository Structure

Validate

Note

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages