Skip to content

aronshamash/knowledge-extractor

Repository files navigation

knowledge-extractor

Interviews engineers and tech leads to extract design intent, business logic, and operational knowledge from code. Works with any content — docs, specs, runbooks — but built code-first.

IP: Killawot Limited | License: MIT


The Problem

Business knowledge lives in people's heads. When someone asks "why does X work this way?", the answer requires:

  1. Reading code to understand what happens
  2. Asking the SME to understand why it was designed that way
  3. Documenting the answer so it doesn't have to be asked again

This tool automates steps 1 and 3, and structures step 2 into an efficient interview.

How It Works

  1. Point it at a folder (code, docs, or both)
  2. It scans the content and generates a topic-by-topic interview plan
  3. You answer questions in Claude Code (voice-optimized via Handy/Dragon)
  4. It drafts structured markdown documentation in real-time
  5. You review and approve each doc

Tools (MCP)

Tool Description
prepare_interview Scan folder → generate interview plan + session ID
conduct_interview Run structured Q&A for each topic
generate_documentation Produce final markdown docs from transcript
review_documentation Approve or request changes per doc

Installation

npm install
npm run build

Add to your Claude Code MCP config (~/.claude/settings.json):

{
  "mcpServers": {
    "knowledge-extractor": {
      "command": "node",
      "args": ["/path/to/knowledge-extractor/dist/index.js"],
      "env": {
        "ANTHROPIC_API_KEY": "your-key-here",
        "KNOWLEDGE_EXTRACTOR_IDE": "cursor"
      }
    }
  }
}

KNOWLEDGE_EXTRACTOR_IDE controls the editor link format in interview questions. Valid values: cursor (default), vscode, none.

Usage

1. Prepare interview

prepare_interview(
  source_path: "/path/to/my-api/src",
  source_type: "code",
  output_path: "/path/to/output/docs",
  focus_areas: ["versioning", "filtering", "sync"]
)

Returns a session ID and interview plan with estimated duration per topic.

2. Conduct interview

conduct_interview(
  session_id: "<session-id>",
  auto_draft: true,
  read_back: true
)

The agent asks prepared questions one at a time. Answer each, say skip to skip, done to finish the topic.

3. Generate docs

generate_documentation(
  session_id: "<session-id>"
)

Produces markdown files in the output_path for each topic.

4. Review

review_documentation(
  session_id: "<session-id>",
  doc_path: "/path/to/output/docs/TOPIC_WHY.md",
  feedback: "Add that REVIEWED→PUBLISHED requires final approval"
)

Omit feedback to approve as-is.

Voice Input

Interviews are designed for voice — speak answers naturally using:

  • Handy (Mac) — real-time transcription into Claude Code chat
  • Dragon — professional STT
  • macOS dictation — built-in, Fn+Fn to activate

The voice layer is external — knowledge-extractor works with anything that types into a text input.

Why voice: A 60-90 min typed interview becomes 15-20 min spoken with higher detail and natural flow.

Document Templates

Template Use For
WHY Design decisions, architecture rationale
HOW Step-by-step processes, algorithms
RULES Business rule catalogs
FLOW User journeys, workflows, sequence diagrams

Templates are in src/templates/ and can be customised.

Local Dev

# Run test harness against this repo's own source
node --import tsx test-agent.ts

# Run against a specific folder
node --import tsx test-agent.ts /path/to/source /path/to/output

Roadmap

  • Phase 2: Claude-powered question generation from file contents
  • Phase 2: Claude-powered documentation synthesis from transcript
  • Phase 3: Markdown/PDF scanner improvements
  • Phase 4: CI + public release
  • Phase 5: cloud.killawot.ai hosted version

About

MCP server that interviews engineers to extract and document code intent, design rationale, and operational knowledge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors