Skip to content

97115104/endor-teach

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Endor Teach

learn deeply · remember always

A local-first AI learning companion that adapts to your knowledge level, builds a personal knowledge base, and generates high-quality LLM training data as a side effect of learning. Everything runs on your machine via Ollama.


Quick Start

git clone https://github.com/your-org/endor-teach
cd endor-teach
./endor.sh

endor.sh handles everything automatically and explains every sudo command before running it:

  • Creates a Python virtual environment (via uv, falling back to python3 -m venv)
  • Installs all dependencies into .venv/ — never touches system Python
  • Installs Docker Engine if missing (APT/DNF/pacman, OS-detected)
  • Starts SearXNG in Docker for private, rate-limit-free web search
  • Checks Ollama and pulls the chat + embedding models
  • Launches the FastAPI server

Re-run ./endor.sh at any time — it is self-healing: restarts stopped containers, repairs broken venvs, repairs missing packages.


System Requirements

Component Minimum Notes
Python 3.10+ Managed in .venv/ — system Python untouched
RAM 8 GB 4 GB for model + 4 GB headroom
GPU Optional Faster inference + transcription
Docker Optional Auto-installed if missing; DDG fallback if unavailable
Ollama Required curl -fsSL https://ollama.ai/install.sh | sh

Features

1. Discovery-First Search Interface

The home screen is a search-first discovery interface:

  1. Type anything — vLLM, Attention mechanism, Roman history
  2. Focus modes: 🌐 Web · 📖 Wikipedia · 📄 Official Docs · 📰 News · 🔬 Papers
  3. Category shortcuts: AI/ML · Technical · Science · History · Business · Medicine · Philosophy · Law · Math · Art & Culture
  4. Results grouped by type — Wikipedia always appears first
  5. Inline [+ Add] buttons on each result — select exactly what to learn from
  6. Knowledge Map (right panel): SVG graph generated by the LLM showing how topics relate
    • Grey dashed = prerequisites (learn first)
    • Cyan lines = concurrent (learn alongside)
    • Green lines = advanced (explore after mastery)
    • Click any node to search that topic
  7. Create Topic from N selected sources — research uses only your confirmed sources

2. Topic Disambiguation

Before research begins, the app:

  • Queries the Wikipedia API directly (reliable, no rate limits, no site: operator)
  • Uses your description ("for model inference") to bias search queries
  • Shows a confirmation modal so you always verify what you're learning
  • Passes the confirmed URL(s) directly to research_and_embed() as source anchors

3. Adaptive Learning (Dreyfus + Maslow Pedagogy Framework)

Every session is guided by tools/pedagogy.py — a silent teacher that adapts to your level:

Dreyfus Skill Acquisition Model (based on mastery score):

Stage Mastery LLM Teaching Style
Novice 0–20% Clear rules, analogies, one concept at a time
Advanced Beginner 20–40% Patterns, real-world context, common mistakes
Competent 40–60% Tradeoffs, multi-step problems, Socratic guidance
Proficient 60–80% Architecture discussions, open-ended evaluation
Expert 80–100% Frontier challenges, cross-domain synthesis

Maslow Hierarchy of Learning Needs (maps to stage):

  1. Foundation (Safety) — What is this?
  2. Connection (Belonging) — How does it relate to what I know?
  3. Confidence (Esteem) — Can I apply and explain it?
  4. Fluency (Self-actualization) — Can I create and extend it?

The pedagogy hint bar in each topic shows your Dreyfus stage badge and recommends the most valuable next action (e.g. "Generate flashcards — lock in the core definitions").

Quiz questions are Bloom-leveled per stage: Novice → Remember/Understand; Expert → Evaluate/Create.

4. Learning Modes

Tab Description
💬 Chat RAG-powered conversation; sources from your selected pages; pedagogy-adapted system prompt
🃏 Cards Spaced-repetition flashcards with SM-2 scheduling (Again/Hard/Good/Easy)
📝 Quiz Bloom's taxonomy-leveled questions; scores 0–5; auto-calculates mastery
🔗 Connect AI-generated conceptual connections between two of your topics
🌐 Sources All indexed pages for this topic; each source embeddings stored for RAG

5. Word Lookup Popover (double-click any word in chat)

Double-click any word or phrase while reading chat responses:

  • Instant Wikipedia summary in a floating popover (anchored near selection)
  • "+ Learn this" — adds the Wikipedia article as a live source to your current topic and opens the discovery search for a full topic if desired
  • "Open Wikipedia ↗" — full article in new tab
  • Dismisses on click outside; non-blocking

6. Daily Quiz

Unlocks when ≥1 topic reaches 80% mastery. Tests consolidated knowledge — prevents quizzing on topics you haven't actually learned yet. Maintains a daily streak counter.

7. Search Result Caching & Offline Retrieval

Every fetched web page is stored in search_cache with vector embeddings:

  • No re-fetching previously seen pages
  • Cross-topic vector search over all cached content
  • Semantic cache hits before any live request

8. Automatic Training Data Generation ("Sleep Consolidation")

Every quiz answer scoring 4–5/5 is automatically harvested into training_pairs as a gold fine-tuning pair. Over many sessions this builds a high-quality, Bloom-labeled, source-attributed dataset — the "neocortex" layer of the knowledge architecture.

Export all data: Settings → Export Training Data


Architecture

endor-teach/
├── endor.sh              # Self-healing launch script
├── requirements.txt      # Python deps (installed into .venv/)
├── .venv/                # Python virtual environment
│
├── tools/
│   ├── app.py            # FastAPI server — all endpoints
│   ├── database.py       # SQLite ORM — all persistence
│   ├── rag.py            # RAG pipeline (search, fetch, embed, generate)
│   ├── pedagogy.py       # Dreyfus/Maslow framework — teaching orchestrator
│   ├── transcriber.py    # faster-whisper voice transcription
│   └── static/
│       ├── index.html
│       ├── app.js
│       └── style.css
│
└── data/
    ├── endor_teach.db    # SQLite (WAL mode, foreign keys)
    ├── config.json       # Runtime config
    └── sessions/         # FLAC voice recordings

Cognitive Architecture Mapping

The database mirrors human memory architecture:

Human Memory Database Layer Description
Working memory source_chunks Active RAG context (12-message window)
Hippocampus search_cache Fast-write episodic store with embeddings
Neocortex knowledge_nodes Consolidated, SM-2 scheduled concepts
Long-term training_pairs Gold fine-tuning data (score ≥ 4/5)

Key API Endpoints

Endpoint Purpose
GET /api/search Discovery search — grouped by wikipedia/docs/papers/news/web
GET /api/related-topics LLM knowledge map suggestions
GET /api/wiki-peek Fast Wikipedia summary for word lookup
GET /api/disambiguate Disambiguation candidates
POST /api/topics Create topic (accepts source_urls[])
DELETE /api/topics/{id} Delete topic and all data
GET /api/topics/{id}/pedagogy Dreyfus stage + next recommended action
POST /api/topics/{id}/sources/add Add URL to topic knowledge base (async)
POST /api/topics/{id}/flashcards/generate Generate flashcards
POST /api/topics/{id}/quiz/generate Generate Bloom-leveled quiz
POST /api/quiz/answer Score answer + consolidate training data
POST /api/sessions/{id}/chat SSE streaming chat
GET /api/daily-quiz/status Quiz availability (requires mastery ≥ 80%)

Configuration

./endor.sh --model   # re-select chat model
Model VRAM Notes
mistral 4 GB Fast, excellent for teaching (default)
llama3.2:3b 2 GB Low-resource machines
gemma2:9b 5.5 GB Strong reasoning
gemma2:27b 15 GB Best quality

Web Search Priority

  1. SearXNG (local Docker) — private, no rate limits
  2. ddgs package — real web results
  3. DDG HTML scrape — requests + bs4 only
  4. DDG Instant Answer API — last resort

Troubleshooting

"No results found" in search → Re-run ./endor.sh to ensure SearXNG is running. DDG fallback activates automatically.

Flashcards not generating → Ollama must be running (curl http://localhost:11434/api/tags). Generation takes 30–90 seconds. Check terminal for [flashcards] log lines.

Wrong topic content (e.g. law degree vs ML framework) → Delete the topic (✕ on the card), recreate via the search interface, confirm the correct Wikipedia article in the disambiguation modal.

Docker permission denied → Log out and back in after being added to the docker group, or run newgrp docker && ./endor.sh.

About

iterative local AI learning tool, self-hosted internet search RAG pipeline, search caching, and DPO pairs for structured AI dataset aggregation.

Resources

Stars

Watchers

Forks

Contributors