It's like having a perfect memory for your roleplay conversations. VectHarePlus brings intelligent context retrieval to SillyTavern with temporal decay, conditional activation, and multiple vector backends.
Branched from the original VectHare project, VectHarePlus is an advanced Retrieval-Augmented Generation (RAG) system for SillyTavern, now featuring newly added, optimized support for Japanese, Traditional Chinese, and Simplified Chinese.
I branched the original VectHare to handle the massive scale of my personal MVU Game Maker projects, which feature:
- Non-English language support (Japanese, Traditional/Simplified Chinese). It supports English by default.
- Extreme scale: 2,000+ replies per story, with 1,000+ words per reply.
- Strip out all functional tag from MVU Game Maker.
Ordinary SillyTavern memory extensions completely buckle under this load, especially when there are a lot of functional tags reside inside the story used by MVU Game Maker, which is useless for memory lookup. So, I need something that is able to clean up all these functional tags while maintain high speed vectorization on extreme scale.
Technical Requirement: Because of the high data throughput required, this system relies on a separate Qdrant vector database running via Docker.
- π© Strip out all functional tags used by MVU Game Maker before memory storage.
- π§ Adding story based memory on top of character based memory in MVU Game Maker.
- πΈ Long conversations choke your token budget with irrelevant history
- βοΈ You manually edit context to remind characters of key events
VectHarePlus Solution: Automatically extract relevant memories from your entire chat history using semantic search, with smart temporal decay that lets older memories fade naturally, and conditional rules to control exactly when memories activate.
- Japanese mode with TinySegmenter-aware extraction behavior
- Traditional Chinese mode with Jieba WASM + Traditional dictionary lazy loading
- Simplified Chinese mode with Jieba WASM support
- Language-aware keyword filtering with cleaner CJK token handling
- Optional summarization before vector storage to reduce noise and improve retrieval density
- Supports OpenRouter and local vLLM-compatible endpoints
- Configurable prompt template so you can tune summary style for your RP format
- Stop button in progress flow to halt long-running vectorization tasks
- Pause/continue style control for vector content processing workflows
- Improved control over long chat ingestion sessions without restarting everything
- Better single-character filtering defaults for CJK keywords
- Mode-specific exceptions for high-signal 1-character RPG/SoL/school terms
- Better signal-to-noise for multilingual retrieval
- Semantic search through your entire chat history
- Find relevant messages even from hundreds of messages ago
- Replace manual memory management with automatic retrieval
- Works with any embedding model (local or cloud-based)
- Memories naturally fade over time, just like humans
- Exponential or linear decay modes
- Set custom half-life for how quickly memories decay
- Protect important scenes from fading (temporally blind)
- Optional featureβdisable if you want permanent memory
- Activate memory chunks based on character emotions (happy, sad, angry, etc.)
- Trigger on conversation topics or keywords
- Smart recency checks (activate only for recent events)
- Character Expressions integration for sprite-based emotion detection
- Fallback to keyword-based emotions if no expressions extension
- Mark scenes in your chat to group related messages
- Scene chunks are treated as single units for retrieval
- Perfect for story arcs, major events, or important character moments
- Standard (Vectra): ST's built-in file-based storage (great for getting started)
- LanceDB: Disk-based, handles millions of vectors, production-ready
- Milvus: Legacy backend option kept for compatibility workflows
- Qdrant: Enterprise-grade with HNSW indexing, cloud support, advanced filtering
β οΈ VectHarePlus backend status: Only Qdrant is actively tested in VectHarePlus. Vectra, LanceDB, and Milvus are kept for backward compatibility with VectorHare and are not guaranteed to work in every setup.
- Chat conversations (with automatic chunking strategies)
- Lorebook entries (preserve structure with per-entry chunks)
- Character definitions and personality
- Custom content types
- Per Message: Each message = one chunk (best for chat recall)
- Conversation Turns: Group by speaker turns
- Message Batch: Process in configurable batches
- Per Scene: Scene-marked groups become chunks
- Browse all vector collections (chat, lorebook, character)
- View chunk counts and metadata
- Enable/disable collections on the fly
- Export and import collections for backup/sharing
- View all chunks in a collection
- Edit chunk text and metadata
- Mark chunks as temporally blind (immune to decay)
- Search and filter chunks
Built-in diagnostic tool that checks everything and offers auto-fixes for common issues.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. VECTORIZATION β
β βββββββββββββββββ β
β Chat messages are chunked and embedded into vectors β
β Each chunk stores: text, metadata, keywords, source β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. SEARCH & RETRIEVAL β
β ββββββββββββββββββββββ β
β When generating a response, recent messages are queried β
β against the vector database to find semantically similar β
β chunks from your chat history β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3. FILTERING & SCORING β
β βββββββββββββββββββββ β
β β’ Apply temporal decay (older = lower score) β
β β’ Evaluate conditional activation rules β
β β’ Boost by keywords β
β β’ Re-rank by relevance β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 4. CONTEXT INJECTION β
β βββββββββββββββββββ β
β Top-scoring chunks are formatted and injected into the β
β prompt before generation β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Backend | Best For | Pros | Cons |
|---|---|---|---|
| Standard (Vectra) | Legacy compatibility / small datasets | No dependencies, works out of box | Legacy compatibility path in VectHarePlus; behavior not guaranteed |
| LanceDB | Legacy compatibility / medium datasets | Fast local vector DB | Legacy compatibility path in VectHarePlus; behavior not guaranteed |
| Milvus | Legacy compatibility / existing deployments | Familiar ecosystem for older setups | Legacy compatibility path in VectHarePlus; behavior not guaranteed |
| Qdrant | Production, cloud deployments | Enterprise-grade, advanced filtering | Requires running Qdrant server |
π‘ Need help choosing? Use Qdrant for VectHarePlus if you want the tested path.
Memories don't stick around forever. VectHarePlus implements intelligent temporal decay that makes memories naturally fade over time.
Exponential Decay (default):
relevance = original_score Γ (0.5 ^ (message_age / half_life))
For example, with half-life = 50:
| Messages Ago | Relevance |
|---|---|
| 0 | 100% |
| 50 | 50% |
| 100 | 25% |
| 150 | 12.5% |
- Enabled: Toggle decay on/off (default: OFF)
- Mode: Exponential or Linear
- Half-life: Messages until 50% relevance (default: 50)
- Floor: Minimum relevance, prevents complete forgetting (default: 0.3)
- Temporally Blind: Mark important chunks to be immune to decay
π‘ Pro Tip: Set a high floor (0.5+) to keep important memories accessible even when old. Mark character introductions as temporally blind!
Control precisely when chunks activate using intelligent rules.
| Type | Description | Example |
|---|---|---|
| π¬ Emotion | Activate when character feels specific emotion | Activate sad memories when character is sad |
| π Keyword | Activate when keywords appear in chat | Activate "treasure" memories when discussing treasure |
| π Recency | Activate only for recent messages | Only use memories from last 10 messages |
| π― Combined | Mix multiple conditions with AND/OR | Emotion=happy AND keyword contains "party" |
Supports 28 emotion types with Character Expressions integration!
- Open SillyTavern in your browser
- Go to Extensions panel (puzzle piece icon)
- Click "Install Extension"
- Paste this URL:
https://github.com/kritblade/VectHare - Click Install
That's it! VectHarePlus will be downloaded and enabled automatically.
- Open VectHarePlus Settings (π° icon in the extensions panel)
- Select your embedding provider (Transformers, OpenAI, Ollama, BananaBread, etc.)
- Configure API keys if using cloud providers
cd SillyTavern/plugins
git clone -b Similharity-Plugin https://github.com/kritblade/VectHare.git similharity
cd similharity
npm installAdd to config.yaml:
enableServerPlugins: trueRestart SillyTavern.
VectHarePlus has auto_update: true in its manifest. If you installed via git clone, SillyTavern will automatically check for and apply updates!
Look for the update notification in the Extensions panel, or manually check with the "Check for Updates" button.
| Setting | Description |
|---|---|
| Vector Backend | Standard, LanceDB, or Qdrant |
| Embedding Provider | 15+ providers supported |
| Summary Provider | 2 providers supported |
| API URL | Custom endpoint for local providers |
| Setting | Description |
|---|---|
| Enable Auto-Sync | Automatically vectorize new messages |
| Chunking Strategy | Per Message, Conversation Turns, Message Batch, Per Scene |
| Score Threshold | Minimum similarity to include chunk (0.0-1.0) |
| Query Depth | How many chunks to retrieve |
| Insert Count | How many chunks to inject into prompt |
| Setting | Description |
|---|---|
| Enabled | Toggle decay system |
| Mode | Exponential or Linear |
| Half-life | Messages until 50% relevance |
| Floor | Minimum relevance multiplier |
- Per Message chunks work best for dialogue-heavy chats
- Mark scenes for major events to keep them cohesive
- Set temporally blind on character intros so your AI never forgets who people are
- Start with Standard backend - upgrade to LanceDB when needed
- Large chats (10k+ messages)? LanceDB handles it smoothly
- Lower score threshold if memories aren't being retrieved (try 0.3)
- Pair emotions with Character Expressions for sprite-based detection
- Add topic keywords to make memories context-aware
- Use recency rules for time-sensitive information
- Export collections regularly as backups
- Run diagnostics if something feels off
- Check the Database Browser to see what's actually stored
- Enable Vectors extension in main ST settings
- Select embedding provider in VectHarePlus settings
- Add API key if using cloud provider
- Run Diagnostics to verify connectivity
- Click "Vectorize" button to index current chat
- Lower score threshold (try 0.3)
- Check Chunk Visualizer to verify chunks exist
- Run Diagnostics for detailed health check
- Run Diagnostics to see which backend failed
- LanceDB: Ensure Similharity plugin is installed
- Qdrant: Ensure Qdrant server is running
- Fallback: Switch to Standard backend
- Switch to LanceDB backend for large datasets
- Increase chunk size (fewer, larger chunks)
- Reduce query depth and insert count
- Mark important chunks as temporally blind
- Increase the decay floor value
- Lower score threshold
- Add conditional activation rules for topic-specific recall
Detailed docs available in the /docs folder:
ARCHITECTURE.md- System designPLUGGABLE_BACKENDS.md- Backend implementationMETADATA_ARCHITECTURE.md- Chunk metadata systemTEMPORAL_DECAY.md- Decay formulas and tuning
- SillyTavern (latest version)
- Embedding Provider (one of 15+ supported)
- Similharity Plugin - For LanceDB/Qdrant backends
- Character Expressions - For sprite-based emotion detection
Found a bug? Have an idea? Contributions welcome!
- π Issues: Report bugs on GitHub
- π‘ Features: Open a discussion first
- π§ PRs: Follow the code standards in
CLAUDE.md
MIT License - See LICENSE file for details.
VectHarePlus is branched from VectHare which is created by Coneja Chibi Special thanks to the SillyTavern community for feedback and testing!
If VectHarePlus helps your roleplay:
- β Star the repo on GitHub
- π¬ Share your experience
- π Report bugs to help improve it
- π Contribute docs or examples
"It's like having a memory that actually works." π°β¨