Capture, cluster, and reflect on your thoughts in real-time.
A full-stack NLP application that transforms scattered thoughts into organized, semantically meaningful clusters with AI-powered insights. Built with Streamlit, LangGraph agents, and Neo4j.
- π Brain Dump Capture: Write down thoughts, ideas, and questions instantly
- π Semantic Clustering: Automatically group related thoughts using embeddings (sentence-transformers) and HDBSCAN
- π Interactive Knowledge Graph: Visualize all thoughts as an interactive Plotly graph with color-coded clusters
- π·οΈ AI-Generated Labels: Auto-label clusters using Google Gemini LLM (e.g., "Personal Growth", "Technical Concepts")
- π€ Multi-Agent Intelligence:
- SearchAgent: Generate contextualized summaries with optional web search (Tavily)
- QuestionAgent: Generate Socratic questions for deeper reflection
- GenerationAgent: Synthesize insights across related thoughts
- FeedAgent: Curate a blog-style feed of your 5 most recent thoughts with agent insights
- π Persistent Storage: All thoughts stored in Neo4j graph database with full history
- π Real-time Updates: Refresh embeddings and recalculate clusters on demand
- Home Tab π : Semantic cluster map + comprehensive table of all thoughts
- Feed Tab π°: Blog-style cards with summaries and reflection questions
Frontend: Streamlit (single-page Python web app)
Backend: Python 3.10+ with LangChain/LangGraph
Database: Neo4j Aura (graph database)
NLP Pipeline:
- Embeddings: sentence-transformers (all-MiniLM-L6-v2)
- Clustering: HDBSCAN + UMAP dimensionality reduction
- LLM: Google Gemini 2.5 Flash (via LangChain)
Search: Tavily API (optional, with mock fallback)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STREAMLIT UI β
β (app.py - Home Tab | Feed Tab | Sidebar Controls) β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββ΄ββββββββββ
β β
βββββΌβββββ ββββββΌβββββββ
β AGENTS β β EMBEDDINGS β
β (Day2) β β & CLUSTERS β
βββββ¬βββββ β (Day 1) β
β ββββββ¬ββββββββ
β β
βββββ΄ββββββββββββββββββββΌβββββ
β BRAINDUMP_CORE.PY β
β - BrainDumpDB (Neo4j) β
β - EmbeddingEngine β
β - ClusterEngine β
β - ClusterLabelEngine β
βββββ¬βββββββββββββββββββββββββ
β
βββββΌβββββββββββββββββββ
β NEO4J GRAPH DB β
β (Aura Cloud) β
β - Dump Nodes β
β - Cluster Nodes β
β - Relationships β
ββββββββββββββββββββββββ
Input β Processing β Storage β Visualization
- User enters brain dump β
app.pysidebar - Generate embedding β
EmbeddingEngine(sentence-transformer) - Store in Neo4j β
BrainDumpDB.add_dump() - Re-cluster on demand β
ClusterEngine.cluster()+ClusterLabelEngine.label_clusters() - Visualize clusters β Plotly interactive graph (Home tab)
- Enrich with AI β Agents analyze for Feed tab
Node: Dump
βββ id: UUID
βββ text: String (the brain dump)
βββ embedding: Vector[384] (from sentence-transformer)
βββ created_at: DateTime
βββ cluster_id: Reference to Cluster
Node: Cluster
βββ id: UUID
βββ label: String (e.g., "Personal Growth")
βββ description: String (optional)
βββ created_at: DateTime
Relationships:
βββ Dump -[:IN_CLUSTER]-> Cluster
βββ Dump -[:SIMILAR_TO {weight: 0.0-1.0}]-> Dump| File | Purpose | Key Classes |
|---|---|---|
app.py |
Streamlit UI & session management | Main app logic, page layout |
braindump_core.py |
NLP pipeline & database | BrainDumpDB, EmbeddingEngine, ClusterEngine, ClusterLabelEngine |
agents.py |
AI agents for insights | QuestionAgent, SearchAgent, GenerationAgent, FeedAgent |
neo4j_maintenance.py |
Database utilities | Neo4j query helpers, debugging |
cleanup_db.py |
Reset database | Wipe all data (for testing) |
- Python 3.10+
- Neo4j Aura account (free tier available)
- API keys for Google Gemini (free) and optionally Tavily
Create a free Neo4j Aura instance:
- Go to https://console.neo4j.io/
- Sign up for a free account
- Create a new "Free" Aura instance
- Copy your connection details:
- URI:
neo4j+ssc://xxxxx.databases.neo4j.io - Username:
neo4j - Password: (set during creation)
- URI:
# Clone repository
git clone <repo-url>
cd braindump-sanctuary
# Install dependencies
pip install -r requirements.txt
# Create environment file
cp .env.example .env # or create manually
# Edit .env with your credentials.env Template:
# Neo4j (Required)
NEO4J_URI=neo4j+ssc://your-instance-id.databases.neo4j.io
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password-here
# Google Gemini (Required for AI features)
GOOGLE_API_KEY=your_google_api_key_here
# Web Search (Optional - uses mock data if not provided)
TAVILY_API_KEY=your_tavily_api_key_here
# Alternative: Perplexity (Optional)
PERPLEXITY_API_KEY=your_perplexity_key_hereGet API Keys:
- π Google Gemini (free): https://aistudio.google.com/app/apikey
- π Tavily Search (optional): https://app.tavily.com
- π§ Perplexity (optional): https://www.perplexity.ai/
streamlit run app.pyApp opens at http://localhost:8501
Left Sidebar - Main interaction hub:
- πΊοΈ Navigation: Radio buttons to toggle between Home and Feed tabs
- π Text Area: Input field for new brain dumps
- β Add to Sanctuary: Save the thought to Neo4j
- ποΈ Clear: Empty the input field
- π Refresh Clusters: Recalculate all embeddings and clusters (computationally intensive, 10-30 seconds)
β οΈ Clear All Dumps: Permanently delete all thoughts from the database- π Total Brain Dumps: Counter showing total stored thoughts
Top Section: Interactive Cluster Map
- Visualization: 2D interactive Plotly graph
- Nodes: Each point = one brain dump
- Colors: Each color = a semantic cluster (similar ideas grouped together)
- Layout: UMAP dimensionality reduction for spatial meaning
- Interactions: Hover to see full text, zoom/pan to explore
- Relationships: Edges show semantic similarity between thoughts
Bottom Section: Comprehensive Table
- Columns: Brain Dump | Cluster Label | Created At
- Sorting: Most recent first (newest at top)
- Format: Plain text, easy to copy
Actions:
- Click "π Refresh Clusters" to recalculate (when you add many new thoughts)
- Spinner animations show progress of embedding generation and labeling
What You See:
- Up to 5 Recent Cards in reverse chronological order
- Each Card Contains:
- π Title: Your full brain dump text
- π·οΈ Cluster Badge: Category label (generated by LLM)
- π Summary: AI-generated context
- If Tavily API set: Real web search + synthesis
- If not: Mock contextual data (demo mode)
- β Reflection Questions: 5 Socratic questions (from QuestionAgent)
Why Use Feed?
- Quick morning review without table scrolling
- AI-generated insights help you think deeper
- Guided reflection via questions
- Patterns become visible across multiple cards
- Perfect for journaling workflow
Morning:
1. Open app at http://localhost:8501
2. Sidebar: Type 3-5 quick thoughts
3. Click "Add to Sanctuary" each time
4. Switch to Feed tab
5. Read summaries and reflect on questions
6. (Takes 5-10 minutes)
Evening:
1. Go to Home tab
2. Click "Refresh Clusters"
3. Observe how new thoughts clustered with existing ones
4. Spot emerging themes and connections
1. Accumulate 20+ thoughts over several days
2. Go to Home tab
3. Click "Refresh Clusters"
4. Examine the visualization:
- Which clusters are densest?
- What themes emerge?
- Any surprising connections?
5. Switch to Feed tab
6. Read the agent-generated insights
7. Use for next creative session
1. Add 10+ thoughts about a specific topic
2. They should cluster together automatically
3. Hover over the cluster in Home tab
4. Read the AI-generated cluster label
5. Check Feed tab for synthesis across related thoughts
6. Use the questions to drill deeper
Role: Socratic questioning for deeper reflection
Input: Brain dump text Output: 5 open-ended questions
Example:
Brain Dump: "Why do I procrastinate on important tasks?"
Generated Questions:
1. What task are you procrastinating on right now?
2. What emotion arises when you think about starting it?
3. What would you do instead if you didn't procrastinate?
4. What is the smallest first step you could take?
5. What would happen if you started today?
Role: Web search + synthesis or mock data generation
Input: Brain dump text Output: Relevant context or summary
With Tavily API:
- Searches the web for related information
- Synthesizes findings with LLM
- Provides citations and context
Without Tavily API (Demo Mode):
- Generates plausible contextual information
- Maintains consistency with the thought
- Allows testing without API key
Role: Cross-thought synthesis
Input: Multiple related brain dumps (from same cluster) Output: Synthesized insight combining themes
Example:
Related Dumps:
- "Neural networks learn patterns"
- "Our brains also learn from patterns"
- "What if consciousness is pattern recognition?"
Synthesis: These thoughts suggest that consciousness might emerge
from the brain's pattern recognition capabilities...
Role: Orchestrates other agents for feed generation
Process:
- Selects 5 most recent thoughts
- For each thought:
- Gets cluster assignment
- Calls SearchAgent for summary
- Calls QuestionAgent for reflection questions
- Formats as blog-style cards
Create .env file in project root:
# ============ REQUIRED ============
# Neo4j Aura Connection
NEO4J_URI=neo4j+ssc://your-instance-id.databases.neo4j.io
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-super-secure-password
# Google Gemini LLM API
GOOGLE_API_KEY=AIzaSyDxxxxxxxxxxxxxxxxxxxxx
# ============ OPTIONAL ============
# Tavily Web Search (for better summaries)
TAVILY_API_KEY=tvly-xxxxxxxxxxxxxxxxx
# Perplexity API (alternative search)
PERPLEXITY_API_KEY=pplx-xxxxxxxxxxxxxxxxxCreate .streamlit/config.toml:
[theme]
primaryColor = "#4ECDC4"
backgroundColor = "#0d1117"
secondaryBackgroundColor = "#161b22"
textColor = "#c9d1d9"
[client]
showErrorDetails = true
[logger]
level = "info"
[server]
port = 8501streamlit run app.pypython cleanup_db.pypython neo4j_maintenance.py# Terminal shows Streamlit logs
# Check sidebar for spinner status during refreshIssue: "Failed to connect to Neo4j"
- Verify
.envhas correct URI, username, password - Check Neo4j Aura instance is running (console.neo4j.io)
- Firewall: Neo4j needs outbound HTTPS (port 7687)
Issue: "GOOGLE_API_KEY not found"
- Ensure key is in
.envfile - Restart streamlit:
streamlit run app.py
Issue: Clusters not showing
- Click "Refresh Clusters" in sidebar
- Wait for spinner to finish (10-30 seconds)
- Needs at least 2-3 thoughts for clustering
Issue: Tavily search not working
- Mock data is used if API key missing (expected)
- Optional feature; not required for core functionality
- Adding 1 thought: ~1 second
- Clustering N thoughts: ~O(N) to O(N log N) depending on N
- 10 thoughts: ~2 seconds
- 100 thoughts: ~5-10 seconds
- 1000 thoughts: ~30-60 seconds
- LLM labeling: ~2-5 seconds per cluster (depends on Gemini API latency)
- Neo4j Free Tier: ~5 GB storage
- Typical thought: ~500 bytes
- Can store ~10 million thoughts theoretically
- Local testing: β Recommended
- Shared team use: Consider dedicated Neo4j instance
- Large scale (10k+ thoughts): May need performance tuning (indexing, batching)
Semantic Embeddings
- sentence-transformers model converts text to 384-dimensional vectors
- Similar texts β similar vectors β can cluster together
- Distance in vector space β semantic similarity
HDBSCAN Clustering
- Density-based clustering (unlike K-means which needs K)
- Automatically finds clusters of any shape
- Robust to outliers (marks them as "noise")
Neo4j Graph Database
- Stores relationships as first-class citizens
- Fast for relationship queries (unlike SQL)
- Perfect for knowledge graphs and recommendations
LLMs for Labeling
- Google Gemini generates cluster labels from examples
- LangChain chains handle prompt + LLM + parsing
- Enables semantic understanding of cluster themes
See LICENSE file for details.
Found a bug or have a feature idea?
- Check existing issues
- Create new issue with clear description
- (Optional) Submit PR with fix
Q: Can I use this with a local Neo4j instance?
A: Yes! Update NEO4J_URI to neo4j://localhost:7687 in .env
Q: What if I don't have a Tavily API key? A: That's fine! SearchAgent will generate mock data (demo mode)
Q: Can I export all my thoughts? A: Use the table in Home tab or query Neo4j directly. Export feature coming soon.
Q: How often should I click "Refresh Clusters"? A: After adding multiple new thoughts (5+), or when you want to see updated organization.
Q: Is my data secure? A: Only you have your Neo4j password. Data is encrypted in transit and at rest on Neo4j Aura.
- Export functionality (CSV, JSON, markdown)
- Thought search & filtering
- Collaborative mode (multiple users)
- Email digest of weekly summaries
- Browser extension for quick capture
- Mobile app
- Advanced analytics dashboard
[client] showErrorDetails = true
---
## π§ Database Management
### Re-calculate Clusters
1. Go to Home tab
2. Click π **Refresh** button
3. Wait for spinner to complete
### Clear Everything and Start Fresh
1. Click ποΈ **Clear All Dumps** in sidebar
2. Confirm action
3. Add new thoughts
### Neo4j Maintenance
Use the `neo4j_maintenance.py` script:
```bash
# Show database statistics
python neo4j_maintenance.py stats
# Remove duplicate brain dumps
python neo4j_maintenance.py dedup
# Clear all data (WARNING: Irreversible)
python neo4j_maintenance.py clear
- Fast: 3-20 dumps (< 5 seconds)
- Moderate: 20-100 dumps (5-30 seconds)
- Slow: 100+ dumps (> 30 seconds)
- Use caching to avoid recomputing
- Google Gemini: Free tier allows ~60 requests/min
- Tavily Search: Free tier allows ~100 searches/month
- Set environment variables to enable features
- Dump and cluster lookups are indexed for fast retrieval
- Embedding vectors cached in graph nodes
- Knowledge graph relationships enable fast similarity searches
β Verify your Aura instance is running: https://console.neo4j.io/
β Check NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD in .env
β Ensure your IP is allowlisted in Aura instance settings
β Make sure .env file exists with GOOGLE_API_KEY
β Restart streamlit: streamlit run app.py
β You might have 100+ dumps β Only happens on first run or "Refresh" β Consider batching cluster operations for large datasets
β Your TAVILY_API_KEY isn't set
β Add it to .env and restart
β Missing dependencies: pip install -r requirements.txt
β Wrong Python version? Try python3 -m pip install ...
braindump_core.py
βββ BrainDumpDB (Neo4j Graph Database)
βββ EmbeddingEngine (sentence-transformers)
βββ ClusterEngine (HDBSCAN + UMAP)
βββ Visualization (Plotly)
agents.py
βββ QuestionAgent (Socratic questions)
βββ SearchAgent (Web search + synthesis)
app.py
βββ Sidebar (Input + Navigation)
βββ Home Tab (Knowledge Graph + Table)
βββ Feed Tab (Blog-style cards)
βββ Helper Functions (Rendering)
neo4j_maintenance.py
βββ Database statistics
βββ Deduplication
βββ Data cleanup
- Set up Neo4j Aura instance at https://console.neo4j.io/
- Add your first thought via sidebar
- Switch between Home and Feed to see different views
- Check back daily to build your idea garden
- Monitor performance using
neo4j_maintenance.py stats
- Best for: Capturing fleeting thoughts, finding patterns, exploring ideas
- Start small: 5-10 thoughts to see clustering in action
- Use specificity: "Why do dreams fade?" works better than "dreams"
- Review regularly: Feed tab is great for morning/evening review
- Share clusters: Screenshot Home tab to share your thinking
- All data stored locally in
braindump.db(no cloud sync) - Embeddings cached in database (fast retrieval)
- Cluster labels generated once and reused
- Each brain dump timestamped automatically
Ready to catch your thoughts before they slip away! π§ β¨
For issues or questions, check REFACTOR_SUMMARY.md or ARCHITECTURE.md