Docs: Key Concepts · Installation · User Guide · Recipes · Architecture · Use Cases · Development
uv tool install task-mind-cli # Install Task-Mind
task-mind init # Initialize environment
task-mind server start # Start web service
# Open http://127.0.0.1:8093 in your browserNew to
uvor setting up a fresh system? See the Installation Guide for prerequisites.
AI should free people from repetitive labor, not become a new instrument of extraction.
Three beliefs that guide Task-Mind:
Chatting with AI produces nothing. The ICQ era of AI — endless conversation, zero delivery — wastes your time and money.
Task-Mind exists for results: recipes that run, scripts that execute, data that's extracted. If AI can't hand you a deliverable, it hasn't done its job.
We reject the narrative that you must wait for some company to build AGI before automation serves you.
Task-Mind is open source. Your recipes, your skills, your Git repo. You accumulate capability, not subscription fees. The tools you build are yours — portable, version-controlled, independent.
Many "AI products" are token vending machines wrapped in pretty UIs. You pay per conversation, per generation, per retry — and get nothing persistent in return.
Task-Mind attacks this directly:
- First exploration: ~150k tokens
- Every run after: ~2k tokens (98.7% saved)
The savings compound. The recipes stay. Your time returns to family, hobbies, creation — not to feeding another revenue stream.
| Version | Highlights |
|---|---|
| v0.26.0 | Workspace file browser; task-mind view media support (video, image, audio, 3D models) |
| v0.24.0 | Community recipes infrastructure; recipe install/uninstall/update/search/share commands |
| v0.23.0 | WebSocket real-time sync; YouTube recipes (download, subtitles, transcript) |
| v0.22.0 | Cross-platform autostart; task-mind autostart command; integrated into init workflow |
| v0.21.0 | i18n support; user language preference for AI title generation |
Multi-runtime automation infrastructure designed for AI agents, providing persistent context management and reusable Recipe system.
When facing prompts, AI can only "talk" but not "do"—it "talks once" but never "follows through from start to finish." Think of ChatGPT in 2023. So people designed Agents. Agents call tools through standardized interfaces.
But reality is: tasks are infinite, while tools are finite.
You ask AI to extract YouTube subtitles. It spends 5 minutes exploring, succeeds. The next day, same request—it starts from scratch again. It completely forgot what it did yesterday.
Even an Agent like Claude Code appears clumsy when facing each person's unique task requirements: every time it must explore, every time it burns through tokens, dragging the LLM from start to finish. Slow and unstable: out of 10 attempts, maybe 5 take the right path, while the other 5 are filled with "strange" and "painful" trial-and-error.
Agents lack context—that's a fact. But what kind of context do they lack?
People tried RAG, fragmenting information so Agents could retrieve and "find methods." This is "theoretically correct but practically misguided"—a massive pitfall. The key issue: each person's task requirements are "local" and bounded. They don't need a heavyweight RAG system. RAG over-complicates how individuals solve problems.
Research from Anthropic and Google both point to: directly consulting documentation. The author of this project proposed the same view in 2024. But this approach requires Agents with sufficient capability. Claude Code is exactly such an Agent.
Claude Code designed a documentation architecture: commands and skills, to practice this philosophy. Task-Mind builds on this foundation, deeply implementing the author's design philosophy: every piece of methodological knowledge must be tied to concrete executable tools.
In Task-Mind's framework, skills are collections of methodologies, and recipes are collections of executable tools.
The author's vision: through Task-Mind's Claude Code slash commands (/task-mind.run and other core commands), establish an Agent specification—enabling it to explore unfamiliar problems and standardize results into structured information; through self-awareness, proactively build the association between skills and recipes.
Ultimately, your Agent can fully understand your descriptions of work and task requirements, leverage existing skills to find and properly use relevant recipes, achieving "driving automated execution with minimal token cost."
Task-Mind is not the Agent itself, but the Agent's "skeleton."
Agents are smart enough, but not yet resourceful. Task-Mind teaches them to remember how to get things done.
Task-Mind integrates with Claude Code through four slash commands, forming a complete "explore → solidify → execute" loop.
/task-mind.run Explore and research, accumulate experience
↓
/task-mind.recipe Solidify experience into reusable recipes
/task-mind.test Validate recipes (while context is fresh)
↓
/task-mind.exec Execute quickly with skill guidance
In Claude Code, type:
/task-mind.run Research how to extract YouTube video subtitles
The Agent will:
- Create a project to store this run instance
- Use Task-Mind's basic tools (navigate, click, exec-js, etc.) to explore
- Automatically record
execution.jsonland key findings - Persist all screenshots, scripts, and output files
projects/youtube-transcript-research/
├── logs/execution.jsonl # Structured execution logs
├── screenshots/ # Screenshot archive
├── scripts/ # Validated scripts
└── outputs/ # Output files
After exploration, type:
/task-mind.recipe
The Agent will:
- Analyze the experience accumulated during exploration
- Auto-generate necessary recipes for this task
- Create corresponding skills (coming soon)
- Associate skills with recipes
Generated recipe example:
---
name: youtube_extract_video_transcript
type: atomic
runtime: chrome-js
description: "Extract complete transcript text from YouTube videos"
use_cases:
- "Batch extract video subtitle content for text analysis"
- "Create indexes or summaries for videos"
---While the session context is still fresh, test immediately:
/task-mind.test youtube_extract_video_transcript
Validation failed? Fix it on the spot, no need to re-explore. This is why recipe and test should be parallel—debugging costs more after context is lost.
Next time you have a similar need, type:
/task-mind.exec video-production Create a short video about AI
The Agent will:
- Load the specified skill (video-production)
- Follow the methodology in the skill to invoke relevant recipes
- Complete the task quickly, no repeated exploration
This is the value of the "skeleton": 5 minutes to explore the first time, seconds to execute thereafter.
The above workflow relies on Task-Mind's underlying capabilities:
| Capability | Description |
|---|---|
| Native CDP | Direct Chrome DevTools Protocol connection, ~2MB lightweight, no Node.js deps |
| Run System | Persistent task context, JSONL structured logs |
| Recipe System | Metadata-driven, three-tier priority (Project > User > Example) |
| Web Service | FastAPI backend + React frontend, browser-based GUI on port 8093 |
| Multi-Runtime | Chrome JS, Python, Shell runtime support |
Architecture Comparison:
Playwright: Python → Node.js relay → CDP → Chrome (~100MB)
Task-Mind: Python → CDP → Chrome (~2MB)
Playwright and Selenium are testing tools—launch browser, run tests, close browser. Every run starts fresh.
Task-Mind is the skeleton for AI—connect to an existing browser, explore, learn, remember. Experience accumulates.
| You need... | Choose |
|---|---|
| Quality assurance, regression testing, CI/CD | Playwright/Selenium |
| Data collection, workflow automation, AI-assisted tasks | Task-Mind |
| One-off scripts, run and discard | Playwright/Selenium |
| Accumulate experience, faster next time | Task-Mind |
Technical differences (lightweight, direct CDP, no Node.js dependency) are outcomes, not goals.
The core difference is design philosophy: testing tools assume you know what to do; Task-Mind assumes you're exploring, and helps you remember what you discovered.
Dify, Coze, and n8n are workflow orchestration tools.
Traditional usage: manually drag nodes, connect lines, configure parameters. n8n launched AI Workflow Builder that can generate workflow nodes from natural language (Dify and Coze don't have similar features yet).
But whether manual or AI-assisted, what do you end up with? A flowchart.
Then what?
- You still need to enter the platform, understand the diagram
- Run, error, go back and modify node config
- Run again, another error, modify again
- After debugging passes, the flowchart runs
AI drew the diagram for you, but debugging, modifying, maintaining—still your job.
Using Task-Mind:
/task-mind.run Scrape data from this website
No flowchart. AI goes to work directly—opens browser, clicks, extracts data, handles errors. You just wait.
When done:
/task-mind.recipe
Recipe auto-generated. Next time:
/task-mind.exec Scrape similar website
You don't need to enter any platform, don't need to look at any flowchart.
| Orchestration Tools (incl. AI-assisted) | Task-Mind | |
|---|---|---|
| What AI does | Draws flowcharts for you | Does the work directly |
| What you do | Enter platform, read diagrams, debug, modify config | State needs, wait for results |
| Output | A flowchart that needs maintenance | Reusable recipe |
Orchestration tools' AI is your "diagram assistant"; Task-Mind's AI is your "executor".
Of course, if you need scheduled triggers, visual monitoring, team collaboration approvals—orchestration tools are better fits. But if you just want to get things done—Task-Mind lets you solve problems by talking, no platform to learn.
Task-Mind is open-source—anyone can install it via PyPI. But the skeleton is universal, while the brain is personal.
Each person has:
- Their own application scenarios
- Personalized knowledge (skills)
- Custom automation scripts (recipes)
These personalized resources shouldn't live in the public package. They belong to you.
Task-Mind's philosophy: cross-environment consistency. Your resources should be available wherever you work—different machines, fresh installations, or new projects. The tool comes from PyPI; your brain comes from your private repository.
Task-Mind doesn't provide community-level cloud sync services (yet). Instead, it gives you commands to manage sync with your own Git repository.
┌─────────────┐ publish ┌─────────────┐ sync ┌─────────────┐
│ Project │ ──────────→ │ System │ ─────────→ │ Remote │
│ .claude/ │ │ ~/.claude/ │ │ Git Repo │
│ examples/ │ │ ~/.task-mind/ │ │ │
└─────────────┘ └─────────────┘ └─────────────┘
↑ │ │
│ dev-load │ deploy │
└──────────────────────────┴──────────────────────────┘
| Command | Direction | Purpose |
|---|---|---|
publish |
Project → System | Push project resources to system directories |
sync |
System → Remote | Push system resources to your private Git repo |
deploy |
Remote → System | Pull from your private repo to system directories |
dev-load |
System → Project | Load system resources into current project (dev only) |
Developer Flow (local changes → cloud):
# After editing recipes in your project
task-mind publish # Project → System
task-mind sync # System → Remote GitNew Machine Flow (cloud → local):
# First time setup on a new machine
task-mind sync --set-repo git@github.com:you/my-task-mind-resources.git
task-mind deploy # Remote Git → System
task-mind dev-load # System → Project (if developing Task-Mind)Regular User (just uses Task-Mind):
task-mind deploy # Get latest resources from your repo
# Resources are now in ~/.claude/ and ~/.task-mind/, ready to useOnly Task-Mind-specific resources are synced:
task-mind.*.mdcommands (not your other Claude commands)task-mind-*skills (not your other skills)- All recipes in
~/.task-mind/recipes/
Your personal, non-Task-Mind Claude commands and skills are never touched.
- Key Concepts - Skill, Recipe, Run definitions and relationships
- Use Cases - Complete workflow from Recipe creation to Workflow orchestration
- Architecture - Core differences, technology choices, system design
- Installation - Installation methods, dependencies, optional features
- User Guide - CDP commands, Recipe management, Run system
- Recipe System - AI-First design, metadata-driven, Workflow orchestration
- Development - Project structure, development standards, testing methods
- Roadmap - Completed features, todos, version planning
Personal thoughts on AI automation, Agent design, and lessons learned.
📍 Current Stage: Full-featured workspace with media preview support
Latest Features (v0.17.0 - v0.26.0):
- ✅ Workspace file browser - Browse run instance directories in Web UI
- ✅ Media viewer -
task-mind viewsupports video, image, audio, 3D models (glTF/GLB) - ✅ Community recipes -
recipe install/uninstall/update/search/sharefor community contributions - ✅ WebSocket real-time sync - Server push updates, reduced polling
- ✅ YouTube recipes - Download videos, extract subtitles and transcripts
- ✅ Cross-platform autostart -
task-mind autostartmanages server boot startup (macOS/Linux/Windows) - ✅ i18n support - UI internationalization with user language preferences
- ✅ Web service mode -
task-mind serverlaunches browser-based GUI on port 8093
Core Infrastructure:
- ✅ Native CDP protocol layer (direct Chrome control, ~2MB lightweight)
- ✅ Recipe metadata-driven architecture (chrome-js/python/shell runtime)
- ✅ Run command system (topic-based task management, JSONL structured logs)
- ✅ Web service backend (FastAPI + React frontend)
- ✅ CLI tools and grouped command system
See Roadmap for details
AGPL-3.0 License - see LICENSE file
Issues and Pull Requests are welcome!
- Project issues: Submit Issue
- Technical discussion: Discussions
Created with Claude Code | 2025-11