A floating desktop AI assistant built with Java 17, Spring Boot, and JavaFX. Think Jarvis for your PC — a swirling orb sits on your desktop, expanding into a full AI command center with voice, vision, proactive actions, browser automation, and connections to 10 messaging platforms.
- Floating window — always-on-top, draggable orb with custom title bar
- Tabbed interface — Chat, Browser, Agents, Integrations, Setup, Skills, Schedules, Todo, Directives, Personality, Knowledge, Voice, Calibration, Workflows, Templates, Marketplace, Dashboard, Multi-Agent, Automations
- Command palette —
Ctrl+Kfor quick access to all commands and tabs - Chat search —
Ctrl+Fto search through message history - Keyboard shortcuts —
Ctrl+/to view all shortcuts,Ctrl+Lto clear chat - Smooth transitions — animated tab switching and message appearances
- Sound effects — subtle audio feedback for sent/received/notification/error (toggleable)
- Styled tooltips — hover over any toolbar icon for a descriptive tooltip
- System tray — minimize to tray for background operation
- Multi-model support — OpenAI (GPT-5.1, GPT-4o), Google Gemini (2.5 Pro, 3 Flash), Anthropic Claude (Opus, Sonnet), and local models via Ollama
- 100+ built-in tools — the AI can invoke tools across files, browser, system, media, health, finance, GitHub, and more
- Dynamic tool routing — AI classifier selects relevant tools per message (respects 128-tool API limit)
- Task planning — numbered checklist before executing complex multi-step tasks
- Autonomous mode — works on directives independently when you're idle
- Chat memory — persistent transcript history across restarts
- Proactive Action Mode (lightning bolt icon) — continuously monitors your screen, pending tasks, and directives, then takes action automatically
- Screen check every 15s — detects dialogs, forms, errors, notifications and acts on them
- Task check every 30s — completes pending to-do items proactively
- Directive check every 60s — executes standing directives
- Safety: skips when you're actively working, 60s cooldown per action, speaks actions aloud via TTS
- Jarvis Watch Mode (eye icon) — AI actively comments on your screen like a real assistant
[COMMENT]— conversational tips, warnings, and observations appear as chat messages[REACT]— auto-types into forms, quizzes, and prompts[SILENT]— stays quiet when nothing interesting is happening- 10-second cooldown between comments, semantic deduplication to avoid repetition
- Auto-pilot (brain icon) — proactive screen help suggestions
- Keyboard & mouse control (keyboard icon) — allow the bot to click and type on your behalf
- Voice input — speech-to-text via Web Speech API and native microphone capture
- Text-to-speech — ElevenLabs, Fish Audio, OpenAI TTS, or Windows native voice
- Gemini Live — real-time bidirectional audio streaming with language translation
- Screen analysis — live screen capture with Gemini vision + OCR before every AI response
- Webcam — capture and analyze webcam feed
- Audio listening — background audio capture and transcription with model selection
- Chrome DevTools Protocol — control your real Chrome browser (navigate, click, extract data, fill forms)
- JavaScript injection —
browserFillFormandbrowserExecuteJsfor instant form filling via CDP (~10ms) - Playwright — headless browser automation for web scraping
- System control — execute system commands, manage processes, control applications
- Window manager — arrange windows, app switching, snap to edges (Win+Arrow)
- Bot window control — tell the bot to move itself ("move yourself to the left")
- Automations — custom trigger/action rules ("when message contains X, do Y")
- Clipboard history — tracks last 200 clipboard entries, searchable by AI
- Life profile — 11 sections: Routines, Preferences, Relationships, Goals, Health, Finance, Locations, Vehicles, Pets, Important Dates, Notes
- Episodic memory — stores life events as searchable JSON episodes with type, tags, people, mood, importance
- Auto-memory extraction — automatically detects life facts from conversations and saves them
- Personal config — name, birthdate, family, work info loaded into every AI response
- Knowledge base — upload documents (PDF, Word, Excel, code, etc.) for AI reference
- Log water, meals, exercise, weight, mood, sleep, medications
- Daily health summaries and multi-day trend analysis
- Set and track health goals
- Log expenses and income with categories
- Monthly budgets with real-time tracking
- Bill tracking with due date alerts
- Debt overview and financial goal tracking
- Monthly reports by category
- Morning briefings, break reminders, hydration reminders
- Meeting prep, bill reminders, relationship nudges
- Weekly goal check-ins, weather alerts
- Custom rules with quiet hours support
- List repos, branches, README content, search repos
- Create/list/comment on issues and pull requests
- View notifications, activity feed, gists
- Monitor CI/CD workflow runs
- Scaffold and manage Remotion projects
- Create custom React video compositions
- Quick text videos with animated typography
- Slideshow videos from images with crossfade transitions
- Render to MP4 via CLI
- Music control — play/pause/skip, volume control via media keys (Spotify, Windows Media Player, any player)
- Video downloader — yt-dlp wrapper for YouTube, TikTok, and 1000+ sites (auto-installs via winget)
- Image tools — resize, crop, filter
- Screen recording — capture and export MP4
- QR codes — generate and scan
- Social monitor — track contacts, birthdays, posts, and mentions across platforms
- Trend scout — monitor YouTube/web for topics you're interested in, surfaces what's trending
- Habit detection — learns your routines from time/day patterns, persisted to
habit_events.json - Feedback loop — rates suggestions, improves over time
- Skill auto-creation — save named workflows with trigger phrases, auto-executes on match
- Email — send/read via SMTP/IMAP + Gmail API
- Calendar — Google Calendar integration
- Web search — Serper, SerpAPI, or DuckDuckGo
- Web monitoring — track website changes
- Code audit — clone repos, scan for SQL injection, hardcoded secrets, unused imports
- File operations — read, write, search, download, export, Excel, Word, PDF
- Utilities — calculator, unit conversion, hash, timers
- Software management — install/uninstall via winget
- Network diagnostics — ping, traceroute, port scan
- Printer control — list printers, print documents
Each skill is a self-contained sub-package (com.minsbot.skills.<name>) with its own Config + Service + Controller,
exposed at /api/skills/<name>/*, and also available to the chat LLM as @Tool methods via 8 themed wrapper classes.
All disabled by default — enable per-skill via app.skills.<name>.enabled=true in application.properties.
| Category | Skill count | Examples |
|---|---|---|
| Dev & Data | 17 | encoder, hashcalc, jsontools, yamltools, csvtools, sqlformatter, difftool, regextester, regexinferrer, markdowntools, dockerfilelint, loganalyzer, httptester, cronvalidator, fakedatagen, randomgen, unitconvert |
| SEO | 6 | metaanalyzer, keywordextractor, sitemapchecker, robotschecker, readability, sluggenerator |
| Marketing | 5 | utmbuilder, subjectanalyzer, charcounter, abtestcalc, hashtagsuggest |
| Security Analyst | 10 | passwordstrength, hibpcheck, jwtinspector, certinspector, headeraudit, dnslookup, cvelookup, hashidentifier, secretsscan, emailvalidator |
| Privacy/Encryption | 3 | piiredactor, exifstripper, encryptionaes |
| Science/Math | 7 | probabilitycalc, matrixops, physicscalc, geodistance, statsbasics, geometrycalc, langdetector |
| Finance | 6 | financecalc, taxcalc, realestatecalc, stockindicators, breakevencalc, depreciationcalc, cashflowforecast |
| Health/Fitness | 5 | bmicalc, macrocalc, pacecalc, heartratezones, medicalunits |
| Productivity | 11 | notes, reminders, timer, clipboardhistory, okrtracker, timezoneconvert, meetingcost, slacalc, pomodoroplanner, flashcardmaker, netinfo |
| Content/Writing | 7 | writingtools, headlineanalyzer, citationformatter, markdownhtml, numberwords, colortools, imagemeta |
| Culinary | 1 | recipescaler |
| Education | 1 | gradecalc |
| File system | 1 | diskscan |
LLM wrapper classes in com.minsbot.agent.tools.* (each registered as a ToolRouter category):
SkillDevTools→dev_skillsSkillProductivityTools→productivity_skillsSkillSeoMarketingTools→seo_marketing_skillsSkillSecurityTools→security_skillsSkillProfessionTools→profession_skillsSkillDataToolsExtra→data_skills_extraSkillCalcTools→calc_skillsSkillExtrasTools→extras_skills
The AI classifier routes user queries to the right category automatically (e.g. "is my password strong?" → security_skills,
"what's 5% compound interest over 10 years?" → profession_skills / calc_skills).
- Parallel agents — launch up to 24 concurrent AI agents on isolated missions
- Per-agent model selection — pick GPT-5.4, GPT-4o-mini, Claude, or Gemini per agent
- Agent dashboard — live progress bars, log stream, plan view, status badges
- Download results — export agent output as a Markdown file
- Cancel / remove — cancel running agents or clear finished ones
- Usage metrics — token counts, response times, tool invocation counts
- Module stats — which vision/audio models are active and how many calls they've handled
- Status bar — 2-row bar shows vision engine, audio module, and live counts
Connect the same AI to any combination — all share the same reply logic:
| Platform | Webhook Endpoint | Config Prefix |
|---|---|---|
| Viber | POST /api/viber/webhook |
app.viber.* |
| Telegram | POST /api/telegram/webhook |
app.telegram.* |
| Discord | POST /api/discord/interactions |
app.discord.* |
| Slack | POST /api/slack/events |
app.slack.* |
POST /api/whatsapp/webhook |
app.whatsapp.* |
|
| Messenger | POST /api/messenger/webhook |
app.messenger.* |
| LINE | POST /api/line/webhook |
app.line.* |
| Teams | POST /api/teams/messages |
app.teams.* |
POST /api/wechat/webhook |
app.wechat.* |
|
| Signal | POST /api/signal/webhook |
app.signal.* |
All integrations are disabled by default and conditionally loaded — disabled platforms don't consume memory.
- Java 17 (JDK 17 or later)
- Maven 3.6+
- Windows, macOS, or Linux
- Node.js 18+ (optional — for Remotion video creation)
- API keys for AI services you want to use (see Configuration)
git clone https://github.com/ceasis/mins-bot.git
cd mins-botCreate a file called application-secrets.properties in the project root (this file is gitignored):
# Required — at least one AI provider
spring.ai.openai.api-key=YOUR_OPENAI_API_KEY
# Optional — Gemini
gemini.api.key=YOUR_GEMINI_API_KEY
# Optional — Claude
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_KEY
# Optional — GitHub
GITHUB_TOKEN=YOUR_GITHUB_TOKEN
# Optional — ElevenLabs TTS
app.elevenlabs.api-key=YOUR_ELEVENLABS_API_KEY
app.elevenlabs.voice-id=YOUR_VOICE_ID
# Optional — Email
spring.mail.host=smtp.gmail.com
spring.mail.username=YOUR_EMAIL
spring.mail.password=YOUR_APP_PASSWORDmvn clean package -DskipTestsOption A — Maven (recommended)
mvn spring-boot:runOption B — Batch script (Windows)
run.batOption C — JAR
java --add-modules javafx.controls,javafx.web,javafx.fxml \
--add-opens java.base/java.lang=ALL-UNNAMED \
-jar target/mins-bot-1.0.0-SNAPSHOT.jarOption D — Windows Installer
build-installer.batCreates an MSI installer in target/installer/ (requires JDK 17+ and WiX Toolset).
- The chat panel appears on your desktop
- Type a message and press Enter
- Click the microphone for voice input
- Ctrl+K to open command palette
- Toolbar icons: eye (watch), keyboard (control), headphones (listen), brain (autopilot), lightning (proactive)
All configuration lives in src/main/resources/application.properties. Sensitive values go in application-secrets.properties (gitignored).
| Property | Description | Default |
|---|---|---|
server.port |
HTTP port | 8765 |
app.window.expanded.width |
Chat panel width (px) | 456 |
app.window.expanded.height |
Chat panel height (px) | 520 |
app.window.always-on-top |
Keep window above all others | true |
| Property | Description | Default |
|---|---|---|
spring.ai.openai.chat.options.model |
OpenAI chat model | gpt-5.1 |
app.gemini.vision-model |
Gemini vision model | gemini-3-flash-preview |
app.gemini.reasoning-model |
Gemini reasoning model | gemini-2.5-pro |
app.claude.model |
Claude model | claude-opus-4-6 |
app.tool-classifier.model |
Tool routing classifier | gpt-4o-mini |
| Property | Description | Default |
|---|---|---|
app.planning.enabled |
Task planning before execution | true |
app.autonomous.enabled |
Autonomous mode when idle | true |
app.chat.live-screen-on-message |
Auto-capture screen before replies | true |
app.proactive.enabled |
Proactive engine (briefings, reminders) | false |
app.proactive-action.enabled |
Proactive action mode (auto-act) | false |
app.cdp.enabled |
Chrome DevTools Protocol | true |
app.tray.enabled |
System tray icon | true |
| Property | Description | Default |
|---|---|---|
app.proactive-action.screen-check-seconds |
Screen check interval | 15 |
app.proactive-action.task-check-seconds |
Task check interval | 30 |
app.proactive-action.directive-check-seconds |
Directive check interval | 60 |
app.proactive.check-interval-ms |
Proactive engine check interval | 300000 |
app.proactive.quiet-hours-start |
Quiet hours start | 22 |
app.proactive.quiet-hours-end |
Quiet hours end | 7 |
| Shortcut | Action |
|---|---|
Ctrl+K |
Command palette |
Ctrl+/ |
Shortcuts help |
Ctrl+F |
Search chat messages |
Ctrl+L |
Clear chat |
Enter |
Send message |
Arrow Up/Down |
Input history |
Esc |
Close overlay/palette |
| Document | Description |
|---|---|
| docs/SETUP.md | Complete setup guide — prerequisites, env vars, platform setup, troubleshooting |
| docs/TOOLS.md | Full tool reference — 100+ tools organized by category |
| CLAUDE.md | Developer context for AI assistants |
mins-bot/
├── pom.xml
├── LICENSE
├── CLAUDE.md # AI developer context
├── run.bat / build-installer.bat # Windows launch scripts
├── docs/
│ ├── SETUP.md # Full setup guide
│ └── TOOLS.md # Tool reference (100+ tools)
│
├── src/main/java/com/minsbot/
│ ├── FloatingAppLauncher.java # JavaFX entry point
│ ├── MinsbotApplication.java # Spring Boot entry point
│ ├── ChatService.java # Core agent loop & AI orchestration
│ ├── ChatController.java # REST API endpoints
│ │
│ ├── agent/
│ │ ├── ProactiveActionService.java # Jarvis-like auto-action engine
│ │ ├── ProactiveEngineService.java # Briefings, reminders, nudges
│ │ ├── EpisodicMemoryService.java # Life event memory system
│ │ ├── AutoMemoryExtractor.java # Auto-detect life facts from chat
│ │ ├── ScreenStateService.java # Screen capture + AI analysis
│ │ ├── SystemContextProvider.java # System prompt builder
│ │ └── tools/ # 100+ tool implementations
│ │ ├── ToolRouter.java # Dynamic tool selection
│ │ ├── ToolClassifierService.java # AI-powered tool categorization
│ │ ├── HealthTrackerTools.java # Health logging & trends
│ │ ├── FinanceTrackerTools.java # Expense/budget tracking
│ │ ├── GitHubTools.java # GitHub API (18 tools)
│ │ ├── RemotionVideoTools.java # Programmatic video creation
│ │ ├── LifeProfileTools.java # Personal life profile
│ │ ├── EpisodicMemoryTools.java # Memory recall & search
│ │ └── ...
│ │
│ ├── skills/ # 81 pluggable skill packages (Config/Service/Controller each)
│ │ ├── encoder/ hashcalc/ jsontools/ regextester/ randomgen/
│ │ ├── unitconvert/ notes/ reminders/ timer/ clipboardhistory/
│ │ ├── netinfo/ metaanalyzer/ keywordextractor/ sitemapchecker/
│ │ ├── robotschecker/ readability/ sluggenerator/ utmbuilder/
│ │ ├── subjectanalyzer/ charcounter/ abtestcalc/ hashtagsuggest/
│ │ ├── passwordstrength/ hibpcheck/ jwtinspector/ certinspector/
│ │ ├── headeraudit/ dnslookup/ cvelookup/ hashidentifier/
│ │ ├── secretsscan/ emailvalidator/ colortools/ imagemeta/
│ │ ├── writingtools/ citationformatter/ statsbasics/ financecalc/
│ │ ├── taxcalc/ realestatecalc/ stockindicators/ meetingcost/
│ │ ├── timezoneconvert/ slacalc/ bmicalc/ medicalunits/
│ │ ├── recipescaler/ geometrycalc/ langdetector/ gradecalc/
│ │ ├── okrtracker/ cronvalidator/ csvtools/ difftool/
│ │ ├── yamltools/ sqlformatter/ markdowntools/ dockerfilelint/
│ │ ├── probabilitycalc/ matrixops/ physicscalc/ geodistance/
│ │ ├── breakevencalc/ depreciationcalc/ cashflowforecast/
│ │ ├── macrocalc/ pacecalc/ heartratezones/ headlineanalyzer/
│ │ ├── markdownhtml/ numberwords/ piiredactor/ exifstripper/
│ │ ├── encryptionaes/ fakedatagen/ flashcardmaker/ loganalyzer/
│ │ ├── pomodoroplanner/ httptester/ regexinferrer/ diskscan/
│ │ └── ...
│ └── [Platform]*.java # 10 messaging integrations
│
├── src/main/resources/
│ ├── application.properties
│ └── static/ # Frontend (HTML/CSS/JS)
│
├── src/test/java/ # 51 unit tests
│ └── com/minsbot/agent/
│ ├── EpisodicMemoryServiceTest.java
│ └── tools/
│ ├── HealthTrackerToolsTest.java
│ ├── FinanceTrackerToolsTest.java
│ └── LifeProfileToolsTest.java
│
└── ~/mins_bot_data/ # Persistent data (user home)
├── personal_config.txt
├── life_profile.txt
├── directives.txt
├── health/ # Health logs
├── finance/ # Finance logs
├── episodic_memory/ # Life event JSONs
├── knowledge_base/ # Uploaded documents
├── remotion/ # Video project
└── videos/ # Rendered videos
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/chat |
Send a message, get a reply |
GET |
/api/chat/history |
Load recent chat history |
GET |
/api/chat/status |
Poll tool execution updates |
POST |
/api/chat/clear |
Clear memory and transcript |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/proactive-action/toggle |
Toggle proactive action mode |
GET |
/api/status/proactive-action |
Proactive action status |
POST |
/api/autopilot/toggle |
Toggle auto-pilot |
GET |
/api/status/autopilot |
Auto-pilot status |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/health |
System health check |
GET |
/api/version |
App version |
POST |
/api/briefing |
Generate daily briefing |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/kb/upload |
Upload document |
GET |
/api/kb/list |
List documents |
GET |
/api/kb/read/{name} |
Read document |
DELETE |
/api/kb/{name} |
Delete document |
- Music control — play/pause/skip, volume via media keys (any player)
- Video downloader — yt-dlp wrapper (YouTube, TikTok, 1000+ sites)
- Clipboard history — 200-entry rolling history, AI-searchable
- Auto-pilot mode — proactive screen-watching suggestions with TTS
- Habit & learning layer — habit detection, feedback loop, skill auto-creation
- Social monitor — contacts, birthdays, posts, mentions
- Trend scout — YouTube/web interest monitoring
- Mobile access — server binds to 0.0.0.0, responsive CSS
- Auto-memory extraction — life facts auto-saved from conversations
- JavaScript injection — CDP-based instant form fill (~10ms)
- Bot window self-control — "move yourself to the left"
- Per-agent model selection — dropdown per background agent
- Agent output download — export agent result as Markdown
- 2-row status bar — vision/audio module info always visible
- Dashboard tab — token usage, tool counts, uptime
- Multi-agent chat — multiple AI personas collaborating
- Automations tab — trigger/action rule engine
- Code audit tools — git clone + vulnerability scan
- Smart home integration — Home Assistant / MQTT for lights, thermostat, locks, cameras
- Contact CRM — relationship tracking with last interaction, birthday alerts, gift ideas
- Daily briefing dashboard — visual home screen with weather, calendar, tasks, health, budget
- Sidebar navigation — collapsible icon sidebar replacing the horizontal tab bar
- Rich message cards — structured cards for health logs, finance, weather, bills
- Wake word detection — always-listening "Hey Mins" trigger
- Workflow builder — visual drag-and-drop automation chains
- Location awareness — GPS/IP-based triggers ("you're near the grocery store")
- Subscription tracker — track all subscriptions, total cost, renewal reminders
- Docker management — list/start/stop containers, view logs
- Plugin marketplace — community-created skills and tools
- Multi-device sync — seamless handoff between desktop, phone, and wearables
- Voice cloning — custom Jarvis voice via voice training
- AR/camera integration — point phone camera for real-world AI assistance
- Offline mode — local LLM fallback when internet is unavailable
| Problem | Solution |
|---|---|
no jfxwebkit in java.library.path |
Use mvn spring-boot:run or add -Djava.library.path=target/javafx-natives |
JavaFX runtime components are missing |
Add --add-modules javafx.controls,javafx.web,javafx.fxml to VM options |
| Window doesn't appear | Check port 8765 is free; try http://localhost:8765/ in a browser |
| Circular dependency on startup | Check for @Lazy annotations on ToolRouter injections |
| Voice not working | Try opening http://localhost:8765/ in Chrome instead of the JavaFX window |
| Messaging webhook not receiving | Ensure HTTPS (ngrok) and correct webhook URL |
| Ollama models not loading | Install from ollama.com and pull your model |
| GitHub tools not working | Set GITHUB_TOKEN environment variable with a Personal Access Token |
| Remotion render fails | Ensure Node.js 18+ is installed; run setupRemotion first |
This project is licensed under the GNU Affero General Public License v3.0 with the Commons Clause license condition.
You may:
- Use, view, and modify the source code
- Distribute copies under the same license terms
- Use it for personal and internal purposes
You may not:
- Sell the software or offer it as a paid hosted service
- Distribute modified versions without sharing the source code
- Remove license or copyright notices
See LICENSE for the full text.