🤖 J.A.R.V.I.S

Just A Rather Very Intelligent System

A real-time AI voice assistant powered by Google Gemini Live — with 30+ tools, forever memory, autonomous AI agents, self-evolving skills, and an Iron Man HUD.

Talk to your computer like Tony Stark talks to JARVIS.

JARVIS listens through your microphone, thinks with Gemini, controls your PC with 30+ tools, remembers everything forever, spawns autonomous AI agents, evolves new skills on the fly — and responds with Paul Bettany's calm British wit — all in a single python main.py.

What is JARVIS?

JARVIS is a voice-controlled AI assistant inspired by Tony Stark's companion from the Iron Man films. It uses Google's Gemini 2.5 Flash Native Audio for real-time bidirectional voice conversation over WebSocket — you talk, JARVIS listens, thinks, acts, and responds instantly with voice.

Key Features

Feature	Description
Real-Time Voice	Gemini 2.5 Flash native audio — talk naturally, JARVIS responds instantly in the "Charon" voice
Bilingual	English and Hebrew — JARVIS auto-detects and responds in the language you speak
30+ Tools	Launch apps, browse the web, control mouse/keyboard, manage files, run code, set reminders, and much more
Forever Memory	SQLite + ChromaDB vector store — JARVIS remembers your name, preferences, and past conversations across sessions
Autonomous Agents	Spawn AI-powered agents with their own Gemini brain — they see the screen, think, act, and report back
Self-Evolution	JARVIS can create new tools for itself at runtime, or install skills from GitHub repos
Iron Man HUD	Tkinter GUI with animated waveform, system gauges, conversation log, Mission Control panel
Purchase Protection	Financial transactions require explicit user approval — JARVIS will never spend money without asking
Pure Python	Single command to run — no Node.js, no Docker, no complex infrastructure

Quick Start

1. Clone the repository

git clone https://github.com/adirangel/JARVIS.git
cd JARVIS

2. Run setup (installs all dependencies + Playwright browsers)

python setup.py

3. Launch JARVIS

python main.py

On first launch, JARVIS will display an API key entry screen. Enter your Gemini API key (get one free at ai.google.dev). The key is saved locally in config/api_keys.json and never leaves your machine.

That's it — start talking!

How It Works

JARVIS runs entirely on your local machine, with Google Gemini handling the AI processing in the cloud. Here's the complete flow from your voice to JARVIS's response:

                         YOUR COMPUTER                              CLOUD
                    ┌─────────────────────────────┐          ┌──────────────┐
                    │                             │          │              │
  🎤 You speak ──► │  PyAudio captures mic       │          │              │
                    │  (16kHz audio stream)       │          │              │
                    │         │                   │          │              │
                    │         ▼                   │          │              │
                    │  Send audio via WebSocket ──┼────────► │  Gemini 2.5  │
                    │                             │          │  Flash Live  │
                    │                             │          │  (Native     │
                    │  Receive response ◄─────────┼──────────│   Audio)     │
                    │    ├── Audio (24kHz voice)   │          │              │
                    │    ├── Text transcription    │          │              │
                    │    └── Tool calls            │          │              │
                    │         │                   │          └──────────────┘
                    │         ▼                   │
                    │  ┌─ Audio → Speakers 🔊     │
                    │  ├─ Text  → GUI log         │
                    │  └─ Tools → Local execution │
                    │         │                   │
                    │         ▼                   │
                    │  Save to Memory             │
                    │  (SQLite + ChromaDB)         │
                    └─────────────────────────────┘

Step by Step

Capture — PyAudio captures your microphone input at 16kHz in real-time
Stream — Audio is streamed to Gemini 2.5 Flash Live API via WebSocket (with echo suppression to prevent JARVIS from hearing itself)
Process — Gemini processes your speech and returns three things simultaneously:
- Audio response (24kHz) — JARVIS's voice reply
- Text transcription — what you said and what JARVIS says (displayed in GUI)
- Tool calls — if JARVIS needs to take action (open an app, search the web, etc.)
Execute tools — Tool calls are dispatched to local Python functions that control your PC
Play audio — JARVIS's voice response plays through your speakers
Save memory — The conversation is saved to SQLite + ChromaDB for long-term recall

Threading Model

JARVIS uses two threads:

Main thread — Tkinter GUI (Iron Man HUD)
Daemon thread — GeminiLive engine with 4 concurrent async tasks:
- _listen_audio — captures mic input
- _send_realtime — streams audio to Gemini
- _receive_audio — handles Gemini's audio/text/tool responses
- _play_audio — plays JARVIS's voice through speakers

The Iron Man HUD

JARVIS comes with a full Tkinter-based Iron Man HUD with a 3-column layout:

Panel	Contents
Left — System Specs	CPU, RAM, Disk, GPU, Network gauges with color coding (green → yellow → red) + uptime counter
Center — Main View	Animated waveform/audio visualizer, JARVIS status indicator (ONLINE / LISTENING / SPEAKING / PROCESSING), scrolling conversation log with Hebrew BiDi support
Right — Mission Control	Activity log, currently active tools, tasks & reminders list, sub-agent status with live progress
Bottom	Text input bar for typed commands (when you can't talk)

Color scheme: dark background (#060a13) with cyan/electric blue accents (#00e5ff), matching the Iron Man aesthetic.

On first run, an animated arc-reactor API key entry screen appears.

Tools (30+)

JARVIS can control your computer through 30+ built-in tools. When you ask JARVIS to do something, Gemini automatically decides which tool to call.

Computer & Apps (6)

Tool	What It Does
`open_app`	Launch any application — Chrome, Spotify, VS Code, Notepad, etc.
`computer_settings`	Volume, brightness, minimize/maximize windows, scroll, zoom, screenshot, lock, restart, shutdown, hotkeys, dark mode, WiFi toggle
`computer_control`	Direct mouse & keyboard control — click, type, drag, press keys, hotkeys, scroll
`window_manager`	List, switch, snap, minimize, or close windows
`process_manager`	List running processes, kill processes, check if an app is running
`clipboard_manager`	Read from or write to the system clipboard

Web & Information (4)

Tool	What It Does
`web_search`	Search the web via DuckDuckGo — returns titles + snippets
`browser_control`	Full browser automation using your real Chrome/Edge — navigate, click, type, fill forms, take screenshots with AI vision analysis, press keys, use hotkeys, manage tabs
`weather_report`	Current weather for any city (Open-Meteo API) — temperature, humidity, wind
`news`	Get latest news headlines by topic

Productivity (6)

Tool	What It Does
`reminder`	Set, list, and complete persistent reminders (saved to SQLite)
`task_manager`	Add, list, complete, and delete tasks (saved to SQLite)
`notes`	Save, list, search, and delete persistent notes
`timer`	Set countdown timers with notification
`daily_briefing`	Morning report — time, weather, system stats, tasks, reminders, news
`get_current_time`	Current date/time in any timezone (maps city names automatically)

Files & Code (3)

Tool	What It Does
`file_controller`	List, create, delete, move, copy, rename, read, write, search files + create directories
`code_helper`	Write, edit, read, explain, and run code files (with visible terminal output)
`cmd_control`	Execute shell commands (CMD/PowerShell), optionally in a visible terminal

Media & Utilities (3)

Tool	What It Does
`youtube_video`	Play YouTube videos by search query or show trending videos
`screen_process`	Take a screenshot and analyze what's on screen using vision
`translate`	Multi-language translation (Hebrew, English, and more)

System (1)

Tool	What It Does
`system_status`	CPU, RAM, and disk usage — formatted for speech

Agent Management (6)

Tool	What It Does
`spawn_agent`	Create a background agent (types: command, script, monitor, autonomous, research, tool, multi-step)
`agent_status`	Check the status of one or all running agents
`agent_result`	Get the full result from a completed agent
`stop_agent`	Stop a running agent
`agent_message`	Send messages between agents or broadcast to all
`remove_agent`	Remove a stopped or completed agent from the registry

Self-Evolution (2)

Tool	What It Does
`skill_manager`	Create new tools at runtime, install skills from GitHub repos or URLs, list/run/remove skills
`purchase_approval`	Request/confirm user approval before any financial transaction

Memory System

JARVIS has a forever memory — it remembers your name, preferences, and past conversations across sessions. Memory is stored locally on your machine and never shared.

How Memory Works

Conversation happens
        │
        ▼
┌─────────────────────┐
│  Every 3rd turn:    │
│  LLM Fact Extraction│
│                     │
│  Stage 1 (cheap):   │
│  "Does this contain │
│   personal info?"   │
│   → YES / NO        │
│                     │
│  Stage 2 (if YES):  │
│  Extract facts as   │
│  {key, value,       │
│   confidence} JSON  │
└─────────┬───────────┘
          │
     ┌────┴────┐
     ▼         ▼
  SQLite    ChromaDB
  (facts,   (vector
   tasks,    embeddings
   profile)  for semantic
             search)

Storage Layers

Layer	Technology	What It Stores
Facts	SQLite	Key-value facts about you (name, preferences, etc.) with confidence scoring
Messages	SQLite	Full conversation history with timestamps
User Profile	SQLite	Personal info with confidence boosting — repeated mentions increase confidence
Tasks & Reminders	SQLite	Persistent to-do items and timed reminders
Vector Search	ChromaDB	Embedded conversation chunks for semantic search — find relevant past conversations by meaning

Relevance Scoring

When JARVIS recalls memories, results are ranked using:

$$\text{score} = \text{cosine similarity} \times (1 + 0.3 \times \text{recency}) \times (1 + 0.1 \times \text{access count})$$

Where recency = $e^{-0.01 \times \text{days old}}$ (half-life ~70 days). Recent and frequently accessed memories rank higher.

Memory Consolidation

Old conversations are automatically consolidated:

Daily summaries — group messages by day, LLM summarizes key facts
Weekly/Monthly — further consolidation over time
Summaries are embedded into ChromaDB for future semantic search

Agent System

JARVIS can spawn autonomous background agents that run in separate threads. Agents range from simple command runners to fully autonomous AI-powered browser agents with their own Gemini brain.

Agent Types

Type	Description	Example
`command`	Run a single shell command	"Run a disk cleanup"
`script`	Execute a Python/batch script	"Run my backup script"
`monitor`	Periodically check something (supports JARVIS tools like screenshot)	"Monitor CPU and alert if > 90%"
`autonomous`	AI-powered agent with its own Gemini brain — sees the screen, thinks, acts, reports	"Chat with Grok about AI safety"
`research`	Multi-step web research	"Research the latest Python 3.13 features"
`tool`	Call a JARVIS tool	"Download this file in the background"
`multi_step`	Execute a sequence of steps	"Create a project folder, init git, and add a README"

Autonomous Agents

The most powerful agent type. Each autonomous agent:

Screenshots the screen using PIL
Thinks by sending the screenshot to its own Gemini 2.5 Flash API call
Acts by executing browser actions (click, type, press keys, scroll, navigate)
Reports interim updates to JARVIS after every iteration
Loops until the goal is achieved or max iterations reached

This means JARVIS can send an agent to have a full conversation with another AI (Grok, ChatGPT, Claude), fill out complex forms, or perform any multi-step browser task — all running autonomously in the background while JARVIS continues talking to you.

Agents report results back to JARVIS, which can relay them to you. Agent state is persisted in SQLite, so they survive restarts.

Self-Evolution (Skills)

JARVIS can create new tools for itself at runtime — no restart needed. This is the self-evolution system.

How It Works

Create: Tell JARVIS to create a new skill. It writes a Python module with the tool contract (TOOL_NAME, TOOL_DESC, TOOL_PARAMS, execute()).
Install from GitHub: Give JARVIS a GitHub repo URL. It clones the repo, scans for valid skill files (.py or SKILL.md), and installs them.
Markdown Skills: Supports SKILL.md files with YAML frontmatter — auto-converted into callable tools that return workflow instructions.
Persistence: Skills created mid-session are available immediately. After restart, they become first-class tools.

Example

"JARVIS, create a skill that converts temperatures between Celsius and Fahrenheit"

JARVIS writes the Python code, registers the tool, and it's immediately available for use.

Security

JARVIS includes several security measures to protect against misuse:

Protection	Description
Purchase Protection	Financial transactions require explicit user approval via `purchase_approval` tool
Command Blocklist	Dangerous shell commands (format, mass delete, registry tampering, credential theft, privilege escalation) are blocked
PowerShell Injection Prevention	All values interpolated into PowerShell commands are sanitized
File Path Validation	File operations block access to system-protected paths (System32, etc.)
Input Sanitization	App launcher blocks shell metacharacters; window manager sanitizes targets
Skill Security Checks	Skills installed from the internet are scanned for dangerous patterns (`eval`, `exec`, `shell=True`) with warnings
API Keys Local Only	All credentials stored locally in gitignored files — never committed or transmitted
No Network Exposure	JARVIS runs as a local desktop app with no open ports or web server

Project Structure

JARVIS/
├── main.py               # Entry point — GUI + Gemini Live on daemon thread
├── gemini_live.py         # Core engine — 4 async tasks over WebSocket
├── gui.py                 # Tkinter Iron Man HUD (3-column layout)
├── tool_registry.py       # 30+ tool declarations + dispatcher
├── setup.py               # One-command installer
├── requirements.txt       # Python dependencies
│
├── agent/
│   ├── agent_manager.py   # AgentManager — spawn/manage background + autonomous agents
│   └── personality.py     # JARVIS personality prompt (Paul Bettany style)
│
├── memory/
│   ├── long_term.py       # Long-term memory API
│   ├── manager.py         # Memory manager (2-stage fact extraction + consolidation)
│   ├── sqlite_store.py    # SQLite storage (facts, messages, profile, tasks, reminders)
│   └── vector_store.py    # ChromaDB vector search (cosine + recency + access boosting)
│
├── tools/
│   ├── browser_control.py # Browser automation (real Chrome/Edge via keyboard/mouse + AI vision)
│   ├── computer_control.py# Mouse & keyboard control (pyautogui)
│   └── system_monitor.py  # System stats (psutil)
│
├── skills/                # Self-evolution skill framework
│   ├── base.py            # Skill contract definition
│   ├── loader.py          # Dynamic skill loader (importlib)
│   └── manager.py         # Skill create/install/list/run/remove operations
│
├── config/                # API keys (gitignored)
├── data/                  # SQLite + ChromaDB data (gitignored)
└── log/                   # Daily logs with 7-day rotation (gitignored)

Configuration

API Key

On first launch, JARVIS prompts for your Gemini API key through the GUI. The key is stored locally in config/api_keys.json (gitignored — never committed).

You can get a free Gemini API key at ai.google.dev.

Requirements

Requirement	Details
Python	3.10 or higher
OS	Windows (primary target — macOS/Linux may need tweaks for some tools)
API Key	Gemini API key (free tier available at ai.google.dev)
Hardware	Microphone + Speakers (or headset)

Python Dependencies

All installed automatically by python setup.py:

google-genai    pyaudio       pyautogui     pynput        keyboard
mouse           chromadb      numpy         loguru
psutil          httpx         Pillow        python-bidi

Contributing

Contributions are welcome! Here's how to get started:

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Make your changes
Test locally with python main.py
Commit and push (git push origin feature/my-feature)
Open a Pull Request

Ideas for Contributions

New tools (Spotify control, calendar integration, email, etc.)
macOS / Linux compatibility fixes
New GUI themes or layouts
Skills framework plugins
Test coverage for the current architecture

Sponsor

If you find JARVIS useful, consider sponsoring the project to support development:

Your support helps keep JARVIS actively maintained and growing with new features.

License

MIT — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.github		.github
agent		agent
memory		memory
skills		skills
tests		tests
tools		tools
.gitignore		.gitignore
README.md		README.md
gemini_live.py		gemini_live.py
gui.py		gui.py
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
tool_registry.py		tool_registry.py

Folders and files

Latest commit

History

Repository files navigation

🤖 J.A.R.V.I.S

Table of Contents

What is JARVIS?

Key Features

Quick Start

1. Clone the repository

2. Run setup (installs all dependencies + Playwright browsers)

3. Launch JARVIS

How It Works

Step by Step

Threading Model

The Iron Man HUD

Tools (30+)

Computer & Apps (6)

Web & Information (4)

Productivity (6)

Files & Code (3)

Media & Utilities (3)

System (1)

Agent Management (6)

Self-Evolution (2)

Memory System

How Memory Works

Storage Layers

Relevance Scoring

Memory Consolidation

Agent System

Agent Types

Autonomous Agents

Self-Evolution (Skills)

How It Works

Example

Security

Project Structure

Configuration

API Key

Requirements

Python Dependencies

Contributing

Ideas for Contributions

Sponsor

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages