LocalMind

A private AI research agent that runs entirely inside your browser. Tool calling, persistent memory, web search, multimodal input — all on-device via WebGPU. No server, no API keys required, no data leaving your device.

Try it live

What it does

LocalMind runs Google's Gemma models directly in your browser tab using WebGPU. Models download once, are cached locally, and run offline from that point on. Your conversations, reasoning, and memories never leave your device. Only web search queries touch the network — and only when you explicitly choose to.

Three models, your choice:

Model	Size	Capabilities	Best for
Gemma 3 1B (default)	~760 MB	Text chat	Quick everyday chat
Gemma 4 E2B	~1.5 GB	Text + image + audio + agent	Multimodal on any device
Gemma 4 E4B	~4.9 GB	Text + image + audio + agent	Best quality, needs more VRAM

Agent tools (Gemma 4 models)

Gemma 4 models have built-in tool calling. The model decides when to use tools based on your question.

Tool	What it does
calculate	Arithmetic, percentages, unit conversions
get_current_time	Date and time with timezone support
store_memory	Save facts to persistent memory (IndexedDB + embeddings)
search_memory	Semantic search over stored memories
list_memories	Show what's stored, grouped by category
delete_memory	Forget specific memories by query
set_reminder	Browser notification after N minutes
web_search	Search the web via Brave, Tavily, or SearXNG (BYOK)
fetch_page	Fetch and read a URL's content with Readability.js extraction

Translation: Gemma 4 supports 140+ languages natively — just ask it to translate. No separate model needed.

Persistent memory (RAG)

LocalMind remembers across sessions. Powered by a local RAG pipeline:

MiniLM embeddings (~23 MB, runs on CPU alongside the main model)
IndexedDB vector store with cosine similarity search
Document upload — PDF, DOCX, .txt, .md, .json, .csv — text extracted, chunked, and stored as searchable knowledge
Folder ingestion — click "Folder" to open a local directory via the File System Access API; all .md, .txt, .pdf, .docx files are recursively ingested. Re-open the same folder to sync only changed files (fingerprint-based, size + last-modified)
Auto-summarize on upload — documents are summarized on ingestion for quick retrieval
Post-session summarization — conversations are summarized and stored when you start a new chat
Memory browser — click "Memory" to open the browser panel:
- Filter by category (fact, preference, finding, document, doc summary, conversation) with live chunk counts per pill
- Document chunks grouped by source file with a bulk "Delete all per source" button — essential after folder ingestion
- Relative timestamps ("2h ago", "3d ago") and coloured category badges
Memory audit — "Audit" button in the memory panel flags three issue types:
- Stale — chunks older than 60 days
- Near-duplicate — pairs with cosine similarity ≥ 0.92 within the same category (keeps one, flags the other)
- Outlier — chunks whose average similarity to category peers is < 0.20 (requires ≥ 5 members in category)
- Each group has a "Delete all" button; individual deletes rerun the audit automatically; green pass when nothing is flagged
Export / Import — download all data (memories, conversations, profile) as JSON, or import from a previous export
Auto-backup — optional setting to auto-download a backup on every New Chat

Every search result and fetched page is cached in the RAG index. The context window stays fixed. The accessible knowledge grows without limit.

Conversation history

New Chat — archives the current conversation to History, then starts fresh
Clear — deletes the current conversation without saving
History sidebar — slides in from the left showing past conversations sorted by date. Click any to resume, or delete individual entries.

Conversations are automatically summarized and embedded into the RAG index when archived, so the model can recall past discussions.

Web search (BYOK)

Open Settings, pick a provider, enter your API key. The key stays in your browser's localStorage and is sent directly from your device to the search provider.

Provider	Free tier	Best for
Tavily	1,000 credits/month, no card	Lowest barrier, AI-optimized results
Brave Search	$5/month credit	Privacy-first, independent index
SearXNG	Free (self-hosted)	Maximum privacy, no corporate entity

Two send buttons: Send (offline, no network) and Search+Send (globe icon, web-enriched). The globe button only appears when a provider is configured.

Every response shows a transparency badge: On-device (pure local), Agent (tools used), or Web-enriched (search results with clickable source links).

Multimodal input (Gemma 4 models)

Attach — images, audio, MP4 video, or documents (PDF, DOCX, .txt, .md, .json, .csv)
Camera — snap a photo with your webcam
Mic — record a voice clip
Paste — Ctrl/Cmd+V an image from clipboard
Drag and drop — drop files onto the chat

Documents are extracted, chunked, embedded, and auto-summarized on upload. Video is experimental — keyframes and audio are extracted separately.

Batch prompts

Click Batch in the toolbar to open the batch panel. Enter one prompt per line and click Run — each prompt is sent sequentially through the full agent loop (including tool calls and web search if configured), with results appearing in the main chat as normal messages.

Chaining — two modes, combinable:

Mode	How
Explicit `{{previous}}`	Write `{{previous}}` anywhere in a prompt — it's substituted with the full text of the previous response before sending
Auto-inject	Checkbox (on by default) — if a prompt has no `{{previous}}`, the previous response is appended as `[Previous response for context: …]` automatically

Stop halts the run after the current generation completes (never mid-stream). Progress is shown live (2 / 5).

Example pipeline:

Summarise the history of the Suez Canal
Now extract the 5 most important dates from this: {{previous}}
Translate that list to Hindi: {{previous}}

Sharing conversations

Click Share in the toolbar to generate a shareable link for the current conversation:

Plain link — conversation is base64-encoded into the URL fragment (#lm:…). No server involved.
Encrypted link — AES-256-GCM with PBKDF2 key derivation (200k rounds). Encoded as #lme:<salt>.<iv>.<ciphertext>. Only someone with the passphrase can read it.

The recipient opens the URL, sees an import banner, and clicks "Load conversation" (entering the passphrase if encrypted). No account, no server, no data in transit beyond the URL itself.

Image and audio attachments are stripped — text content only.

Output artifacts

Save as Markdown — download any assistant response as a .md file. If a folder is open via Folder ingestion, the file is written directly into that folder instead of downloading.
Code download — hover over code blocks for a download button (saves with correct extension)
Model cache management — view cached model sizes and clear cache in Settings

Things to try

Math & Conversions

"What is 15% of 2450?"
"Convert 72 Fahrenheit to Celsius"
"If I invest $10,000 at 7% annual return, how much after 5 years with compound interest?"

Time & Reminders

"What time is it in Tokyo?"
"Remind me in 5 minutes to check the oven"

Memory

"Remember that I'm a software engineer working on a React project called Dashboard Pro"
"What do you know about me and my projects?"
"Forget everything about my preferences"

Translation (140+ languages)

"Translate 'Good morning, how are you?' to Japanese, French, and Hindi"
"How do you say 'Where is the nearest train station?' in Spanish and German?"

Writing & Analysis

"Write a professional email declining a meeting invitation politely"
"Summarize the pros and cons of microservices vs monolithic architecture"
"Explain the concept of WebGPU to a non-technical person in 3 sentences"

Documents (attach a PDF, DOCX, or text file)

"Summarize the key points from the document I just uploaded"
"What are the main conclusions or recommendations in my document?"

Multimodal (attach an image first)

"Describe this image in detail"
"What text can you see in this image? Transcribe it."

Web Research (requires API key in Settings)

"What are the top tech news stories today?"
"Search for the latest WebGPU browser support status and summarize"
"Find recent articles about AI running locally in the browser and give me a summary with sources"

Coding

"Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes"
"Explain the difference between async/await and Promises in JavaScript with examples"

Context engineering

The raw context window is 12-16K tokens depending on model. The effective knowledge is unlimited:

Sliding window — last 3 turn-pairs verbatim, older turns compressed to rolling summary
RAG retrieval — top relevant memories auto-injected into system prompt
Semantic pre-filtering — fetched web pages split into paragraphs, embedded, ranked by relevance to your query
Multi-hop reasoning — agentic loop chains up to 3 tool calls per message

How to run

Serve from any static host — GitHub Pages, Netlify, or a local server:

python3 -m http.server 8080
# open http://localhost:8080

No build step. No dependencies. No backend.

Note: Requires a browser with WebGPU support (Chrome 113+, Edge 113+, Firefox 130+). Will not work from file:// — needs an HTTP server.

Tech

Transformers.js v4 — runs Hugging Face models in the browser via WebGPU
Gemma 3 1B — text-only, q4f16 quantized
Gemma 4 E2B — multimodal, 2.3B effective params, q4f16
Gemma 4 E4B — multimodal, 4.5B effective params, q4f16
MiniLM — 384-dim embeddings for RAG (~23 MB, WASM)
Readability.js — article extraction from fetched pages (lazy-loaded)
PDF.js — PDF text extraction (lazy-loaded on first PDF upload)
mammoth.js — DOCX text extraction (lazy-loaded on first DOCX upload)
Web Workers for off-main-thread inference (LLM on WebGPU, embeddings on WASM)
IndexedDB for persistent vector store + user profile
Zero build tooling. One HTML file.

Browser support

Browser	Status
Chrome 113+	Supported
Edge 113+	Supported
Firefox 130+	Supported
Safari	Not yet (WebGPU incomplete)

Part of the NakliTechie series

A collection of browser-native tools that run entirely on your device.

Project	Description
BabelLocal	Universal translator — 55 languages, NLLB model
VoiceVault	Audio transcription — Whisper, offline-first
SnipLocal	Background remover — RMBG-1.4, passport mode
StripLocal	EXIF metadata stripper — drag-and-drop
GambitLocal	Chess vs Stockfish — correspondence mode
KingMe	English draughts vs minimax AI
KoLocal	Go (Baduk) vs MCTS AI
PredictionMarket	Educational prediction market simulator
LocalMind	Private AI agent — Gemma, multimodal, WebGPU

Built by Chirag Patnaik

Built with Claude Code.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LocalMind

What it does

Agent tools (Gemma 4 models)

Persistent memory (RAG)

Conversation history

Web search (BYOK)

Multimodal input (Gemma 4 models)

Batch prompts

Sharing conversations

Output artifacts

Things to try

Context engineering

How to run

Tech

Browser support

Part of the NakliTechie series

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LocalMind

What it does

Agent tools (Gemma 4 models)

Persistent memory (RAG)

Conversation history

Web search (BYOK)

Multimodal input (Gemma 4 models)

Batch prompts

Sharing conversations

Output artifacts

Things to try

Context engineering

How to run

Tech

Browser support

Part of the NakliTechie series

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages