CallMind — Video Intelligence with Q-Learning

3-channel multimodal video analysis. Insights flow into vector memory. Q-learning ranks what closes deals.

Built at the Multimodal Frontier Hackathon (YC x Google DeepMind, March 2026).

The Problem

Sales teams lose critical context between calls. Hours of video recordings pile up unwatched. The intelligence trapped in body language, tone shifts, and verbal cues never reaches the next call prep.

The Solution

CallMind analyzes sales call recordings through 3 parallel AI channels, extracts structured insights, and stores them in a vector knowledge base with Q-learning feedback.

The key: insights get smarter over time. Record deal outcomes, and Q-learning updates the value of every insight. Over multiple deals, the system learns which signals actually predict success.

Architecture

┌─────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Video/Audio │────▶│  CallMind Engine  │────▶│   API / MCP     │
│  (Upload/URL)│     │                  │     │                 │
└─────────────┘     │  Channel 1: Text  │     │  REST + Unkey   │
                    │  Channel 2: Visual│     │  Connect to any │
                    │  Channel 3: Fusion│     │  AI agent / CRM │
                    │                  │     └─────────────────┘
                    │  Qdrant + Q-Learn │
                    └──────────────────┘
                             ▲
                             │
                    Deal Outcomes (reward)

CallMind is one service/block in a larger AI stack. Qdrant with Q-learning sits at the core. CRM, email, documents — all connect through API or MCP. Claude Code (or any AI agent) orchestrates.

3-Channel Parallel Analysis

Channel	What it does	Model
Transcript	Full speech-to-text, speaker ID, structured insight extraction	Gemini 2.5 Flash
Visual	Body language, facial expressions, engagement, micro-expressions	Gemini 2.5 Flash
Fusion	Cross-modal correlations (e.g., "said yes but leaned back")	Gemini 2.5 Flash

All 3 channels run in parallel on the same uploaded video via Gemini's native multimodal capabilities.

Q-Learning

Every insight gets a Q-value. When you record deal outcomes:

Q_new = clamp(Q_old + 0.25 × reward, -0.5, 1.0)

Deal won → +1.0 reward → insights move up
Deal lost → -1.0 reward → insights move down
Progressed → +0.5 → moderate boost

Over time, high-Q insights float to the top of your pre-call briefing. The system learns what matters empirically.

Sponsor Integration

Gemini (Google DeepMind)

Gemini 2.5 Flash multimodal — video uploaded via Files API, then queried 3× in parallel
Native video understanding (not frame extraction) — sees motion, gestures, expressions
Structured JSON output for reliable insight extraction

Unkey

API key management — users register via web UI, get API keys instantly
Key verification on every API request (v2 REST API)
Enables programmatic access: upload videos, fetch insights, integrate with any tool

DigitalOcean

Production deployment target (Qdrant + FastAPI app)

Tech Stack

Component	Technology
Video Understanding	Gemini 2.5 Flash (3-channel multimodal)
Embeddings	FastEmbed (BAAI/bge-small-en-v1.5, 384-dim, local)
Vector Store	Qdrant (hybrid: 60% vector + 40% Q-value)
Q-Learning	Custom (alpha=0.25, range [-0.5, 1.0])
API Auth	Unkey (v2 REST API)
Web Framework	FastAPI + Jinja2
Video Download	yt-dlp
Language	Python 3.11+

Quick Start

# 1. Clone and install
git clone https://github.com/anthroos/callmind.git
cd callmind
python -m venv .venv && source .venv/bin/activate
pip install -e .

# 2. Start Qdrant
docker run -p 6333:6333 qdrant/qdrant

# 3. Configure
cp .env.example .env
# Edit .env with your GEMINI_API_KEY, UNKEY_API_ID, UNKEY_ROOT_KEY

# 4. Run
python -m callmind.app
# → http://localhost:8000

API Usage

# Register for an API key (or use the web UI)
curl -X POST http://localhost:8000/register \
  -d "username=your_name"

# Upload a video
curl -X POST http://localhost:8000/api/upload \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "client_name=Acme Corp" \
  -F "video_file=@recording.mp4"

# Get insights (ranked by Q-value)
curl http://localhost:8000/api/client/acme_corp/insights \
  -H "Authorization: Bearer YOUR_API_KEY"

Built On OpenExp

OpenExp — open-source Q-learning memory for AI agents. Same core: Qdrant + FastEmbed + Q-values. CallMind extends it with multimodal video intelligence.

License

MIT

Built by Ivan Pasichnyk at the Multimodal Frontier Hackathon (YC x Google DeepMind, March 2026).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
callmind		callmind
skill		skill
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
SUBMISSION.md		SUBMISSION.md
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CallMind — Video Intelligence with Q-Learning

The Problem

The Solution

Architecture

3-Channel Parallel Analysis

Q-Learning

Sponsor Integration

Gemini (Google DeepMind)

Unkey

DigitalOcean

Tech Stack

Quick Start

API Usage

Built On OpenExp

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CallMind — Video Intelligence with Q-Learning

The Problem

The Solution

Architecture

3-Channel Parallel Analysis

Q-Learning

Sponsor Integration

Gemini (Google DeepMind)

Unkey

DigitalOcean

Tech Stack

Quick Start

API Usage

Built On OpenExp

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages