🎬 vision-agent-hack: Joey Tribbiani - Full-Stack Frontend Expert

A fun and engaging multimodal vision agent powered by Joey Tribbiani's persona! This agent brings charm, humor, and technical expertise to web development by analyzing Figma designs and guiding users through full-stack frontend implementation step-by-step.

✨ Features

Vision-Powered Design Analysis: Analyzes Figma designs and interprets visual layouts using Google's Gemini Vision AI
Interactive Voice Communication: Speaks with engaging personality via ElevenLabs text-to-speech
Speech Recognition: Understands user input via Deepgram speech-to-text
Real-time Streaming Voice Calls: Powered by GetStream for seamless multimodal interaction
Built-in MCP Tooling: Local Filesystem + Fetch MCP servers for file operations and URL/doc retrieval
Knowledge-Aware Responses: Supports Gemini File Search over local markdown docs in knowledge/
Optional GitHub MCP Integration: Automatically enabled when GITHUB_PAT is set
Custom LLM Helper Functions: Includes timestamping, React boilerplate generation, package suggestions, and safe workspace command execution
Step-by-Step Guidance: Breaks down web development projects into manageable, progressively-built features
Joey's Personality: Enthusiastic, charming communication style that keeps users engaged and motivated

🛠️ Technical Stack

Vision Language Model: Google Gemini Vision AI
Speech-to-Text: Deepgram
Text-to-Speech: ElevenLabs
Real-time Communication: GetStream
MCP: @modelcontextprotocol/server-filesystem, @modelcontextprotocol/server-fetch
ML/Transformers: HuggingFace Transformers, NVIDIA/Ultralytics support
Core: Python 3.12+, Vision Agents Framework

📋 Prerequisites

Python 3.12 or higher
Node.js 18+ (required for MCP servers launched through npx)
API keys for:
- Google Gemini Vision
- ElevenLabs
- Deepgram
- Stream / GetStream

Optional:

GitHub Personal Access Token (GITHUB_PAT) to enable remote GitHub MCP

🚀 Installation

Clone the repository:

git clone <repository-url>
cd vision-agent-hack

Create a Python virtual environment:

python -m venv venv
# Windows (PowerShell)
venv\Scripts\Activate.ps1
# Windows (cmd)
venv\Scripts\activate.bat
# macOS/Linux
source venv/bin/activate

Install dependencies:

pip install -e .

Or install directly:

pip install -r requirements.txt

⚡ Quickstart

# 1) copy env template
cp .env.example .env

# 2) fill in API keys inside .env

# 3) run agent
python full-stack-joey.py

🔑 Configuration

Create a .env file in the project root with your API keys:

GOOGLE_API_KEY=your_gemini_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret
GITHUB_PAT=your_github_pat_optional

Note: If your local setup expects GETSTREAM_API_KEY / GETSTREAM_API_SECRET, keep both pairs in .env.

Workspace behavior

On Windows, Joey uses C:/joey-workspace
On macOS/Linux, Joey uses /tmp/joey-workspace
MCP launcher scripts (_mcp_*.py) and generated component boilerplates are written there

📚 Knowledge & Gemini Search

Joey supports a local knowledge base backed by Gemini File Search.

Where knowledge lives

Add markdown files under knowledge/ (for example: knowledge/nextjs.md, knowledge/shadcn.md)
The loader scans *.md files from that folder

How Gemini search works in this project

create_rag_from_directory() builds a Gemini file-search store from knowledge/
The store is attached to Gemini VLM via gemini.tools.FileSearch(...)
During conversations, Joey can retrieve relevant snippets from indexed docs and ground responses in project knowledge

Important note

If knowledge/ is missing, startup logs a warning and no local knowledge is indexed
To enable retrieval, ensure RAG initialization runs before agent creation in your startup flow

💬 Joey's Expertise

Joey specializes in:

Frontend Frameworks

React.js - Functional components, hooks, state management, performance optimization
Next.js - Full-stack development, SSR, SSG, API routes
TypeScript - Type-safe, maintainable code with strict checking
Tailwind CSS - Utility-first responsive design and styling

Development Approach

Analyzes Figma designs with enthusiasm
Breaks projects into logical, manageable steps
Builds incrementally (html → styling → interactivity)
Follows best practices: semantic HTML, responsive design, accessibility
Maintains Joey's engaging personality throughout!

Key Phrases

"How you doin'?"
"Could I BE any more excited about this code?"
"Oh my God!"
"That's so...not good!"
"Yeah, baby!"

📞 Usage

Run the agent:

python full-stack-joey.py

The agent will:

Initialize a real-time multimodal connection
Wait for users to join the call
Analyze uploaded Figma designs
Guide through frontend implementation with Joey's personality
Build features step-by-step with explanations

📁 Project Structure

.
├── full-stack-joey.py      # Main agent implementation
├── knowledge/              # Local knowledge base markdown docs for Gemini File Search
│   ├── nextjs.md
│   └── shadcn.md
├── main.py                 # Entry point
├── pyproject.toml          # Project configuration and dependencies
├── requirements.txt        # Direct dependencies
├── README.md              # This file
└── .env                   # Configuration (not in repo)

🎯 How It Works

Knowledge Indexing (Optional): Builds Gemini file-search store from knowledge/*.md
Agent Creation: Initializes Full-Stack Joey with Gemini VLM (max_output_tokens=3000) and FileSearch tool, plus ElevenLabs TTS and Deepgram STT
MCP Server Bootstrapping: Generates Python launcher scripts and starts local Filesystem + Fetch MCP servers
Remote MCP (Optional): Adds GitHub MCP automatically when GITHUB_PAT is present
Function Registration: Registers custom helpers (get_timestamp, generate_component_boilerplate, suggest_packages, run_workspace_command)
Call Lifecycle: Joins a GetStream call and handles participant-join events before finishing the session

🧠 Custom Helper Functions

Joey registers domain-specific functions directly on the VLM:

get_timestamp() — returns current datetime string
generate_component_boilerplate(component_name, props_json, use_typescript) — writes a React/Next component file to workspace
suggest_packages(use_case) — recommends npm packages by use-case category
run_workspace_command(command) — runs bounded shell commands in Joey workspace with timeout and captured output

🧯 Troubleshooting

npx not found: install Node.js 18+ and reopen terminal.
MCP server startup fails: ensure C:/joey-workspace is writable on Windows.
Missing credentials errors: verify keys in .env and restart the process.
Voice/call connection issues: confirm Stream/GetStream API key + secret are valid for your app.
Knowledge search returns nothing: verify markdown docs exist under knowledge/ and RAG initialization is executed before agent startup.

📦 Dependencies

python-dotenv>=1.2.1 - Environment variable management
transformers>=4.57.6 - ML transformers for enhanced capabilities
vision-agents[deepgram,elevenlabs,gemini,getstream,huggingface,nvidia,openai,ultralytics]>=0.3.8 - Core vision agent framework with all plugins

👥 Meet the Team: status200


Caleb Chandrasekar @calebjubal	S.Tharundhatri @Tharun-10Dragneel	Rishav @Rishav23av	Kushagra Chandok @mengyokyu

Made with ❤️ by bringing Joey Tribbiani to the web development world!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 vision-agent-hack: Joey Tribbiani - Full-Stack Frontend Expert

✨ Features

🛠️ Technical Stack

📋 Prerequisites

🚀 Installation

⚡ Quickstart

🔑 Configuration

Workspace behavior

📚 Knowledge & Gemini Search

Where knowledge lives

How Gemini search works in this project

Important note

💬 Joey's Expertise

Frontend Frameworks

Development Approach

Key Phrases

📞 Usage

📁 Project Structure

🎯 How It Works

🧠 Custom Helper Functions

🧯 Troubleshooting

📦 Dependencies

👥 Meet the Team: status200

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
knowledge		knowledge
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
full-stack-joey.py		full-stack-joey.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🎬 vision-agent-hack: Joey Tribbiani - Full-Stack Frontend Expert

✨ Features

🛠️ Technical Stack

📋 Prerequisites

🚀 Installation

⚡ Quickstart

🔑 Configuration

Workspace behavior

📚 Knowledge & Gemini Search

Where knowledge lives

How Gemini search works in this project

Important note

💬 Joey's Expertise

Frontend Frameworks

Development Approach

Key Phrases

📞 Usage

📁 Project Structure

🎯 How It Works

🧠 Custom Helper Functions

🧯 Troubleshooting

📦 Dependencies

👥 Meet the Team: status200

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages