Gemini Live Agent Challenge Hackathon Submission
Built with Next.js 15, Gemini Live API, React Three Fiber, and Leaflet
JARVIS is a real-time multimodal AI agent that transforms voice + video input into live visual prototypes. Speak naturally and watch as Jarvis builds 3D hardware models, generates web applications, plots navigation routes on interactive maps, or teaches you about anything visible on your screen β all in real-time using the Gemini Live API.
- π€ Voice-first interaction with real-time streaming via Gemini Live API
- πΉ Multimodal input: Audio-only, Video-only, or Both simultaneously
- π₯οΈ Screen sharing: Share your screen and Jarvis explains what it sees
- π§ 3D Hardware Prototypes: Build cars, planes, gadgets with Three.js
- π» Software Prototypes: Generate web apps rendered live in a sandbox
- ποΈ Conversational App Scaffolding: Jarvis can actually build the real software project locally (Next.js, Vite React, Vue) after prototyping it for you!
- πΊοΈ Navigation Maps: Plot routes with markers on interactive Leaflet maps
- π Visual Teacher: Screen-share mode for learning anything on screen
- β‘ Barge-in Interruptions: Interrupt naturally like a real conversation
- π¨ Stunning UI: Red-black-white theme with smooth animations
- Node.js 18+ installed
- A free Google Gemini API key
- Visit Google AI Studio
- Click "Create API Key"
- Copy your key
git clone https://github.com/your-username/jarvis.git
cd jarvis
npm installCreate a .env.local file:
GEMINI_API_KEY=your_api_key_here
npm run devOpen http://localhost:3000 in Chrome.
| Mode | Description |
|---|---|
| π€ Audio | Voice-only interaction |
| πΉ Video | Webcam/Screen share only |
| π€πΉ Both | Voice + visual input |
| Source | Description |
|---|---|
| π· Webcam | Camera feed for showing physical objects |
| π₯οΈ Screen Share | Share your screen for Visual Teacher mode |
- "Build a red sports car with spinning wheels"
- "Turn this into a flying airplane with wings"
- "Create an interactive dashboard web app for a fitness tracker"
- "Navigate from Ibadan to Lagos airport"
- "Share my screen and explain this webpage"
- "Teach me what this code does"
- "Explain this math diagram I'm looking at"
Once Jarvis generates a Software Prototype for you, it will verbally ask if you want to build it as a real project.
- Answer "Yes, let's use Next.js and call it my-new-app."
- Jarvis will execute the CLI scaffolding tool behind the scenes!
- Check the parent directory of your
jarvis/folder; you will find your newly generated Next.js, Vite React, or Vue application ready to go.
The Visual Teacher mode is a unique feature that lets Jarvis teach you about anything visible on your screen:
- Switch to Screen Share mode using the input source toggle
- Select "Both" input mode so you can speak AND share your screen
- Click Connect to Jarvis
- Grant screen sharing permission when prompted
- Navigate to any content (code, docs, diagrams, charts, textbooks)
- Ask Jarvis to explain what it sees:
- "What is this code doing?"
- "Explain this diagram to me"
- "Break down the concepts on this page"
- "Help me understand this error message"
Jarvis will provide structured explanations with key points, maintaining full context for follow-up questions.
jarvis/
βββ public/
β βββ audio/
β βββ recorder-processor.js # Mic AudioWorklet
β βββ player-processor.js # Playback AudioWorklet
βββ src/
β βββ app/
β β βββ globals.css # Theme & Leaflet CSS
β β βββ layout.tsx # Root layout
β β βββ page.tsx # Main page
β βββ components/
β β βββ engines/
β β β βββ HardwareEngine.tsx # 3D (React Three Fiber)
β β β βββ SoftwareEngine.tsx # Web app (iframe)
β β β βββ NavigationEngine.tsx # Maps (Leaflet)
β β β βββ TeachingEngine.tsx # Visual Teacher
β β βββ PrototypeViewer.tsx # Engine switcher
β β βββ Sidebar.tsx # Controls & transcript
β β βββ StatusIndicator.tsx # Connection status
β β βββ TranscriptPanel.tsx # Chat history
β β βββ WaveformVisualizer.tsx # Audio visualizer
β βββ hooks/
β β βββ useGeminiLive.ts # Gemini Live API hook
β βββ lib/
β βββ audioPlayer.ts # Audio playback class
β βββ audioRecorder.ts # Audio recording class
β βββ types.ts # TypeScript types
βββ Dockerfile # Cloud Run container
βββ cloud-run.yaml # Cloud Run config
βββ firebase.json # Firebase Hosting
βββ LICENSE # MIT License
βββ README.md # This file
Proof of Deployment: Watch Video (hosted in repo)
# Build the container
docker build -t gcr.io/YOUR_PROJECT/jarvis \
--build-arg GEMINI_API_KEY=your_key .
# Push to Container Registry
docker push gcr.io/YOUR_PROJECT/jarvis
# Deploy to Cloud Run
gcloud run deploy jarvis \
--image gcr.io/YOUR_PROJECT/jarvis \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--memory 512Mi \
--cpu 1npm run build
npx -y firebase-tools deploy --only hosting| Technology | Purpose |
|---|---|
| Next.js 15 | App framework (App Router) |
| TypeScript | Type safety |
| Tailwind CSS | Styling |
| @google/genai | Gemini Live API SDK |
| React Three Fiber | 3D rendering |
| Three.js | 3D engine |
| @react-three/drei | R3F helpers |
| Leaflet | Interactive maps |
| react-leaflet | React bindings for Leaflet |
This project is open source under the MIT License.
Built for the Gemini Live Agent Challenge hackathon.