Skip to content

Okediya/Jarvis

Repository files navigation

πŸ€– JARVIS β€” Multimodal Prototype Builder & Visual Teacher

Gemini Live Agent Challenge Hackathon Submission
Built with Next.js 15, Gemini Live API, React Three Fiber, and Leaflet

License: MIT Next.js Gemini


🌟 Overview

JARVIS is a real-time multimodal AI agent that transforms voice + video input into live visual prototypes. Speak naturally and watch as Jarvis builds 3D hardware models, generates web applications, plots navigation routes on interactive maps, or teaches you about anything visible on your screen β€” all in real-time using the Gemini Live API.

✨ Key Features

  • 🎀 Voice-first interaction with real-time streaming via Gemini Live API
  • πŸ“Ή Multimodal input: Audio-only, Video-only, or Both simultaneously
  • πŸ–₯️ Screen sharing: Share your screen and Jarvis explains what it sees
  • πŸ”§ 3D Hardware Prototypes: Build cars, planes, gadgets with Three.js
  • πŸ’» Software Prototypes: Generate web apps rendered live in a sandbox
  • πŸ—οΈ Conversational App Scaffolding: Jarvis can actually build the real software project locally (Next.js, Vite React, Vue) after prototyping it for you!
  • πŸ—ΊοΈ Navigation Maps: Plot routes with markers on interactive Leaflet maps
  • πŸ“– Visual Teacher: Screen-share mode for learning anything on screen
  • ⚑ Barge-in Interruptions: Interrupt naturally like a real conversation
  • 🎨 Stunning UI: Red-black-white theme with smooth animations

πŸš€ Quick Start (Under 5 Minutes)

Prerequisites

  • Node.js 18+ installed
  • A free Google Gemini API key

1. Get a Free Gemini API Key

  1. Visit Google AI Studio
  2. Click "Create API Key"
  3. Copy your key

2. Clone & Setup

git clone https://github.com/your-username/jarvis.git
cd jarvis
npm install

3. Configure API Key

Create a .env.local file:

GEMINI_API_KEY=your_api_key_here

4. Run Locally

npm run dev

Open http://localhost:3000 in Chrome.


🎯 How to Use

Input Modes

Mode Description
🎀 Audio Voice-only interaction
πŸ“Ή Video Webcam/Screen share only
πŸŽ€πŸ“Ή Both Voice + visual input

Input Sources

Source Description
πŸ“· Webcam Camera feed for showing physical objects
πŸ–₯️ Screen Share Share your screen for Visual Teacher mode

Try These Prompts

  • "Build a red sports car with spinning wheels"
  • "Turn this into a flying airplane with wings"
  • "Create an interactive dashboard web app for a fitness tracker"
  • "Navigate from Ibadan to Lagos airport"
  • "Share my screen and explain this webpage"
  • "Teach me what this code does"
  • "Explain this math diagram I'm looking at"

πŸ—οΈ How to Scaffold Real Projects

Once Jarvis generates a Software Prototype for you, it will verbally ask if you want to build it as a real project.

  1. Answer "Yes, let's use Next.js and call it my-new-app."
  2. Jarvis will execute the CLI scaffolding tool behind the scenes!
  3. Check the parent directory of your jarvis/ folder; you will find your newly generated Next.js, Vite React, or Vue application ready to go.

πŸ“– Visual Teacher Usage Guide

The Visual Teacher mode is a unique feature that lets Jarvis teach you about anything visible on your screen:

  1. Switch to Screen Share mode using the input source toggle
  2. Select "Both" input mode so you can speak AND share your screen
  3. Click Connect to Jarvis
  4. Grant screen sharing permission when prompted
  5. Navigate to any content (code, docs, diagrams, charts, textbooks)
  6. Ask Jarvis to explain what it sees:
    • "What is this code doing?"
    • "Explain this diagram to me"
    • "Break down the concepts on this page"
    • "Help me understand this error message"

Jarvis will provide structured explanations with key points, maintaining full context for follow-up questions.


πŸ—οΈ Architecture

jarvis/
β”œβ”€β”€ public/
β”‚   └── audio/
β”‚       β”œβ”€β”€ recorder-processor.js    # Mic AudioWorklet
β”‚       └── player-processor.js      # Playback AudioWorklet
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ globals.css              # Theme & Leaflet CSS
β”‚   β”‚   β”œβ”€β”€ layout.tsx               # Root layout
β”‚   β”‚   └── page.tsx                 # Main page
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ engines/
β”‚   β”‚   β”‚   β”œβ”€β”€ HardwareEngine.tsx   # 3D (React Three Fiber)
β”‚   β”‚   β”‚   β”œβ”€β”€ SoftwareEngine.tsx   # Web app (iframe)
β”‚   β”‚   β”‚   β”œβ”€β”€ NavigationEngine.tsx # Maps (Leaflet)
β”‚   β”‚   β”‚   └── TeachingEngine.tsx   # Visual Teacher
β”‚   β”‚   β”œβ”€β”€ PrototypeViewer.tsx      # Engine switcher
β”‚   β”‚   β”œβ”€β”€ Sidebar.tsx              # Controls & transcript
β”‚   β”‚   β”œβ”€β”€ StatusIndicator.tsx      # Connection status
β”‚   β”‚   β”œβ”€β”€ TranscriptPanel.tsx      # Chat history
β”‚   β”‚   └── WaveformVisualizer.tsx   # Audio visualizer
β”‚   β”œβ”€β”€ hooks/
β”‚   β”‚   └── useGeminiLive.ts         # Gemini Live API hook
β”‚   └── lib/
β”‚       β”œβ”€β”€ audioPlayer.ts           # Audio playback class
β”‚       β”œβ”€β”€ audioRecorder.ts         # Audio recording class
β”‚       └── types.ts                 # TypeScript types
β”œβ”€β”€ Dockerfile                       # Cloud Run container
β”œβ”€β”€ cloud-run.yaml                   # Cloud Run config
β”œβ”€β”€ firebase.json                    # Firebase Hosting
β”œβ”€β”€ LICENSE                          # MIT License
└── README.md                        # This file

☁️ Deployment

Google Cloud Run (Recommended β€” Free Tier)

Proof of Deployment: Watch Video (hosted in repo)

# Build the container
docker build -t gcr.io/YOUR_PROJECT/jarvis \
  --build-arg GEMINI_API_KEY=your_key .

# Push to Container Registry
docker push gcr.io/YOUR_PROJECT/jarvis

# Deploy to Cloud Run
gcloud run deploy jarvis \
  --image gcr.io/YOUR_PROJECT/jarvis \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --memory 512Mi \
  --cpu 1

Firebase Hosting (Alternative)

npm run build
npx -y firebase-tools deploy --only hosting

πŸ› οΈ Tech Stack

Technology Purpose
Next.js 15 App framework (App Router)
TypeScript Type safety
Tailwind CSS Styling
@google/genai Gemini Live API SDK
React Three Fiber 3D rendering
Three.js 3D engine
@react-three/drei R3F helpers
Leaflet Interactive maps
react-leaflet React bindings for Leaflet

πŸ“„ License

This project is open source under the MIT License.

Built for the Gemini Live Agent Challenge hackathon.

About

JARVIS: A true AI companion and future operating system. Seamlessly blending voice, vision, and execution to manifest ideas into reality in real-time. #GeminiLiveAgentChallenge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors