An interactive AI tutoring system powered by Google's Gemini Multimodal Live API, featuring real-time voice conversation and visual demonstrations with LaTeX math rendering.
- 🎙️ Real-time Voice Interaction: Speak naturally with an AI tutor using bidirectional audio streaming
- 📊 Visual Demonstrations: AI draws shapes, diagrams, and formulas on a digital whiteboard
- 🧮 LaTeX Math Rendering: Beautiful mathematical formula display using KaTeX
- 🔄 Step-by-Step Explanations: AI breaks down problems into clear, visual steps
- 🎨 Modern UI: Premium dark theme with glassmorphism effects
- Node.js (v16 or higher)
- npm (comes with Node.js)
- Google Gemini API Key (Get one from Google AI Studio)
cd studyaidcd server
npm installcd ../client
npm installCreate a .env file in the server directory:
cd ../server
# On Windows
echo GEMINI_API_KEY=YOUR_API_KEY_HERE > .env
# On Mac/Linux
echo "GEMINI_API_KEY=YOUR_API_KEY_HERE" > .envReplace YOUR_API_KEY_HERE with your actual Gemini API key from Google AI Studio.
You'll need two terminal windows - one for the server and one for the client.
cd server
npm startYou should see:
Server running on port 3001
cd client
npm run devYou should see:
VITE v... ready in ...ms
Local: http://localhost:5173/
Navigate to http://localhost:5173 in your web browser (Chrome or Edge recommended for best audio support).
-
Allow Microphone Access: The browser will ask for microphone permission - click "Allow"
-
Wait for Connection: The status should show "Gemini Live Ready"
-
Click "Start Mic": The orb will turn red and pulse
-
Speak Your Question: Try asking:
- "Draw a triangle"
- "Solve 2x + 5 = 15"
- "Show me the area of a circle formula"
- "Explain the Pythagorean theorem"
-
AI Response: The AI will:
- Respond with natural voice
- Draw relevant shapes/formulas on the whiteboard
- Use LaTeX for mathematical expressions
- Check browser volume/permissions
- Ensure "Start Mic" was clicked (this initializes audio playback)
- Try refreshing the page
- Verify
.envfile exists inserverdirectory - Check that
GEMINI_API_KEY=has your actual key - Restart the server after adding the key
- Ensure both server (port 3001) and client (port 5173) are running
- Check your internet connection (required for Gemini API)
- Look for error messages in server terminal
- Check browser console (F12) for errors
- Verify canvas is visible (right panel)
- Try asking for a simple shape first: "Draw a circle"
- Close other browser tabs to free up resources
- Check your internet speed
- The buffer size is optimized for ~128ms latency
studyaid/
├── client/ # React frontend
│ ├── src/
│ │ ├── components/
│ │ │ └── CanvasBoard.jsx # Visual whiteboard
│ │ ├── hooks/
│ │ │ └── useAudioStream.js # Audio capture
│ │ ├── App.jsx # Main app component
│ │ └── index.css # Styling
│ └── package.json
│
├── server/ # Node.js backend
│ ├── services/
│ │ └── GeminiLiveBridge.js # Gemini API integration
│ ├── index.js # Express server
│ ├── .env # API key (create this!)
│ └── package.json
│
└── README.md
Frontend:
- React 19
- Vite
- Socket.IO Client
- KaTeX (math rendering)
- Lucide React (icons)
Backend:
- Node.js
- Express
- Socket.IO
- WebSocket (
ws) - Gemini Multimodal Live API
- Audio Capture: Browser captures microphone input, converts to 16kHz PCM, and streams via WebSocket
- Gemini Processing: Server forwards audio to Gemini Live API, which generates:
- Audio responses (24kHz PCM)
- Tool calls (e.g.,
draw_on_canvas)
- Visual Rendering: Frontend receives drawing commands and renders them using:
- HTML5 Canvas for shapes
- KaTeX overlay for LaTeX formulas
- Audio Playback: Frontend plays Gemini's audio response in real-time
Server logs are already verbose. For client-side debugging, open browser console (F12).
Edit server/services/GeminiLiveBridge.js → sendSetupMessage() → systemInstruction.parts[0].text
Edit client/src/hooks/useAudioStream.js → createScriptProcessor(2048, 1, 1)
- Lower value (e.g., 1024) = lower latency, higher CPU
- Higher value (e.g., 4096) = smoother, more lag
This project is for educational purposes.
Built with:
Need Help? Check the troubleshooting section or open an issue.