Skip to content

Sumit231292/Gemini_AI_Tutor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽ“ EduNova โ€” AI Tutor That Sees & Speaks

Real-time, vision-enabled AI tutor powered by Google Gemini Live API.
Talk naturally, show your homework, and get step-by-step guidance.

Built for the Gemini Live Agent Challenge hackathon โ€” Category: Live Agents ๐Ÿ—ฃ๏ธ


โœจ Features

Feature Description
๐Ÿ—ฃ๏ธ Real-time Voice Natural conversation with your tutor โ€” speak and get spoken responses via Gemini Live API (native audio)
๐Ÿ“ธ Vision-Enabled Show your homework via camera or upload an image โ€” analyzed by Gemini 2.5 Flash vision model
๐Ÿง  Balanced Teaching Guides you to understand concepts, then provides complete solutions with step-by-step explanations
โšก Interruptible Break in at any time โ€” the tutor handles interruptions gracefully
๐ŸŒ Language Selection Choose from 20+ languages (English, Hindi, Spanish, French, etc.) โ€” the tutor responds in your preferred language
๐Ÿ“š Multi-Subject Mathematics, Physics, Chemistry, Biology, CS, Language Arts, History
๐Ÿ› ๏ธ ADK Agent Tools Structured tools for practice problems, concept explanations, study plans
๐Ÿ” User Authentication Sign up / log in with username & password โ€” passwords hashed with SHA-256 + salt
๐Ÿ‘ค Student Profiles Name, grade, gender, age, language โ€” persisted to Google Cloud Firestore
โ˜๏ธ Google Cloud Deployed on Cloud Run with Terraform IaC, user data stored in Firestore

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                       Frontend (Browser)                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚ Mic/Audioโ”‚  โ”‚  Camera  โ”‚  โ”‚     Chat UI / Controls      โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚
โ”‚                            โ”‚ WebSocket (wss://)                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                             โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 Google Cloud Run                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚              FastAPI + WebSocket Server                โ”‚    โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚    โ”‚
โ”‚  โ”‚  โ”‚   Session Mgr    โ”‚    โ”‚      ADK Tutor Agent      โ”‚ โ”‚    โ”‚
โ”‚  โ”‚  โ”‚  (Live API       โ”‚    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚ โ”‚    โ”‚
โ”‚  โ”‚  โ”‚   sessions)      โ”‚    โ”‚  โ”‚ Tools:              โ”‚  โ”‚ โ”‚    โ”‚
โ”‚  โ”‚  โ”‚                  โ”‚    โ”‚  โ”‚ โ€ข Practice Probs    โ”‚  โ”‚ โ”‚    โ”‚
โ”‚  โ”‚  โ”‚                  โ”‚    โ”‚  โ”‚ โ€ข Concept Explain   โ”‚  โ”‚ โ”‚    โ”‚
โ”‚  โ”‚  โ”‚                  โ”‚    โ”‚  โ”‚ โ€ข Check Solution    โ”‚  โ”‚ โ”‚    โ”‚
โ”‚  โ”‚  โ”‚                  โ”‚    โ”‚  โ”‚ โ€ข Study Plan        โ”‚  โ”‚ โ”‚    โ”‚
โ”‚  โ”‚  โ”‚                  โ”‚    โ”‚  โ”‚ โ€ข Step-by-Step      โ”‚  โ”‚ โ”‚    โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚ โ”‚    โ”‚
โ”‚  โ”‚           โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚              โ”‚                               โ”‚                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚ Bidirectional Audio           โ”‚ User Auth &
               โ”‚ Streaming                     โ”‚ Profile Storage
               โ–ผ                               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚      Google AI Platform      โ”‚  โ”‚  Google Cloud Firestore    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚  โ”‚                            โ”‚
โ”‚  โ”‚  Gemini 2.5 Flash    โ”‚    โ”‚  โ”‚  โ€ข User profiles           โ”‚
โ”‚  โ”‚  Native Audio        โ”‚    โ”‚  โ”‚  โ€ข Username / password     โ”‚
โ”‚  โ”‚  (Live API โ€” voice)  โ”‚    โ”‚  โ”‚  โ€ข Grade, age, gender      โ”‚
โ”‚  โ”‚  โ€ข Real-time audio   โ”‚    โ”‚  โ”‚  โ€ข Language preference     โ”‚
โ”‚  โ”‚  โ€ข Interruptions     โ”‚    โ”‚  โ”‚  โ€ข Subject preference      โ”‚
โ”‚  โ”‚  โ€ข Transcription     โ”‚    โ”‚  โ”‚                            โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚  โ”‚  (Fallback: local JSON)    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚  โ”‚  Gemini 2.5 Flash    โ”‚    โ”‚
โ”‚  โ”‚  Vision Model        โ”‚    โ”‚
โ”‚  โ”‚  (image analysis)    โ”‚    โ”‚
โ”‚  โ”‚  โ€ข Homework photos   โ”‚    โ”‚
โ”‚  โ”‚  โ€ข Image upload      โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ› ๏ธ Tech Stack

Layer Technology
AI Models Gemini 2.5 Flash Native Audio (Live API โ€” voice) + Gemini 2.5 Flash (vision/image analysis)
Agent Framework Google ADK (Agent Development Kit)
SDK Google GenAI SDK (google-genai v1.x)
Backend Python 3.12, FastAPI, uvicorn, WebSockets
Database Google Cloud Firestore (user profiles & session data)
Frontend Vanilla HTML/CSS/JS (no build step)
Google Cloud Cloud Run, Vertex AI, Cloud Build, Firestore
IaC Terraform + Cloud Build YAML
Containerization Docker (multi-stage build)

๐Ÿš€ Quick Start (Local Development)

Prerequisites

  • Python 3.11+
  • A Google AI API Key or a Google Cloud project with Vertex AI enabled
  • Microphone + Camera (for full experience)

1. Clone the Repository

git clone https://github.com/Sumit231292/Gemini_AI_Tutor.git
cd Gemini_AI_Tutor

2. Set Up Environment

# Create and activate virtual environment
python -m venv venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate

# Install dependencies
pip install -r backend/requirements.txt

3. Configure API Key

# Copy the example env file
cp .env.example .env

# Edit .env and add your API key
GOOGLE_API_KEY=your-api-key-here

# Set your GCP project for Firestore (user storage)
GOOGLE_CLOUD_PROJECT=your-gcp-project-id

3b. Set Up Firestore (for persistent user storage)

# Authenticate with GCP
gcloud auth application-default login --project YOUR_PROJECT_ID

# Create Firestore database (one-time)
gcloud firestore databases create --project=YOUR_PROJECT_ID --location=nam5 --type=firestore-native

Without Firestore, the app automatically falls back to a local JSON file (backend/data/users.json).

4. Run the Server

cd backend
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

5. Open in Browser

Navigate to http://localhost:8000 โ€” select a subject and start learning!

Note: For microphone/camera access, use localhost or HTTPS. Browsers block media APIs on non-secure origins.


โ˜๏ธ Deploy to Google Cloud

Option A: One-Command Deploy Script

# Authenticate with Google Cloud
gcloud auth login
gcloud config set project YOUR_PROJECT_ID

# Deploy
chmod +x deploy/deploy.sh
./deploy/deploy.sh YOUR_PROJECT_ID us-central1

Option B: Cloud Build (CI/CD)

# Trigger a build from the repo root
gcloud builds submit \
    --config deploy/cloudbuild.yaml \
    --project YOUR_PROJECT_ID \
    --substitutions _REGION=us-central1

Option C: Terraform (IaC โ€” Bonus Points!)

cd deploy

# Initialize Terraform
terraform init

# Plan the deployment
terraform plan -var="project_id=YOUR_PROJECT_ID"

# Apply
terraform apply -var="project_id=YOUR_PROJECT_ID"

After deployment, set your API key:

gcloud run services update edunova \
    --region us-central1 \
    --set-env-vars GOOGLE_API_KEY=your-api-key

๐Ÿ“ Project Structure

Gemini_AI_Tutor/
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ app/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py          # Package init
โ”‚   โ”‚   โ”œโ”€โ”€ config.py            # Environment configuration
โ”‚   โ”‚   โ”œโ”€โ”€ main.py              # FastAPI server + WebSocket endpoints
โ”‚   โ”‚   โ”œโ”€โ”€ live_api.py          # Gemini Live API session manager
โ”‚   โ”‚   โ”œโ”€โ”€ tutor_agent.py       # ADK agent with tutoring tools
โ”‚   โ”‚   โ””โ”€โ”€ user_store.py        # User auth & Firestore/JSON storage
โ”‚   โ”œโ”€โ”€ data/
โ”‚   โ”‚   โ””โ”€โ”€ users.json           # Local fallback user storage
โ”‚   โ”œโ”€โ”€ requirements.txt         # Python dependencies
โ”‚   โ””โ”€โ”€ Dockerfile               # Container for Cloud Run
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ index.html               # Main UI (sign-in, landing, session)
โ”‚   โ”œโ”€โ”€ style.css                # Styling (glass-morphism design)
โ”‚   โ””โ”€โ”€ app.js                   # Client-side logic & WebSocket
โ”œโ”€โ”€ deploy/
โ”‚   โ”œโ”€โ”€ deploy.sh                # One-click deploy script
โ”‚   โ”œโ”€โ”€ cloudbuild.yaml          # Cloud Build CI/CD config
โ”‚   โ””โ”€โ”€ main.tf                  # Terraform IaC config
โ”œโ”€โ”€ .env.example                 # Environment template
โ”œโ”€โ”€ .gitignore
โ””โ”€โ”€ README.md                    # This file

๐ŸŽฎ How to Use

  1. Create an account โ€” sign up with a username, password, name, grade, and preferred language
  2. Log in โ€” returning users log in with username and password
  3. Select a subject from the landing page
  4. Talk to your tutor โ€” click the mic button and speak naturally
  5. Show your homework โ€” use the camera button to capture a photo, or upload an image
  6. Type messages โ€” use the text input for typed questions
  7. Change language โ€” switch language anytime from the landing page
  8. Get guided โ€” the tutor explains concepts step-by-step and gives you the final answer
  9. Interrupt anytime โ€” the tutor handles interruptions gracefully
  10. Log out โ€” click the logout button to switch accounts

๐Ÿ”‘ Key Hackathon Requirements Checklist

Requirement Status Details
โœ… Gemini model โœ”๏ธ Gemini 2.5 Flash Native Audio (Live API) + Gemini 2.5 Flash (vision)
โœ… Google GenAI SDK or ADK โœ”๏ธ Both โ€” GenAI SDK for Live API + ADK for agent tools
โœ… Google Cloud service โœ”๏ธ Cloud Run, Vertex AI, Cloud Build, Firestore
โœ… Multimodal input โœ”๏ธ Voice (audio) + Vision (camera/image) + Text
โœ… Beyond text-in/text-out โœ”๏ธ Real-time voice conversation + image understanding
โœ… Multi-language โœ”๏ธ 20+ languages with per-session language selection
โœ… User persistence โœ”๏ธ Student profiles stored in Google Cloud Firestore
โœ… Live Agent category โœ”๏ธ Real-time interruptible voice + vision tutor
โœ… Public code repo โœ”๏ธ This repository
โœ… Spin-up instructions โœ”๏ธ See Quick Start above
โœ… Architecture diagram โœ”๏ธ See Architecture section
โœ… IaC deployment (bonus) โœ”๏ธ Terraform + Cloud Build

๐Ÿ“ Findings & Learnings

What Worked Well

  • Gemini Live API with native audio model provides remarkably natural real-time conversations with low latency
  • Hybrid vision approach โ€” using Gemini 2.5 Flash for image analysis and feeding results into the native audio session creates a seamless "sees and speaks" experience
  • Vision + Voice combo creates a powerful tutoring experience โ€” students can literally "show" their homework
  • Balanced teaching approach combines guided learning with actual answers โ€” the tutor explains step-by-step AND gives the final answer
  • ADK tools provide structured capabilities (practice problems, study plans) beyond free-form conversation
  • Language selection ensures the tutor always responds in the student's preferred language
  • Firestore integration provides durable, serverless storage for student profiles across sessions

Challenges

  • Native audio model limitations โ€” the gemini-2.5-flash-native-audio-latest model doesn't support direct image input, requiring a hybrid approach with a separate vision model call
  • Audio format handling โ€” bridging browser MediaRecorder PCM format to Gemini's expected input required careful sample rate and encoding management
  • WebSocket lifecycle โ€” managing the bidirectional bridge between client WebSocket and Gemini Live API session required careful async handling
  • Interruption handling โ€” ensuring smooth interruption UX when the student speaks while the tutor is responding

Future Ideas

  • Real-time whiteboard/drawing for working through math problems visually
  • Progress tracking across sessions with Firestore persistence
  • Integration with curriculum standards (Common Core, etc.)
  • Google OAuth as alternative login method
  • Support for more Gemini model variants as they become available

๐Ÿ“„ License

MIT License โ€” see LICENSE for details.


Built with โค๏ธ using Google Gemini Live API ยท ADK ยท Google Cloud
#GeminiLiveAgentChallenge

About

For DEvpost gemini hackathon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors