Skip to content

ShadowEsu/Cauliform-AI

Repository files navigation

Cauliform Logo

Cauliform

Turn any Google Form into a phone call

Live Demo Β· About

Problem β€’ Solution β€’ Tech Stack β€’ Getting Started β€’ Deployment


Cauliform is an AI-powered voice agent that transforms Google Forms into natural phone conversations. Simply paste a Google Form link, receive a call, and complete the form hands-free while walking, driving, or multitasking.

Demo

Cauliform Demo

Architecture

Cauliform Architecture Diagram

The system connects the user's browser to Gemini Live API via WebSocket for real-time voice conversation. Cloud Run hosts the Next.js backend with API routes for form parsing, profile management, and form submission. Firebase Firestore stores two collections:

  • user_profiles β€” Keyed by phone number, stores common responses (name, email, company, job title) extracted from past form submissions. On future calls, these are injected into Gemini's system prompt so the agent can confirm known answers instead of re-asking.
  • call_sessions β€” Logs every form interaction (form URL, title, answers, status, timestamp), enabling cross-form intelligence and call history tracking.

When a user fills out Form A and provides their name and email, those fields are automatically saved. When they later fill out Form B (a completely different form), the agent already knows their info and says "I have your name as Chinat Yu β€” is that still correct?"

Confirmation step: Before any form is submitted, the agent summarizes all collected responses and explicitly asks "Should I submit this form?" The form is only submitted when the user confirms with "yes", "submit", or "go ahead." This is enforced at two levels β€” the Gemini system prompt requires confirmation before calling the submit_form tool, and the frontend only triggers the submission API when the tool call is received. No form is ever submitted without the user's explicit verbal approval.

TinyFish (AI browser agent) handles the actual Google Form submission by automating the browser to fill and submit the form, keeping the form owner's workflow untouched.

The Problem

Google Forms are everywhereβ€”surveys, event registrations, feedback forms, applications. Yet filling them out requires your full attention: you need to stop what you're doing, pull out your device, and manually type responses. This creates friction that leads to abandoned forms, incomplete responses, and missed opportunities.

For users with disabilities, limited mobility, or those constantly on the move, traditional form-filling is even more challenging.

Solution

Cauliform leverages Google's Gemini Live API to create a real-time voice agent that:

  1. Parses any Google Form link you provide
  2. Calls you directly on your phone
  3. Asks each question conversationally
  4. Confirms your responses before submission
  5. Submits the completed form automatically

Fill out forms while walking to your car, during your commute, or while cooking dinnerβ€”no screens required.

Cauliform App Screenshot

Key Features

  • Voice-First Experience: Natural conversational interface powered by Gemini Live API
  • Real-Time Interaction: Handles interruptions gracefully, just like talking to a human
  • Smart Profile Memory: Remembers common responses (name, email, etc.) across forms
  • Multi-Format Support: Handles text responses, multiple choice, checkboxes, and long-form paragraphs
  • Attachment Handling: For file uploads, receive a text/email prompt to submit attachments
  • Confirmation Flow: Reviews all responses before final submission
  • Accessibility-First: Designed for users who prefer or require voice interaction

Technology Stack

Component Technology
AI/ML Gemini Live API, Google GenAI SDK
Voice/Telephony Twilio Voice API (WIP β€” phone call flow under development)
Frontend Next.js 14, React, TypeScript, Tailwind CSS
Backend Next.js API Routes
Cloud Infrastructure Google Cloud Run
Database Firebase Firestore
Email Resend / SendGrid
Authentication Google OAuth (optional)

Agent Pipeline

The voice agent follows a structured pipeline: identify user β†’ parse form β†’ conduct call β†’ confirm β†’ submit β†’ notify.

flowchart LR
    A[πŸ“± User Input] --> B{Known User?}
    B -->|Yes| C[Load Profile]
    B -->|No| D[Create Profile]
    C --> E[Parse Form]
    D --> E
    E --> F[πŸ“ž Initiate Call]
    F --> G[πŸ€– Gemini Agent]
    G --> H[Ask Questions]
    H --> I[Confirm Answers]
    I --> J[βœ… Submit Form]
    J --> K[πŸ“§ Send Email]
Loading

Call Flow

sequenceDiagram
    participant U as User
    participant C as Cauliform
    participant G as Gemini Live
    participant F as Google Forms

    U->>C: Paste form URL + phone
    C->>C: Lookup/create user profile
    C->>F: Parse form questions
    C->>U: πŸ“ž Phone call

    loop Each Question
        G->>U: Ask question (pre-fill if known)
        U->>G: Speak answer
        G->>C: Store response
    end

    G->>U: "Confirm your answers..."
    U->>G: "Yes, submit"
    C->>F: Submit responses
    C->>U: πŸ“§ Confirmation email
    G->>U: "Done! Goodbye!"
Loading

User Profile System

Phone number is the primary identifier. The agent learns and remembers common responses:

Field Type Example Question Saved As
email "What's your email?" john@example.com
fullName "What's your name?" John Smith
company "Where do you work?" Acme Corp
jobTitle "What's your role?" Software Engineer

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     FRONTEND (Next.js PWA)                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Landing Page   β”‚  β”‚  Call Status    β”‚  β”‚  Success Page   β”‚  β”‚
β”‚  β”‚  URL + Phone    β”‚  β”‚  Live Updates   β”‚  β”‚  Confirmation   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    BACKEND (Google Cloud Run)                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚ /parse-form  β”‚  β”‚ /start-call  β”‚  β”‚  /webhook    β”‚           β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β–Ό                       β–Ό                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Firebase        β”‚  β”‚  Gemini Live API β”‚  β”‚     Twilio       β”‚
β”‚  Firestore       β”‚  β”‚  Voice AI Agent  β”‚  β”‚  Phone Calls     β”‚
β”‚  ─────────────   β”‚  β”‚  ─────────────   β”‚  β”‚  ─────────────   β”‚
β”‚  β€’ User Profiles β”‚  β”‚  β€’ Real-time STT β”‚  β”‚  β€’ Outbound call β”‚
β”‚  β€’ Known Answers β”‚  β”‚  β€’ Real-time TTS β”‚  β”‚  β€’ Audio stream  β”‚
β”‚  β€’ Call Sessions β”‚  β”‚  β€’ Conversation  β”‚  β”‚  β€’ Webhooks      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό                                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Google Forms      β”‚            β”‚    Email Service     β”‚
β”‚    Parse & Submit    β”‚            β”‚    Confirmation      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Getting Started

Prerequisites

  • Node.js 18+
  • Google Cloud account with billing enabled
  • Twilio account with Voice capabilities
  • Google AI Studio API key (Gemini)

Installation

# Clone the repository
git clone https://github.com/ShadowEsu/Cauliform-AI.git
cd Cauliform-AI

# Install dependencies
npm install

# Set up environment variables
cp .env.example .env.local

Environment Variables

Edit .env.local with your API keys:

# Google AI (Gemini)
GOOGLE_AI_API_KEY=your_gemini_api_key_here
GOOGLE_CLOUD_PROJECT=your_gcp_project_id

# Twilio Voice
TWILIO_ACCOUNT_SID=your_twilio_account_sid
TWILIO_AUTH_TOKEN=your_twilio_auth_token
TWILIO_PHONE_NUMBER=+1234567890

# Firebase (optional)
NEXT_PUBLIC_FIREBASE_API_KEY=your_firebase_api_key
NEXT_PUBLIC_FIREBASE_PROJECT_ID=your_project_id

# App Configuration
NEXT_PUBLIC_APP_URL=http://localhost:3000

Running Locally

# Start the development server
npm run dev

# Open http://localhost:3000

Deployment

Google Cloud Run (Recommended)

We provide a one-click deployment script for Google Cloud Run:

# Make the script executable
chmod +x deploy.sh

# Deploy to Cloud Run
./deploy.sh YOUR_PROJECT_ID us-central1

The script will:

  1. Enable required GCP APIs
  2. Build and push the Docker image
  3. Deploy to Cloud Run
  4. Output your service URL

Manual Deployment

# Build the Docker image
docker build -t gcr.io/YOUR_PROJECT/cauliform .

# Push to Container Registry
docker push gcr.io/YOUR_PROJECT/cauliform

# Deploy to Cloud Run
gcloud run deploy cauliform \
  --image gcr.io/YOUR_PROJECT/cauliform \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated

Post-Deployment

  1. Set environment variables in Cloud Run console
  2. Update NEXT_PUBLIC_APP_URL to your Cloud Run URL
  3. Configure Twilio webhook URL to: https://YOUR_URL/api/webhook

Usage

  1. Open Cauliform in your browser
  2. Paste a Google Form link
  3. Enter your phone number
  4. Answer the call and complete the form conversationally
  5. Confirm your responses when prompted
  6. Done! The form is submitted automatically

Use Cases

  • Students: Register for events, complete course surveys, submit feedbackβ€”all while walking to class
  • Professionals: Fill out expense reports, HR forms, or client surveys during commute
  • Accessibility: Voice-first interface for users with visual impairments or motor difficulties
  • Busy Parents: Complete school forms, medical questionnaires, or community surveys hands-free
  • Field Workers: Submit reports and checklists without stopping work

Project Structure

src/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ page.tsx                 # Landing page
β”‚   └── api/
β”‚       β”œβ”€β”€ parse-form/          # Google Form parser
β”‚       β”œβ”€β”€ start-call/          # Twilio call initiation
β”‚       └── webhook/             # Twilio callbacks
β”œβ”€β”€ lib/
β”‚   β”œβ”€β”€ types.ts                 # TypeScript definitions
β”‚   β”œβ”€β”€ gemini.ts                # Gemini API wrapper
β”‚   β”œβ”€β”€ firebase.ts              # Firebase configuration
β”‚   └── form-parser.ts           # Form parsing logic
└── components/                  # React components

TODO: User Profile Storage (Firebase)

Track users by phone number and remember their data over time using Firestore.

Firebase Config:

  • Project ID: cauliform-ai-d836f
  • API Key: AIzaSyBovx3wV8lTZrNzcg4rCb1qcvxljoUhjuA
  • Auth Domain: cauliform-ai-d836f.firebaseapp.com

What to store per user (keyed by phone number):

  • phoneNumber β€” primary identifier
  • name, email, age, etc. β€” learned from past form responses
  • knownResponses β€” map of field patterns to saved values (auto-fill future forms)
  • sessions[] β€” history of completed form sessions (form title, answers, timestamp)
  • formsCompleted β€” count
  • lastActiveAt β€” timestamp

How it connects:

  1. When a call/conversation starts, look up user by phone number in Firestore
  2. Pre-fill known answers (e.g., name, email) and confirm with user
  3. After form submission, save new responses to user profile
  4. Next time the same user fills a form, agent says: "I have your name as Alice β€” should I use that?"

Files to implement:

  • src/lib/user-profile.ts β€” CRUD operations for user profiles
  • Update src/hooks/useGeminiLive.ts β€” pass known answers to system prompt
  • Update src/app/api/submit-form/route.ts β€” save responses after submission

Documentation

Hackathon

Category: Live Agents - Real-time voice interaction using Gemini Live API

This project is built for the Gemini Live Agent Challenge hackathon, focusing on breaking the "text box" paradigm with immersive, real-time voice experiences.

Judging Criteria

Criteria Weight
Innovation & Multimodal UX 40%
Technical Implementation 30%
Demo & Presentation 30%

Team

Name Role Background
Preston Full Stack Developer Web apps, front-end, minimal design
Chinat Yu Full Stack Developer Hackathon winner (TreeHacks), experienced builder

License

MIT License - see LICENSE for details.


Built with Gemini Live API for the Gemini Live Agent Challenge 2026

About

An AI Agent that converts input from google survey links to a phone call that automatically fills in the google form as the user answers the questions by the AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages