CookWithMe is a Gemini-powered, multimodal, multi-agent grocery assistant for Indian quick-commerce. It understands user intent in plain language, sees what is on screen, and executes shopping actions end-to-end.
Supported platforms:
- Blinkit
- Zepto
Online grocery buying is still high-friction, especially for busy users and families:
- Recipe ideas are disconnected from actual cart building.
- Users manually search each ingredient and decide among many pack sizes and variants.
- Price and availability vary by platform, so users repeat the same work across apps.
- Building a complete cart takes time, attention, and frequent corrections.
This causes three real costs:
- Time cost: repetitive effort for every meal plan.
- Money cost: missing better platform choices and offers.
- Cognitive cost: constant micro-decisions (quantity, substitutes, brands, budget).
CookWithMe converts this into one guided flow: ask once, review once, and let the agent complete the rest.
Example prompts:
- Make paneer butter masala for 4
- Buy 2L milk, 500g paneer, and eggs
CookWithMe then:
- Parses user intent and extracts structured requirements.
- Expands recipe requests into a practical shopping list.
- Uses Gemini multimodal reasoning on screenshots to decide the next executable actions.
- Adds items, handles unavailable products with meaningful substitutes, and tracks progress live.
- Compares outcomes across platforms and recommends where to buy for better value.
- Returns a clear cart and cost summary.
- Saves effort: recipe-to-cart in one conversation.
- Saves time: no repeated manual searching for each item.
- Saves money: platform comparison helps users choose better total value.
- Improves confidence: live progress + transparent summary before checkout.
- Works around uncertainty: substitution and quantity handling keep flows moving.
- Multimodal agentic shopping using Gemini screenshot understanding.
- Personalized experience with one-time preference setup and reuse.
- Multi-agent orchestration with specialized agents per responsibility.
- Recipe expansion to structured shopping items with quantities.
- Substitution planning for unavailable products.
- Cross-platform execution and comparison (Blinkit + Zepto). Help users in decision making.
- Session continuity for smoother repeat usage.
- Real-time frontend updates through SSE/WebSocket.
CookWithMe stores user preferences after initial setup and applies them in future sessions:
- Dietary style and restrictions.
- Budget level.
- Preferred platform.
- Preferred brands.
- Pack-size behavior and household defaults.
Result: users do not need to restate the same context each time.
CookWithMe is built as a coordinated agent platform, not a single prompt chain.
Main agents:
- Intent Parser Agent: converts open-ended user text to structured shopping intent.
- Recipe Expander Agent: turns recipe goals into complete grocery items.
- Fused Vision Agent: reads screenshots, reasons about UI state, and proposes precise next actions.
- Substitution Agent: finds practical, recipe-compatible alternatives.
- Core Loop Orchestrator: executes and verifies actions, tracks step status, and manages progress.
This separation improves reliability, debuggability, and extensibility for production use.
flowchart TD
U[User] --> FE[Frontend UI]
FE --> API[FastAPI Backend on Google Cloud]
API --> CS[Chat Session + State]
CS --> IA[Intent Parser Agent]
CS --> RA[Recipe Expander Agent]
CS --> ORCH[Core Loop Orchestrator]
ORCH --> VA[Fused Vision Agent \n Gemini Multimodal]
ORCH --> SA[Substitution Agent]
ORCH --> EXE[Action Executor]
EXE --> BR[Playwright Browser]
BR --> BL[Blinkit]
BR --> ZP[Zepto]
ORCH --> COMP[Platform Comparison + Recommendation]
COMP --> SUM[Cart + Cost Summary]
SUM --> FE
CS --> PREF[(User Preferences Store)]
- Python 3.12
- FastAPI backend
- Gemini API via google-genai (multimodal reasoning)
- Playwright (Chromium)
- Pydantic v2 models
- HTML/CSS/JS frontend served by backend
- Docker for containerised deployemnt
- Google Cloud Run deployment flow
- gemini/server.py: API layer, chat/events endpoints, UI serving.
- gemini/agents/: multimodal and text agents.
- gemini/core/: orchestration loop, browser manager, session logic, models.
- gemini/frontend/: web UI assets.
- utils/: helper scripts for sessions and profile handling.
- scripts/: container startup scripts (including noVNC runtime).
Runtime-created directories:
- sessions/: local session artifacts.
- screenshots/: runtime captures for debugging and evaluation.
-
Create and activate virtual environment.
python -m venv venv source venv/bin/activate
-
Install dependencies.
pip install -r requirements.txt playwright install chromium
-
Configure environment.
cp .env.example .env
-
Run backend.
venv/bin/python -m gemini.server --port 8000
-
Open app.
Use this flow to test the full app in a local container, including visible browser UI via noVNC.
-
Build image.
docker build -f Dockerfile.novnc -t cook-with-me-novnc:local .
-
Create runtime env file (do not commit this file).
nano runtime.env
Add:
GOOGLE_API_KEY=YOUR_KEY
DEMO_TOKEN=
-
Run container.
docker run -d --restart unless-stopped
--name cook-with-me-novnc
--env-file runtime.env
-p 8080:8080
-p 6080:6080
-e BROWSER_HEADLESS=false
cook-with-me-novnc:local -
Verify container health.
docker ps docker logs --tail 120 cook-with-me-novnc
-
Open URLs.
Users must authenticate themselves in the real browser UI.
- Open the app UI.
- Click Accounts.
- Choose Blinkit or Zepto.
- In noVNC browser, complete login manually.
- Select delivery location/address manually inside platform UI.
- Return to app chat and type done.
- The app stores the session and uses it for subsequent automation.
Recommended run order for first-time setup:
- Open noVNC tab first.
- Open app tab second.
- Connect platform and login.
- Type done in app chat after login and location selection completes.
- Start shopping prompts.
This project includes a ready Cloud Run deployment script:
Deploy steps:
-
Authenticate and set project.
gcloud auth login gcloud config set project YOUR_PROJECT_ID
-
Export Gemini key.
export GOOGLE_API_KEY="YOUR_KEY"
-
Deploy.
bash deploy.sh
The script builds the container with Cloud Build and deploys to Cloud Run.
CookWithMe helps users decide where to buy, not just what to buy.
It compares platform outcomes to recommend the better option based on:
- Item coverage.
- Effective total value.
- Delivery and fee impact.
- Substitution burden.
This directly improves user outcomes on time and money.
- A real multimodal agent that acts on screen context in real time.
- A true multi-agent platform with clear role boundaries.
- Personalization that users configure once and reuse.
- Cross-platform recommendation focus for practical cost/time savings.
- End-to-end flow from intent to completed cart with transparent status.
- Multimodal action planning significantly improves robustness in dynamic UI flows.
- Specialized agents outperform monolithic prompts for complex workflows.
- Preference memory materially improves user experience in repeated sessions.
- Platform-to-platform differences make comparison logic highly valuable to users.
- Stronger recommendation scoring using richer price and delivery signals.
- Expanded platform coverage.
- Optional voice-first interactions.
MIT