Skip to content

Interactive React app for visual object understanding powered by Google GenAI. Upload an image, choose a detection type, and get structured results (2D boxes, segmentation masks, or points) along with the exact API request/response used.

Notifications You must be signed in to change notification settings

dheeraj-codingdesk/Ai-Vision

Repository files navigation

Robotics Spatial Understanding (AI Vision)

Interactive React app for visual object understanding powered by Google GenAI. Upload an image, choose a detection type, and get structured results (2D boxes, segmentation masks, or points) along with the exact API request/response used.

Overview

  • Three detection modes: 2D bounding boxes, Segmentation masks, Points.
  • Shows JSON request and response for transparency.
  • Supports model selection and optional "thinking" with a configurable temperature.
  • Uses @google/genai on the client. API key is injected at build time.

Tech Stack

  • Vite + React + TypeScript
  • State: jotai
  • Styling: Tailwind (browser build)
  • Drawing: perfect-freehand

Prerequisites

  • Node.js 18+ and npm
  • A Google Generative AI API key

Setup

  1. Install dependencies:
    • npm install
  2. Create a local env file with your API key:
    • ./.env.local
    • Add GEMINI_API_KEY=YOUR_API_KEY
    • The Vite config maps this to process.env.API_KEY used in the app (vite.config.ts:14-16).

Development

  • Start the dev server:
    • npm run dev
  • Default host/port: 0.0.0.0:3000 (vite.config.ts:8-11).

Build & Preview

  • Production build:
    • npm run build
  • Local preview of the build:
    • npm run preview

Usage

  1. Upload an image via the button in the left panel (SideControls.tsx:42-63).
  2. Choose a detection type (DetectTypeSelector.tsx:26-35).
  3. Optionally enable "Reveal on hover" for boxes (TopBar.tsx:47-64).
  4. Pick a model and optionally enable "thinking" and set the temperature (Prompt.tsx:270-305, Prompt.tsx:397-409).
  5. Click Send to run the request (Prompt.tsx:368-396).
  6. View the rendered results on the image (Content.tsx:243-297) and the JSON request/response (App.tsx:32-61).

Models

  • Default: gemini-robotics-er-1.5-preview
  • Alternative: gemini-2.5-flash
  • Switch in the UI (Prompt.tsx:274-285).

Environment & Security

  • API key is read from process.env.API_KEY in the client (Prompt.tsx:45).
  • Vite defines process.env.API_KEY from GEMINI_API_KEY (vite.config.ts:14-16).
  • Do not commit real keys. Keep them in .env.local only.

Troubleshooting

  • No results or API errors:
    • Ensure .env.local contains GEMINI_API_KEY and restart the dev server.
    • Confirm network access and that your key has permissions.
  • UI shows disabled Send button:
    • Upload an image first; the button is disabled without an image (Prompt.tsx:369-372).
  • Boxes/masks not visible:
    • Try toggling "Reveal on hover" or switch detection type.

About

Interactive React app for visual object understanding powered by Google GenAI. Upload an image, choose a detection type, and get structured results (2D boxes, segmentation masks, or points) along with the exact API request/response used.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published