Robotics Spatial Understanding (AI Vision)

Interactive React app for visual object understanding powered by Google GenAI. Upload an image, choose a detection type, and get structured results (2D boxes, segmentation masks, or points) along with the exact API request/response used.

Overview

Three detection modes: 2D bounding boxes, Segmentation masks, Points.
Shows JSON request and response for transparency.
Supports model selection and optional "thinking" with a configurable temperature.
Uses @google/genai on the client. API key is injected at build time.

Tech Stack

Vite + React + TypeScript
State: jotai
Styling: Tailwind (browser build)
Drawing: perfect-freehand

Prerequisites

Node.js 18+ and npm
A Google Generative AI API key

Setup

Install dependencies:
- npm install
Create a local env file with your API key:
- ./.env.local
- Add GEMINI_API_KEY=YOUR_API_KEY
- The Vite config maps this to process.env.API_KEY used in the app (vite.config.ts:14-16).

Development

Start the dev server:
- npm run dev
Default host/port: 0.0.0.0:3000 (vite.config.ts:8-11).

Build & Preview

Production build:
- npm run build
Local preview of the build:
- npm run preview

Usage

Upload an image via the button in the left panel (SideControls.tsx:42-63).
Choose a detection type (DetectTypeSelector.tsx:26-35).
Optionally enable "Reveal on hover" for boxes (TopBar.tsx:47-64).
Pick a model and optionally enable "thinking" and set the temperature (Prompt.tsx:270-305, Prompt.tsx:397-409).
Click Send to run the request (Prompt.tsx:368-396).
View the rendered results on the image (Content.tsx:243-297) and the JSON request/response (App.tsx:32-61).

Models

Default: gemini-robotics-er-1.5-preview
Alternative: gemini-2.5-flash
Switch in the UI (Prompt.tsx:274-285).

Environment & Security

API key is read from process.env.API_KEY in the client (Prompt.tsx:45).
Vite defines process.env.API_KEY from GEMINI_API_KEY (vite.config.ts:14-16).
Do not commit real keys. Keep them in .env.local only.

Troubleshooting

No results or API errors:
- Ensure .env.local contains GEMINI_API_KEY and restart the dev server.
- Confirm network access and that your key has permissions.
UI shows disabled Send button:
- Upload an image first; the button is disabled without an image (Prompt.tsx:369-372).
Boxes/masks not visible:
- Try toggling "Reveal on hover" or switch detection type.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
App.tsx		App.tsx
Content.tsx		Content.tsx
DetectTypeSelector.tsx		DetectTypeSelector.tsx
ExampleImages.tsx		ExampleImages.tsx
ExtraModeControls.tsx		ExtraModeControls.tsx
Palette.tsx		Palette.tsx
Prompt.tsx		Prompt.tsx
README.md		README.md
SideControls.tsx		SideControls.tsx
TopBar.tsx		TopBar.tsx
Types.tsx		Types.tsx
atoms.tsx		atoms.tsx
consts.tsx		consts.tsx
hooks.tsx		hooks.tsx
index.css		index.css
index.html		index.html
index.tsx		index.tsx
metadata.json		metadata.json
package.json		package.json
tsconfig.json		tsconfig.json
utils.tsx		utils.tsx
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robotics Spatial Understanding (AI Vision)

Overview

Tech Stack

Prerequisites

Setup

Development

Build & Preview

Usage

Models

Environment & Security

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

dheeraj-codingdesk/Ai-Vision

Folders and files

Latest commit

History

Repository files navigation

Robotics Spatial Understanding (AI Vision)

Overview

Tech Stack

Prerequisites

Setup

Development

Build & Preview

Usage

Models

Environment & Security

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages