A real-time American Sign Language detection app. The frontend streams live webcam frames to an ML backend over WebSocket, which runs inference on each frame and returns the detected hand sign. Detected signs are assembled into text with AI-powered autocorrect.
Collaboration project. I built the entire frontend and both AI integrations. The ML model (Random Forest + MediaPipe + ONNX) and FastAPI backend were built by my teammate — backend repo.
- User opens the app and grants webcam access
- Frontend captures a frame from the webcam every 100ms as a binary WebP blob
- Blob is sent directly over a WebSocket connection to the FastAPI backend
- Backend runs MediaPipe hand landmark extraction → ONNX model inference → returns the predicted ASL character
- Characters accumulate into words; a Gemini-powered autocorrect cleans up recognition errors into readable sentences
Frontend & AI integration (this repo)
- Built the real-time WebSocket client — binary frame capture, connection lifecycle management (open/error/close), and automatic cleanup on unmount
- Implemented frame capture pipeline: reads webcam via canvas, encodes to WebP blob at 100ms intervals, streams binary data over WebSocket
- Integrated Gemini API for autocorrect — detected sign text is sent to Gemini to fix recognition errors and produce grammatically correct output
- Dockerized the frontend application
Tech stack
React · TypeScript · Vite · Tailwind CSS · WebSocket API · Gemini API · Docker
git clone https://github.com/Smiky0/Sign-Language-Detection.git
cd Sign-Language-Detection
npm install
npm run devCreate a .env file:
VITE_BACKEND_WS_URL=your_backend_websocket_url
VITE_GEMINI_MODEL=your_gemini_preferred_model
VITE_GEMINI_API_KEY=your_gemini_api_key
VITE_GEMINI_PROMPT=your_prompt_to_autocorrect_text
The backend must be running for sign detection to work. See the backend repo for setup instructions.
docker build -t asl-detection .
docker run -p 5173:5173 asl-detection