SignSpeak

A real-time sign language recognition and text-to-speech system that converts sign language gestures into spoken audio with emotion-based voice synthesis. Built for presentations and accessibility.

Features

Real-time Sign Language Recognition: Recognizes both static and dynamic sign language gestures using MediaPipe hand tracking
Emotion-Based Text-to-Speech: Converts recognized text to speech with emotion-aware voice synthesis using Google Cloud Text-to-Speech (Gemini-TTS)
Facial Emotion Detection: Detects facial emotions in real-time using MediaPipe Face Landmarker to enhance TTS with appropriate emotional tone
Stream Cleaning: Intelligent text cleaning pipeline that removes duplicates and trailing noise from recognized gestures
Presentation Mode: Upload and display PDF presentations while signing
Speaker Profiles: Customize voice settings, rate, pitch, and volume
Live Captions: Real-time word-by-word caption display as you sign

Architecture

Backend (`backend/`)

FastAPI WebSocket server for real-time communication
Gesture Recognition: SVM classifier for static signs, DTW (Dynamic Time Warping) for dynamic signs
Stream Cleaner: Removes consecutive duplicates, near-duplicates, and trailing noise
TTS Service: Google Cloud Text-to-Speech with Gemini-TTS models for emotion-based synthesis

Frontend (`frontend/`)

React + Vite application
MediaPipe for hand tracking and facial emotion detection
WebSocket client for real-time backend communication
PDF.js for presentation viewing

Prerequisites

Python 3.8+
Node.js 18+ and npm
Google Cloud Account with Text-to-Speech API enabled
Webcam for sign language recognition and emotion detection
Chrome browser (recommended for best webcam support)

Setup Instructions

1. Clone the Repository

git clone <repository-url>
cd Cheesehacks2026

2. Backend Setup

Install Python Dependencies

cd backend
pip install -r requirements.txt

Google Cloud Credentials Setup

Create a Google Cloud Project (if you don't have one):
- Go to Google Cloud Console
- Create a new project or select an existing one
Enable Text-to-Speech API:
- Navigate to "APIs & Services" > "Library"
- Search for "Cloud Text-to-Speech API"
- Click "Enable"
Create Service Account:
- Go to "APIs & Services" > "Credentials"
- Click "Create Credentials" > "Service Account"
- Fill in the service account details
- Grant the "Cloud Text-to-Speech API User" role
- Click "Done"
Download Service Account Key:
- Click on the created service account
- Go to "Keys" tab
- Click "Add Key" > "Create new key"
- Choose "JSON" format
- Download the JSON file
Place Credentials File:
- Rename the downloaded JSON file to signspeak-488902-b29067f64881.json
- Place it in the project root directory (same level as backend/ and frontend/)
- Alternatively, set the GOOGLE_APPLICATION_CREDENTIALS environment variable:
```
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/credentials.json"
```

Run the Backend Server

cd backend
python main.py

The backend server will start on http://localhost:8000

3. Frontend Setup

Install Dependencies

cd frontend
npm install

Run the Frontend Development Server

npm run dev

The frontend will start on http://localhost:5173

4. Access the Application

Open your browser and navigate to http://localhost:5173
Allow camera access when prompted
The application should connect to the backend automatically (check the connection status in the header)

Project Structure

Cheesehacks2026/
├── backend/
│   ├── main.py              # FastAPI WebSocket server
│   ├── recognizer.py        # Gesture recognition (SVM + DTW)
│   ├── cleaner.py           # Text stream cleaning pipeline
│   ├── features.py          # Feature extraction for gestures
│   ├── tts_service.py       # Google Cloud TTS service
│   ├── templates.json       # Saved sign templates
│   └── requirements.txt     # Python dependencies
├── frontend/
│   ├── src/
│   │   ├── App.jsx          # Main application component
│   │   ├── components/      # React components
│   │   │   ├── Calibration.jsx
│   │   │   ├── DashboardHome.jsx
│   │   │   ├── FacialEmotionDetection.jsx
│   │   │   ├── PresentationsPage.jsx
│   │   │   └── SpeakerProfilesPage.jsx
│   │   └── context/         # React contexts
│   │       └── FacialEmotionContext.jsx
│   ├── package.json
│   └── vite.config.js
├── emotion-app/             # Standalone emotion detection app
├── signspeak-488902-b29067f64881.json  # Google Cloud credentials (not in git)
└── README.md               # This file

Key Technologies

Backend

FastAPI: Modern Python web framework
MediaPipe: Hand tracking and gesture recognition
scikit-learn: Machine learning (SVM classifier)
Google Cloud Text-to-Speech: Emotion-based voice synthesis
NumPy: Numerical computations
WebSockets: Real-time bidirectional communication

Frontend

React 19: UI framework
Vite: Build tool and dev server
MediaPipe Tasks Vision: Client-side hand and face tracking
PDF.js: PDF rendering for presentations

Usage

Recording Signs

Static Signs (single gesture):
- Click "Saved Signs" button
- Enter a name for your sign
- Click "Record Static"
- Perform the gesture and hold it for a moment
- The system will record multiple samples for better recognition
Dynamic Signs (motion-based):
- Enter a name for your sign
- Click "Record Dynamic"
- Perform the motion gesture
- Click "Stop Recording" when done

Presenting

Click the "Present" tab
Load a PDF presentation (optional) using "Load PDF"
Start signing - your gestures will be recognized in real-time
Press E key to flush the current sentence and trigger TTS
The system will:
- Clean the recognized text (remove duplicates, noise)
- Detect your facial emotion
- Synthesize speech with appropriate emotional tone
- Play the audio through your speakers

Calibration

Go to the "Calibration" tab
Follow the on-screen instructions
The system will calibrate facial emotion detection for your face

Configuration

Backend Tuning Parameters

In backend/main.py:

VOTE_WINDOW: Number of frames for gesture voting (default: 10)
VOTE_THRESHOLD: Agreement threshold for recognition (default: 0.60)

Audio Format

The system uses LINEAR16 (uncompressed WAV) format for highest audio quality. This can be changed in backend/tts_service.py:

LINEAR16: Best quality, larger files
OGG_OPUS: Good quality, smaller files
MP3: Standard quality, smaller files

Emotion Detection

Emotion thresholds can be customized in frontend/src/components/FacialEmotionDetection.jsx:

STABILITY_FRAMES: Frames required before emotion change (default: 10)
Individual emotion thresholds for: confident, warm, excited, emphatic, serious, passionate, reflective

Troubleshooting

Backend Issues

TTS Service Not Initializing:

Verify GOOGLE_APPLICATION_CREDENTIALS environment variable is set
Check that the credentials file exists and is valid
Ensure Text-to-Speech API is enabled in Google Cloud Console

WebSocket Connection Failed:

Verify backend is running on port 8000
Check firewall settings
Ensure CORS is properly configured

Frontend Issues

Camera Not Working:

Use Chrome browser (best support)
Ensure camera permissions are granted
Check that no other application is using the camera

WebSocket Disconnected:

Verify backend server is running
Check browser console for connection errors
Ensure backend URL in App.jsx matches your backend server

Audio Playback Issues

No Audio Playing:

Check browser audio permissions
Verify TTS service is initialized in backend
Check browser console for audio errors
Ensure audio format is supported by your browser

Development

Running in Development Mode

Backend:

cd backend
python main.py

Frontend:

cd frontend
npm run dev

Building for Production

Frontend:

cd frontend
npm run build

The production build will be in frontend/dist/

License

[Add your license here]

Contributing

[Add contribution guidelines here]

Acknowledgments

MediaPipe for hand and face tracking
Google Cloud Text-to-Speech for voice synthesis
FastAPI for the backend framework
React and Vite for the frontend

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
backend		backend
frontend		frontend
.DS_Store		.DS_Store
.gitignore		.gitignore
README.MD		README.MD

Folders and files

Latest commit

History

Repository files navigation

SignSpeak

Features

Architecture

Backend (backend/)

Frontend (frontend/)

Prerequisites

Setup Instructions

1. Clone the Repository

2. Backend Setup

Install Python Dependencies

Google Cloud Credentials Setup

Run the Backend Server

3. Frontend Setup

Install Dependencies

Run the Frontend Development Server

4. Access the Application

Project Structure

Key Technologies

Backend

Frontend

Usage

Recording Signs

Presenting

Calibration

Configuration

Backend Tuning Parameters

Audio Format

Emotion Detection

Troubleshooting

Backend Issues

Frontend Issues

Audio Playback Issues

Development

Running in Development Mode

Building for Production

License

Contributing

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend (`backend/`)

Frontend (`frontend/`)

Packages