Elevate Your Live Stream Sales with Multimodal AI Audits.
StreamCoach AI is a powerful web application designed to automatically analyze the performance quality of live streams. Powered by Google Gemini 3 Flash Preview, it acts as a virtual consultant, auditing both the visual presentation and audio delivery of your stream to provide actionable, timestamped feedback that helps boost engagement and sales conversion.
- Multimodal Analysis: "Sees" your product presentation (lighting, clarity, gestures) and "hears" your sales pitch (tone, enthusiasm, pacing) simultaneously.
- Internationalization (i18n): Fully localized UI and AI responses in English, Indonesian, Spanish, Chinese, and Japanese.
- Smart Async Queue: Protects the server from overload. Implements a robust Job System with Polling to provide real-time status updates (Queued, Processing, Completed) to the user.
- Scalable Concurrency: Supports Local (In-Memory) and Production (Redis) queuing modes to strictly limit concurrent heavy tasks.
- Large File Support: Optimized handling for video uploads up to 1GB.
- Resource Efficiency: Automatically cancels heavy processing (FFmpeg & AI) if the user disconnects, saving server resources and API quota.
- Timeline Analysis: Identifies specific moments (timestamped flags) where issues occurred.
- Privacy-First & Secure:
- BYOK (Bring Your Own Key): Your Google Gemini API Key is stored safely in your browser's LocalStorage, never on our servers.
- Auto-Cleanup: Video files and extracted assets are deleted immediately after processing.
- PDF Reports: Export your audit results into a professional PDF format.
- Backend: Go (Golang)
- AI Engine: Google GenAI SDK (Gemini 3 Flash Preview)
- Video Processing: FFmpeg (Frame extraction & Audio separation)
- Concurrency:
- Queue: Native Channels (Local) / Redis (Production)
- Job Management: Async polling architecture (UUID based)
- Frontend: Vue.js 3 (Composition API)
- Styling: Tailwind CSS
- Alerts: SweetAlert2
- PDF Generation: jsPDF & AutoTable
The system follows a modern, decoupled architecture designed for speed and privacy. It leverages **Golang** for high-performance backend processing and **Google Gemini 3 Flash Preview** for multimodal AI analysis.
- Tech Stack: Vue.js (Framework) & Tailwind CSS (Styling).
- Function:
- Provides a responsive dashboard for users to upload live stream recordings.
- Privacy-First Security: The user's Gemini API Key is stored securely in the browser's Local Storage. It is never saved to our database, ensuring a "Bring Your Own Key" (BYOK) architecture.
- Visualizes the JSON analysis data into interactive charts, timelines, and scorecards.
- Tech Stack: Golang (Go).
- Function:
- Acts as the central orchestrator, handling API requests from the frontend.
- Manages Temporary Storage to briefly hold uploaded video files during the processing stage.
- Utilizes Go’s concurrency model (Goroutines) to handle multiple analysis requests efficiently without blocking the server.
- Tool: FFmpeg.
- Workflow:
- Once the video reaches the backend, Golang executes FFmpeg commands to split the media into two modalities:
- Visual Sampling: Extracts image frames at specific intervals (e.g., every 5-10 seconds) to reduce payload size while retaining visual context (lighting, gestures, product focus).
- Audio Extraction: Separates the full audio track to ensure the AI can analyze vocal intonation, pitch, and energy continuity without interruption.
- Once the video reaches the backend, Golang executes FFmpeg commands to split the media into two modalities:
- Model: Google Gemini 3 Flash Preview.
- Workflow:
- The Backend constructs a Multimodal Prompt containing the sampled image frames, the full audio file, and the user-selected context (e.g., "Jewelry Sales" or "Fashion").
- This payload is sent to the Gemini API.
- Gemini processes the inputs simultaneously ("watching" the frames and "listening" to the audio) to generate a holistic audit.
- Result: Gemini returns a structured JSON response containing:
- Overall Performance Score (0-100).
- Timestamped flags for issues (e.g., "Blurry product at 02:15").
- Actionable coaching tips.
- The Golang backend forwards this JSON to the Frontend, which renders it into the user-friendly "Stream Health Report."
Before running the application, ensure you have the following installed:
- Go (version 1.21 or higher) - Download Go
- FFmpeg - Download FFmpeg
- Crucial: Ensure
ffmpegis added to your system's PATH variable.
- Crucial: Ensure
- Google Gemini API Key - Get a free key
- (Optional) Redis: Required only if running in
productionmode for distributed queuing.
-
Clone the Repository
git clone https://github.com/yourusername/streamcoach-ai.git cd streamcoach-ai -
Install Dependencies
go mod tidy
-
Configure Environment Copy the example environment file:
cp .env.example .env
- By default,
APP_ENV=localuses in-memory queuing (no Redis required). - To use Redis, set
APP_ENV=productionand configureREDIS_ADDR.
- By default,
-
Verify FFmpeg
ffmpeg -version
-
Build and Run
go build -o streamcoach.exe ./streamcoach.exe
-
Access the App Open
http://localhost:8080
| Variable | Default | Description |
|---|---|---|
APP_ENV |
local |
Set to production to enable Redis queue. |
MAX_CONCURRENT_TASKS |
2 |
Maximum number of simultaneous analysis tasks. |
REDIS_ADDR |
localhost:6379 |
Address of your Redis server (Prod only). |
REDIS_PASSWORD |
- | Redis password (if any). |
REDIS_DB |
0 |
Redis Database index. |
- No Persistent Storage: We do not store your API keys or video files on the backend.
- Ephemeral Processing: Videos are uploaded to a temporary folder, processed, and immediately deleted.
- Client-Side Keys: API keys remain in your browser's LocalStorage.
This project was built for the Gemini Hackathon. Feedback and contributions are welcome!
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
Distributed under the MIT License. See LICENSE for more information.
