A Chrome extension that provides real-time meeting transcription with AI-powered answer recommendations using Google's Gemini AI. Now with multi-speaker conversation support and flexible transcription modes.
- Real-time Transcription: Captures audio from your browser tab and transcribes it live
- See interim results as you speak (gray italic text)
- Final transcription appears when confirmed
- Character counter shows transcription length
- AI-Powered Recommendations: Get intelligent answer suggestions based on meeting context using Gemini AI
- One-click generation from your transcription
- Copy recommendations to clipboard
- Dual Display Modes: Choose how MeetScribe appears
- Side Panel (default): Persistent sidebar that doesn't block your meeting content
- Popup: Traditional extension popup for quick access
- Customizable Prompts: Configure custom system prompts or use the built-in defaults
- Multiple AI Models: Choose between Gemini 2.0 Flash, 2.5 Flash, or 2.5 Pro
- Privacy-First: Transcription happens locally, only the text is sent to Gemini API
- Speaker Identification: Track multiple speakers in conversations
- Manual speaker tagging with Web Speech API (free mode)
- Automatic speaker detection with Deepgram (paid mode)
- Speaker Management:
- Rename speakers for easy identification
- Color-coded speaker bubbles
- Track turn count per speaker
- Multiple View Modes:
- Turn-by-Turn: See each conversation turn with speaker attribution
- Grouped by Speaker: View all text from each speaker together
- Flat: Simple speaker-prefixed text format
- Export Functionality: Save transcriptions as JSON or TXT with speaker data
- Web Speech API (Free)
- Uses browser's built-in speech recognition
- Manual speaker assignment during transcription
- No API costs
- Works offline after initial load
- Deepgram (Paid ~$0.24/hour)
- Professional-grade transcription
- Automatic speaker detection (diarization)
- Higher accuracy, especially for technical terms
- Supports up to 10 speakers simultaneously
- Fallback to Web Speech API if connection fails
- Dual AI Modes:
- Manual: Click to generate AI responses
- Auto: Automatic generation based on triggers
- Auto-Trigger Options:
- Time interval (e.g., every 60 seconds)
- Turn count (e.g., every 5 conversation turns)
- Speech pause (e.g., after 5 seconds of silence)
- Multiple Recommendation Types:
- Answer suggestions
- Meeting summary
- Follow-up questions
- Analysis
- Custom Prompts per Type: Configure different prompts for each recommendation type
- Google Chrome browser (version 88 or higher)
- A Gemini API key from Google AI Studio
- (Optional) Deepgram API key for automatic speaker detection
- Microphone access permission
- Visit Google AI Studio
- Sign in with your Google account
- Click "Create API Key"
- Copy your API key (it starts with
AIza)
Note: Gemini API offers free tier usage. Check Google's pricing page for current limits.
- Visit Deepgram Console
- Sign up for a free account (includes $200 in free credit)
- Create an API key
- Copy your API key (it starts with
dg_)
Cost: Approximately $0.24/hour for nova-2 model with speaker diarization.
- Download or clone this repository
- Open Chrome and navigate to
chrome://extensions/ - Enable "Developer mode" in the top-right corner
- Click "Load unpacked"
- Select the
meetscribefolder containing the extension files - The extension icon will appear in your browser toolbar
- Click the extension icon in your browser toolbar
- Go to the Settings tab
- Choose your Interface Mode (Side Panel or Popup)
- Paste your Gemini API key
- (Optional) Select Transcription Mode:
- Web Speech API (free, manual speaker tagging)
- Deepgram (paid, automatic speaker detection) - requires Deepgram API key
- (Optional) Configure AI Mode:
- Manual: Click "Generate" to get AI suggestions
- Auto: AI generates automatically based on triggers
- Optionally, add custom prompts for different recommendation types
- Select your preferred AI model (default: Gemini 2.0 Flash)
- Click Save Settings
- Open a meeting or any webpage with audio
- Click the extension icon
- Select the current speaker from the dropdown (for Web Speech API mode)
- Click Start Recording
- Grant microphone permission if prompted
- Speak or play audio - transcription will appear in real-time
- Use the "Current Speaker" dropdown to select who's speaking
- Click "+" to add a new speaker
- Click "Manage Speakers" to rename or delete speakers
- Speaker names and colors persist across sessions
- Speakers are automatically detected and labeled (Speaker 1, Speaker 2, etc.)
- Go to "Manage Speakers" to rename detected speakers
- Colors are automatically assigned for visual distinction
Switch between three different views:
- Turn-by-Turn (default): Each conversation turn with speaker bubble and color
- Grouped by Speaker: All text from each speaker grouped together
- Flat: Simple
[Speaker]: textformat
- Select a recommendation type (Answer, Summary, Follow-up, Analysis)
- Click Generate Answer button
- The AI analyzes the transcription and provides suggestions
- Copy the recommendation as needed
When auto mode is enabled, AI generates recommendations automatically based on your trigger settings:
- Interval: Every X seconds (configurable, minimum 30s)
- Turn: Every N conversation turns
- Pause: When speech pauses for 5+ seconds
Auto-AI status is shown above the recommendation section.
- Click the Export button
- Choose format:
- Plain Text (.txt): Human-readable text file
- JSON (.json): Structured data with speakers and metadata
- For TXT, choose text format (flat, grouped, or turn-by-turn with timestamps)
- Click Export to download
- Stop Recording: Click the Stop button to pause transcription
- Clear: Click the Clear button to remove the current transcription
- Copy: Click the copy icon to copy the transcription to clipboard
You can customize prompts for each recommendation type:
- Go to Settings tab
- In the "Custom Prompts" section:
- Default System Prompt (Answer): For general answer suggestions
- Auto-AI Prompt: For automatic AI generations
- Summary Prompt: For meeting summaries
- Follow-up Questions Prompt: For generating follow-up questions
- Use
{transcription}as a placeholder for the meeting text
Example Custom Summary Prompt:
Based on the following meeting transcription, provide:
1. A brief summary (2-3 sentences)
2. Key decisions made
3. Action items with owners (if mentioned)
Meeting Transcription:
{transcription}
Summary:
- Gemini 2.0 Flash (default): Fast responses, good for real-time use
- Gemini 2.5 Flash: Latest Flash model, improved performance
- Gemini 2.5 Pro: Highest quality responses
- gemini-flash-latest: Always uses the latest Flash model
- Ensure you're using Chrome browser
- Update Chrome to the latest version
- Click the lock icon in your browser's address bar
- Allow microphone access for the current site
- Refresh the page and try again
- Check your Deepgram API key is valid
- Verify you have credit in your Deepgram account
- The extension will automatically fall back to Web Speech API
- Verify your API key is correct
- Ensure the key starts with
AIza - Check that the key hasn't expired or been revoked
- Speak clearly and at a moderate pace
- Minimize background noise
- Ensure your microphone is working properly
- Try Deepgram mode for higher accuracy
- Wait a few moments before trying again
- Increase the auto-AI interval to reduce API calls
- Consider using manual mode instead of auto
- Ensure speakers are speaking clearly and distinctly
- Adjust the "Maximum Speakers" setting in configuration
- Check that your Deepgram account has speaker diarization enabled
meetscribe/
├── manifest.json # Extension configuration
├── popup.html # Popup UI (compact interface)
├── popup.css # Popup styling
├── popup.js # Popup logic and API calls
├── sidepanel.html # Side panel UI (persistent sidebar)
├── sidepanel.css # Side panel styling
├── sidepanel.js # Side panel logic
├── background.js # Service worker (handles display mode)
├── content.js # Audio capture and transcription
├── speakerManager.js # Speaker detection and metadata
├── conversationManager.js # Session state management
├── transcriptionProviders.js # Pluggable transcription backends
├── icons/ # Extension icons
│ ├── icon16.png
│ ├── icon48.png
│ ├── icon128.png
│ └── generate-icons.html # Icon generator tool
├── openspec/ # Specification documentation
│ └── changes/
│ └── add-meeting-transcriber-extension/
└── README.md # This file
The extension uses a modular architecture with the following core modules:
- speakerManager.js: Handles speaker detection, identification, and metadata
- conversationManager.js: Manages session state, turn tracking, and view formats
- transcriptionProviders.js: Pluggable transcription backends (Web Speech API, Deepgram)
- Make your changes to the source files
- Go to
chrome://extensions/ - Click the refresh icon on the MeetScribe extension card
- Test your changes
- Local Processing: Audio transcription happens locally (Web Speech API) or securely (Deepgram)
- No Audio Storage: Raw audio is never stored
- Secure Storage: API keys stored using Chrome's storage API
- HTTPS Only: All API calls use HTTPS encryption
- Minimal Data: Only transcription text is sent to AI APIs
This extension uses Google's Gemini API:
- Free Tier: Generous limits for development and personal use
- Pay-as-you-go: For heavier usage
Check the Gemini API pricing for current rates.
- Cost: ~$0.24/hour for nova-2 model with speaker diarization
- Free Tier: $200 in free credits for new accounts
- No minimum commitment: Pay only for what you use
Cost-Saving Tips:
- Use Web Speech API mode (free) for manual speaker tagging
- Use Deepgram only for important meetings
- Use Gemini 2.0 Flash for faster, cheaper AI responses
- Increase auto-AI intervals to reduce API calls
Existing users are automatically migrated to the new multi-speaker system:
- Old settings are preserved
- New features use sensible defaults
- Both old and new message formats are handled
- Data migration happens automatically on first load
This project is provided as-is for personal and educational use.
- NEW: Multi-speaker conversation support
- NEW: Speaker identification and management
- NEW: Flexible transcription modes (Web Speech API / Deepgram)
- NEW: Automatic speaker detection with Deepgram
- NEW: Multiple view modes (turn-by-turn, grouped, flat)
- NEW: Export functionality (JSON, TXT)
- NEW: Dual AI modes (manual / auto)
- NEW: Multiple recommendation types (answer, summary, follow-up, analysis)
- NEW: Custom prompts per recommendation type
- NEW: Auto-trigger options for AI generation
- Enhanced: Fallback from Deepgram to Web Speech API
- Enhanced: Backward compatibility with existing users
- Initial release
- Real-time transcription using Web Speech API with interim results
- Gemini AI integration for answer recommendations
- Dual display modes: Side Panel (default) and Popup
- Custom prompt support with fallback to default
- Multiple AI model selection (2.0 Flash, 2.5 Flash, 2.5 Pro, Flash Latest)
- Settings persistence with chrome.storage
- Character counter for transcription
- Enhanced error messages with actionable suggestions
- Copy buttons for transcription and AI recommendations
Web Speech API (Free, Manual):
- ✅ No additional costs
- ✅ Works offline after initial load
- ✅ Good for 1-2 speakers
⚠️ Requires manual speaker assignment
Deepgram (Paid, Auto):
- ✅ Automatic speaker detection
- ✅ Higher accuracy
- ✅ Supports up to 10 speakers
- ✅ Better for technical terms
⚠️ Costs ~$0.24/hour
Manual Mode:
- ✅ Full control over when AI generates
- ✅ Fewer API calls, lower costs
- ✅ Good for intermittent meetings
Auto Mode:
- ✅ Automatic insights during meetings
- ✅ No need to remember to click generate
⚠️ More API calls⚠️ Set minimum 30s intervals to avoid rate limits
Yes! Go to Settings → Select "Transcription Mode" → Save. The extension will use the selected mode for the next recording.
Yes! Speaker names, colors, and metadata are saved to Chrome storage and persist across sessions.
The extension automatically falls back to Web Speech API if Deepgram connection fails, ensuring you never lose transcription capability.
Made with ❤️ for better meetings