This is a vanilla JavaScript prototype that showcases the Web Speech API ability to mirror the voice-first prototypes.
Open the folder, serve it locally, and you have a demo-ready build sized for an iPhone 17 Pro-sized artboard.
- How to wire the Web Speech API without frameworks
- How to present voice-specific UI states (idle, listening, speaking) with CSS and DOM hooks
- How to keep
SpeechRecognitionresilient with watchdogs, timeouts, and auto language detection - How to experiment with
SpeechSynthesisvoices from the same codebase
- The code in
speech-recognition-ios26/ - Any static web server (
python3 -m http.server,npx serve, etc.) - Chrome 115+, Edge 115+, or Safari 17+ (desktop or mobile) with microphone permissions enabled
The Web Speech API exposes two building blocks: SpeechRecognition (turn voice into text) and SpeechSynthesis (turn text into voice). Together they unlock conversational UI prototypes that feel close to production apps like Google Assistant, Apple's Siri, or Amazon Alexa.
- Clone or download this repo.
cd speech-recognition-ios26- Run a static server:
- Python:
python3 -m http.server 8090 - Node:
npx serve -p 8090
- Python:
- Visit http://127.0.0.1:8090 in a supported browser and allow microphone access when prompted.
- Tap the mic to start listening, tap ✕ to stop, or use the gear to toggle languages.
Because the project is plain HTML/CSS/JS there is no build pipeline—swap assets, tweak app.js, refresh the browser, and repeat.
speech-recognition-ios26/– iPhone 17 Pro-sized vanilla JS prototype, the version showcased above.speech-recognition/– legacy layout that uses the same JavaScript but older artboard sizing.captures/&_img/– marketing captures you can drop into decks or portfolio pieces.
The iOS 26 build is a comprehensive playground for testing voice UX. Highlights from speech-recognition-ios26/app.js:
- Stateful UI model –
idle,listening, andspeakingstates drive CSS classes for the animated waveform, card headers, and button availability. - Low-latency prompts – transcript text switches between “Speak now”, “Start talking…”, and live transcripts with timeout helpers so the UI never feels frozen.
- Self-healing recognition – watchdogs restart the recognizer if Chrome drops audio, while inactivity timers reset the session after long pauses.
- Bilingual support – a visible language toggle and an automatic detector (English ↔︎ Chinese) adjust recognizer locales and update copy in a couple of taps.
- Helpful error copy – microphone, permission, and network errors replace the transcript area with guidance instead of failing silently.
- Local persistence – the last selected language is stored in
localStorageso the next run feels personal.
Use this as a template for your own demos: swap the textBox copy, add real-time fetches to your assistant stack, or bolt on SpeechSynthesis for responses.
index.htmlwires up the shell (title bar, cards, mic/settings buttons) and loadsapp.js.style.csshandles the faux-device layout, animation hooks (.listening,.speaking), and typography.app.jscreates the recognizer, coordinates UI state, and encapsulates timeouts, watchdogs, and auto language switching logic. Look atstartListening(),restartRecognizer(), andcheckAutoLanguageSwitch()to understand the full flow.
To try other locales, update activeLanguage defaults and tweak inferLanguageFromTranscript(). To prototype different prompts or commands, customize the recognizer.onresult handler to branch on transcripts or call external APIs.
The SpeechRecognition interface lets us recognize speech and respond accordingly. PromptWorks' piece on Speech Recognition in the Browser provided the snippet below.
Your browser may request permission to use the microphone.
// This API is currently prefixed in Chromium browsers
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
// Create a new recognizer
const recognizer = new SpeechRecognition();
// Start producing results before the person has finished speaking
recognizer.interimResults = true;
// Set the language of the recognizer
recognizer.lang = "en-US";
// Define a callback to process results
recognizer.onresult = (event) => {
const result = event.results[event.resultIndex];
if (!result || !result[0]) return;
console.log(result[0].transcript);
};
// Start listening...
recognizer.start();Once the transcript is a string you can map it to DOM updates, send it to a service, or run local logic. For example, the snippet below mirrors the prototype’s live transcript area:
const textBox = document.querySelector("[data-role='transcript']");
recognizer.onresult = (event) => {
const result = event.results[event.resultIndex];
if (!result || !result[0]) return;
textBox.textContent = result[0].transcript;
};The SpeechSynthesis interface provides controls and methods for the synthesis voices available on the device. Browser compatibility is stronger than recognition, spanning Safari and several mobile browsers.
Snippets from PromptWorks:
speechSynthesis.speak(new SpeechSynthesisUtterance("Hello world."));Incrementing utterance.voice = voices[1] lets you cycle through device voices:
const voices = speechSynthesis.getVoices();
const utterance = new SpeechSynthesisUtterance("Hello world.");
utterance.voice = voices[1];
speechSynthesis.speak(utterance);- MDN - Web Speech API
- MDN - SpeechRecognition Interface
- MDN - SpeechSynthesis Interface
