CrowCrow main UI (click to enlarge)
CrowCrow is an experimental, browser-based parametric articulatory speech synthesizer.
This repo is not under active development, but is preserved as an artifact for reference, inspiration, or further research.
Note:
Like many experimental projects, some paths lead you nowhere but inform your future work.
CrowCrow began as an attempt to build a fully parametric TTS synth that could be AI-driven for future embodiment, as an alternative to current dataset/diffusion-based TTS.
However, since the synth itself is not the main focus and lacks essential features like a phonemizer, coarticulation, and prosody (which are better served with a different architecture), a full rewrite would be needed for production use.
Companion Post:
Embodied speech for AIs — A short introduction to the ideas behind this project.
- Real-time articulatory speech synthesis in the browser (Web Audio API + AudioWorklet)
- Modular, extensible audio engine (Glottis, Tract, Nasal, Click, Tap, Trill, etc.)
- Partial IPA phoneme coverage (vowels, plosives, fricatives, nasals, clicks, trills, taps, etc.)
- Interactive UI: phoneme buttons, tract visualizer, oscilloscope, parameter sliders
- Phoneme text input and sequencer for custom utterances
- Designed for research, education, and creative coding
👉 Try CrowCrow on GitHub Pages
Just visit: https://kenoleon.github.io/crowcrow/
git clone https://github.com/kenoleon/crowcrow.git
cd crowcrow
npm install
npm run build
npm start
# Then open http://localhost:8080/- Start Synth else you wont hear a thing.
- Click phoneme buttons to hear individual sounds.
- Move the master Gain to hear a continuous waveform.
- Adjust sliders for pitch, tenseness, tract shape, etc.
- Enter a sequence in the phoneme text input and press "Play" to synthesize custom utterances.
- Watch the tract visualizer and oscilloscope update in real time.
[subglottalNode]
│
[noiseNode]─────┐
│ │
[glottisNode] │
│ │
[chestinessNode]│
│ │
[transientNode] │
│ │ │
│ └─────────────┐
│ │
[tractNode] [nasalNode]
│ │
└─────┬─────┬───────┘
│ │
[clickNode] │
[tapNode] │
[trillNode] │
│ │
[summingNode]
│
[gainNode]
│
[masterGainNode]
│
[analyserNode]
│
[audioContext.destination]
Legend:
-
subglottalNode: Simulates subglottal pressure (lungs) -
noiseNode: Generates noise for fricatives, aspiration, etc. -
glottisNode: Simulates vocal fold vibration and voicing -
chestinessNode: Adds chest resonance -
transientNode: Handles plosive bursts and transients -
tractNode: Main vocal tract filter (8 zones, 44 segments) -
nasalNode: Simulates nasal tract coupling -
clickNode,tapNode,trillNode: Special consonant bursts -
summingNode: Sums oral, nasal, click, tap, trill outputs -
gainNode,masterGainNode: Output gain control -
analyserNode: For oscilloscope/visualization -
Processors:
public/audio/processors/: AudioWorkletProcessors for each stagesrc/main.js: Main controller, node creation, UI, parameter flowsrc/state/phonemeMap.js: Phoneme definitions and parameters
- All phonemes are defined in
src/state/phonemeMap.jsusing articulator parameters. - You can add or modify phonemes by editing this file and the UI.
- Build:
npm run build - Dev server:
npm start - Deploy: Push to
mainbranch; GitHub Actions will auto-deploy to Pages.
MIT License (see LICENSE)
- Created by Keno Leon
- Inspired by Pink Trombone and other open-source speech synthesis projects
This repo is experimental and will not be developed further.
It is preserved as a reference artifact for the community.
- Coarticulation and expressive prosody
- Voice customization and presets
- AI-driven parameterization
- Improved UI/UX and accessibility
Feel free to fork or reference for your own projects!

