GitHub

Real-time Speech-to-Text with AssemblyAI and PyAudio This project is a real-time speech-to-text transcription system that streams audio from your microphone to AssemblyAI’s Streaming API using WebSockets. The program captures audio through the microphone, streams it to AssemblyAI, receives live transcription, and saves the recorded audio to a .wav file after the session ends.

Features

Real-time speech recognition using AssemblyAI Streaming API
Captures audio through the microphone with PyAudio
Streams audio in 50ms chunks over WebSocket
Handles live formatted and unformatted transcripts
Records audio locally and saves as a .wav file with timestamp
Clean resource management and error handling

How This Was Built

Audio Capture Used pyaudio to capture microphone input. Configured sample rate at 16kHz, mono channel, 16-bit PCM format.
WebSocket Streaming Established connection with AssemblyAI Streaming endpoint. Sent binary audio frames in real-time to the API.
Transcription Handling Processed incoming JSON responses. Displayed interim transcripts and formatted text when available.
Recording and Saving Audio Stored raw audio frames in memory while streaming. Saved all captured audio into a .wav file when the session ended.
Requirements Python 3.8 or above A valid AssemblyAI API Key
Installation Clone this repository or copy the code into a Python file (e.g., realtime_stt.py). Install required dependencies: pip install pyaudio websocket-client Replace the placeholder API key in the code: API_KEY = "your_api_key_here"

Usage Run the script in your terminal:

python realtime_stt.py

Speak into your microphone. Live transcripts will appear in the terminal. Audio will continue recording until you stop the program. Stop the program with Ctrl + C. A termination message is sent to the server. The recorded audio will be saved locally as a .wav file with a timestamp.

Implementation Details

Audio Configuration Frames per buffer: 800 samples (~50ms at 16kHz) Sample rate: 16000 Hz Format: 16-bit PCM (pyaudio.paInt16) Channels: Mono (1)
Concurrency Audio streaming is handled in a dedicated thread. The main thread listens for WebSocket events and manages graceful termination.
Error Handling Robust error messages for audio device issues, WebSocket disconnects, and JSON decoding errors. Ensures all audio streams and threads are cleaned up on exit.

Notes Ensure your microphone is accessible and not in use by other applications. Network stability is important for uninterrupted streaming. The saved .wav file is raw audio; it does not contain transcripts.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
Stt.py		Stt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

MyMindSpace/STT

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages