This project provides functionality to record audio until silence is detected, process the audio using a speech-to-text API, and generate a response with synthesized speech. The application is designed to run on a Jetson device with ALSA and Flask setup.
Before you start, ensure the following dependencies are installed on your system:
- Operating System: Linux-based system (e.g., Ubuntu) on Jetson devices.
- Python: Python 3.8 or higher installed.
- Packages:
FlaskFlask-SocketIOpyaudiorequestspydub
- ALSA Utilities:
alsa-utilsarecord
Install ALSA utilities using:
sudo apt update
sudo apt install alsa-utils- Microphone and Speaker: Ensure your microphone and speaker are properly connected and recognized by the system.
- Clone the repository:
git clone https://github.com/dextboy/treehacks.git
cd treehacks/backend- Set up a Python virtual environment (recommended):
python3 -m venv venv
source venv/bin/activate- Install Python dependencies:
pip install -r requirements.txt-
Install additional required libraries:
pyaudiomay require theportaudiolibrary. Install it using:
sudo apt install libportaudio2 libportaudiocpp0 portaudio19-dev
-
Configure ALSA to use the correct audio device:
-
Edit the
~/.asoundrcfile:pcm.!default { type hw card 2 device 0 } ctl.!default { type hw card 2 }
Replace
cardanddevicewith the appropriate values for your system. Usearecord -lto identify them.
-
-
Set Up Flask Application:
- Run the application by navigating to the project directory and starting the Flask server:
python app.py
- Run the application by navigating to the project directory and starting the Flask server:
-
ALSA Setup:
- Restart ALSA or reload its configuration:
sudo alsactl init
- Restart ALSA or reload its configuration:
-
Start Recording Until Silence: Run the application, and it will start recording audio until silence is detected. The recorded file will be saved as
output.wav. -
Process Recorded Audio: The
output.wavfile will automatically be sent to the/process/speechAPI endpoint for transcription and speech synthesis. -
Play Synthesized Speech: The synthesized audio file will be played on the device.
-
ALSA Errors:
- If you encounter errors like
Unknown PCMorCannot open device, verify your.asoundrcconfiguration and check connected audio devices using:aplay -l arecord -l
- If you encounter errors like
-
PyAudio Installation:
- If
pyaudiofails to install, ensure theportaudiolibrary is installed:sudo apt install libportaudio2 libportaudiocpp0 portaudio19-dev
- If
-
Permissions Issues:
- Ensure you have the required permissions to access audio devices:
sudo usermod -aG audio $USER
- Ensure you have the required permissions to access audio devices:
-
Device Not Recognized:
- Check the microphone and speaker connection. Restart ALSA with:
sudo alsactl init
- Check the microphone and speaker connection. Restart ALSA with: