This is a simple prototype of a node.js based speech-to-text -> GPT -> text-to-speech pipeline. It uses the Google Cloud Speech-to-Text and Google Cloud Text-to-Speech APIs to convert speech to text and text to speech, respectively. The OpenAI GPT-3.5 model is used to generate text from the speech-to-text output.
- Clone this repository
- Create a
.env
file in the root directory of the project and add the following environment variables:GOOGLE_APPLICATION_CREDENTIALS
: Path to your Google Cloud Platform service account key fileOPENAI_API_KEY
: Your OpenAI API keyINIT_PROMPT
: The initial prompt to use for the GPT model. Optional and defaults to a default prompt if not set.
You can use the Eleven Labs text to speech service instead of Google Cloud Text-to-Speech. To do so, you need to create an account on the Eleven Labs website and get an API key. Then, you need to add the following environment variable to your .env
file:
ELEVENLABS_API_KEY
: Your Eleven Labs API key
Then you need to swap the import in the server.ts
file from text-to-speech.ts
to text-to-speech-elevenlabs.ts
.
- Run
npm install
to install the dependencies - Run
npm start
to start the server - Wait until it says
Press the space bar to start recording.
; - Press the space bar to start recording and wait for the answer. It will take a few seconds to process the audio and generate the answer and is is relatively slow as is has not been optimized by using streaming APIs.
This is a prototype and is not intended for production use. It is not optimized for performance. As with everything where you use your own API keys, you are responsible for the costs incurred by using this software.