This project converts written material into speech by using Google AI (Gemini) for text creation or internet searches.
You may be wondering how this project works; it's actually simple. This was based on an example in test/app.ts. So the first thing it will do is fetch our voice, and then it will call a function that sends a request to the Google Gemini API so we can receive an answer from the AI. Also, this is necessary; it can automatically play a TTS from the generated text.
This project is being tested on Linux using the Ubuntu 24.04 LTS x86_64 distribution. For windows users you can install SoX in SourceForge. In MacOS, I don't have any information about it since I don't use MacOS, but you can use any possible way to run SoX at least.
Task | Priority | Complete | Status |
---|---|---|---|
Implement Gemini Chat | High | ✓ | Completed |
Develop Voice Recognition | High | ✓ | Completed |
Implement Audio Language Detection | High | ✓ | Completed |
Implement Text Language Detection | Medium | ✓ | Completed |
Implement an Audio Player | Low | ✓ | Completed |
Define Enums | Low | ✓ | Completed |
Integrate Debugging | Low | ✓ | Completed |
Before you use this repository, verify that you have the following libraries installed on Linux:
- SoX
sudo apt-get install sox
- Windows Users (SourceForge)
- libsox-fmt-all
sudo apt-get install libsox-fmt-all
// Optional for windows
- FFmpeg
choco install ffmpeg
sudo apt install ffmpeg
After installing the necessary libraries, proceed to install the repository by using the following commands:
# npm
$ npm install git+https://github.com/Stawa/GTTS.git
# Bun
$ bun install git+https://github.com/Stawa/GTTS.git
A few requirements must be completed in order for each class to execute successfully. These needs include the following:
- Google Gemini API Key (
lib.GoogleGemini
)- This key can be obtained from Google Cloud.
- TikTok SessionID (
lib.TextToSpeech
)- This SessionID can be obtained from TikTok cookies.
- Google Speech API Key (
lib.VoiceRecognition.fetchTranscriptGoogle
)- This key can be obtained from Chromium API Key.
- Deepgram API Key (
lib.VoiceRecognition.fetchTrascriptDeepgram
)- This key can be obtained from Deepgram
This is an example of how you get a generated response from the Google Gemini API; it only takes one function:
import { GoogleGemini } from "@stawa/gtts";
const google = new GoogleGemini({
apiKey: "XXXXX",
debugLog: true;
})
async function app() {
const res = await google.chat("When was Facebook launched?");
console.log(res);
};
app();