GitHub - Stawa/GTTS: This project converts written material into speech by using Google AI (Gemini) for text creation or internet searches.

Gemini Text-To-Speech

This project converts written material into speech by using Google AI (Gemini) for text creation or internet searches.

📜 Table of Contents

How It Works
Project Note
Project Installlation
Project Examples
Contributors

❓ How It Works

You may be wondering how this project works; it's actually simple. This was based on an example in test/app.ts. So the first thing it will do is fetch our voice, and then it will call a function that sends a request to the Google Gemini API so we can receive an answer from the AI. Also, this is necessary; it can automatically play a TTS from the generated text.

📌 Project Note

This project is being tested on Linux using the Ubuntu 24.04 LTS x86_64 distribution. For windows users you can install SoX in SourceForge. In MacOS, I don't have any information about it since I don't use MacOS, but you can use any possible way to run SoX at least.

Task	Priority	Complete	Status
Implement Gemini Chat	High	✓	Completed
Develop Voice Recognition	High	✓	Completed
Implement Audio Language Detection	High	✓	Completed
Implement Text Language Detection	Medium	✓	Completed
Implement an Audio Player	Low	✓	Completed
Define Enums	Low	✓	Completed
Integrate Debugging	Low	✓	Completed

📦 Project Installlation

Before you use this repository, verify that you have the following libraries installed on Linux:

SoX
- sudo apt-get install sox
- Windows Users (SourceForge)
libsox-fmt-all
- sudo apt-get install libsox-fmt-all
- // Optional for windows
FFmpeg
- choco install ffmpeg
- sudo apt install ffmpeg

After installing the necessary libraries, proceed to install the repository by using the following commands:

# npm
$ npm install git+https://github.com/Stawa/GTTS.git
# Bun
$ bun install git+https://github.com/Stawa/GTTS.git

📄 Project Examples

A few requirements must be completed in order for each class to execute successfully. These needs include the following:

Google Gemini API Key (lib.GoogleGemini)
- This key can be obtained from Google Cloud.
TikTok SessionID (lib.TextToSpeech)
- This SessionID can be obtained from TikTok cookies.
Google Speech API Key (lib.VoiceRecognition.fetchTranscriptGoogle)
- This key can be obtained from Chromium API Key.
Deepgram API Key (lib.VoiceRecognition.fetchTrascriptDeepgram)
- This key can be obtained from Deepgram

This is an example of how you get a generated response from the Google Gemini API; it only takes one function:

import { GoogleGemini } from "@stawa/gtts";

const google = new GoogleGemini({
   apiKey: "XXXXX",
   debugLog: true;
})

async function app() {
   const res = await google.chat("When was Facebook launched?");
   console.log(res);
};

app();

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github		.github
lib		lib
repo		repo
test		test
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bun.lockb		bun.lockb
package.json		package.json
tsconfig.json		tsconfig.json
typedoc.json		typedoc.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemini Text-To-Speech

📜 Table of Contents

❓ How It Works

📌 Project Note

📦 Project Installlation

📄 Project Examples

👥 Contributors

About

Releases

Packages

Contributors 2

Languages

License

Stawa/GTTS

Folders and files

Latest commit

History

Repository files navigation

Gemini Text-To-Speech

📜 Table of Contents

❓ How It Works

📌 Project Note

📦 Project Installlation

📄 Project Examples

👥 Contributors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages