VIS (Versatile Image Speech) is a Java tool, which uses OCR (optical character recognition) and TTS (text-to-speech) services to enable images to be read aloud.
With the push of a button, a voice output can be played from an image on the clipboard.
- Converts images from clipboard to speech
- Plays back the converted speech
- (optional) Translates the detected text into the voices language before the speech is generated and played
- Configurable voice settings (language, voice, speed, pitch)
- Google Cloud Vision API - https://cloud.google.com/vision
- Google Cloud Text-To-Speech API - https://cloud.google.com/text-to-speech
- Google Cloud Translation API - https://cloud.google.com/translate
- Java Development Kit (JDK) (17 or higher recommended)
- Internet connection for utilizing OCR and TTS services
- Service Account of a Google Cloud Project with enabled Vision API & Text-To-Speech API https://console.cloud.google.com/
- (optional) Google Cloud Translation could also be enabled if translation functionality should be used.
- Clone the repository:
git clone https://github.com/cech12/VIS.git
- Navigate to the project directory:
cd VIS
- Create a Google Cloud Project
https://cloud.google.com/resource-manager/docs/creating-managing-projects
Enable following APIs:
- Google Cloud Vision API
- Google Cloud Text-To-Speech API
- (optional) Google Cloud Translation API
- Setup Service Account for Google Cloud Project
https://cloud.google.com/iam/docs/service-account-overview
- Create & Download the key file of the created Service Account and save it as "credentials.json"
- put the file into the config directory (directly located in the project)
- Run the application
./gradlew run
- Once the application is running, you can configure your preferred language & voice
- (optional) add your Google Cloud Project ID into the field for the translation functionality
- Copy an image to the clipboard. (You can use Tools like Windows Snipping Tool to copy something on your screen)
- Hit the "Read Image from Clipboard" button and hear the voice
- You can hit the "Stop Speech" button to stop the voice output
- You can configure voice settings (language, voice, speed, pitch, project ID) by editing the parameters in the UI (works directly for the next voice output)
- or by changing the values in the "config/vis.config" file (needs an application restart)
Contributions to the project are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.