VIS (Versatile Image Speech)

VIS (Versatile Image Speech) is a Java tool, which uses OCR (optical character recognition) and TTS (text-to-speech) services to enable images to be read aloud.

With the push of a button, a voice output can be played from an image on the clipboard.

Features

Converts images from clipboard to speech
Plays back the converted speech
(optional) Translates the detected text into the voices language before the speech is generated and played
Configurable voice settings (language, voice, speed, pitch)

Requirements

Java Development Kit (JDK) (17 or higher recommended)
Internet connection for utilizing OCR and TTS services
Service Account of a Google Cloud Project with enabled Vision API & Text-To-Speech API https://console.cloud.google.com/
(optional) Google Cloud Translation could also be enabled if translation functionality should be used.

Installation

Clone the repository:

git clone https://github.com/cech12/VIS.git

Navigate to the project directory:

cd VIS

Create a Google Cloud Project

https://cloud.google.com/resource-manager/docs/creating-managing-projects

Enable following APIs:

Google Cloud Vision API
Google Cloud Text-To-Speech API
(optional) Google Cloud Translation API

Setup Service Account for Google Cloud Project

https://cloud.google.com/iam/docs/service-account-overview

Create & Download the key file of the created Service Account and save it as "credentials.json"
put the file into the config directory (directly located in the project)

Usage

Run the application

./gradlew run

Once the application is running, you can configure your preferred language & voice
(optional) add your Google Cloud Project ID into the field for the translation functionality
Copy an image to the clipboard. (You can use Tools like Windows Snipping Tool to copy something on your screen)
Hit the "Read Image from Clipboard" button and hear the voice
You can hit the "Stop Speech" button to stop the voice output

Configuration

You can configure voice settings (language, voice, speed, pitch, project ID) by editing the parameters in the UI (works directly for the next voice output)
or by changing the values in the "config/vis.config" file (needs an application restart)

Contributing

Contributions to the project are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
gradle/wrapper		gradle/wrapper
src/main/java/de/cech12/vis		src/main/java/de/cech12/vis
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VIS (Versatile Image Speech)

Features

OCR Services

TTS Services

Translation Services

Requirements

Installation

Usage

Configuration

Contributing

License

About

Releases

Sponsor this project

Languages

License

cech12/VIS

Folders and files

Latest commit

History

Repository files navigation

VIS (Versatile Image Speech)

Features

OCR Services

TTS Services

Translation Services

Requirements

Installation

Usage

Configuration

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Languages