This project is a Visual and Audio Reading Assistant developed as an undergraduate project at the Department of Electrical & Computer Engineering, University of Peloponnese for the course "Digital Sound and Image Processing." The project was performed by Tsikelis G., under the supervision of Associate Prof. Athanasios Koutras.
Speakify is an accessible tool designed to assist users in reading and understanding text through visual and audio formats. The application allows users to open and read various file types, describe images using AI, convert text to speech, and customize audio playback settings. It provides support for multiple languages and gender-specific voice options.
- File Reading: Supports reading of text files, PDFs, and upcoming support for EPUB files.
- Image Description: Utilizes a machine learning model for image captioning, with translation to the user's preferred language.
- Text-to-Speech Conversion: Converts text to speech using Google Cloud's Text-to-Speech API, allowing customization of voice settings such as gender, speed, pitch, and volume.
- Audio Playback: Includes audio playback controls with play, pause, resume, and reset functionality.
- Language Support: Offers multilingual support for English, Greek, German, French, and Spanish.
- Multi-format support for reading text files and PDFs
- AI-powered image description with real-time translation
- Customizable text-to-speech settings
- User-friendly graphical interface with
tkinter - Context menu for easy text manipulation (cut, copy, paste)
To run this project, the following dependencies must be installed:
- Python 3.7 or higher
tkinterfor the graphical user interface (usually included with Python)PyPDF2for reading PDF filesgoogle-cloud-texttospeechandgoogle-cloud-visionfor cloud-based text-to-speech and image processingtransformersfor the AI image description modelpygamefor audio playbackpillowfor image handling
You will also need to set up Google Cloud credentials for the Text-to-Speech and Vision APIs.
-
Install Python Dependencies:
pip install tkinter PyPDF2 google-cloud-texttospeech google-cloud-vision transformers pygame pillow -
Configure Google Cloud Credentials:
- Replace the placeholder
GOOGLE_APPLICATION_CREDENTIALSwith the path to your Google Cloud JSON credentials file.
- Replace the placeholder
-
Run the Application:
python visualreader.py- The graphical user interface will open, allowing you to interact with the application's features.
- Make sure to configure the Google Cloud credentials correctly.
- The image description feature requires an internet connection for AI processing.
- Audio playback options (play, pause, resume) will be enabled after text-to-speech conversion.
This project is intended for educational purposes and may not be used for commercial applications without proper licensing.