Skip to content

Rajkiumar/PDF_to_Audio_Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

PDF to Speech

A tiny Python script (main.py) that opens a file picker, lets you choose a PDF file, and reads the text out loud using your system TTS engine.

This repository contains a single script that demonstrates a simple PDF-to-speech utility using PyPDF2 to extract text and pyttsx3 for text-to-speech.

What this does

  • Opens a file dialog to select a PDF file.
  • Extracts text from each page using PyPDF2.
  • Speaks each page's extracted text using pyttsx3.

The core logic lives in main.py:

  • Uses tkinter.filedialog.askopenfilename() to let you pick a PDF.
  • Uses PyPDF2.PdfReader to read pages and page.extract_text() to get text.
  • Uses pyttsx3 to speak the extracted text.

Requirements

  • Python 3.7+ (tested on Windows)
  • The following Python packages:
    • pyttsx3
    • PyPDF2

Note: tkinter is part of the standard library on most official Python builds for Windows; if you used a minimal or custom build that lacks tkinter, install a Python distribution that includes it.

Install

Open PowerShell and run:

pip install pyttsx3 PyPDF2

(If you use a virtual environment, activate it first.)

Usage

Run the script from the project folder:

python main.py

A file selection dialog will open. Choose a PDF file. The script will extract text page-by-page and speak it aloud.

Stopping the script:

  • You can stop the reading by focusing the terminal and pressing Ctrl+C.
  • Closing the terminal or stopping Python will halt the speech.

Limitations & Notes

  • Text extraction depends on the PDF contents. If a PDF contains scanned pages (images), PyPDF2's extract_text() will likely return None or empty strings; in that case you need OCR (for example using pytesseract) to extract text from images.

  • The script does not currently handle errors such as selecting a non-PDF file, missing read permissions, or a canceled file dialog. Consider adding try/except blocks and basic validation.

  • pyttsx3 uses your platform's TTS backend (SAPI5 on Windows). If you want different voices or to adjust rate/volume, you can modify the script to set properties on the engine (see pyttsx3 docs). Example:

engine = pyttsx3.init()
engine.setProperty('rate', 150)  # words per minute
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[0].id)  # choose a voice

Suggested Improvements

  • Add CLI options to choose file path from the command line (for automation) and to adjust TTS voice/rate/volume.
  • Add error handling for file selection failures and PDF read errors.
  • Support OCR fallback (using pytesseract) for scanned PDFs.
  • Provide a simple GUI with play/pause/stop controls instead of blocking the terminal.

Example main.py (current)

The current main.py is small and looks like this:

import pyttsx3
from PyPDF2 import PdfReader
from tkinter.filedialog import askopenfilename

book = askopenfilename()

pdfreader = PdfReader(book)
pages = len(pdfreader.pages)

player = pyttsx3.init()

for num in range(pages):
    page = pdfreader.pages[num]
    text = page.extract_text()
    if text:
        player.say(text)
        player.runAndWait()

License

This README and any added small scripts are provided under the MIT License. If you want a different license, let me know and I can update it.

Contact / Next steps

If you want, I can:

  • Add argument parsing so files can be passed on the command line.
  • Add error handling and a small requirements.txt.
  • Add an example showing how to change voice settings or save speech to audio file.

Tell me which of the above you'd like next and I'll implement it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages