In [2]:
from PyPDF2 import PdfReader
import pyttsx3
import os

In [2]:
chapter_dict = {
    "Basics of Derivatives" : (11,18),
    "Understanding the Index" : (18,27),
    "Introduction to Forwards and Futures" : (27,55),
    "Introduction to Options" : (55,87),
    "Strategies using Equity futures and Equity options" : (87,124),
    "Trading Mechanism" : (124,140),
    "Introduction to Clearing and Settlement System" : (140,161),
    "Legal and Regulatory Environment" : (161,170),
    "Accounting and Taxation" : (170,182),
    "Sales Practices and Investors Protection Services" : (182,195)
    
}


def convert_to_audio(txt : str, file_name : str):
    """Convert the given text into speech and save it to an MP3 file."""
    engine = pyttsx3.init()
    engine.save_to_file(txt, file_name)  # Use 'txt' as the argument instead of 'text'
    engine.runAndWait()
    engine.stop()

    
def generate_audio_text(pdf_object: PdfReader, chapter_name: str, start_page: int, end_page: int):
    """Extract text from specified pages of a PDF and convert it to audio."""
    print(f"Chapter Name: {chapter_name}.mp3, starts_at={start_page}, ends_at={end_page}")
    text = ""
    
    for x in range(start_page, end_page):
        page_text = pdf_object.pages[x].extract_text()
        if page_text:  # Ensure there is text extracted from the page
            text += page_text
    
    # Safely format the file name (remove spaces and special characters)
    file_name = f"{chapter_name.replace(' ', '_')}.mp3"
    
    convert_to_audio(text, file_name)

    
def main():
    """Main function to read the PDF and convert chapters to audio."""
    reader = PdfReader("NISM Series VIII- Equity Derivatives Certification Examination Workbook - May 2023.pdf")
    
    for chapter_name, page_numbers in chapter_dict.items():
        generate_audio_text(
            pdf_object=reader,
            chapter_name=chapter_name, 
            start_page=page_numbers[0],
            end_page=page_numbers[1]
        )

if __name__ == "__main__":
    main()

Chapter Name: Basics of Derivatives.mp3, starts_at=11, ends_at=18


KeyboardInterrupt: 

In [None]:
PDF to Audio Converter V4

This project is a Python script that converts chapters from a PDF workbook into audio files using text-to-speech (TTS) technology. The audio is generated for each chapter and saved as an MP3 file.

Prerequisites:

Before running the script, you will need to install the required libraries:

pyttsx3 - For text-to-speech conversion.
PyPDF2 - For reading PDF files and extracting text.
You can install these libraries using pip:

pip install pyttsx3 PyPDF2
How It Works:

Chapter Data:
The script processes a pre-defined set of chapters, with each chapter having a specific range of pages defined in the chapter_dict. Each entry in the dictionary contains the chapter name and the start and end page numbers of that chapter.
Text Extraction:
The PdfReader from the PyPDF2 library is used to read the PDF. The text from the specified pages (defined in chapter_dict) is extracted from the PDF and stored as a string.
Text-to-Speech Conversion:
The extracted text is converted into speech using the pyttsx3 library. The text is saved as an MP3 file named after the chapter.
Features:

Converts PDF chapters to MP3 audio files.
Supports multiple chapters with defined page ranges.
Each chapter is saved as a separate MP3 file with the name of the chapter.
How to Run:

Download the Python script and the required PDF file: NISM Series VIII- Equity Derivatives Certification Examination Workbook - May 2023.pdf.
Make sure you have the required libraries installed (use pip install pyttsx3 PyPDF2).
Run the script:
python script_name.py
The script will generate MP3 files for each chapter in the same directory where the script is located.
File Naming Convention:

Each audio file will be named based on the chapter name, with spaces replaced by underscores. For example, the "Basics of Derivatives" chapter will be saved as Basics_of_Derivatives.mp3.

Sample Output:

Basics_of_Derivatives.mp3
Understanding_the_Index.mp3
Introduction_to_Forwards_and_Futures.mp3
Troubleshooting:

If the text extraction does not work well, ensure that the PDF is not encrypted or in an unusual format (e.g., scanned images). For scanned PDFs, consider using an OCR (Optical Character Recognition) tool to extract text.
Ensure that the TTS engine is correctly set up. You may need to adjust the TTS voice and speech rate based on your system's settings.