### Installing Required Libraries

This cell handles the installation of libraries necessary for the PDF to audio conversion process:

- `pdfplumber`: A Python library for reading and extracting data from PDFs, crucial for accessing the text content within the PDF file.
- `openai`: The OpenAI Python library, used here for its text-to-speech capabilities, allowing us to convert extracted text into audio.
- `pydub`: A Python library for audio manipulation, which will be used for combining audio files if needed.
- `moviepy`: An optional library for video editing tasks, which can also handle audio files, offering additional audio processing capabilities if required.

In [None]:
# Install the pdfplumber library for extracting text from PDF files
!pip install pdfplumber

# Install the openai library to use OpenAI's text-to-speech API
!pip install openai

# Install pydub for audio file manipulation (like combining audio files)
!pip install pydub

# Optional: Install moviepy for advanced audio/video processing tasks
!pip install moviepy


### Extracting Text from PDF and Formatting as Markdown

This cell defines the `pdf_to_markdown` function, which is used to extract text from a PDF file and format it in Markdown. The process involves:

1. Opening the PDF file using `pdfplumber`, which allows us to access its content.
2. Iterating through each page of the PDF.
3. Extracting text from each page.
4. Applying basic Markdown formatting to the text for better readability. This includes adding double newlines for new paragraphs and a separator line between pages.

The function takes the path of the PDF file as an input and returns the formatted text. An example usage is shown at the end of the cell, where the function is called with a specified PDF file path, and the output is printed.

In [None]:
# Importing the pdfplumber library for PDF text extraction
import pdfplumber

def pdf_to_markdown(pdf_path):
    # Open the PDF file at the given path
    with pdfplumber.open(pdf_path) as pdf:
        markdown_content = ""
        # Loop through each page in the PDF
        for page in pdf.pages:
            # Extract text from each page
            text = page.extract_text()
            if text:
                # Format the text with basic Markdown: double newline for new paragraphs
                markdown_page = text.replace('\n', '\n\n')
                # Add a separator line between pages
                markdown_content += markdown_page + '\n\n---\n\n'

        return markdown_content

In [None]:
# Function Usage
pdf_path = 'Kant_What_is_Enlightenment.pdf'  # Replace with the actual PDF file path
markdown_text = pdf_to_markdown(pdf_path)
print(markdown_text)  # Print the extracted and formatted text

### Converting Markdown to Plain Text

This cell introduces the `markdown_to_plain_text` function, which is designed to convert Markdown-formatted text into plain text. The function utilizes regular expressions (regex) to achieve this, working as follows:

1. Identify and remove Markdown URL syntax, leaving only the link text.
2. Strip out bold (`**`) and italic (`*` or `_`) Markdown formatting.
3. Remove Markdown headers, list item symbols, and blockquote symbols.

The purpose of this function is to clean the extracted text from any Markdown formatting, making it suitable for speech synthesis in the next steps. An example usage is demonstrated at the end of the cell, converting previously formatted Markdown text into plain text and printing the result.


In [None]:
# Importing the 're' module for regular expression operations
import re

def markdown_to_plain_text(markdown_text):
    # Remove Markdown URL syntax ([text](link)) and keep only the text
    text = re.sub(r'\[([^\]]+)\]\([^\)]+\)', r'\1', markdown_text)

    # Remove Markdown formatting for bold and italic text
    text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text)  # Bold with **
    text = re.sub(r'\*([^*]+)\*', r'\1', text)      # Italic with *
    text = re.sub(r'\_\_([^_]+)\_\_', r'\1', text)  # Bold with __
    text = re.sub(r'\_([^_]+)\_', r'\1', text)      # Italic with _

    # Remove Markdown headers, list items, and blockquote symbols
    text = re.sub(r'#+\s?', '', text)  # Headers
    text = re.sub(r'-\s?', '', text)   # List items
    text = re.sub(r'>\s?', '', text)   # Blockquotes

    return text

In [None]:
# Function Usage
plain_text = markdown_to_plain_text(markdown_text)
print(plain_text)  # Printing the converted plain text

### Final Text Cleaning

This cell is dedicated to further cleaning the plain text extracted from the PDF. In some cases, the text conversion process may leave behind unwanted artifacts or specific words that are not relevant or desirable for the final audio output. This step allows for the removal of such elements. 

The cell demonstrates how to remove a specific word or artifact from the text. In this example, any occurrences of the word "artifact" are being removed from the text. This approach can be adapted to target and remove any other specific words or symbols that might be present in the text after the initial conversion and formatting steps.


In [None]:
# Further cleaning of the plain text
# Here, we are removing a specific unwanted word or artifact from the text
# Replace "artifact" with any specific word or symbol you need to remove
cleaned_text = plain_text.replace("artifact", "")

# Printing the cleaned text to verify the changes
print(cleaned_text)


### Splitting Text into Manageable Chunks for Text-to-Speech

Given the character limit of 4096 for the OpenAI text-to-speech API, this cell contains a function named `split_text` designed to divide the cleaned text into smaller chunks. Each chunk adheres to the maximum character limit, ensuring compatibility with the API. The process is as follows:

1. The function splits the text into sentences.
2. It then iteratively adds sentences to a chunk until adding another sentence would exceed the character limit.
3. Once the limit is near, the current chunk is saved, and a new chunk starts with the next sentence.
4. This process continues until all sentences are allocated to chunks.

The function takes the whole text and an optional `max_chunk_size` parameter (defaulted to 4096 characters) as inputs. The output is a list of text chunks, each within the specified character limit. An example usage is provided, demonstrating how the text is split and then printing each chunk with its number for easy identification.


In [None]:
def split_text(text, max_chunk_size=4096):
    chunks = []  # List to hold the chunks of text
    current_chunk = ""  # String to build the current chunk

    # Split the text into sentences and iterate through them
    for sentence in text.split('.'):
        sentence = sentence.strip()  # Remove leading/trailing whitespaces
        if not sentence:
            continue  # Skip empty sentences

        # Check if adding the sentence would exceed the max chunk size
        if len(current_chunk) + len(sentence) + 1 <= max_chunk_size:
            current_chunk += sentence + "."  # Add sentence to current chunk
        else:
            chunks.append(current_chunk)  # Add the current chunk to the list
            current_chunk = sentence + "."  # Start a new chunk

    # Add the last chunk if it's not empty
    if current_chunk:
        chunks.append(current_chunk)

    return chunks

In [None]:
# Function Usage
# chunks = split_text(cleaned_text) # use this if you have cleaned the text, else use the next line.
chunks = split_text(plain_text)

# Printing each chunk with its number
for i, chunk in enumerate(chunks, 1):
    print(f"Chunk {i}:\n{chunk}\n---\n")


### Text-to-Speech Conversion

This cell contains the `text_to_speech` function, which utilizes OpenAI's text-to-speech API to convert text into audio. The function performs the following steps:

1. Initializes an OpenAI client to interact with the API.
2. Sends a request to the Audio API with the specified text, model, and voice parameters. The `model` parameter defines the quality of the text-to-speech conversion, while the `voice` parameter selects the voice type.
3. Receives the audio response from the API and streams it to a specified output file.

The function takes `input_text` (the text to be converted), `output_file` (the path to save the audio file), and optional parameters for `model` and `voice`. After executing, it saves the audio file to the specified path and prints a confirmation message.

This function is key in converting the previously segmented text chunks into individual audio files, ready for any further processing or direct usage.


In [None]:
# Importing necessary modules
from pathlib import Path
import openai

def text_to_speech(input_text, output_file, model="tts-1-hd", voice="nova"):
    # Initialize the OpenAI client
    client = openai.OpenAI()

    # Make a request to OpenAI's Audio API with the given text, model, and voice
    response = client.audio.speech.create(
        model=model,      # Model for text-to-speech quality
        voice=voice,      # Voice type
        input=input_text  # The text to be converted into speech
    )

    # Define the path for the output audio file
    speech_file_path = Path(output_file)

    # Stream the audio response to the specified file
    response.stream_to_file(speech_file_path)

    # Print confirmation message after saving the audio file
    print(f"Audio saved to {speech_file_path}")


### Converting Text Chunks to Audio Files

In this cell, we define the `convert_chunks_to_audio` function, which processes each text chunk through the `text_to_speech` function and saves the resulting audio files. The steps are as follows:

1. Iterate over the chunks of text.
2. For each chunk, create a filename for the output audio file, ensuring it is saved in the specified output folder.
3. Convert each text chunk to an audio file using the `text_to_speech` function defined earlier.
4. Store the path of each generated audio file in a list.

The function takes two arguments: `chunks`, which is a list of text segments, and `output_folder`, the directory where the audio files should be saved. After processing, it returns a list of file paths for the generated audio files. This is particularly useful for handling multiple audio files corresponding to different sections of the original PDF.

An example usage is shown at the end of the cell, where the function is called with the text chunks and a specified output folder, resulting in the creation of separate audio files for each text chunk.


In [None]:
# Importing necessary modules
import os
from pydub import AudioSegment

def convert_chunks_to_audio(chunks, output_folder):
    audio_files = []  # List to store the paths of generated audio files

    # Iterate over each chunk of text
    for i, chunk in enumerate(chunks):
        # Define the path for the output audio file
        output_file = os.path.join(output_folder, f"chunk_{i+1}.mp3")

        # Convert the text chunk to speech and save as an audio file
        text_to_speech(chunk, output_file)

        # Append the path of the created audio file to the list
        audio_files.append(output_file)

    return audio_files  # Return the list of audio file paths

In [None]:
# Function Usage
output_folder = "chunks"  # Define the folder to save audio chunks
audio_files = convert_chunks_to_audio(chunks, output_folder)  # Convert chunks to audio files
print(audio_files) # print list of all the audio files generated

### Combining Individual Audio Clips into a Single File

This cell contains the `combine_audio_with_moviepy` function, which combines multiple audio clips into a single audio file using the `moviepy` library. The function follows these steps:

1. Iterate through the files in the specified folder, filtering for `.mp3` files.
2. For each audio file, create an `AudioFileClip` object and add it to a list.
3. Once all audio clips are collected, use `concatenate_audioclips` to merge them into a single continuous audio clip.
4. Write the combined clip to an output file.

This function is particularly useful for creating a single audio file from multiple smaller files, which is a common requirement when the original text is split into chunks for text-to-speech conversion. If any errors occur during the processing of individual files, they are caught and printed.

An example usage is shown at the end of the cell, demonstrating how to combine the audio files from a specified folder into a single output file.


In [None]:
# Importing necessary modules from moviepy
from moviepy.editor import concatenate_audioclips, AudioFileClip
import os

def combine_audio_with_moviepy(folder_path, output_file):
    audio_clips = []  # List to store the audio clips

    # Iterate through each file in the given folder
    for file_name in sorted(os.listdir(folder_path)):
        if file_name.endswith('.mp3'):
            # Construct the full path of the audio file
            file_path = os.path.join(folder_path, file_name)
            print(f"Processing file: {file_path}")

            try:
                # Create an AudioFileClip object for each audio file
                clip = AudioFileClip(file_path)
                audio_clips.append(clip)  # Add the clip to the list
            except Exception as e:
                # Print any errors encountered while processing the file
                print(f"Error processing file {file_path}: {e}")

    # Check if there are any audio clips to combine
    if audio_clips:
        # Concatenate all the audio clips into a single clip
        final_clip = concatenate_audioclips(audio_clips)
        # Write the combined clip to the specified output file
        final_clip.write_audiofile(output_file)
        print(f"Combined audio saved to {output_file}")
    else:
        print("No audio clips to combine.")

In [None]:
# Function Usage
combine_audio_with_moviepy('chunks', 'combined_audio.mp3')  # Combine audio files in 'chunks' folder


### (Optional) Creating an MP4 File with Image and Audio

This cell introduces the `create_mp4_with_image_and_audio` function, which combines an image and an audio file to create an MP4 video. This can be particularly useful for presentations or other scenarios where an audio track needs to be accompanied by a static image. The function performs the following steps:

1. Load the audio file as an `AudioFileClip`.
2. Create a video clip from the specified image using `ImageClip`, setting its duration to match the length of the audio.
3. Set the frames per second (fps) for the video clip.
4. Assign the audio clip as the audio track of the video clip.
5. Write the final video clip to an output file, specifying the video and audio codecs.

The function takes paths for an image file, an audio file, and an output file, along with an optional fps parameter (defaulted to 24). An example usage is provided at the end of the cell, demonstrating how to generate an MP4 file using a given image and a combined audio file.


In [None]:
from moviepy.editor import AudioFileClip, ImageClip

def create_mp4_with_image_and_audio(image_file, audio_file, output_file, fps=24):
    # Load the audio file
    audio_clip = AudioFileClip(audio_file)

    # Create a video clip from an image
    video_clip = ImageClip(image_file, duration=audio_clip.duration)

    # Set the fps for the video clip
    video_clip = video_clip.set_fps(fps)

    # Set the audio of the video clip as the audio clip
    video_clip = video_clip.set_audio(audio_clip)

    # Write the result to a file
    video_clip.write_videofile(output_file, codec='libx264', audio_codec='aac')

# Example usage
image_file = 'cover_image.png'  # Replace with the path to your image
audio_file = 'combined_audio.mp3'      # The combined audio file
output_file = 'output_video.mp4'       # Output MP4 file
create_mp4_with_image_and_audio(image_file, audio_file, output_file)
