# Create an Audiobook from a PDF
## This task tests your ability to apply Text to Speech conversion and Extraction of Text from PDF files in the creation of an audiobook from a PDF file

### Steps
- Extract text from PDF file
- Clean the text
- Convert the text into speech
- Save the speech
- Play the speech

## 1. Extract text from PDF
- Use PyPDF2

### Install the library

In [1]:
# # ! pip install PyPDF2
# !pip install pymupdf

### Import the library

In [2]:
import PyPDF2

### Extract the text

In [3]:
def extract_text_from_pdf(pdf_path):
    text = ""
    with open(pdf_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        num_pages = len(pdf_reader.pages)

        for page_num in range(num_pages):
            page = pdf_reader.pages[page_num]
            text += page.extract_text()

    return text

### Print the extracted text

In [4]:
# Replace 'kopper-justiceandglobalorder2018draft.pdf' with the actual path to your PDF file
pdf_path = 'kopper-justiceandglobalorder2018draft.pdf'

# Call the function to extract text
extracted_text = extract_text_from_pdf(pdf_path)

# Print the extracted text
print("Extracted Text:")
print(extracted_text)

Extracted Text:
 
1 
 
PIRATES,  JUSTICE AND GLOBAL ORDER  
IN THE ANIME ‘ONE PIECE  ‘ 
 
"Remove justice, and what are kingdoms 
but gangs of criminals on a large scale? 
(St. Augustine City of God, Book 4, Ch. 4).  
 
DRAFT, work in progress …do not quote without permission   
 
I. INTRODUCTION  
 
The manga /anime ( Japanese cartoon ) titled  ONE PIECE telling  the story of the pirate 
Monkey D. Luffy and his crew is one of the most successful Japanese cultural products of 
all time . The first volume of One Piece was published in 1997 and it has been published 
weekly ever since. It has  sold over 43 0.000.000 million copies wor ldwide  (70.000.000 
outside Japan) and it has set already years ago the world record  for "The most copies 
published for the same comic book series , by a single author."1 Although One Piece is the 
most successful Japanese manga series there are many others -like Dragon  Ball or Full 
Metal Alchemist2- with fans around the World, making Japanese manga no

In [5]:
import fitz  # PyMuPDF

def extract_specific_text_from_pdf(pdf_path, start_marker, end_marker):
    text = ""
    with open(pdf_path, 'rb') as file:
        pdf_document = fitz.open(file)
        num_pages = pdf_document.page_count

        for page_num in range(num_pages):
            page = pdf_document.load_page(page_num)
            text += page.get_text()

    # Find the start and end indices of the desired portion
    start_index = text.find(start_marker)
    end_index = text.find(end_marker, start_index) + len(end_marker)

    # Extract the specific portion
    specific_text = text[start_index:end_index]

    return specific_text

# Replace 'kopper-justiceandglobalorder2018draft.pdf' with the actual filename
pdf_path = 'kopper-justiceandglobalorder2018draft.pdf'

# Define the markers for the desired portion
start_marker = "PIRATES, JUSTICE AND GLOBAL ORDER"
end_marker = "(St. Augustine City of God, Book 4, Ch. 4)."

# Call the function to extract the specific portion
specific_text = extract_specific_text_from_pdf(pdf_path, start_marker, end_marker)

# Print the extracted specific portion
print("Extracted Specific Text:")
print(specific_text)

Extracted Specific Text:
PIRATES, JUSTICE AND GLOBAL ORDER  
IN THE ANIME ‘ONE PIECE ‘ 
 
"Remove justice, and what are kingdoms 
but gangs of criminals on a large scale? 
(St. Augustine City of God, Book 4, Ch. 4).


## 2. Convert the Text into Speech
- Use **pyttsx3** OR **gTTS**

### Install the library

In [6]:
# pip install PyMuPDF gtts pygame

In [7]:
# !pip install pygame

In [8]:
# !pip install comtypes

### Import the library

In [9]:
from gtts import gTTS
import os
import pyttsx3

### Initialize a Speaker object

In [10]:
# Create a gTTS object
speech = gTTS(text=specific_text, lang='en', slow=False)

### Convert the text

### Save the audio

In [11]:
# Save the speech as a WAV file
speech.save('output.wav')

In [12]:
import IPython
IPython.display.display(IPython.display.Audio('output.wav'))