# **Book Summary Narrator**

This notebook demonstrates how to use **Sarvam's Text-to-Speech API** effectively. It focuses on building Book summary generator which can help you to get some inspirations to build **innovative and impactful applications**, such as:

- **Audiobook generators**
- **Accessible content for visually impaired users**
- **Automated narration systems for summaries and reports**

---

## **What This Notebook Covers**

This notebook explains how to use **Sarvam's Text-to-Speech API** with Python to create useful applications. It walks you through the entire process:

1. **Extracting Text from PDFs**  
   Using the `PyPDF2` library to get text from PDF files.

2. **Summarizing the Text**  
   Using AI to create a concise summary of the extracted text.

3. **Converting Text to Speech**  
   Using **Sarvam's API** to turn the summary into clear and natural-sounding audio.

---

## **Key Features**

- **Simple Code Examples**: Easy-to-follow Python code for each step.
- **End-to-End Process**: From text extraction to audio generation.
- **Practical Applications**: Build tools like audiobooks, automated narrators, or accessible content.

---

## **Why Use This Notebook?**

- **Learn API Integration**: Understand how to work with APIs like Sarvam's Text-to-Speech.
- **Hands-On Experience**: Get practical experience with text processing and audio generation.
- **Build Useful Tools**: Create applications that can help people, like audiobooks for visually impaired users.

---



# **Installation Commands**
### Run these commands to install the necessary libraries before executing the notebook.

In [1]:
!pip install PyPDF2
!pip install google-generativeai
!pip install requests
!pip install textwrap3
!pip install pathlib

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1
Collecting textwrap3
  Downloading textwrap3-0.9.2-py2.py3-none-any.whl.metadata (4.6 kB)
Downloading textwrap3-0.9.2-py2.py3-none-any.whl (12 kB)
Installing collected packages: textwrap3
Successfully installed textwrap3-0.9.2


# **Import Libraries**
### Import all the required libraries for the process.

In [2]:
import os
import PyPDF2
import google.generativeai as genai
import requests
import logging
from pathlib import Path
import json
from textwrap import wrap
import time
import base64


# **Configure Logging**
### Set up logging to track the execution of the script.

In [3]:
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('book_narrator.log'),
        logging.StreamHandler()
    ]
)

# **Set Up the API Endpoint and Payload**

### To use the Saaras API, you need an API subscription key. Follow these steps to set up your API key:

### **1. Obtain your API key**: If you don’t have an API key, sign up on the Sarvam AI Dashboard to get one.
### **2. Replace the placeholder key:** In the code below, replace "YOUR_SARVAM_AI_API_KEY" and "YOUR_GEMINI_API_KEY" with your actual API key.


In [4]:

SARVAM_API_KEY = "YOUR_SARVAM_AI_API_KEY"
GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
MAX_CHUNK_LENGTH = 500  # Maximum characters per chunk for Text-to-Speech (TTS)

# **Extract Text from PDF**
### Function to extract text from a PDF file.

In [5]:
def extract_text_from_pdf(pdf_path):
    """Extract text from PDF file."""
    text = ""
    try:
        with open(pdf_path, 'rb') as file:
            pdf_reader = PyPDF2.PdfReader(file)
            for page in pdf_reader.pages:
                text += page.extract_text()
        logging.info(f"Successfully extracted text from {pdf_path}")
        return text
    except Exception as e:
        logging.error(f"Error extracting text from PDF: {str(e)}")
        raise

# **Generate Summary Using** *Gemini API*
### Function to generate a concise summary of text using the Gemini API.



In [6]:
def generate_summary(text):
    """Generate summary using Google's Gemini API."""
    try:
        genai.configure(api_key=GEMINI_API_KEY)
        model = genai.GenerativeModel('gemini-pro')
        prompt = f"""Please provide a concise summary of the following text. Focus on the main ideas \
        and key points, keeping the summary clear and engaging: {text}"""
        response = model.generate_content(prompt)
        logging.info("Successfully generated summary using Gemini API")
        return response.text
    except Exception as e:
        logging.error(f"Error generating summary: {str(e)}")
        raise


### Split Text into Chunks
# Function to split text into manageable chunks for TTS processing.









In [7]:
def split_text_into_chunks(text, max_length=MAX_CHUNK_LENGTH):
    """Split text into chunks of maximum length, trying to break at sentence endings."""
    sentences = text.replace('\n', ' ').split('. ')
    chunks = []
    current_chunk = ''

    for sentence in sentences:
        if sentence != sentences[-1]:
            sentence += '.'

        if len(current_chunk) + len(sentence) + 1 > max_length:
            if current_chunk:
                chunks.append(current_chunk.strip())
            current_chunk = sentence + ' '
        else:
            current_chunk += sentence + ' '

    if current_chunk:
        chunks.append(current_chunk.strip())

    logging.info(f"Split text into {len(chunks)} chunks")
    return chunks

### Convert Text to Speech Using Sarvam API
# Function to convert text to speech using Sarvam API.

In [8]:
def text_to_speech(text, output_path, language_code="en-IN"):
    """Convert text to speech using Sarvam AI API."""
    url = "https://api.sarvam.ai/text-to-speech"

    payload = {
        "inputs": [text],
        "target_language_code": language_code,
        "speaker": "amartya",  # Male voice
        "pitch": 0,
        "pace": 1.0,
        "loudness": 1.2,
        "speech_sample_rate": 22050,
        "enable_preprocessing": True,
        "model": "bulbul:v1"
    }

    headers = {
        "Accept": "application/json",
        "Content-Type": "application/json",
        "api-subscription-key": SARVAM_API_KEY
    }

    try:
        logging.info(f"Sending request to Sarvam API for chunk of length {len(text)}")
        response = requests.post(url, json=payload, headers=headers)

        logging.info(f"Sarvam API Response Status Code: {response.status_code}")

        response.raise_for_status()

        audio_data = response.json()

        if "audios" in audio_data:
            base64_audio = audio_data["audios"][0]
            binary_audio = base64.b64decode(base64_audio)

            with open(output_path, 'wb') as f:
                f.write(binary_audio)
            logging.info(f"Successfully saved audio file to {output_path}")
            return True
        else:
            logging.error("No audio data found in response.")
            return False

    except Exception as e:
        logging.error(f"Error converting text to speech: {str(e)}")
        raise

### Process Book Workflow
# Main function to extract text, generate a summary, and create an audio narration.

In [9]:
def process_book(pdf_path, output_dir="output"):
    """Process book: Extract text, generate summary, and create audio."""
    try:
        Path(output_dir).mkdir(parents=True, exist_ok=True)
        logging.info(f"Processing book from {pdf_path}")

        logging.info("Starting text extraction...")
        book_text = extract_text_from_pdf(pdf_path)

        text_path = os.path.join(output_dir, "extracted_text.txt")
        with open(text_path, 'w', encoding='utf-8') as f:
            f.write(book_text)
        logging.info(f"Text extracted and saved to {text_path}")

        logging.info("Starting summary generation...")
        summary = generate_summary(book_text)

        summary_path = os.path.join(output_dir, "summary.txt")
        with open(summary_path, 'w', encoding='utf-8') as f:
            f.write(summary)
        logging.info(f"Summary generated and saved to {summary_path}")

        chunks = split_text_into_chunks(summary)
        audio_files = []

        for i, chunk in enumerate(chunks, 1):
            logging.info(f"Processing chunk {i} of {len(chunks)}")
            audio_path = os.path.join(output_dir, f"summary_narration_part_{i}.wav")
            text_to_speech(chunk, audio_path)
            audio_files.append(audio_path)
            if i < len(chunks):
                time.sleep(1)

        return {
            "text_path": text_path,
            "summary_path": summary_path,
            "audio_files": audio_files
        }

    except Exception as e:
        logging.error(f"Error processing book: {str(e)}")
        raise

### Execute the Workflow
# Provide the path to the PDF file you want to process.




In [11]:
if __name__ == "__main__":
    pdf_path = "cheese.pdf"  # Replace with the actual file path
    logging.info(f"Starting processing of PDF: {pdf_path}")
    results = process_book(pdf_path)
    logging.info("\nProcessing completed successfully!")
    logging.info(f"Text file: {results['text_path']}")
    logging.info(f"Summary file: {results['summary_path']}")
    logging.info("Audio files:")
    for audio_file in results['audio_files']:
        logging.info(f"- {audio_file}")



### **Additional Resources**

For more details, refer to the official **Sarvam AI API documentation** and join the community for support:

- **Documentation**: [docs.sarvam.ai](https://docs.sarvam.ai/)
- **Community**: [Join the Discord Community](https://discord.gg/hTuVuPNF)

### **Notes:**

**File Format:** Ensure the file is in .wav format and has a sample rate of 16kHz.

**API Key:** Double-check that the SARVAM_API_KEY is correctly set.

**Error Handling:** If transcription fails, the error message and response content will be displayed for debugging.

**Keep Building!** 🚀
