# PDF to Blog Post Generator using Multi Agent AI

This project automates the creation of blog posts from PDF documents using the OpenAI GPT-4 API. The system consists of multiple agents working together to extract text from a PDF, summarize the content, generate keywords, create an SEO-friendly title, and format the final blog post.

## Workflow

1. **PDF Extraction**: Extracts text from an uploaded PDF document.
2. **Summarisation**: Breaks down the text into smaller chunks and generates summaries using GPT-4.
3. **Keyword Extraction**: Extracts keywords and key topics from the text.
4. **Title Generation**: Creates an engaging and SEO-optimized title for the blog post.
5. **Blog Formatting**: Combines the title, summary, and keywords into a well-structured blog post.

## Technology Stack

- **OpenAI GPT-4**: Used for generating summaries, extracting keywords, and creating titles.
- **PyPDF2**: Python library used to extract text from PDF files.
- **CrewAI**: Coordinates the multi-agent system, allowing agents to work together efficiently.


In [None]:
from dotenv import load_dotenv
import os

# Load the environment variables from .env file
load_dotenv()

# Retrieve the API key from environment variables
api_key = os.getenv("OPENAI_API_KEY")

## 1. PDF Extraction Agent

This agent extracts the text from the PDF document. 

In [None]:
import PyPDF2

class PDFExtractionAgent:
    def extract_text(self, pdf_path):
        with open(pdf_path, 'rb') as file:
            reader = PyPDF2.PdfReader(file)
            text = ""
            for page in reader.pages:
                text += page.extract_text()
        return text


## 2. Summarisation Agent

This agent splits the extracted text into smaller chunks and generates a summary for each chuck using GPT-4.

In [None]:
import openai

# Set up OpenAI API key
openai.api_key = api_key

TOKEN_LIMIT = 2000  # Set token limit to avoid rate limits

def split_text(text, max_tokens):
    sentences = text.split('.')
    chunks = []
    current_chunk = ''
    
    for sentence in sentences:
        if len(current_chunk.split()) + len(sentence.split()) <= max_tokens:
            current_chunk += sentence + '.'
        else:
            chunks.append(current_chunk)
            current_chunk = sentence + '.'
    
    if current_chunk:
        chunks.append(current_chunk)
    
    return chunks

class SummarizationAgent:
    def generate_summary(self, text, chunk_size=TOKEN_LIMIT):
        text_chunks = split_text(text, chunk_size)
        summary_chunks = []

        for chunk in text_chunks:
            prompt = f"Summarize the following text for a blog post:\n\n{chunk}"
            response = openai.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.7,
                max_tokens=2000,
            )
            summary = response.choices[0].message.content
            summary_chunks.append(summary)
        
        final_summary = ' '.join(summary_chunks)
        
        return final_summary


## 3. Keyword Agent

This agent extracts keywords from the text using the GPT-4

In [None]:
class KeywordAgent:
    def extract_keywords(self, text, chunk_size=TOKEN_LIMIT):
        text_chunks = split_text(text, chunk_size)
        keyword_chunks = []

        for chunk in text_chunks:
            prompt = f"Extract keywords from the following text:\n\n{chunk}"
            response = openai.chat.completions.create(
                model="gpt-4",
                messages=[
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": prompt}
                ],
                temperature=0.5,
                max_tokens=200,
            )
            keywords = response.choices[0].message.content
            keyword_chunks.append(keywords)
        
        final_keywords = ', '.join(keyword_chunks)
        return final_keywords


## 4. Title Agent

This agent generates title for blog post.

In [None]:
class TitleAgent:
    def generate_title(self, text, chunk_size=TOKEN_LIMIT):
        # Use only the first chunk for title generation
        text_chunk = split_text(text, chunk_size)[0]
        prompt = f"Generate an engaging title for a blog post based on this content:\n\n{text_chunk}"
        response = openai.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.7,
            max_tokens=60,
        )
        title = response.choices[0].message.content
        return title


## 5. Blog Formatting Agent

This agent combines the tile, summary, and keywords into a fromatted blog post.

In [None]:
class FormattingAgent:
    def format_blog_post(self, title, summary, keywords):
        blog_post = f"# {title}\n\n{summary}\n\n**Keywords**: {keywords}\n"
        return blog_post


## Coordinator

The coordinator class manages all the agents, orchestrating the workflow from PDF extraction to blog post generation. 

In [None]:
class BlogPostGeneratorCoordinator:
    def __init__(self):
        self.pdf_agent = PDFExtractionAgent()
        self.summary_agent = SummarizationAgent()
        self.keyword_agent = KeywordAgent()
        self.title_agent = TitleAgent()
        self.formatting_agent = FormattingAgent()

    def generate_blog_post_from_pdf(self, pdf_path):
        # Agent 1: Extract text from PDF
        pdf_text = self.pdf_agent.extract_text(pdf_path)

        # Agent 2: Generate a summary for the blog post
        summary = self.summary_agent.generate_summary(pdf_text)

        # Agent 3: Extract keywords
        keywords = self.keyword_agent.extract_keywords(pdf_text)

        # Agent 4: Generate a title for the blog post
        title = self.title_agent.generate_title(pdf_text)

        # Agent 5: Format the blog post
        blog_post = self.formatting_agent.format_blog_post(title, summary, keywords)

        return blog_post


In [None]:
from docx import Document

if __name__ == "__main__":
    pdf_path = "C:/Users/DanielGodden/Downloads/Generative-AI-and-LLMs-for-Dummies.pdf"
    coordinator = BlogPostGeneratorCoordinator()
    blog_post = coordinator.generate_blog_post_from_pdf(pdf_path)

    # Save blog post as text file (excluding keywords)
    blog_post_without_keywords = blog_post.split("**Keywords**")[0]  # Exclude keywords section
    with open("generated_blog_post.txt", "w") as text_file:
        text_file.write(blog_post_without_keywords)
        print(blog_post_without_keywords)

    doc = Document()

    lines = blog_post_without_keywords.split('\n')
    title = lines[0].strip("# ").strip()

    doc.add_heading(title, 0)
    doc.add_paragraph(blog_post_without_keywords)
    doc.save("blog_post.docx")

