# Automated Document Summarization System

## ðŸ“š Learning Objectives
This notebook demonstrates:
- **Custom LLM Configuration**: Tailoring different LLMs for specific roles
- **Hierarchical Process**: Manager-led workflow coordination
- **File Operations**: Using CrewAI tools to read and process documents
- **Quality Control**: Multi-agent validation and approval workflows

## ðŸŽ¯ Business Case
Create an automated system that:
1. Reads documents from files
2. Generates concise, coherent summaries
3. Validates summary quality through managerial review
4. Ensures output meets professional standards

In [None]:
# ============================================
# IMPORTS AND SETUP
# ============================================
from crewai import Crew, Agent, Task, LLM, Process  # Core CrewAI components
from crewai_tools import FileReadTool  # Tool for reading files
import os

# Set up API key - Replace with your actual OpenAI API key
os.environ['OPENAI_API_KEY'] = "YOUR_OPENAI_API_KEY"

### Custom LLM Configuration

**Why Configure Different LLMs for Different Agents?**
- **Task-Specific Optimization**: Different tasks require different model behaviors
- **Cost Management**: Use appropriate models/settings based on complexity
- **Quality Control**: Fine-tune creativity vs. accuracy based on role

**Key Parameters:**
- `temperature`: Controls randomness (0 = deterministic, 1 = creative)
- `max_tokens`: Limits response length
- `model`: Selects the specific GPT model

In [None]:
# ============================================
# CUSTOM LLM CONFIGURATIONS
# ============================================
# Create specialized LLM configurations for different agent roles

# LLM FOR SUMMARIZER: Balanced creativity for summarization
summarizer_llm = LLM(
    model="gpt-4",        # High-quality model for nuanced understanding
    temperature=0.7,      # Moderate creativity (0.5-0.8 is good for summaries)
    max_tokens=500        # Limit to concise summaries (~500 words max)
)
# Use case: This agent needs to be creative enough to rephrase content
# while maintaining accuracy

# LLM FOR MANAGER: Factual and deterministic evaluation
manager_llm = LLM(
    model="gpt-4",        # Same model, different configuration
    temperature=0.3,      # Low temperature = more deterministic/factual
    max_tokens=200        # Shorter responses (approval/feedback only)
)
# Use case: Manager provides binary feedback (approve/reject) with brief notes
# Low temperature ensures consistent, objective evaluation

# KEY INSIGHT: Same model, different configurations = different behaviors!

### Tools Setup

**FileReadTool:**
- Reads text content from files
- Supports various formats (txt, pdf, docx, etc.)
- Automatically handles file path resolution
- Can be configured to read specific file types only

In [None]:
# ============================================
# TOOL INITIALIZATION
# ============================================

# Initialize the file reading tool
# This tool will be provided to the summarizer agent
file_read_tool = FileReadTool()
# The agent will use this to read documents specified in task inputs

# EXAMPLE USAGE (by agent):
# file_read_tool.run(file_path="docs/sample_text.txt")

### Agent Definitions

**Two-Agent Quality Control System:**
1. **Summarizer**: Performs the actual summarization work
2. **Manager**: Reviews and validates the summary quality

**Hierarchical Relationship:**
- The manager oversees the summarizer
- Can request revisions if quality is insufficient
- Ensures consistent output standards

In [None]:
# ============================================
# AGENT DEFINITIONS WITH CUSTOM LLMs
# ============================================

# AGENT 1: Text Summarizer
summarizer = Agent(
    role="Text Summarizer",
    
    # DYNAMIC GOAL: Uses {path_to_file} placeholder
    # This will be replaced at runtime with actual file path
    goal="Summarize the text in the file {path_to_file} into a concise and coherent summary.",
    
    backstory="A highly skilled summarizer specialized in condensing long texts.",
    
    # KEY FEATURES:
    tools=[file_read_tool],      # Can read files
    llm=summarizer_llm,          # Uses creative, balanced LLM configuration
    verbose=True,                # Show detailed thinking process
)

# AGENT 2: Quality Manager
manager = Agent(
    role="Content Quality Manager",
    goal="Ensure that the summary provided is realistic and meets quality standards.",
    backstory="An experienced manager skilled in evaluating text summaries.",
    
    # KEY FEATURES:
    llm=manager_llm,             # Uses deterministic, factual LLM configuration
    verbose=True,                # Show evaluation process
    # Note: No tools needed - manager only evaluates text output
)

# WORKFLOW: Summarizer creates â†’ Manager evaluates â†’ (Revision loop if needed)

### Task Definitions

**Task Design Pattern:**
- Tasks define the **what**, agents provide the **how**
- Clear expected outputs enable better validation
- Task names help with debugging and logging

**Quality Control Loop:**
The validate_task can trigger revisions back to the summarizer if standards aren't met.

In [None]:
# ============================================
# TASK DEFINITIONS
# ============================================

# TASK 1: Summarization Task
summarize_task = Task(
    name="Summarize Text Task",         # Descriptive name for logging
    agent=summarizer,                    # Assigned to the summarizer agent
    
    description="Read the given text file and generate a concise summary.",
    # The agent will:
    # 1. Use file_read_tool to read the file
    # 2. Analyze the content
    # 3. Create a summary using its LLM
    
    expected_output="A brief and coherent summary of the text."
    # Clear success criteria for validation
)

# TASK 2: Validation Task (Manager)
validate_task = Task(
    name="Validate Summary Task",       # Descriptive name for logging
    agent=manager,                       # Assigned to the manager agent
    
    description=(
        "Evaluate the summary for realism and coherence. "
        "If it lacks realism or clarity, request a revision from the Summarizer."
    ),
    # The manager will:
    # 1. Receive the summary from Task 1
    # 2. Evaluate quality against standards
    # 3. Either approve OR request revision
    
    expected_output="Approval or specific feedback for improvement."
    # Binary outcome: pass/fail with justification
)

# NOTE: In hierarchical process, the manager coordinates these tasks
# The manager can create a feedback loop for quality improvement

### Crew Assembly with Hierarchical Process

**Hierarchical vs Sequential Process:**

| Feature | Sequential | Hierarchical |
|---------|-----------|--------------|
| Task Order | Fixed order | Manager decides |
| Coordination | Agent-to-agent | Manager-led |
| Revisions | Not supported | Manager can request |
| Best For | Linear workflows | Quality control, complex logic |

**In this example:**
- Manager coordinates the summarizer
- Can enforce quality standards
- Can request revisions before final approval

In [None]:
# ============================================
# CREW WITH HIERARCHICAL MANAGEMENT
# ============================================

crew = Crew(
    # WORKER AGENTS: Only include agents that perform work
    # Do NOT include the manager in this list
    agents=[summarizer],
    
    # TASKS: All tasks that need to be completed
    # Note: validate_task is defined but not explicitly listed here
    # In hierarchical mode, the manager can create/assign tasks dynamically
    tasks=[summarize_task],
    
    # HIERARCHICAL PROCESS: Manager-led coordination
    # Key differences from Sequential:
    # - Manager plans the workflow
    # - Manager can request revisions
    # - Manager validates outputs
    process=Process.hierarchical,
    
    # MANAGER AGENT: The agent that coordinates the workflow
    # This agent oversees the summarizer and ensures quality
    manager_agent=manager
)

# EXECUTION FLOW:
# 1. Manager receives the overall goal
# 2. Manager assigns summarize_task to summarizer
# 3. Summarizer produces summary
# 4. Manager evaluates the summary (acting as quality control)
# 5. Manager either approves OR requests revision
# 6. If revision needed, loop back to step 2
# 7. Final approved summary is returned

### Execution

**Input System:**
- Use `inputs` parameter to pass dynamic values
- Keys in the dictionary match placeholders in task descriptions
- Example: `{path_to_file}` placeholder gets replaced with actual path

**Expected Flow:**
1. Crew receives file path through inputs
2. Summarizer reads the file and creates summary
3. Manager evaluates the summary quality
4. If approved, workflow completes
5. If not approved, manager requests revision and loop continues

In [None]:
# ============================================
# EXECUTION
# ============================================
# Run the document summarization workflow

# Execute the crew with input file path
# The {path_to_file} placeholder in the agent's goal will be replaced with this value
result = crew.kickoff(inputs={"path_to_file": "docs/sample_text.txt"})

# Note: Update the path to point to an actual file in your workspace
# Suggested locations:
# - "docs/sample_text.txt" (create a docs folder with a sample file)
# - Relative path from notebook location
# - Absolute path if needed

# Display the final approved summary
print("Final Output:")
print(result)

# WHAT TO OBSERVE:
# 1. Summarizer reading the file (using FileReadTool)
# 2. Summarizer generating the initial summary (creative LLM with temp=0.7)
# 3. Manager evaluating the summary (deterministic LLM with temp=0.3)
# 4. Manager's approval or feedback
# 5. Final output (approved summary)

# QUALITY INDICATORS:
# - Summary should be concise (< 500 tokens as configured)
# - Should maintain key points from original text
# - Should be coherent and professional
# - Manager's evaluation ensures these standards are met

[1m[95m# Agent:[00m [1m[92mContent Quality Manager[00m
[95m## Task:[00m [92mRead the given text file and generate a concise summary.[00m
