## Code to Chapter 10 of LangChain for Life Science and Healthcare book, by Dr. Ivan Reznikov

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/14v5RQcUCUfJYI4zqQshfW8eXZhQXSyYU?usp=sharing)

## CrewAI Tutorial - Multi-Agent System for Scientific Document Analysis

This notebook demonstrates how to build a multi-agent system using CrewAI to analyze scientific documents. The system consists of three specialized agents that collaborate to extract data from a PDF, analyze it, and generate comprehensive reports.

**Key Components:**
- **Scientist Agent**: Extracts and analyzes data from scientific papers
- **Engineer Agent**: Creates visualizations and handles code execution
- **Writer Agent**: Generates HTML reports combining insights and visualizations

## 1. Environment Setup and Package Installation

First, we'll install the required packages for our multi-agent system:

In [1]:
!pip install -q crewai[tools] langchain_google_genai PyPDF2
!pip install -q -U duckduckgo-search

**What these packages do:**
- `crewai[tools]`: Core framework for multi-agent collaboration with built-in tools
- `langchain_google_genai`: Integration with Google's Gemini AI models
- `PyPDF2`: PDF processing library for extracting text from documents
- `duckduckgo-search`: Web search capabilities for agents

In [2]:
!pip freeze | grep "crew\|lang"

crewai==0.150.0
crewai-tools==0.58.0
google-ai-generativelanguage==0.6.18
google-cloud-language==2.17.2
langchain==0.3.26
langchain-cohere==0.3.5
langchain-community==0.3.27
langchain-core==0.3.71
langchain-experimental==0.3.4
langchain-google-genai==2.1.8
langchain-openai==0.2.14
langchain-text-splitters==0.3.8
langcodes==3.5.0
langsmith==0.3.45
language_data==1.3.0
libclang==18.1.1


## 2. Import Required Libraries and Setup Directories

In [3]:
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from crewai import Agent, Task, Crew, Process

**Directory Structure:**
- `./data/`: Stores downloaded PDF documents
- `./reports/`: Contains generated HTML reports and outputs

In [4]:
os.makedirs("./data", exist_ok=True)
os.makedirs("./reports", exist_ok=True)

## 3. Download Sample Scientific Article

We'll download a sample scientific article to demonstrate the system's capabilities.

**Important Notes:**
- The headers prevent the request from being blocked by the server
- This downloads a medical research paper that will be analyzed by our agents
- The PDF is saved locally for processing

In [5]:
import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
    "Referer": "https://github.com/IvanReznikov/LangChain4LifeScience",
}

response = requests.get(
    "https://raw.githubusercontent.com/IvanReznikov/LangChain4LifeScience/refs/heads/main/data/articles/0021-9681(87)90171-8.pdf",
    headers=headers,
)

pdf_path = "./data/article.pdf"
with open(pdf_path, "wb") as f:
    f.write(response.content)

## 4. Configure AI Models and API Keys

**Model Configuration:**
- **Temperature 0.1**: Ensures consistent, factual responses (important for scientific analysis)
- **Gemini-1.5-pro**: Google's advanced model with good reasoning capabilities
- **GPT-4o-mini**: Cost-effective alternative with strong performance

In [6]:
from google.colab import userdata
import os

os.environ["GEMINI_API_KEY"] = userdata.get("GEMINI_API_KEY")
os.environ["OPENAI_API_KEY"] = userdata.get("LC4LS_OPENAI_API_KEY")

In [7]:
# Set gemini pro as llm
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    verbose=True,
    temperature=0.1,
    google_api_key=os.environ["GEMINI_API_KEY"],
)

# or

llm = "gpt-4o-mini"

## 5. Create Custom PDF Reading Tool

CrewAI agents need tools to interact with files. We'll create a custom PDF reader:


In [8]:
from PyPDF2 import PdfReader
from crewai.tools import tool


def read_pdf(pdf_path):
    reader = PdfReader(pdf_path)
    return [page.extract_text() for page in reader.pages]


# to check if this tool is ran after default.
# for proper running - change description to "PDF Reader"
@tool("PDF reader")
def read_pdf_tool(pdf_path: str) -> list:
    """Backup file reader"""
    return read_pdf(pdf_path)

**Tool Design:**
- The `@tool` decorator makes the function available to agents
- Returns a list of strings, one for each page
- Essential for the scientist agent to analyze the document

## 6. Import Additional Tools

**Tool Purposes:**
- **DirectoryReadTool**: Allows agents to browse and read multiple files
- **FileReadTool**: Enables reading specific files
- **CodeInterpreterTool**: Executes Python code for data analysis and visualization
- **unsafe_mode=True**: Allows broader code execution (use with caution)

In [9]:
from crewai import Agent
from crewai_tools import (
    FileReadTool,
    FileWriterTool,
    DirectoryReadTool,
    WebsiteSearchTool,
    CodeInterpreterTool,
)

# from langchain.tools import DuckDuckGoSearchRun
# search_tool = DuckDuckGoSearchRun()

docs_tool = DirectoryReadTool(directory="./reports")
file_tool = FileReadTool()
file_writer_tool = FileWriterTool()
code_interpreter = CodeInterpreterTool(unsafe_mode=True)

/usr/local/lib/python3.11/dist-packages/pydantic/fields.py:1093: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'required'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  warn(


## 7. Define Specialized Agents

**Agent Design Principles:**
- **Specialized Roles**: Each agent has a specific expertise area
- **Tool Access**: Agents only get tools relevant to their role
- **Delegation Strategy**: Only the engineer can delegate (orchestration role)
- **Verbose Mode**: Provides detailed output for debugging and understanding


### 7.1 Engineer Agent - Code Development Specialist

In [10]:
engineer = Agent(
    role="Senior Python Developer",
    goal="Craft well-designed and thought-out code",
    backstory="""You are a senior Python developer with extensive
      experience in software architecture and best practices.""",
    tools=[code_interpreter],
    allow_delegation=True,
    llm=llm,
)

### 7.2 Writer Agent - Report Generation Specialist

In [11]:
writer = Agent(
    role="HTML Creator",
    goal="Craft HTML reports",
    backstory="A skilled HTML creator.",
    tools=[docs_tool, file_tool, file_writer_tool],
    verbose=True,
    allow_delegation=False,
    llm=llm,
)

### 7.3 Scientist Agent - Research and Analysis Specialist

In [12]:
scientist = Agent(
    role="Scientist",
    goal="""To collaborate with your colleages to gather their views
        and opitions and get a final answer to complex questions""",
    verbose=True,
    allow_delegation=False,
    llm=llm,
    tools=[file_tool, read_pdf_tool],
    backstory="""You are an experimented scientist.
        You deeply read through given materials.
        You collaborate with expert colleages to get their views.
        You synthesize information in a simple and understandable way.
        """,
)

## 8. Define Collaborative Tasks

### 8.1 Task 1: Data Extraction and Analysis

**Task Objectives:**
- Extract specific table data from the PDF
- Analyze the data for meaningful insights
- Compare findings with author's stated opinions
- Demonstrates the scientist's analytical capabilities

In [13]:
# Create tasks for your agents
task1 = Task(
    description="""
  Get the data and description of Table 6 of the document (./data/article.pdf).
  Provide possible conclusions that can be made from the table.
  How does this correlate with the authors opinion?""",
    agent=scientist,
    expected_output="Table data, your conclusions regarding the data, authors opinion",
)

### 8.2 Task 2: Visualization and Report Generation

**Task Objectives:**
- Create data visualizations using Python
- Generate comprehensive HTML report
- Combine multiple elements (table, plot, insights)
- Demonstrates inter-agent collaboration

In [14]:
task2 = Task(
    description="""Using the insights provided, create an area plot in Python
  (call engineer if required) covering the table data, and create an html report
  including the table itself, generated plot and the insights provided.
  Save the report as crew_report.html in reports folder""",
    agent=writer,
    expected_output="Saved HTML report as ./reports/crew_report.html",
)

## 9. Assemble the Multi-Agent Crew

**Crew Configuration:**
- **Sequential Execution**: Tasks run in order (task1 → task2)
- **Agent Collaboration**: Agents can communicate and share results
- **Verbose Logging**: Shows detailed decision-making process
- **Task Dependencies**: Task 2 uses results from Task 1

In [15]:
crew = Crew(agents=[engineer, scientist, writer], tasks=[task1, task2], verbose=True)

In [16]:
crew

Crew(id=607ca298-3d5f-4fbb-a0b9-950ace36b1db, process=Process.sequential, number_of_agents=3, number_of_tasks=2)

## 10. Execute the Multi-Agent Workflow - let the magic happen

**Execution Process:**
1. **Scientist Agent** reads the PDF and extracts Table 6 data
2. **Scientist Agent** analyzes the data and provides insights
3. **Writer Agent** requests visualization from **Engineer Agent**
4. **Engineer Agent** creates Python code for area plot
5. **Writer Agent** combines all elements into HTML report
6. **Final Output**: Comprehensive report saved as `./reports/crew_report.html`


## Expected Results

The system will produce:

1. **Data Extraction**: Table 6 data from the scientific paper
2. **Statistical Analysis**: Conclusions drawn from the data
3. **Visual Representation**: Area plot showing survival data trends
4. **Comprehensive Report**: HTML document combining:
   - Original table data
   - Generated visualization
   - Scientific insights and analysis
   - Author's perspective correlation

In [17]:
# Performance of FoldMark user identification accuracy

# Table 6. Ten-year actual and predicted survival according to age-comorbidity in the testing population

In [18]:
result = crew.kickoff()

Output()

Output()

Output()

Output()

Output()

In [19]:
print(result)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Crew Report</title>
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
</head>
<body>
    <h1>Crew Report</h1>
    <h2>Table 6 Data</h2>
    <table border="1">
        <tr>
            <th>Comorbidity-age combined risk score</th>
            <th>Number of patients</th>
            <th>Actual 10-yr survival (%)</th>
            <th>Predicted 10-yr survival (%)</th>
        </tr>
        <tr><td>0</td><td>213</td><td>99</td><td>99</td></tr>
        <tr><td>1</td><td>156</td><td>97</td><td>96</td></tr>
        <tr><td>2</td><td>136</td><td>87</td><td>90</td></tr>
        <tr><td>3</td><td>109</td><td>79</td><td>77</td></tr>
        <tr><td>4</td><td>42</td><td>47</td><td>53</td></tr>
        <tr><td>5</td><td>29</td><td>34</td><td>21</td></tr>
    </table>
    <h2>Area Plot</h2>
    <canvas id="survivalChart" width="400" he

## Key Learning Points

### Multi-Agent Benefits:
- **Specialization**: Each agent focuses on their strength
- **Collaboration**: Agents share information and delegate tasks
- **Quality Control**: Multiple perspectives improve output quality
- **Scalability**: Easy to add new agents with different specialties

### CrewAI Features:
- **Tool Integration**: Seamless integration of external tools
- **Task Management**: Clear task definition and dependency handling
- **Agent Communication**: Built-in collaboration mechanisms
- **Flexible LLM Support**: Works with multiple AI models

### Practical Applications:
- **Document Analysis**: Automated extraction and analysis of scientific papers
- **Report Generation**: Standardized reporting workflows
- **Data Visualization**: Automated chart and graph creation
- **Quality Assurance**: Multi-agent validation of results

---

## Troubleshooting Tips

1. **API Key Issues**: Ensure all API keys are properly set in environment variables
2. **PDF Reading Problems**: Check PDF format compatibility with PyPDF2
3. **Tool Errors**: Verify tool permissions and file paths
4. **Memory Issues**: Monitor token usage with large documents
5. **Agent Conflicts**: Review task descriptions for clarity and avoid overlapping responsibilities

This multi-agent system demonstrates the power of collaborative AI for complex document analysis tasks, combining the strengths of specialized agents to produce comprehensive, high-quality outputs.