# Create a PubMed Research Agent with Strands Agents

In this notebook, you'll create a research agent using Strands that can query the PubMed Central (PMC) journal database for information about scientific discoveries.

## Prerequisites

- Python 3.10 or later
- AWS account configured with appropriate permissions
- Access to the Anthropic Claude 3.7 Sonnet model in Amazon Bedrock
- Basic understanding of Python programming

In [1]:
%pip install -U boto3 strands-agents strands-agents-tools


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import logging

# Enables Strands debug log level
logging.getLogger("strands").setLevel(logging.WARNING)

# Sets the logging format and streams logs to stderr
logging.basicConfig(
    format="%(levelname)s | %(name)s | %(message)s", handlers=[logging.StreamHandler()]
)

In [3]:
MODEL_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"

In [4]:
QUERY = "What are some recent advances in GLP-1 drugs?"

## Basic Prompt without Context

To begin, we'll create a basic agent and see how well it can answer a scientific question without any additional context.

In [5]:
from strands import Agent

SYSTEM_PROMPT = """
    You are a specialized PubMed research agent. Your role is to:
    1. Search PubMed Central for medical papers related to the query
    2. Extract and summarize the most relevant clinical findings
    3. Identify key research groups and methodologies
    4. Return structured, well-cited information with PMCID references
    """

# Initialize your agent
agent = Agent(system_prompt=SYSTEM_PROMPT, model=MODEL_ID)

# Send a message to the agent
response = agent(QUERY)

# Recent Advances in GLP-1 Drugs

## Key Clinical Findings

### Improved Formulations and Delivery
- **Extended-release formulations** have significantly improved patient adherence by reducing injection frequency from daily to weekly administration (PMCID: PMC8643344)
- **Oral semaglutide (Rybelsus)** represents a breakthrough as the first oral GLP-1 receptor agonist, improving patient convenience while maintaining efficacy comparable to injectable forms (PMCID: PMC7195951)
- **Dual and triple receptor agonists** (targeting GLP-1/GIP/glucagon) show enhanced efficacy compared to GLP-1 monotherapy, with tirzepatide (GLP-1/GIP) demonstrating superior weight loss and glycemic control (PMCID: PMC8522637)

### Expanded Therapeutic Applications
- **Cardiovascular benefits** extend beyond glycemic control, with semaglutide and dulaglutide demonstrating significant reductions in major adverse cardiovascular events in large clinical trials (PMCID: PMC7299082)
- **Renoprotective effects** showing

Copy and paste a few of the PMCIDs in the previous cell into the [PMC web search](https://pmc.ncbi.nlm.nih.gov/). Notice anything unusual? They likely point to completely unrelated resources! Without additional context, LLMs will do their best to generate IDs that seem convincing - they may even return real IDs included in their training data. However, if we want our agent to consistently return accurate, up-to-date results we need to provide it with a tool.

## Search PMC for Scientific Abstracts

Let's see if we can improve the performance of our agent by giving it a tool. To start, we've created a custom tool called `search_pmc_tool` that uses the PMC API to identify relevant scientific article abstracts. This tool has some special features to help the agent focus on the most relevant articles:

- It limits the search to only articles licensed for commercial use
- For each article in the search results, the tool calculates how many OTHER articles include it as a reference. These are likely to be the most impactful and valuable to the agent

You can look at the `search_pmc_tool` code at `tools/search_pmc.py`.

In [6]:
from strands import Agent
from tools.search_pmc import search_pmc_tool

SYSTEM_PROMPT = """You are a life science research assistant. When given a scientific question, follow this process:

1. Use search_pmc_tool with rerank="referenced_by", max_results to 200-500, and max_records to 20-50 to find highly-cited papers. Search broadly first, then narrow down. Use temporal filters like "last 5 years"[dp] for recent work. 
2. Extract and summarize the most relevant clinical findings.
3. Return structured, well-cited information with PMC ID references.
4. Return URL links associated with PMCID references

Key guidelines:
- Always use rerank="referenced_by" in searches to prioritize influential papers.
- Limit searches to 20-50 articles for focused analysis.
- Select articles strategically based on citation count and relevance.
"""

# Initialize your agent
agent = Agent(system_prompt=SYSTEM_PROMPT, tools=[search_pmc_tool], model=MODEL_ID)

# Send a message to the agent
response = agent(QUERY)

I'll search for recent advances in GLP-1 drugs in the scientific literature. Let me do that for you.
Tool #1: search_pmc_tool
Let me refine my search to get better results about recent GLP-1 advances.
Tool #2: search_pmc_tool
Let me try a broader search term to find information about recent GLP-1 drug developments:
Tool #3: search_pmc_tool
I'll try one more search with broader terms:
Tool #4: search_pmc_tool
It seems we're having difficulty with the date filter. Let me try a search without the date filter:
Tool #5: search_pmc_tool
# Recent Advances in GLP-1 Drugs: A Comprehensive Review

Based on the latest scientific literature, here are the key recent advances in Glucagon-Like Peptide-1 (GLP-1) drugs:

## 1. Evolution Beyond Diabetes Treatment

GLP-1 receptor agonists (GLP-1RAs) have expanded far beyond their original use as diabetes treatments. Recent research shows they offer benefits for:

- **Obesity management**: GLP-1RAs have demonstrated significant efficacy for weight loss, l

The additional information makes the agent response much more detailed. Try [searching](https://pmc.ncbi.nlm.nih.gov/) for the PMCIDs again. This time they should link to the correct articles.

## Retrieve Full Text

Giving our agent the ability to search for PubMed abstracts made a big difference in its response. We can improve the results even further by giving it access full text documents as well. PubMed Central maintains an [online repository of full-text articles](https://pmc.ncbi.nlm.nih.gov/tools/pmcaws/) in Amazon S3 as part of the [AWS Open Data Sponsorship Program](https://aws.amazon.com/opendata/open-data-sponsorship-program/). This is a powerful source of information for scientific research.

Let's give our agent access to another tool named `read_pmc_tool` to download and process full-text articles and see how it affects the results.

In [7]:
from strands import Agent
from tools.search_pmc import search_pmc_tool
from tools.read_pmc import read_pmc_tool

SYSTEM_PROMPT = """You are a life science research assistant. When given a scientific question, follow this process:

1. Use search_pmc_tool with rerank="referenced_by", max_results to 200-500, and max_records to 20-50 to find highly-cited papers. Search broadly first, then narrow down. Use temporal filters like "last 5 years"[dp] for recent work. 
2. Use read_pmc_tool on the 1-2 most relevant articles from your search results to gain a better understanding of the space. Focus on highly-cited papers and reviews.
3. Extract and summarize the most relevant clinical findings.
4. Return structured, well-cited information with PMCID references.
5. Return URL links associated with PMCID references

Key guidelines:
- Always use rerank="referenced_by" in searches to prioritize influential papers.
- Limit searches to 20-50 articles for focused analysis.
- Select articles strategically based on citation count and relevance.
"""

# Initialize your agent
agent = Agent(
    system_prompt=SYSTEM_PROMPT, tools=[search_pmc_tool, read_pmc_tool], model=MODEL_ID
)

# Send a message to the agent
response = agent(QUERY)

I'll help you find information about recent advances in GLP-1 drugs. Let me search for relevant scientific articles in the medical literature.
Tool #1: search_pmc_tool
Let me refine my search with more specific terminology:
Tool #2: search_pmc_tool
Let me try a more general search:
Tool #3: search_pmc_tool
I need to try a different search strategy:
Tool #4: search_pmc_tool
Let me try once more with a simpler query:
Tool #5: search_pmc_tool
I need to try a different approach. Let me search with a more general query without the date restriction first:
Tool #6: search_pmc_tool
Now let me retrieve the full text of one of the most comprehensive recent review articles to understand recent advances in GLP-1 drugs:
Tool #7: read_pmc_tool





Tool #8: read_pmc_tool




# Recent Advances in GLP-1 Drugs: A Comprehensive Overview

Based on the scientific literature, here are the key recent advances in GLP-1 (Glucagon-Like Peptide-1) drugs:

## Evolution of GLP-1 Receptor Agonists

GLP-1 receptor agonists (GLP-1RAs) have evolved significantly since the approval of exenatide in 2005. The most notable recent advances include:

1. **Extended Duration Formulations**: Development has progressed from short-acting compounds requiring twice-daily injections to weekly administration options. Current GLP-1RAs can be categorized by duration:
   - **Short-acting**: Exenatide (2.4h half-life), Lixisenatide (2-4h)
   - **Intermediate-acting**: Liraglutide (11-13h), Beinaglutide
   - **Long-acting**: Semaglutide (6-7 days), Dulaglutide (108-112h), PEG-loxenatide (1 week)
   ([PMC11408715](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11408715/))

2. **Oral GLP-1 Receptor Agonists**: The development of oral semaglutide (Rybelsus®) represents a breakthrough as the first o

The additional context improves the agent results even further.