# Use Tools to Search PubMed Central for Scientific Articles

![Agent with scientific knowledge tool](../../../static/agent-tool-use.png)

This lab introduces participants to building their first research agent using the Strands Agents SDK and Amazon Bedrock AgentCore. The lab emphasizes the importance of grounding AI agents with reliable data sources and demonstrates how proper tooling transforms unreliable responses into accurate, well-cited scientific information. It serves as a foundation for the more advanced retrieval and planning techniques covered in subsequent labs.

The lab starts by creating a basic agent using Claude Sonnet 4 and asking it about GLP-1 drug safety and effectiveness. This demonstrates a critical limitation: without access to real data, the agent generates plausible-sounding but potentially inaccurate PMC IDs that don't link to actual relevant articles.

The lab then introduces a custom `search_pmc` tool that connects to the PubMed Central API. This tool includes intelligent features like:

- Filtering for commercially licensed articles
- Ranking results by citation count to prioritize high-impact research
- Providing accurate PMC IDs that link to real, relevant scientific papers

The lab guides participants through deploying their agent to Amazon Bedrock AgentCore runtime, transforming a local prototype into a scalable, production-ready system. This includes configuring the runtime environment, launching the agent, and testing it in the cloud.

## 1. Prerequisites

- Python 3.10 or later
- AWS account configured with appropriate permissions
- Access to the Anthropic Claude Sonnet 4 model on Amazon Bedrock
- Basic understanding of Python programming

In [None]:
%pip install -qU boto3 strands-agents strands-agents-tools defusedxml httpx bedrock_agentcore_starter_toolkit

## 2. Basic Prompt without Context

To begin, create a basic agent and see how well it can answer a scientific question without any additional context.

In [None]:
from strands import Agent

MODEL_ID = "global.anthropic.claude-sonnet-4-20250514-v1:0"
QUERY = "How safe and effective are GLP-1 drugs for long term use?"

SYSTEM_PROMPT = """
    You are a specialized life science research agent. Your role is to:
    1. Search PubMed Central for medical papers related to the query
    2. Extract and summarize the most relevant clinical findings
    3. Identify key research groups and methodologies
    4. Return structured, well-cited information with PMCID references
    Please provide specific PMC ID references that support your answer.
    """

# Initialize your agent
agent = Agent(system_prompt=SYSTEM_PROMPT, model=MODEL_ID)

# Send a message to the agent
response = agent(QUERY)

The model may be unable to come up with references, but if it does copy and paste a few of the identitifed PMCIDs into the [PMC web search](https://pmc.ncbi.nlm.nih.gov/). Notice anything unusual? They likely point to completely unrelated resources! Without additional context, LLMs will do their best to generate IDs that seem convincing - they may even return real IDs included in their training data. However, if we want our agent to consistently return accurate, up-to-date results we need to provide it with a tool.

## 3. Search PMC for Scientific Abstracts

See if you can improve the performance of your agent by giving it access to a custom tool called `search_pmc_tool`. This tools uses the PMC API to identify relevant scientific article abstracts. It has some special features to help the agent focus on the most relevant articles:

- It limits the search to only articles licensed for commercial use
- For each article in the search results, the tool calculates how many OTHER articles include it as a reference. These are likely to be the most impactful and valuable to the agent

You can look at the `search_pmc_tool` code in `search_pmc.py`.

In [None]:
from strands import Agent
from search_pmc import search_pmc_tool

MODEL_ID = "global.anthropic.claude-sonnet-4-20250514-v1:0"
QUERY = "How safe and effective are GLP-1 drugs for long term use?"

SYSTEM_PROMPT = """You are a life science research assistant. When given a scientific question, follow this process:

1. Use search_pmc_tool to find highly-cited papers. Search broadly first, then narrow down. Use temporal filters like "last 2 years"[dp] for recent work.
2. Extract and summarize the most relevant clinical findings.
3. Return structured, well-cited information with PMC ID references.
4. Return URL links associated with PMCID references
"""

# Initialize your agent
agent = Agent(system_prompt=SYSTEM_PROMPT, tools=[search_pmc_tool], model=MODEL_ID)

# Send a message to the agent
response = agent(QUERY)

The additional information makes the agent response much more detailed. Try [searching](https://pmc.ncbi.nlm.nih.gov/) for the PMCIDs again. This time they should link to the correct articles.

## 4. Deploy to Amazon Bedrock AgentCore Runtime

Next, deploy the agent and tool to Amazon Bedrock AgentCore Runtime. 

Amazon Bedrock AgentCore Runtime provides a secure, serverless environment specifically designed for hosting AI agents at enterprise scale, eliminating the need to manage complex infrastructure while offering session isolation and support for any agent framework. The platform handles critical operational requirements like container orchestration, session management, scalability, and security isolation automatically, allowing developers to focus on creating intelligent experiences rather than managing underlying infrastructure. Additionally, AgentCore Runtime supports extended execution times up to 8 hours, processes large 100MB payloads for multimodal workloads, and provides consumption-based pricing that charges only for resources actually used, making it ideal for complex research agents that require long-running sessions and comprehensive data processing capabilities.

Start by looking at the agent code. Notice the `@app.entrypoint` decorator on the `strands_agent_bedrock` function? This tells AgentCore Runtime how to run the agent.

In [None]:
%pycat agent.py

Run the following cell to configure and launch the agent on AgentCore Runtime.

In [None]:
import boto3
from bedrock_agentcore_starter_toolkit import Runtime

ssm = boto3.client("ssm")

agentcore_runtime = Runtime()
agentcore_runtime.configure(
    agent_name="pmc_abstract_agent",
    auto_create_ecr=True,
    execution_role=ssm.get_parameter(
        Name="/deep-research-workshop/agentcore-runtime-role-arn"
    )["Parameter"]["Value"],
    entrypoint="agent.py",
    memory_mode="NO_MEMORY",
    requirements_file="requirements.txt",
)

agentcore_runtime.launch(auto_update_on_conflict=True)

When the deployment is finished, generate a new session ID value and use it to invoke the agent.

In [None]:
from invoke_agentcore import invoke_agentcore
import uuid

session_id = str(uuid.uuid4())

invoke_agentcore(
    agent_runtime_name="pmc_abstract_agent",
    prompt="How safe and effective are GLP-1 drugs for long term use?",
    session_id=session_id,
)

## 5. (Optional) Interact with agent using a chat application

To experiment with this agent in an instructor-led workshop, follow the steps in the Getting Started section of the Workshop Studio page for this event to access a Streamlit chat application. Then, select `pmc_abstract_agent` from the **Agent Name** list.


## 6. (Optional) Clean Up

Run the next notebook cell to delete the AgentCore runtime environment.

In [None]:
import boto3

agentcore_client = boto3.client("bedrock-agentcore-control")
agent_status = agentcore_runtime.status()

agentcore_client.delete_agent_runtime(agentRuntimeId=agent_status.config.agent_id)