# AI Investment Research Assistant

The instructions below guide you through the process of creating a supervisor agent and subagents for an investment research assistant. Each section explains the purpose of the code cells that follow.

## PREREQUISITES:

Follow instructions on README.md to setup the environment, deploy the web search stack, and deploy the stock data stack. Enable all necessary foundation models and attach permissions as necessary.

#### Ensure that this cell is run before running any other cells in the notebook

In [None]:
!pip install -q -r /home/sagemaker-user/amazon-bedrock-agent-samples/src/requirements.txt
!pip install --upgrade boto3 botocore

### Importing helper functions

On following section, we're adding `bedrock_agent_helper.py` and `knowledge_base_helper` on Python path, so the files can be recognized and their functionalities can be invoked.

Now, you're going to import from helper classes `bedrock_agent_helper.py` and `knowledge_base_helper.py`.

All interactions with Bedrock will be handled by these classes.

Following are methods that you're going to invoke on this notebook:

On `agents.py`:
- `create_agent`: Create a new agent and respective IAM roles
- `invoke`: Execute agent

On `knowledge_bases.py`:
- `create_or_retrieve_knowledge_base`: Create Knowledge Base on Amazon Bedrock if it doesn't exist or get info about previous created.
- `synchronize_data`: Read files on S3, convert text info into vectors and add that information on Vector Database.

In [None]:
import sys
import argparse
from pathlib import Path

import boto3
import botocore
from botocore.exceptions import ClientError

import os
from IPython.display import JSON, IFrame, Video, display, clear_output
from datetime import datetime
import time
from time import sleep

# Adjust the root path to go up 2 levels from the current working directory
ROOT_PATH = Path.cwd().parents[2]
sys.path.insert(0, str(ROOT_PATH))  # Insert at the beginning of sys.path

# Importing custom modules
from src.utils.bedrock_agent import (
    Agent,
    SupervisorAgent,
    Task,
    Guardrail,
    region,
    account_id,
    agents_helper,
)
from src.utils.knowledge_base_helper import KnowledgeBasesForAmazonBedrock

# Initialize the Knowledge Base helper
kb_helper = KnowledgeBasesForAmazonBedrock()

# Initialize boto3 client
bedrock_client = boto3.client("bedrock")
# LLM = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"
LLM = "us.anthropic.claude-3-5-sonnet-20241022-v2:0"
# LLM = "amazon.nova-lite-v1:0"

## Setup Bedrock Data Automation Project (BDA)
### Download multimodal data to be ingested into our knowledge base
This may take ~5 minutes due to large file sizes.

In [None]:
!mkdir files/
!curl -L -o files/Amazon-Quarterly-Earnings-Report-Q4-2024-Full-Call.wav https://s2.q4cdn.com/299287126/files/doc_financials/2024/q4/Amazon-Quarterly-Earnings-Report-Q4-2024-Full-Call-v1.wav
!curl -L -o files/amazon2024_10k.pdf https://s2.q4cdn.com/299287126/files/doc_financials/2024/q4/e42c2068-bad5-4ab6-ae57-36ff8b2aeffd.pdf
!curl -L -o files/amazon2024_q2_10Q.pdf https://s2.q4cdn.com/299287126/files/doc_financials/2024/q2/AMAZON-10Q-20240208.pdf
!curl -L -o files/amazon2024_q3_10Q.pdf https://s2.q4cdn.com/299287126/files/doc_financials/2024/q3/76ba648c-eba4-4ec1-b571-4f5993feed2e.pdf
!curl -L -o files/amazon2024_q1_10Q.pdf https://s2.q4cdn.com/299287126/files/doc_financials/2024/q1/04741924-2b6f-4934-91ab-1ccae56f0f9b.pdf
!curl -L -o files/amazon2023_q1_10Q.pdf https://s2.q4cdn.com/299287126/files/doc_financials/2023/q1/394e4b27-bf11-4bcf-b650-2ac1b9fe2a14.pdf

## Creating BDA Project

To start a BDA job, you need a BDA project, which organizes both standard and custom output configurations. This project is reusable, allowing you to apply the same configuration to process multiple video/audio/image files that share the same settings.
### Set up an S3 bucket for BDA and for the knowledge base

In [None]:
bda_client = boto3.client("bedrock-data-automation", region_name=region)
bda_runtime_client = boto3.client("bedrock-data-automation-runtime", region_name=region)
s3_client = boto3.client("s3", region_name=region)

kb_bucket_name = f"multimodal-fsi-data-{region}-{account_id}"

try:
    s3_client.create_bucket(
        Bucket=kb_bucket_name, 
        CreateBucketConfiguration={'LocationConstraint': region} # Comment this out if you are in us-east-1
    )
except ClientError as e:
    # Check if bucket already exists
    if e.response["Error"]["Code"] in ["BucketAlreadyOwnedByYou"]:
        print(
            f"Bucket {kb_bucket_name} already exists and is owned by you. No action taken."
        )
    else:
        # For any other errors, raise the exception
        raise


# Create bucket for BDA input/output
bda_bucket_name = f"bda-processing-{region}-{account_id}"


try:
    s3_client.create_bucket(
        Bucket=bda_bucket_name,
        CreateBucketConfiguration={'LocationConstraint': region} # Comment this out if you are in us-east-1
    )
except ClientError as e:
    # Check if bucket already exists
    if e.response["Error"]["Code"] in ["BucketAlreadyOwnedByYou"]:
        print(
            f"Bucket {bda_bucket_name} already exists and is owned by you. No action taken."
        )
    else:
        # For any other errors, raise the exception
        raise

bda_bucket_name_input = f"s3://{bda_bucket_name}/input/"  # DBA input path
bda_bucket_name_output = f"s3://{bda_bucket_name}/output/"  # DBA output path


Create a name for the BDA project. Deletes existing project, if it already exists.

In [None]:
project_name = f"fsi-bda-{region}-{account_id}"

# delete project if it already exists
projects_existing = [
    project
    for project in bda_client.list_data_automation_projects(projectStageFilter="ALL")[
        "projects"
    ]
    if project["projectName"] == project_name
]
if len(projects_existing) > 0:
    print(f"Deleting existing project: {projects_existing[0]}")
    bda_client.delete_data_automation_project(
        projectArn=projects_existing[0]["projectArn"]
    )


### Configure the BDA project
For more information on creating/configuring BDA projects, take a look at our [documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/bda-using-api.html). 

In [None]:
response = bda_client.create_data_automation_project(
    projectName=project_name,
    projectDescription="BDA video, audio, and document processing project",
    projectStage="DEVELOPMENT",
    standardOutputConfiguration={
        "video": {
            "extraction": {
                "category": {
                    "state": "ENABLED",
                    "types": ["TEXT_DETECTION", "TRANSCRIPT"],
                },
                "boundingBox": {"state": "ENABLED"},
            },
            "generativeField": {
                "state": "ENABLED",
                "types": ["VIDEO_SUMMARY", "CHAPTER_SUMMARY", "IAB"],
            },
        },
        "audio": {
            "extraction": {"category": {"state": "ENABLED", "types": ["TRANSCRIPT"]}},
            "generativeField": {
                "state": "ENABLED",
                "types": ["AUDIO_SUMMARY", "TOPIC_SUMMARY", "IAB"],
            },
        },
        "document": {
            "extraction": {
                "granularity": {"types": ["DOCUMENT", "PAGE", "ELEMENT"]},
                "boundingBox": {"state": "ENABLED"},
            },
            "generativeField": {"state": "ENABLED"},
            "outputFormat": {
                "textFormat": {"types": ["PLAIN_TEXT"]},
                "additionalFileFormat": {"state": "ENABLED"},
            },
        },
    },
)

Retrieve and print out the ARN of the BDA project.

In [None]:
kb_project_arn = response.get("projectArn")
print("BDA kb project ARN:", kb_project_arn)

## Process files by invoking BDA

This may take ~10-15 minutes to process due to large size of files.

In [None]:
blueprint_arn = None
# If you want to use a blueprint, for example:
# blueprint_arn = f"arn:aws:bedrock:{region}:{account_id}:blueprint/bedrock-data-automation-public"

# Process files in your local folder ("files/")
for root, dirs, files in os.walk("files/"):
    for file in files:
        local_file_path = os.path.join(root, file)
        s3_key_input = f"input/{file}"
        s3_uri_input = f"s3://{bda_bucket_name}/{s3_key_input}"
        # Use your output bucket URI as defined (make sure it ends with a slash if needed)
        s3_uri_output = bda_bucket_name_output

        print(f"Uploading file {file} to {s3_uri_input}")
        s3_client.upload_file(local_file_path, bda_bucket_name, s3_key_input)

        # Define the configurations following the working example:
        inputConfiguration = {"s3Uri": s3_uri_input}
        outputConfiguration = {"s3Uri": s3_uri_output}
        # Use the project ARN as provided by create_data_automation_project() here:
        dataAutomationConfiguration = {"dataAutomationProjectArn": kb_project_arn}

        # Optionally, if you have a blueprint ARN, build a blueprints list. Otherwise, leave it out.
        blueprints = (
            [{"blueprintArn": blueprint_arn}] if blueprint_arn is not None else None
        )

        # Invoke the asynchronous BDA task
        # Notice that we supply dataAutomationProfileArn at the top level and dataAutomationConfiguration with dataAutomationProjectArn.
        kwargs = {}
        if blueprints is not None:
            kwargs["blueprints"] = blueprints

        bda_response = bda_runtime_client.invoke_data_automation_async(
            dataAutomationConfiguration={
                "dataAutomationProjectArn": kb_project_arn,  # your project ARN from create_data_automation_project()
                "stage": "DEVELOPMENT",  # or "LIVE" if that's what you're using
            },
            inputConfiguration={"s3Uri": s3_uri_input},
            outputConfiguration={"s3Uri": s3_uri_output},
            dataAutomationProfileArn=f"arn:aws:bedrock:{region}:{account_id}:data-automation-profile/us.data-automation-v1",
        )

        invocation_arn = bda_response.get("invocationArn")
        print("BDA task started:", invocation_arn)

        # Poll for task completion
        statusBDA = None
        while statusBDA not in ["Success", "ServiceError", "ClientError"]:
            status_response = bda_runtime_client.get_data_automation_status(
                invocationArn=invocation_arn
            )
            statusBDA = status_response.get("status")
            clear_output(wait=True)
            print(
                f"{datetime.now().strftime('%H:%M:%S')} : BDA task status: {statusBDA}"
            )
            time.sleep(5)

        output_config_uri = status_response.get("outputConfiguration", {}).get("s3Uri")
        print("Output configuration file:", output_config_uri)

        # Prepare to download the result JSON from the output S3 URI.
        # (Adjust the parsing as needed for your output structure)
        out_result_key = output_config_uri.split("/job_metadata.json", 1)[0].split(
            f"{bda_bucket_name}/"
        )[1]
        out_result_key += "/0/standard_output/0/result.json"
        local_result_file = f"result_{file}.json"
        print("Downloading result file from key:", out_result_key)
        s3_client.download_file(bda_bucket_name, out_result_key, local_result_file)

        # Finally, upload the result to your knowledge base bucket
        kb_file = f"data/result_{file}_kb.json"
        print(f"Uploading file {local_result_file} to KB bucket")
        s3_client.upload_file(local_result_file, kb_bucket_name, kb_file)


### Set up functions to clean up agents and create a guardrail
Function to delete agents and guardrails if they are already created.

In [None]:
def clean_up_agents():
    agents_helper.delete_agent(
        agent_name="investment_research_assistant", delete_role_flag=True, verbose=True
    )
    agents_helper.delete_agent(
        agent_name="news_agent", delete_role_flag=True, verbose=True
    )
    agents_helper.delete_agent(
        agent_name="quantitative_analysis_agent", delete_role_flag=True, verbose=True
    )
    agents_helper.delete_agent(
        agent_name="smart_summarizer_agent", delete_role_flag=True, verbose=True
    )
    response = bedrock_client.list_guardrails()
    for _gr in response["guardrails"]:
        if _gr["name"] == "no_bitcoin_guardrail":
            print(f"Found guardrail: {_gr['id']}")
            guardrail_identifier = _gr["id"]
            bedrock_client.delete_guardrail(guardrailIdentifier=guardrail_identifier)


#### Define Amazon Bedrock Guardrail
[Amazon Bedrock Guardrails](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html) can implement safeguards for your generative AI applications based on your use cases and responsible AI policies. You can use guardrails for both user inputs and model responses with natural language. In this case, we are creating a guardrail to prevent cryptocurrencies from being discussed by our agents.

In [None]:
def create_guardrail():
    return Guardrail(
        "no_bitcoin_guardrail",
        "bitcoin_topic",
        "No Bitcoin or cryptocurrency allowed in the analysis.",
        denied_topics=["bitcoin", "crypto", "cryptocurrency"],
        blocked_input_response="Sorry, this agent cannot discuss bitcoin.",
        verbose=True,
    )

#### Delete old agents and guardrail if they exist, create new guardrail

In [None]:
clean_up_agents()
Agent.set_force_recreate_default(True)
no_bitcoin_guardrail = create_guardrail()

### Create subagents
#### Create smart_summarizer_agent subagent
This agent takes in output from other subagents, such as recent news and financial data, and synthesizes information into structured investment insights.

In [None]:
smart_summarizer_agent = Agent.create(
    name="smart_summarizer_agent",
    role="A financial analyst specializing in synthesizing stock market trends and financial news into structured investment insights. The agent produces fact-based summaries to support strategic decision-making.",
    goal="Analyze stock trends and market news to generate insights.",
    instructions="""You are a Financial Analyst, responsible for analyzing stock trends and financial news to generate structured insights.
                            Combine stock price trends with financial news to identify key patterns.
                            Use your expertise to analyze macroeconomic indicators, company earnings, and market sentiment.
                            Ensure responses are fact-driven, clearly structured, and cite sources where applicable.
                            Do not generate financial advice—your role is to analyze and summarize available data objectively.
                            Keep analyses concise and insightful, focusing on major trends and anomalies.
                            Ensure answers are professional and coherent. No emojis should be displayed.
                            **If given portfolio optimization pecentages, indicate that these are based on logic/math from the portfolio optimization tool, and are not considered financial advice**""",
    llm=LLM,
)


#### Create quantitative_analysis subagent
This agent queries and analyzes historical stock data, and builds optimized portfolio allocations based on user inputs like stock tickers, investment amount etc.

In [None]:
# Define Lambda ARNs
stock_data_tools_arn = f"arn:aws:lambda:{region}:{account_id}:function:stock_data_tools"

quantitative_analysis_agent = Agent.create(
    name="quantitative_analysis_agent",
    role="Financial Data Collector",
    goal="Retrieve real-time and historic stock prices as well as optimizing a portfolio given tickers.",
    instructions="""You are a Stock Data and Portfolio Optimization Specialist. Your role is to retrieve real-time stock data and optimize investment portfolios.

Your capabilities include:
1. Retrieving stock price data using the `stock_data_lookup` tool.
2. Performing portfolio optimization when at least three stock tickers are provided.
3. Enforcing the portfolio optimization rule: If fewer than three tickers are provided, inform the user that optimization requires at least three.

Core behaviors:
- Always retrieve stock data from `stock_data_lookup` before running portfolio optimization.
- If portfolio optimization is requested, invoke `portfolio_optimization_action_group` only after retrieving stock data.
- Do not attempt to interpret financial trends—focus solely on data retrieval and portfolio structuring.
""",
    tools=[
        # Stock Data Lookup Tool
        {
            "code": stock_data_tools_arn,
            "definition": {
                "name": "stock_data_lookup",
                "description": "Gets the 1-month stock price history for a given stock ticker, formatted as JSON.",
                "parameters": {
                    "ticker": {
                        "description": "The ticker to retrieve price history for",
                        "type": "string",
                        "required": True,
                    }
                },
            },
        },
        # Portfolio Optimization Tool
        {
            "code": stock_data_tools_arn,
            "definition": {
                "name": "portfolio_optimization",
                "description": "Optimizes a stock portfolio given a list of tickers and historical prices from the stock_data_lookup function.",
                "parameters": {
                    "tickers": {
                        "description": "A comma-separated list of stock tickers to include in the portfolio",
                        "type": "string",
                        "required": True,
                    },
                    "prices": {
                        "description": "A JSON object with dates as keys and stock prices as values",
                        "type": "string",
                        "required": True,
                    },
                },
            },
        },
    ],
    llm=LLM,
)


#### Create a knowledge base, create news_agent subagent, attach and syncronize the knowledge base
This agent searches and retrieves relevant financial data like earnings reports, filings from the knowledge base for context. If information is not present in the knowledge base, it constructs a web query. 

In [None]:
kb_name = "financial-analysis-kb"
kb_description = "Access this knowledge base when needing to look up financial information like 10K reports, revenues, sales, net sales, loss and risks. Contains earnings calls."
kb_id, ds_id = kb_helper.create_or_retrieve_knowledge_base(
    kb_name=kb_name,
    kb_description=kb_description,
    data_bucket_name=kb_bucket_name,
    embedding_model="amazon.titan-embed-text-v2:0",
)

# Define Web Search Lambda ARN
web_search_arn = f"arn:aws:lambda:{region}:{account_id}:function:web_search"


news_agent = Agent.create(
    name="news_agent",
    role="Market News Researcher",
    goal="Fetch from the knowledge base. Then if needed, fetch latest relevant news for a given stock based on a ticker.",
    instructions=f"""You are a Financial Document & News Analyst responsible for extracting structured insights from official financial reports and real-time news.

Your capabilities include:
1. Extracting insights from earnings calls, SEC filings (10-K, 10-Q), and corporate press releases stored in the knowledge base (ID: {kb_id}).
2. Summarizing financial reports with a focus on factual accuracy.
3. Retrieving the latest financial news only **if the knowledge base lacks relevant information**.

Core behaviors:
- **Always check the knowledge base (ID: {kb_id}) first** before fetching external news.
- **Avoid unnecessary web searches**—use external news sources only if the knowledge base lacks sufficient information.
- Ensure all findings are **fact-based, neutral, and structured** for investment research.
""",
    tools=[
        {
            "code": web_search_arn,
            "definition": {
                "name": "web_search",
                "description": "Searches the web for investment news and earnings reports.",
                "parameters": {
                    "search_query": {
                        "description": "The query to search the web with",
                        "type": "string",
                        "required": True,
                    },
                    "target_website": {
                        "description": "Specific website to search",
                        "type": "string",
                        "required": False,
                    },
                    "topic": {
                        "description": "The topic being searched, such as 'news'",
                        "type": "string",
                        "required": False,
                    },
                    "days": {
                        "description": "Number of days of history to search",
                        "type": "string",
                        "required": False,
                    },
                },
            },
        },
    ],
    kb_id=kb_id,
    llm=LLM,
)

kb_helper.synchronize_data(kb_id, ds_id)


### Create the supervisor agent
This supervisor agent orchestrates the overall investment research process by breaking down user prompts, delegating subtasks to specialized subagents, and consolidating their outputs to generate the final response.

In [None]:
investment_research_assistant = SupervisorAgent.create(
    "investment_research_assistant",
    role="Investment Research Assistant",
    goal="A seasoned investment research expert responsible for orchestrating subagents to conduct a comprehensive stock analysis. This agent synthesizes market news, stock data, and smart_summarizer insights into a structured investment report.",
    collaboration_type="SUPERVISOR",
    instructions=f"""You are an Investment Research Assistant, responsible for overseeing and synthesizing financial research from specialized agents. Your role is to coordinate subagents to produce structured investment insights.

Your capabilities include:
1. Managing collaboration between subagents to retrieve and analyze financial data.
2. Synthesizing stock trends, financial reports, and market news into a structured analysis.
3. Delivering well-organized, fact-based investment insights with clear distinctions between data sources.

Available subagents:
- **news_agent**: Retrieves and summarizes the latest financial news.  
  - **Always instruct news_agent to check the knowledge base (ID: {kb_id}) first before using external web searches**.
- **quantitative_analysis_agent**: Provides real-time and historical stock prices.  
  - For portfolio optimization, retrieve stock data via `stock_data_lookup` before calling `portfolio_optimization_action_group`.
- **smart_summarizer_agent**: Synthesizes financial data and market trends into a structured investment insight.

Core behaviors:
- Only invoke a subagent when necessary. Do not invoke agent for information not requested by user.
- Ensure responses are **well-structured, clearly formatted, and relevant to investor decision-making**.
- Differentiate between financial news, technical stock analysis, and synthesized insights.
""",
    collaborator_agents=[
        {
            "agent": "news_agent",
            "instructions": f"Always check the knowledge base (ID: {kb_id}) first. Use this collaborator for finding news and analyzing specific documents.",
        },
        {
            "agent": "quantitative_analysis_agent",
            "instructions": "Use this collaborator for retrieving stock price history and performing portfolio optimization.",
        },
        {
            "agent": "smart_summarizer_agent",
            "instructions": "Use this collaborator for synthesizing stock trends, financial data, and generating structured investment insights.",
        },
    ],
    collaborator_objects=[
        news_agent,
        quantitative_analysis_agent,
        smart_summarizer_agent,
    ],
    # guardrail=no_bitcoin_guardrail,
    llm=LLM,
    # verbose=False,
)


### Example queries to the supervisor agent
You may also test the multi-agent collaboration by querying the supervisor agent in the console.

In [None]:
request = (
    "what's AMZN stock price doing over the last week and relate that to recent news"
)
print(f"Request:\n{request}\n")
trace_level = "core"
result = investment_research_assistant.invoke(
    request,
    enable_trace=False,
    trace_level=trace_level,
)
print(f"Final answer:\n{result}")


In [None]:
request = "Optimize my portfolio with [Ticker 1], [Ticker 2], and [Ticker 3]"
print(f"Request:\n{request}\n")
trace_level = "core"
result = investment_research_assistant.invoke(
    request,
    enable_trace=False,
    trace_level=trace_level,
)
print(f"Final answer:\n{result}")

In [None]:
request = "Tell me about 2023 Q1 amazon earnings call."
print(f"Request:\n{request}\n")
trace_level = "core"
result = investment_research_assistant.invoke(
    request,
    enable_trace=False,
    trace_level=trace_level,
)
print(f"Final answer:\n{result}")

In [None]:
request = "Analyze Amazon’s financial health based on the 2024 10k report. Calculate important financial ratios. Limit to 5 sentences"
print(f"Request:\n{request}\n")
trace_level = "outline"
result = investment_research_assistant.invoke(
    request,
    enable_trace=False,
    trace_level=trace_level,
)
print(f"Final answer:\n{result}")

## Cleanup
Running this cell will delete the agents created from this notebook. To fully clean up this project, you must also delete the WebSearch and the StockDataTools stack.

In [None]:
clean_up_agents()
# Delete the WebSearch and StockDataTools stack.