# Build an AI Agent with SEC Filing Insights in Just 10 Minutes Using OpenSSA

In this tutorial, you will learn how to:

1. Build an AI Agent from Scratch Using openSSA
2. Customize Plans to Guide the Agent Through Complex Problem-Solving
3. Add Your Own Domain Expertise to Enhance the Agent

## Setups

Let's start by impporting the neccessary dependencies.

In [1]:
%load_ext autoreload
%autoreload

In [2]:
from pprint import pprint
from IPython.display import display, Markdown

In [3]:
import os
import sys

if cwd_is_root := ('examples' in os.listdir()):
    sys.path.append('examples')

Make sure you plave your OpenAI API key in `example/.env`

```
OPENAI_API_KEY=...
```

[Where do I find my OpenAI API Key?](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)

In [4]:
from pathlib import Path
from dotenv import load_dotenv

print('Sanity check if we have the OpenAI API setup: ', load_dotenv(dotenv_path=Path('examples' if cwd_is_root else '.') / '.env'))

Sanity check if we have the OpenAI API setup:  False


In [12]:
from openssa import Agent, HTP, AutoHTPlanner, OodaReasoner, FileResource
from openssa.utils.llms import OpenAILLM

## 1. Build an AI Agent from Scratch Using OpenSSA

### Build Agent

We're going to use the FinanceBench dataset to demonstrate. We have loaded a sample SEC filing for 3M from 2022. 

https://github.com/patronus-ai/financebench/blob/main/pdfs/3M_2022_10K.pdf

In [5]:
DOC_PATH = 'sample_data/3M_2022_10K/'
PROBLEM = 'Is 3M a capital-intensive business based on FY2022 data?'
GROUTH_TRUTH_ANSWER ='''
    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4%'''

In [13]:
llm=OpenAILLM()

In [14]:
llm.client.chat.completions.create()

<bound method OpenAILLM.get_default of <class 'openssa.utils.llms.OpenAILLM'>>

In [15]:
# util function to summarize answer
def summarize_ans(ans, model="gpt-4", max_tokens=100):
    response = openai.ChatCompletion.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Please summarize the following text into 1-2 sentences: " + ans}
        ],
        max_tokens=max_tokens,
        temperature=0.7
    )
    summary = response['choices'][0]['message']['content']
    return summary

In [6]:
# util function to print
import textwrap
def print_solution(sol, present_full_answer=False):
    print('PROBLEM: ')
    print('====================')
    print(PROBLEM, '\n')
    print('GROUTH TRUTH ANSWER: ')
    print('====================')
    print(GROUTH_TRUTH_ANSWER, '\n')
    print('AGENT\'S SUMMARIZED ANSWER:')
    print('====================')
    print(textwrap.fill(summarize_ans(sol), 80))
    if present_full_answer:
        print('AGENT\'S FULL ANSWER:')
        print('====================')
        print(textwrap.fill(sol, 80))


Let's build our first agent with all default settings. 

In [9]:
# Build a base agent
agent = Agent(planner=None,
              reasoner=OodaReasoner(),
              knowledge=None,
              resources={FileResource(path=DOC_PATH)})

base_solution = agent.solve(problem=PROBLEM, plan=None, dynamic=False)

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 252/252 [00:00<00:00, 929.28it/s]
Generating embeddings: 100%|██████████| 312/312 [00:07<00:00, 40.75it/s]


In [10]:
print_solution(base_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUTH TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

AGENT'S ANSWER:
Based on the FY2022 data provided, 3M is indeed a capital-intensive business.
The determination is supported by the reported capital expenditures of $1,831
million for the year 2022. Capital-intensive businesses are typically
characterized by high levels of investment in physical assets such as property,
plant, and equipment, which is consistent with 3M's financial statements.
Furthermore, the total assets of the company amounting to $46,455 million
underscore the significant capital employed in its operations. The investments
in information technology, laboratory facilities, and sustainability
initiatives, as well as the focus on workforce development, all point t

In this example, we can see the default answer is not that good. 3M is not a capital intensive business but the agent failed to answer the question correctly. Let's incorporate planning capability to enhance the agent.

## Customize Plans to Guide the Agent Through Complex Problem-Solving

### Auto-generated plan with OpenSSA

Let's upgrade our agent to incorporate planning, in this example we're decomposing the task into 4 subtasks with a hierachy with the depth of 2 layers. Each plan for the subtask is auto-generated by an LLM.

In [92]:
agent = Agent(planner=AutoHTPlanner(max_depth=2, max_subtasks_per_decomp=4),
                reasoner=OodaReasoner(),
                knowledge=None,
                resources={FileResource(path=DOC_PATH)})

auto_htp_statically_solution = agent.solve(problem=PROBLEM, plan=None, dynamic=False)

[32m2024-05-30 17:14:22.465[0m | [1mINFO    [0m | [36mopenssa.l2.agent.agent[0m:[36msolve[0m:[36m106[0m - [1m
GLOBAL TASK PLANNING

PLAN(task=Is 3M a capital-intensive business based on FY2022 data?,
     subs=[ PLAN(task="What is the total amount of 3Ms capital expenditures for FY2022?"),
            PLAN(task=What is the depreciation and amortization expense for 3M in FY2022?),
            PLAN(task="What is the ratio of 3Ms capital expenditures to its total revenue for FY2022?"),
            PLAN(task="What is the nature of 3Ms assets and how are they utilized in the business?")])
[0m


[32m2024-05-30 17:14:22.465[0m | [1mINFO    [0m | [36mopenssa.l2.agent.agent[0m:[36msolve[0m:[36m106[0m - [1m
GLOBAL TASK PLANNING

PLAN(task=Is 3M a capital-intensive business based on FY2022 data?,
     subs=[ PLAN(task="What is the total amount of 3Ms capital expenditures for FY2022?"),
            PLAN(task=What is the depreciation and amortization expense for 3M in FY2022?),
            PLAN(task="What is the ratio of 3Ms capital expenditures to its total revenue for FY2022?"),
            PLAN(task="What is the nature of 3Ms assets and how are they utilized in the business?")])
[0m


  0%|          | 0/4 [00:00<?, ?it/s][32m2024-05-30 17:14:38.053[0m | [34m[1mDEBUG   [0m | [36mopenssa.l2.planning.hierarchical.plan[0m:[36mexecute[0m:[36m112[0m - [34m[1m
TASK-LEVEL REASONING

WHAT IS THE TOTAL AMOUNT OF 3M'S CAPITAL EXPENDITURES FOR FY2022?
--------------------------
Based on the information provided by the uniquely named resource '3M_2022_10K', which is a Form 10-K document containing comprehensive financial data for 3M Company for the fiscal year ended December 31, 2022, the total amount of 3M's capital expenditures for FY2022 is reported to be $1,749 million. This figure is taken directly from the specified resource, which is expected to be a reliable and accurate source of official financial information as Form 10-K is a detailed annual report filed by public companies with the SEC, outlining the company's financial performance. Therefore, the answer is based on rigorous data provided by the company in its official annual report.
[0m


[32m2024-05-30 17:14:38.053[0m | [34m[1mDEBUG   [0m | [36mopenssa.l2.planning.hierarchical.plan[0m:[36mexecute[0m:[36m112[0m - [34m[1m
TASK-LEVEL REASONING

WHAT IS THE TOTAL AMOUNT OF 3M'S CAPITAL EXPENDITURES FOR FY2022?
--------------------------
Based on the information provided by the uniquely named resource '3M_2022_10K', which is a Form 10-K document containing comprehensive financial data for 3M Company for the fiscal year ended December 31, 2022, the total amount of 3M's capital expenditures for FY2022 is reported to be $1,749 million. This figure is taken directly from the specified resource, which is expected to be a reliable and accurate source of official financial information as Form 10-K is a detailed annual report filed by public companies with the SEC, outlining the company's financial performance. Therefore, the answer is based on rigorous data provided by the company in its official annual report.
[0m


 25%|██▌       | 1/4 [00:08<00:24,  8.25s/it][32m2024-05-30 17:14:45.661[0m | [34m[1mDEBUG   [0m | [36mopenssa.l2.planning.hierarchical.plan[0m:[36mexecute[0m:[36m112[0m - [34m[1m
TASK-LEVEL REASONING

WHAT IS THE DEPRECIATION AND AMORTIZATION EXPENSE FOR 3M IN FY2022?
--------------------------
The depreciation and amortization expense for 3M in FY2022 was $498 million. This information is derived from the '3M_2022_10K' resource, which is a Form 10-K document. Form 10-Ks are comprehensive annual reports filed with the U.S. Securities and Exchange Commission (SEC) that provide a detailed overview of a company's financial performance for the fiscal year. Given that the Form 10-K is an official document that must adhere to SEC regulations and generally accepted accounting principles (GAAP), the reported figure for depreciation and amortization expense is considered reliable and accurate. The specific value of $498 million is taken directly from the resource, which is expecte

[32m2024-05-30 17:14:45.661[0m | [34m[1mDEBUG   [0m | [36mopenssa.l2.planning.hierarchical.plan[0m:[36mexecute[0m:[36m112[0m - [34m[1m
TASK-LEVEL REASONING

WHAT IS THE DEPRECIATION AND AMORTIZATION EXPENSE FOR 3M IN FY2022?
--------------------------
The depreciation and amortization expense for 3M in FY2022 was $498 million. This information is derived from the '3M_2022_10K' resource, which is a Form 10-K document. Form 10-Ks are comprehensive annual reports filed with the U.S. Securities and Exchange Commission (SEC) that provide a detailed overview of a company's financial performance for the fiscal year. Given that the Form 10-K is an official document that must adhere to SEC regulations and generally accepted accounting principles (GAAP), the reported figure for depreciation and amortization expense is considered reliable and accurate. The specific value of $498 million is taken directly from the resource, which is expected to be a precise reflection of 3M's financia

 50%|█████     | 2/4 [00:15<00:15,  7.87s/it][32m2024-05-30 17:14:52.256[0m | [34m[1mDEBUG   [0m | [36mopenssa.l2.planning.hierarchical.plan[0m:[36mexecute[0m:[36m112[0m - [34m[1m
TASK-LEVEL REASONING

WHAT IS THE RATIO OF 3M'S CAPITAL EXPENDITURES TO ITS TOTAL REVENUE FOR FY2022?
--------------------------
To calculate the ratio of 3M's capital expenditures to its total revenue for FY2022, we need two specific figures: the total capital expenditures and the total revenue for the year. From the additional information provided, we know that 3M's capital expenditures for FY2022 were $1,749 million. However, the total revenue figure is not directly provided in the information available. Since the ratio cannot be calculated without both values, and the total revenue figure is missing, we cannot provide a confident answer with concrete results to the posed question.
[0m


[32m2024-05-30 17:14:52.256[0m | [34m[1mDEBUG   [0m | [36mopenssa.l2.planning.hierarchical.plan[0m:[36mexecute[0m:[36m112[0m - [34m[1m
TASK-LEVEL REASONING

WHAT IS THE RATIO OF 3M'S CAPITAL EXPENDITURES TO ITS TOTAL REVENUE FOR FY2022?
--------------------------
To calculate the ratio of 3M's capital expenditures to its total revenue for FY2022, we need two specific figures: the total capital expenditures and the total revenue for the year. From the additional information provided, we know that 3M's capital expenditures for FY2022 were $1,749 million. However, the total revenue figure is not directly provided in the information available. Since the ratio cannot be calculated without both values, and the total revenue figure is missing, we cannot provide a confident answer with concrete results to the posed question.
[0m


 75%|███████▌  | 3/4 [00:22<00:07,  7.29s/it][32m2024-05-30 17:15:15.251[0m | [34m[1mDEBUG   [0m | [36mopenssa.l2.planning.hierarchical.plan[0m:[36mexecute[0m:[36m112[0m - [34m[1m
TASK-LEVEL REASONING

WHAT IS THE NATURE OF 3M'S ASSETS AND HOW ARE THEY UTILIZED IN THE BUSINESS?
--------------------------
The nature of 3M's assets is diverse, encompassing current assets, long-term investments, property, plant, and equipment (PP&E), operating lease right-of-use assets, goodwill, intangible assets, and other assets. These assets are utilized across the company's various business segments to support day-to-day operations, manufacturing processes, and strategic initiatives.

Current assets, including cash, marketable securities, accounts receivable, and inventories, are used for managing liquidity, financing short-term obligations, and maintaining operations. PP&E are critical for 3M's manufacturing capabilities, allowing the company to produce a wide array of products. Operati

[32m2024-05-30 17:15:15.251[0m | [34m[1mDEBUG   [0m | [36mopenssa.l2.planning.hierarchical.plan[0m:[36mexecute[0m:[36m112[0m - [34m[1m
TASK-LEVEL REASONING

WHAT IS THE NATURE OF 3M'S ASSETS AND HOW ARE THEY UTILIZED IN THE BUSINESS?
--------------------------
The nature of 3M's assets is diverse, encompassing current assets, long-term investments, property, plant, and equipment (PP&E), operating lease right-of-use assets, goodwill, intangible assets, and other assets. These assets are utilized across the company's various business segments to support day-to-day operations, manufacturing processes, and strategic initiatives.

Current assets, including cash, marketable securities, accounts receivable, and inventories, are used for managing liquidity, financing short-term obligations, and maintaining operations. PP&E are critical for 3M's manufacturing capabilities, allowing the company to produce a wide array of products. Operating lease right-of-use assets enable 3M to lea

100%|██████████| 4/4 [00:45<00:00, 11.36s/it]
[32m2024-05-30 17:15:36.431[0m | [34m[1mDEBUG   [0m | [36mopenssa.l2.planning.hierarchical.plan[0m:[36mexecute[0m:[36m95[0m - [34m[1m
TASK-LEVEL REASONING with Supporting/Other Results

PLAN(task=Is 3M a capital-intensive business based on FY2022 data?,
     subs=[ PLAN(task="What is the total amount of 3Ms capital expenditures for FY2022?"),
            PLAN(task=What is the depreciation and amortization expense for 3M in FY2022?),
            PLAN(task="What is the ratio of 3Ms capital expenditures to its total revenue for FY2022?"),
            PLAN(task="What is the nature of 3Ms assets and how are they utilized in the business?")])

IS 3M A CAPITAL-INTENSIVE BUSINESS BASED ON FY2022 DATA?
--------------------------
To assess whether 3M is a capital-intensive business based on FY2022 data, we need to consider the company's capital expenditures, the nature of its assets, and how these assets are utilized in the business.

Fr

[32m2024-05-30 17:15:36.431[0m | [34m[1mDEBUG   [0m | [36mopenssa.l2.planning.hierarchical.plan[0m:[36mexecute[0m:[36m95[0m - [34m[1m
TASK-LEVEL REASONING with Supporting/Other Results

PLAN(task=Is 3M a capital-intensive business based on FY2022 data?,
     subs=[ PLAN(task="What is the total amount of 3Ms capital expenditures for FY2022?"),
            PLAN(task=What is the depreciation and amortization expense for 3M in FY2022?),
            PLAN(task="What is the ratio of 3Ms capital expenditures to its total revenue for FY2022?"),
            PLAN(task="What is the nature of 3Ms assets and how are they utilized in the business?")])

IS 3M A CAPITAL-INTENSIVE BUSINESS BASED ON FY2022 DATA?
--------------------------
To assess whether 3M is a capital-intensive business based on FY2022 data, we need to consider the company's capital expenditures, the nature of its assets, and how these assets are utilized in the business.

From the 3M_2022_10K document, we know that 3M's

In [93]:
print_solution(auto_htp_statically_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

EXPECTED ANSWER: 
    
    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:

    CAPEX/Revenue Ratio: 5.1%

    Fixed assets/Total Assets: 20%

    Return on Assets= 12.4% 

AGENT'S ANSWER:
To assess whether 3M is a capital-intensive business based on FY2022 data, we
need to consider the company's capital expenditures, the nature of its assets,
and how these assets are utilized in the business.  From the 3M_2022_10K
document, we know that 3M's capital expenditures for FY2022 were $1,749 million.
This figure represents the amount 3M invested in fixed assets such as property,
plant, and equipment (PP&E) during the fiscal year. Capital expenditures of this
magnitude suggest significant investment in the physical assets required for
production and operations, which is a characteristic of capital-intensive
industries.  The depreciation and amortization 

### Auto-generated plan with dynamically solving

In [99]:
agent = Agent(planner=AutoHTPlanner(max_depth=2, max_subtasks_per_decomp=4),
                reasoner=OodaReasoner(),
                knowledge=None,
                resources={FileResource(path=DOC_PATH)})

auto_htp_dynamically_solution = agent.solve(problem=PROBLEM, plan=None, dynamic=True)

In [100]:
print_solution(auto_htp_dynamically_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

EXPECTED ANSWER: 
    
    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:

    CAPEX/Revenue Ratio: 5.1%

    Fixed assets/Total Assets: 20%

    Return on Assets= 12.4% 

AGENT'S ANSWER:
Based on the FY2022 data provided in the 3M_2022_10K resource, 3M can be
considered a capital-intensive business. The resource indicates that the company
has made significant investments in property, plant, and equipment, which are
typical indicators of a capital-intensive industry. Capital-intensive businesses
are characterized by a high proportion of capital assets to labor, and the focus
on enhancing manufacturing capabilities and aligning product capability with
sales in major geographic areas suggests that 3M relies heavily on physical
assets to generate revenue. The capital spending and net property, plant, and
equipment figures, although not quantified in

### Customized plan provided by the user

In [102]:
COMPANY = '3M'
PERIOD = '2022'
EXPERT_PLAN="""
cap-intens:
  task: >-
    Assess whether {COMPANY} is capital-intensive according to {PERIOD} fiscal period data

  sub-plans:
    # 1 single Retrieval task for multiple quantities on same statement, for both efficiency & mutual consistency;
    # retrieve individual starting & ending balance values only, without taking division or simple arithmetic average
    # because RAG LMs may not be good at calculation & mathematical reasoning
    - task: |-
        What are values in dollars of:
        - `(Net) Fixed Assets, a.k.a. (Net) Property, Plant & Equipment (PP&E)`; and
        - `Total Assets`
        (or most similar-meaning reported line item to those)

        on one same `(Consolidated) Balance Sheet, a.k.a. Statement of (Consolidated) Financial Position`
        (or most similar-meaning statement) of {COMPANY}
        (and NOT Balance Sheets of its acquired and/or divested companies)

        as at following two annual fiscal period ends:
        - previous annual fiscal period end immediately preceding {PERIOD}; and
        - current {PERIOD} annual fiscal period end?

    - task: >-
        What is value in dollars of
        `Capital Expenditure(s), a.k.a. CapEx, or Capital Spending, or Property, Plant & Equipment (PP&E) Expenditure(s)/Purchase(s)`
        (or most similar-meaning reported line item)

        on `(Consolidated) Cash Flow(s) Statement(s), a.k.a. (Consolidated) Statement(s) of Cash Flows`
        (or most similar-meaning statement)

        of {COMPANY} for {PERIOD} fiscal period?

    # 1 single Retrieval task for multiple quantities on same statement, for both efficiency & mutual consistency
    - task: |-
        What are values in dollars of:
        - `(Total) (Net) (Operating) Revenue(s), a.k.a. (Total) (Net) Sales`; and
        - `Net Income, a.k.a. Net Profit, or Net Earnings (or Loss(es)) (Attributable to Shareholders)`
        (or most similar-meaning reported line items to those)

        on `(Consolidated) Income Statement, a.k.a. (Consolidated) Profit-and-Loss (P&L) Statement,
        or (Consolidated) Earnings Statement, or (Consolidated) Operations Statement`
        (or most similar-meaning statement)

        of {COMPANY} for {PERIOD} fiscal period?

    - task: |-
        Assess whether {COMPANY} is capital-intensive according to {PERIOD} fiscal period data

        Capital-intensive businesses tend to have one or several of the following characteristics:

        - high `(Net) Fixed Assets, a.k.a. (Net) Property, Plant & Equipment (PP&E)` as proportion of `Total Assets`,
          e.g., over 25%;

        - high `Total Assets` relative to `(Total) (Net) (Operating) Revenue(s), a.k.a. (Total) (Net) Sales`,
          e.g., over 2 times;

        - high `Capital Expenditure(s), a.k.a. CapEx, or Capital Spending, or Property, Plant & Equipment (PP&E) Expenditure(s)/Purchase(s)`
          relative to `(Total) (Net) (Operating) Revenue(s), a.k.a. (Total) (Net) Sales`,
          e.g., over 10%;

          and/or

        - low `Return on (Total) Assets, a.k.a. RoA or RoTA`,
          e.g., under 10%,
          according to formula:

          `Return on (Total) Assets, a.k.a. RoA or RoTA` = (
            `Net Income, a.k.a. Net Profit, or Net Earnings (or Loss(es)) (Attributable to Shareholders)` /
            `average Total Assets, typically between two consecutive fiscal year-ends`
          )
"""

In [106]:
from openssa import HTP
from openssa.l2.task import Task


htp_dict = EXPERT_PLAN
htp =  HTP(task=Task.from_dict_or_str(htp_dict['task']),  # pylint: disable=unexpected-keyword-arg
                   sub_plans=[HTP.from_dict(d) for d in htp_dict.get('sub-plans', [])])

# htp: HTP = HTP.from_dict(EXPERT_PLAN_TEMPLATES[EXPERT_PLAN_MAP[fb_id]])
# htp.task.ask: str = QS_BY_FB_ID[fb_id]
# htp.concretize_tasks_from_template({EXPERT_PLAN_COMPANY_KEY='3M',
#                                           EXPERT_PLAN_PERIOD_KEY='2022'})

htp

ImportError: cannot import name 'Task' from 'openssa.l2.task' (unknown location)

In [None]:
agent = Agent(planner=AutoHTPlanner(max_depth=2, max_subtasks_per_decomp=4),
                reasoner=OodaReasoner(),
                knowledge=None,
                resources={FileResource(path=DOC_PATH)})

auto_htp_dynamically_solution = agent.solve(problem=PROBLEM, plan=EXPERT_PLAN, dynamic=False)

## 3. Add Your Own Domain Expertise to Enhance the Agent

### Before Adding Knowledge

### After Adding Knowledge

#### Sample Knowledge 

```
Balance-Sheet Line-Item Synonyms
--------------------------------

- "Total Assets", "TA(s)"

- "(Net) Fixed Assets", "(Net) FA(s)",
  "(Net) Property, Plant & Equipment", "(Net) PP&E", "(Net) PPNE",
  "(Net) Property & Equipment", "(Net) Plant & Equipment", "(Net) Property, Equipment & Intangibles"

- "(Total) (Net) Inventory", "(Total) (Net) Inventories",
  "(Total) (Net) Merchandise Inventory", "(Total) (Net) Merchandise Inventories"

- "(Net) Accounts Receivable", "(Net) AR", "(Net) (Trade) Receivables"

- "(Net) Accounts Payable", "(Net) AP"
```