# Build an AI Agent with SEC Filing Insights in Just 10 Minutes Using OpenSSA

In this tutorial, you will learn how to:

1. Build an AI Agent from Scratch Using openSSA
2. Customize Plans to Guide the Agent Through Complex Problem-Solving
3. Add Your Own Domain Expertise to Enhance the Agent

## Setups

Let's start by impporting the neccessary dependencies.

In [1]:
%load_ext autoreload
%autoreload

In [2]:
from pprint import pprint
from IPython.display import display, Markdown

In [3]:
import os
import sys

if cwd_is_root := ('examples' in os.listdir()):
    sys.path.append('examples')

Make sure you plave your OpenAI API key in `example/.env`

```
OPENAI_API_KEY=...
```

[Where do I find my OpenAI API Key?](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)

In [4]:
from pathlib import Path
from dotenv import load_dotenv

print('Sanity check if we have the OpenAI API setup: ', load_dotenv(dotenv_path=Path('examples' if cwd_is_root else '.') / '.env'))

Sanity check if we have the OpenAI API setup:  False


In [12]:
from openssa import Agent, HTP, AutoHTPlanner, OodaReasoner, FileResource
from openssa.utils.llms import OpenAILLM

## 1. Build an AI Agent from Scratch Using OpenSSA

### Build Agent

We're going to use the FinanceBench dataset to demonstrate. We have loaded a sample SEC filing for 3M from 2022. 

https://github.com/patronus-ai/financebench/blob/main/pdfs/3M_2022_10K.pdf

In [5]:
DOC_PATH = 'sample_data/3M_2022_10K/'
PROBLEM = 'Is 3M a capital-intensive business based on FY2022 data?'
GROUND_TRUTH_ANSWER ='''
    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4%'''

In [53]:
# util function to summarize answer
def summarize_ans(ans, max_tokens=100):
    llm=OpenAILLM()
    response = llm.call(
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Please summarize the following text into 1-2 sentences: " + ans}
        ],
        max_tokens=max_tokens,
        temperature=0.7
    )
    summary = response.choices[0].message.content
    return summary

In [54]:
# util function to print
import textwrap
def print_solution(sol, present_full_answer=False):
    print('PROBLEM: ')
    print('====================')
    print(PROBLEM, '\n')
    print('GROUND TRUTH ANSWER: ')
    print('====================')
    print(GROUND_TRUTH_ANSWER, '\n')
    print('AGENT\'S SUMMARIZED ANSWER:')
    print('====================')
    print(textwrap.fill(summarize_ans(sol), 80))
    if present_full_answer:
        print('AGENT\'S FULL ANSWER:')
        print('====================')
        print(textwrap.fill(sol, 80))


Let's build our first agent with all default settings. 

In [9]:
# Build a base agent
agent = Agent(planner=None,
              reasoner=OodaReasoner(),
              knowledge=None,
              resources={FileResource(path=DOC_PATH)})

base_solution = agent.solve(problem=PROBLEM, plan=None, dynamic=False)

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 252/252 [00:00<00:00, 929.28it/s]
Generating embeddings: 100%|██████████| 312/312 [00:07<00:00, 40.75it/s]


In [41]:
print_solution(base_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUTH TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

AGENT'S SUMMARIZED ANSWER:
Based on FY2022 data, 3M is confirmed as a capital-intensive business, as
evidenced by its high capital expenditures of $1,831 million and total assets of
$46,455 million. The company's significant investments in physical assets,
technology, facilities, sustainability, and workforce development indicate a
strong reliance on capital investment for growth, efficiency, and innovation.


In this example, we can see the default answer is not that good. 3M is not a capital intensive business but the agent failed to answer the question correctly. Let's incorporate planning capability to enhance the agent.

## Customize Plans to Guide the Agent Through Complex Problem-Solving

### Auto-generated plan with OpenSSA

Let's upgrade our agent to incorporate planning, in this example we're decomposing the task into 4 subtasks with a hierachy with the depth of 2 layers. Each plan for the subtask is auto-generated by an LLM.

In [None]:
# RUN THIS CELL AGAIN TO SEE THE FULL LOGS
agent = Agent(planner=AutoHTPlanner(max_depth=2, max_subtasks_per_decomp=4),
                reasoner=OodaReasoner(),
                knowledge=None,
                resources={FileResource(path=DOC_PATH)})

auto_htp_statically_solution = agent.solve(problem=PROBLEM, plan=None, dynamic=False)

** Note that the full logs of HTP steps have been cleared for better readability. To show full logs to understand step-by-step execution of HTP, run the above cell again.

In [47]:
print_solution(auto_htp_statically_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUTH TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

AGENT'S SUMMARIZED ANSWER:
Based on the FY2022 data provided in the 3M 10-K report, while we cannot
definitively confirm 3M's capital intensity due to the lack of total revenue
figures and industry comparisons, the significant proportion of assets tied up
in property, plant, and equipment (PP&E) at 24.6% suggests that 3M likely
operates as a capital-intensive business. Further analysis with additional data
is needed for a conclusive assessment.


### Auto-generated plan with dynamically solving

In [48]:
agent = Agent(planner=AutoHTPlanner(max_depth=2, max_subtasks_per_decomp=4),
                reasoner=OodaReasoner(),
                knowledge=None,
                resources={FileResource(path=DOC_PATH)})

auto_htp_dynamically_solution = agent.solve(problem=PROBLEM, plan=None, dynamic=True)

In [49]:
print_solution(auto_htp_dynamically_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUTH TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

AGENT'S SUMMARIZED ANSWER:
The FY2022 data reveals that 3M is a capital-intensive business, as evidenced by
its significant capital expenditures on property, plant, and equipment amounting
to $1,831 million, substantial total assets of $46,455 million, and projected
capital spending for 2023 estimated between $1.5 billion and $1.8 billion. The
company's strategic focus on maintaining and expanding manufacturing
capabilities and managing raw material inventories further supports its capital-
intensive nature.


### Customized plan provided by the user

The sample plan is provided in the `sample_data/expert-plan-templates-sample.yml'`. Let's load it and see how an expert plan is structured.

In [None]:
import yaml
variables = {
    'COMPANY': '3M',
    'PERIOD': '2022'
}

with open('sample_data/expert-plan-templates-sample.yml', 'r') as file:
    EXPERT_PLAN_TEMPLATES_CONTENT = file.read()
EXPERT_PLAN_TEMPLATES_CONTENT = EXPERT_PLAN_TEMPLATES_CONTENT.format(**variables)
EXPERT_PLAN = yaml.safe_load(EXPERT_PLAN_TEMPLATES_CONTENT)

In [None]:
from openssa import HTP
from openssa.l2.task import Task

EXPERT_HTP =  HTP(task=Task.from_dict_or_str(EXPERT_PLAN['task']),  # pylint: disable=unexpected-keyword-arg
                   sub_plans=[HTP.from_dict(d) for d in EXPERT_PLAN.get('sub-plans', [])])

In [None]:
# RUN THIS CELL AGAIN TO SEE THE FULL LOGS
agent = Agent(planner=AutoHTPlanner(max_depth=2, max_subtasks_per_decomp=4),
                reasoner=OodaReasoner(),
                knowledge=None,
                resources={FileResource(path=DOC_PATH)})

expert_htp_statiscally_solution = agent.solve(problem=PROBLEM, plan=EXPERT_HTP, dynamic=False)

In [81]:
print_solution(expert_htp_statiscally_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUTH TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

AGENT'S SUMMARIZED ANSWER:
Based on the 2022 fiscal period data from 3M's 10-K filing, the analysis of
specific financial metrics such as the proportion of Net PP&E to Total Assets,
Total Assets to Total Net Sales ratio, Capital Expenditures to Total Net Sales
ratio, and Return on Total Assets (RoA) suggests that 3M is not classified as a
capital-intensive company for the 2022 fiscal period. The figures indicate that
the company's capital intensity levels, investment in property,


With the added expert's plan, we can see the agent can answer the question correctly this time! The agent can now identify that 3M is not a capital-intensive business!

## 3. Add Your Own Domain Expertise to Enhance the Agent

### Before Adding Knowledge

In [None]:
agent = Agent(planner=None,
              reasoner=OodaReasoner(),
              knowledge=None,
              resources={FileResource(path=DOC_PATH)})

### After Adding Knowledge

#### Auto-generated plan with added knowledge

In [None]:
with open(file='sample_data/expert-knowledge.txt',
          buffering=-1,
          encoding='utf-8',
          errors='strict',
          newline=None,
          closefd=True,
          opener=None) as f:
    EXPERT_KNOWLEDGE: str = f.read()

EXPERT_KNOWLEDGE_SET = set(EXPERT_KNOWLEDGE.split('\n\n'))

In [99]:
agent = Agent(planner=None,
              reasoner=OodaReasoner(),
              knowledge=EXPERT_KNOWLEDGE_SET,
              resources={FileResource(path=DOC_PATH)})
auto_htp_statically_with_knowledge_solution = agent.solve(problem=PROBLEM, plan=None, dynamic=False)

In [100]:
print_solution(auto_htp_statically_with_knowledge_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUTH TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

AGENT'S SUMMARIZED ANSWER:
3M is deemed a capital-intensive business for FY2022 due to its high capital
expenditures for property, plant, and equipment, substantial total assets, and
projected future capital spending. These financial metrics suggest a significant
investment in operational capacity and manufacturing capabilities, indicating a
capital-intensive industry.


With the auto-generated plan, even with the added knowledge, looks like the agent is still struggled to solve the problem.
Let's add the expert's plan to see whether it's more valuable for the agent to be able to solve this.

#### Expert-provided plan with added knowledge

In [None]:
agent = Agent(planner=None,
              reasoner=OodaReasoner(),
              knowledge=EXPERT_KNOWLEDGE_SET,
              resources={FileResource(path=DOC_PATH)})
expert_htp_statically_with_knowledge_solution = agent.solve(problem=PROBLEM, plan=EXPERT_HTP, dynamic=False)

In [108]:
print_solution(expert_htp_statically_with_knowledge_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUTH TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

AGENT'S SUMMARIZED ANSWER:
To determine if 3M is a capital-intensive company for the 2022 fiscal period,
one would typically analyze metrics like Net Fixed Assets to Total Assets ratio,
Total Assets compared to Total Net Operating Revenues, Capital Expenditures
relative to Net Operating Revenues, and Return on Total Assets. However, without
specific financial data for 3M in 2022, such as Net Fixed Assets, Total Assets,
Total Net Operating Revenues, Capital Expenditures, and Net Income, a
