# Build an AI Agent with SEC Filing Insights in Just 10 Minutes Using OpenSSA
--------------

### In this tutorial, you will learn how to:

1. Build an AI Agent from scratch with Hierachichy Task Planing (HTP) using openSSA
2. Improve agent's performance by:
    - Incorporating external knowledge source
    - Providing customized plan from the expert
    - Enabling dynamic solving capability

### By the end of this tutorial, you will understand:
- What is HTP and how it works?
- How to customize OpenSSA components to solve your complex problem?

## Setups

Let's start by impporting the neccessary dependencies.

In [64]:
%load_ext autoreload
%autoreload

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [63]:
from pathlib import Path
from pprint import pprint
import os
import sys

from IPython.display import display, Markdown
from dotenv import load_dotenv
import yaml

from openssa import Agent, HTP, AutoHTPlanner, OodaReasoner, FileResource
from openssa.utils.llms import OpenAILLM
from openssa.l2.task import Task

Make sure you plave your OpenAI API key in `example/.env`

```
OPENAI_API_KEY=...
```

[Where do I find my OpenAI API Key?](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)

In [65]:
# make sure we're in the right folder
if cwd_is_root := ('examples' in os.listdir()):
    sys.path.append('examples')

In [66]:
print('Sanity check if we have the OpenAI API setup: ', load_dotenv(dotenv_path=Path('examples' if cwd_is_root else '.') / '.env'))

Sanity check if we have the OpenAI API setup:  True


In [41]:
# util function to summarize answer
def summarize_ans(ans, max_tokens=100):
    llm=OpenAILLM()
    response = llm.call(
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Please summarize the following text into 1-2 sentences: " + ans}
        ],
        max_tokens=max_tokens,
        temperature=0.7
    )
    summary = response.choices[0].message.content
    return summary

In [120]:
# util function to print results
import textwrap

def namestr(obj, namespace):
    return [name for name in namespace if namespace[name] is obj]

def print_solution(sol, present_full_answer=False):
    agent_name = namestr(sol, globals())[0].upper().replace('_', ' ')
    # print(agent_name)
    print('PROBLEM: ')
    print('='*80)
    print(PROBLEM, '\n')
    if GROUND_TRUTH_ANSWER:
        print('GROUND TRUTH ANSWER: ')
        print('='*80)
        print(GROUND_TRUTH_ANSWER, '\n')
    if present_full_answer:
        print(f'{agent_name} FULL:')
        print('='*80)
        print(textwrap.fill(sol, 80))
    else:
        print(f'{agent_name} SUMMARIZED:')
        print('='*80)
        print(textwrap.fill(summarize_ans(sol), 80))

### Data preparation

We're going to use the FinanceBench dataset to demonstrate. FinanceBench is a dataset to benchmark question answering capability in financial domain.

We have loaded a sample SEC filing for 3M from 2022. 
https://github.com/patronus-ai/financebench/blob/main/pdfs/3M_2022_10K.pdf

- Let's look at a sample question: 

`Is 3M a capital-intensive business based on FY2022 data`

- The expected answer for this question is:

`No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4%`

In [40]:
DOC_PATH = 'sample_data/3M_2022_10K/'
PROBLEM = 'Is 3M a capital-intensive business based on FY2022 data?'
GROUND_TRUTH_ANSWER ='''
    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4%'''

Now, we'll build an agent from scracth using [OpenSSA](https://www.openssa.org/).

## Build an AI Agent from Scratch Using OpenSSA
------------

### Base Agent

Let's build our first agent with all default settings. 

<img src="./FinanceBench/diagrams/base-agent.png" height="100" />

To build an agent, the first and most basic resource we need is a document. We will learn how to enable hierarchical task planning (HTP) capability and how to customize it's component later. Let's first build a `Base Agent`` with only the document we've prepared in the previous block and see how well it can solve the question. 

In [73]:
# Build a base agent
base_agent = Agent(planner=None,
                   reasoner=OodaReasoner(),
                   knowledge=None,
                   resources={FileResource(path=DOC_PATH)})

base_agent_answer = base_agent.solve(problem=PROBLEM,
                                       plan=None,
                                       dynamic=False)

In [121]:
print_solution(base_agent_answer)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUND TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

BASE AGENT ANSWER SUMMARIZED:
3M's financial statements for FY2022 show significant capital investments in
property, plant, and equipment (PP&E), with capital expenditures amounting to
$1,831 million and total assets reported at $46,455 million. The company's focus
on growth, productivity, and sustainability is reflected in its projected
capital spending of $1.5 billion to $1.8 billion for 2023, demonstrating a
commitment to supporting business activities and driving future growth through
capital investments and strategic resource management practices


In this example, we can see the default answer is not that good. 3M is not a capital intensive business but the agent failed to answer the question correctly.

## How to Add External Knowledge to the Agent

Let's incorporate external knowledge to the base agent. We've prepared a sample expert knowledge in `sample-data/expert-knowledge.txt` file, you can load your own knowledge by replacing the sample file with yours.

<img src="./FinanceBench/diagrams/agent-with-knowledge.png" height="100" />

In [96]:
with open(file='sample_data/expert-knowledge.txt',
          buffering=-1,
          encoding='utf-8',
          errors='strict',
          newline=None,
          closefd=True,
          opener=None) as f:
    EXPERT_KNOWLEDGE: str = f.read()

EXPERT_KNOWLEDGE_SET = set(EXPERT_KNOWLEDGE.split('\n\n'))

In the added knowledge, we've specified 

```
Capital-Intensiveness / Return-on-Capital Metric Formulas
---------------------------------------------------------

`Capital Intensity Ratio` = `Total Assets` / `(Total) (Net) (Operating) Revenue(s), a.k.a. (Total) (Net) Sales`

`Return on (Total) Assets, a.k.a. RoA or RoTA` = (
  `Net Income, a.k.a. Net Profit, or Net Earnings (or Loss(es)) (Attributable to Shareholders)` /
  `average Total Assets, typically between two consecutive fiscal year-ends`
)
```

Let's add the knowledge set to our base agent.

In [97]:
agent_with_knowledge = Agent(planner=None,
                             reasoner=OodaReasoner(),
                             knowledge=EXPERT_KNOWLEDGE_SET,
                             resources={FileResource(path=DOC_PATH)})

agent_with_knowledge_solution = agent_with_knowledge.solve(problem=PROBLEM,
                                                           plan=None,
                                                           dynamic=False)

In [109]:
print_solution(agent_with_knowledge_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUND TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

AGENT WITH KNOWLEDGE SOLUTION SUMMARIZED:
Based on the substantial capital expenditures, large asset base, and planned
future investments in operational infrastructure and capacity enhancement, it is
reasonable to classify 3M as a capital-intensive business for FY2022.


Although the final answer is still incorrect, we can see the reasoning behind is getting better when using external resource - the agent can now recognize `assets`` need to be taken into account when looking at capital intensiveness questions.

## Get started with HTP by Adding Auto-Plan on top of Knowledge

We can see the agent is improved with added knowledge. Let's enhance it with OpenSSA's HTP feature: `AutoHTPlanner`.

<img src="./FinanceBench/diagrams/auto-htp-agent-with-knowledge.png" height="100" />

`HTP` is OpenSSA’s default problem-solving task plan structure.

A `HTP` instance is a tree, in which each node can be decomposed into a number of supporting sub-HTPs, each targeting to solve a supporting sub-task.

`HTP` execution involves using a specified Reasoner to work through sub-tasks from the lowest levels and roll up results up to the top level.

There is also a horizontal results-sharing mechanism to enable the execution of a subsequent HTP node to benefit from results from earlier nodes at the same depth level.

`AutoHTPlanner` is OpenSSA’s default Planner to create and update problem-solving HTPs.

Such a planner has an LM for generating new or updated task HTPs, the complexity of which is controlled by 2 key parameters `max_depth` and `max_subtasks_per_decomp`. 

<img src="./FinanceBench/diagrams/htp.png" />


In [None]:
auto_htp_agent_with_knowledge = Agent(planner=AutoHTPlanner(max_depth=2, max_subtasks_per_decomp=4),
                                      reasoner=OodaReasoner(),
                                      knowledge=EXPERT_KNOWLEDGE_SET,
                                      resources={FileResource(path=DOC_PATH)})

auto_htp_agent_with_knowledge_solution = auto_htp_agent_with_knowledge.solve(problem=PROBLEM,
                                                                             plan=None,
                                                                             dynamic=False)

You can read the full logs of all the intermediate steps in `logs/auto_htp_agent_with_knowledge_logs.txt`

In [110]:
print_solution(auto_htp_agent_with_knowledge_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUND TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

AUTO HTP AGENT WITH KNOWLEDGE SOLUTION SUMMARIZED:
Based on the available FY2022 data, 3M's net property, plant, and equipment
(PP&E) constitutes 19.75% of its total assets, indicating that it may not be
highly capital-intensive relative to some industries. However, without
additional information on capital expenditures (CapEx) to sales ratio,
depreciation and amortization expenses, and return on assets (RoA), a definitive
assessment of 3M's capital intensity cannot be made.


We can see when breaking down the task into other sub-tasks, the agent gives more concrete reasons to answer the question: `key financial metrics such as the
proportion of net fixed assets to total assets, capital expenditure relative to
total net sales, depreciation and amortization expense as a percentage of total
net sales, and Return on Assets cannot be calculated without specific financial
data`. However, the final answer is still incorrect - the agent still fails to answer 3M is not a capital-intensive business.

## Let's Upgrade the Agent to Solve the Problem Dynamically

Let's enable another `HTP` component: `Dynamic` solving. When a problem is solved dynamically, it would be decomposed further if the sub-tasks are still not solvable.


<img src="./FinanceBench/diagrams/dynamic-auto-htp-agent-with-knowledge.png" height="100" />

In [103]:
dynamic_auto_htp_agent_with_knowledge = Agent(planner=AutoHTPlanner(max_depth=2, max_subtasks_per_decomp=4),
                reasoner=OodaReasoner(),
                knowledge=EXPERT_KNOWLEDGE_SET,
                resources={FileResource(path=DOC_PATH)})

dynamic_auto_htp_agent_with_knowledge_solution = dynamic_auto_htp_agent_with_knowledge.solve(problem=PROBLEM,
                                                                                             plan=None,
                                                                                             dynamic=True)

In [111]:
print_solution(dynamic_auto_htp_agent_with_knowledge_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUND TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

DYNAMIC AUTO HTP AGENT WITH KNOWLEDGE SOLUTION SUMMARIZED:
Based on the FY2022 data provided, 3M is identified as a capital-intensive
business due to its significant capital expenditures, large total asset base,
focus on environmental expenditures, and structured asset management practices.
These factors collectively indicate a substantial investment in physical assets
and operational capabilities characteristic of capital-intensive businesses.


With the added knowledge, neither solving statistically nore dynamically could help the agent to get to the final answer correctly. Let's customize the most powerful component of `HTP`: the plan.

## Incorporating Expert HTP instead of Auto-HTP

With OpenSSA, the user can customize the plan instead of depending on the auto-generated plan. Let's add an expert plan on top of our beginning Base Agent to see how it performs. 

<img src="./FinanceBench/diagrams/expert-htp-agent.png" height="100" />

We've prepared a sample expert plan, but please feel free to customize the expert plan yourself.

In [112]:
variables = {
    'COMPANY': '3M',
    'PERIOD': '2022'
}

with open('sample_data/expert-plan-templates-sample.yml', 'r') as file:
    EXPERT_PLAN_TEMPLATES_CONTENT = file.read()
EXPERT_PLAN_TEMPLATES_CONTENT = EXPERT_PLAN_TEMPLATES_CONTENT.format(**variables)
EXPERT_PLAN = yaml.safe_load(EXPERT_PLAN_TEMPLATES_CONTENT)

EXPERT_HTP =  HTP(task=Task.from_dict_or_str(EXPERT_PLAN['task']),
                   sub_plans=[HTP.from_dict(d) for d in EXPERT_PLAN.get('sub-plans', [])])

In [None]:
expert_htp_agent = Agent(planner=AutoHTPlanner(max_depth=2, max_subtasks_per_decomp=4),
                         reasoner=OodaReasoner(),
                         knowledge=None,
                         resources={FileResource(path=DOC_PATH)})

expert_htp_agent_solution = expert_htp_agent.solve(problem=PROBLEM,
                                                   plan=EXPERT_HTP,
                                                   dynamic=False)

You can read the full logs of all the intermediate steps in `logs/expert_htp_agent_logs.txt`

In [114]:
print_solution(expert_htp_agent_solution)

PROBLEM: 
Is 3M a capital-intensive business based on FY2022 data? 

GROUND TRUTH ANSWER: 

    No, the company is managing its CAPEX and Fixed Assets pretty efficiently,
    which is evident from below key metrics:
    CAPEX/Revenue Ratio: 5.1%
    Fixed assets/Total Assets: 20%
    Return on Assets= 12.4% 

EXPERT HTP AGENT SOLUTION SUMMARIZED:
Based on the 2022 fiscal period data, although 3M has a significant investment
in Net Property, Plant & Equipment and a substantial asset base relative to its
sales, its Capital Expenditures and Return on Assets metrics do not align with
typical characteristics of a capital-intensive business. Therefore, 3M does not
fully exhibit the characteristics of a capital-intensive business according to
the provided benchmarks.


Yay! By incorporating the expert's plan, we instantly get the correct answer! 

## Try It Yourself!

So now you've learned how OpenSSA's `HTP` works. You can try different combination of knobs that you can turn, including:
- auto-plan vs expert-plan
- statistically solving vs dynamically solving
- external knowledge vs no external knowledge

Some tips and tricsk:
- If you want the fastest way to be up and running with HTP with ok-performance: try auto-plan with added knowledge and dynamically solving.
- If you want a sufficiently good result with least customization and runtime: try adding expert-plan without anything else
_ If you want the best result: try adding expert-plan with knowledge and dynamically solving!
