# Overseer

The role of this `Researcher` is to develop leads from the main `research_question`, then create `HypothesisResearcher`s to explore each lead.

A lead consists of three things:
1. a research lead/direction for exploration
2. a working hypothesis to answer the `research_question`
3. observation(s) that led to the research lead and working hypothesis

## Initialize Overseer

`Overseer` takes five arguments
- `research_question`: the main research question to pursue.
- `background_report`: the background report generated by `BackgroundResearcher`. **Note**: maybe let overseer also take the `BackgroundResearcher` as an argument, rather than only the report generated.
- `librarian`: yea again the resource guy, which needs lots of parameters
- `hyp_researchers_config`: parameters to create `HypothesisResearcher` inside the `Overseer`
- `llm`: exactly what you think it is

anyway so because of this there's lots of initializing to do...

In [1]:
from kruppe.llm import OpenAILLM, OpenAIEmbeddingModel

research_question = "What are the key developments and financial projections for Amazon's advertising business, and how is it positioning itself in the digital ad market?"

### Initialize Librarian

In [2]:
# the rest are all for the librarian lmao
from kruppe.algorithm.librarian import Librarian
from kruppe.data_source.news.nyt import NewYorkTimesData
from kruppe.functional.rag.index.vectorstore_index import VectorStoreIndex
from kruppe.functional.rag.vectorstore.chroma import ChromaVectorStore
from kruppe.functional.docstore.mongo_store import MongoDBStore

reset_db = False
collection_name="playground_2"

llm = OpenAILLM()
embedding_model = OpenAIEmbeddingModel()

# Create doc store
unique_indices = [['title', 'datasource']] # NOTE: this is important to avoid duplicates
docstore = await MongoDBStore.acreate_db(
    db_name="kruppe_librarian",
    collection_name=collection_name,
    unique_indices=unique_indices,
    reset_db=reset_db
)

# Create vectorstore index
vectorstore = ChromaVectorStore(
    embedding_model=embedding_model,
    collection_name=collection_name,
    persist_path='/Volumes/Lexar/Daniel Liu/vectorstores/kruppe_librarian'
)
if reset_db:
    vectorstore.clear()
    
index = VectorStoreIndex(llm=llm, vectorstore=vectorstore)

# Define news data source
news_source = NewYorkTimesData(headers_path = "/Users/danielliu/Workspace/fin-rag/.nyt-headers.json")

# Define librarian
librarian_llm = OpenAILLM(model="gpt-4o-mini")
librarian = Librarian(
    llm=librarian_llm,
    docstore=docstore,
    index=index,
    news_source=news_source
)

### Get a Background Research Report on the `research_question`

#### Option 1: Initialize a `BackgroundResearcher` and generate a background research report

In [5]:
from kruppe.algorithm.background import BackgroundResearcher
bkg_researcher_llm = OpenAILLM(model="gpt-4o-mini")
bkg_researcher = BackgroundResearcher(
    llm=bkg_researcher_llm,
    librarian=librarian,
    research_question=research_question,
)

In [6]:
bkg_report = await bkg_researcher.execute()

print("\n===== Report =====")
print(bkg_report.text)

with open("bkg_report.txt", "w") as f:
    f.write(bkg_report.text)

Creating info requests...
Created 4 info requests


Answering info requests
:   0%|          | 0/4 [00:00<?, ?it/s]Retrieving from library for info request: To effectively answer the query regarding Amazon's...
Answering info requests
Answering info requests
Answering info requests[02:19<02:20, 70.10s/it]
Answering info requests[02:37<00:46, 46.19s/it]
Answering info requests[02:47<00:00, 31.99s/it]
: 100%|██████████| 4/4 [02:47<00:00, 41.85s/it]


Compiling report...

===== Report =====
**Background Report on Amazon's Advertising Business**

**Research Question:** What are the key developments and financial projections for Amazon's advertising business, and how is it positioning itself in the digital ad market?

---

**I. Overview of Amazon’s Advertising Financial Performance**

Understanding the financial health of Amazon's advertising segment provides critical insights into its growth trajectory and market positioning. 

1. **Revenue Figures**: 
   As of Q2 2023, Amazon reported advertising revenue of approximately $10.7 billion for the quarter, indicating a significant year-over-year increase. In 2022, total advertising revenue reached around $36.8 billion, reflecting over 25% growth from 2021. This consistent upward trend points towards robust demand for Amazon’s advertising services and confirms its crucial role within the digital advertising landscape.

2. **Profit Margins**: 
   The advertising business boasts profit marg

#### Option 2: Load in a previously generated background research report

In [3]:
with open("bkg_report.txt", "r") as f:
    bkg_report_text = f.read()
bkg_report_text

"**Background Report on Amazon's Advertising Business**\n\n**Research Question:** What are the key developments and financial projections for Amazon's advertising business, and how is it positioning itself in the digital ad market?\n\n---\n\n**I. Overview of Amazon’s Advertising Financial Performance**\n\nUnderstanding the financial health of Amazon's advertising segment provides critical insights into its growth trajectory and market positioning. \n\n1. **Revenue Figures**: \n   As of Q2 2023, Amazon reported advertising revenue of approximately $10.7 billion for the quarter, indicating a significant year-over-year increase. In 2022, total advertising revenue reached around $36.8 billion, reflecting over 25% growth from 2021. This consistent upward trend points towards robust demand for Amazon’s advertising services and confirms its crucial role within the digital advertising landscape.\n\n2. **Profit Margins**: \n   The advertising business boasts profit margins estimated to exceed 3

### Determine Config for `HypothesisResearcher`

This is because i've made it so that `Overseer` will create the `HypothesisResearcher`, so we need to tell `Overseer` the parameters to pass into all the `HypothesisResearcher`s

In [4]:
hyp_llm = OpenAILLM(model="gpt-4o")
hyp_config = {
    'librarian': librarian,
    'llm': hyp_llm,
    'iterations': 1,
    'iterations_used': 1,
    'num_info_requests': 3
}

### Initialize `Overseer`

In [5]:
from kruppe.algorithm.overseer import Overseer
overseer_llm = OpenAILLM(model="gpt-4o")
overseer = Overseer(
    llm=overseer_llm,
    librarian=librarian,
    research_question=research_question,
    background_report=bkg_report_text,
    hyp_researcher_config = hyp_config,
    num_leads=2
)

## Individual Function Test

### `create_leads`

Given the `research_question` and observations from `background_report`, I prompt the LLM to identify notable observations and determine `Lead`s to follow up on (for definition of what a `Lead` is, see above).

*Note: for some reason I decided to break convention here and make `create_leads` take the question and bkg_report as parameters... instead of just using the class variables... i have no idea why. I may change that in the future*

In [5]:
leads = await overseer.create_leads(query=research_question, report=bkg_report_text)
leads

[Lead(observation="Amazon's advertising revenue is projected to grow to $55 billion by 2025, yet it currently significantly trails Google and Meta's advertising revenues, estimated at $250 billion and $130 billion respectively in 2023. This wide gap persists despite Amazon's rapid growth and unique data advantages.", lead='Investigate the strategies Amazon might employ to scale its advertising revenue more rapidly to close the gap with Google and Meta. Particularly explore how Amazon can leverage its e-commerce platform to create new advertising models that competitors cannot replicate.', hypothesis="Amazon will develop innovative advertising formats integrated within its e-commerce transactions that will not only boost engagement but also drive higher conversion rates and revenue, positioning it uniquely against Google and Meta's traditional ad offerings."),
 Lead(observation="Amazon's advertising business enjoys profit margins exceeding 30%, significantly higher than Amazon's corpora

## `execute`

In [6]:
results = await overseer.execute()
results

Creating leads...
Created 5 leads
Creating hypothesis researchers...


hypothesis researchers: 100%|██████████| 5/5 [00:00<00:00, 8198.41it/s]


Executing hypothesis researchers...


Lead investigation iteration: 100%|██████████| 1/1 [01:06<00:00, 66.60s/it]


Finished executing hypothesis researcher 1


Lead investigation iteration:   0%|          | 0/1 [00:00<?, ?it/s]Retrieving from library for info request: 2. I want to examine case studies or reports on successful advertising campaigns conducted through Amazon's platform that highlight the effectiveness of their targeting and personalization strategies. These examples will illustrate the practical benefits achieved and provide evidence of Amazon's competitive positioning compared to rivals in terms of advertising outcomes.
Lead investigation iteration: 100%|██████████| 1/1 [01:35<00:00, 95.74s/it]


Finished executing hypothesis researcher 2


Lead investigation iteration: 100%|██████████| 1/1 [01:03<00:00, 63.18s/it]


Finished executing hypothesis researcher 3


Lead investigation iteration: 100%|██████████| 1/1 [02:32<00:00, 152.91s/it]


Finished executing hypothesis researcher 4


Lead investigation iteration: 100%|██████████| 1/1 [02:06<00:00, 126.63s/it]

Finished executing hypothesis researcher 5
Compiling reports...





[{'original_lead': Lead(observation="Amazon's advertising business reports profit margins exceeding 30%, which is higher than Amazon's overall corporate margins. This high profitability is particularly notable given the typically slim margins seen in Amazon’s core retail sector.", lead="Investigate the specific aspects of Amazon's advertising operations that contribute to its high profit margins compared to the broader corporate operations.", hypothesis="Amazon's advertising segment achieves higher profit margins through the integration of advanced targeting technologies and minimal customer acquisition costs due to its existing e-commerce platform, leading to cost efficiencies that outpace the rest of its corporate operations."),
  'report': Response(text="## Comprehensive Report on Amazon's Advertising Business\n\n### Introduction\n\nThe research question aims to explore the key developments and financial projections for Amazon's advertising business, examining how it positions itsel