# Background Research

Given a central research question, generate a report that puts the research question into the proper context. Its split into three general steps:
1. Initialize `BackgroundResearcher` and take in research query
2. Create information requests
3. Answer the information requests with the help of the `Librarian`
4. Compile background research report

## Initialize Background Researcher

A background researcher needs a couple of things:
- `research_question`: the central research question that is being answered
- `librarian` of the type `Librarian`

Doesn't seem like a lot, but first take a look at how many parameters `librarian` needs before saying anything.

Anyway here's some optional parameters:
- `system_message`: currently set to a default one, but you can customize it
- `history`: a list of tuples of type (str, Response). the string part being the information requests made, and the Response being the response to the information requests, and contains an attribute`sources` which are the retrieved contexts from the librarian that's used to answer the information request


In [1]:
from kruppe.algorithm.background import BackgroundResearcher

# the rest are all for the librarian lmao
from kruppe.algorithm.librarian import Librarian
from kruppe.data_source.news.nyt import NewYorkTimesData
from kruppe.functional.rag.index.vectorstore_index import VectorStoreIndex
from kruppe.functional.rag.vectorstore.chroma import ChromaVectorStore
from kruppe.functional.docstore.mongo_store import MongoDBStore
from kruppe.llm import OpenAILLM, OpenAIEmbeddingModel

reset_db = True
collection_name="playground_2"

llm = OpenAILLM()
embedding_model = OpenAIEmbeddingModel()

# Create doc store
unique_indices = [['title', 'datasource']] # NOTE: this is important to avoid duplicates
docstore = await MongoDBStore.acreate_db(
    db_name="kruppe_librarian",
    collection_name=collection_name,
    unique_indices=unique_indices,
    reset_db=reset_db
)

# Create vectorstore index
vectorstore = ChromaVectorStore(
    embedding_model=embedding_model,
    collection_name=collection_name,
    persist_path='/Volumes/Lexar/Daniel Liu/vectorstores/kruppe_librarian'
)
if reset_db:
    vectorstore.clear()
    
index = VectorStoreIndex(llm=llm, vectorstore=vectorstore)

# Define news data source
news_source = NewYorkTimesData(headers_path = "/Users/danielliu/Workspace/fin-rag/.nyt-headers.json")

# Define librarian
librarian_llm = OpenAILLM(model="gpt-4o-mini")
librarian = Librarian(
    llm=librarian_llm,
    docstore=docstore,
    index=index,
    news_source=news_source
)

In [2]:
research_question = "What are the key developments and financial projections for Amazon's advertising business, and how is it positioning itself in the digital ad market?"

bkg_researcher_llm = OpenAILLM(model="gpt-4o-mini")
bkg_researcher = BackgroundResearcher(
    llm=bkg_researcher_llm,
    librarian=librarian,
    research_question=research_question,
)

## Testing Individual Methods

### `create_info_request`

Given a research question, determine the information that needs to be answered to create a comprehensive background

**TODO** in `librarian`: turn these `info_request`'s into `info_query` for RAG retrieval

In [6]:
info_requests = await bkg_researcher.create_info_requests()
info_requests

["To answer the query comprehensively, I want to gather information about Amazon's recent earnings reports, particularly focusing on the advertising segment. This information can provide insights into revenue growth, key performance indicators, and overall contributions of the advertising sector to Amazon's total revenue, helping to paint a clear picture of current financial performance.",
 "I am interested in understanding the competitive landscape of the digital advertising market, specifically Amazon's position relative to its main competitors like Google and Facebook. This includes market share, unique selling propositions (USPs), and emerging trends within the industry. This context is crucial to understand how Amazon differentiates itself and adapts its strategies in an increasingly crowded market.",
 "I want to look into Amazon's strategic initiatives related to its advertising business, such as partnerships, technology investments, and platform enhancements. This information wi

### `answer_info_request`

Given an `info_request`, query the librarian to retrieve relevant contexts. Then, answer the question using the contexts (basically RAG with extra steps lol). Finally, add the `info_request` and `Response` pair into `self.history`

Note: this returns a `Response` object

In [None]:
info_request = info_requests[0]
response = await bkg_researcher.answer_info_request(info_request)

print("Info Request:")
print(info_request)
print("\nResponse:")
print(response.text)
print("\nContexts:")
print("\n\n".join(doc.text for doc in response.sources))

Info Request:
To answer the query comprehensively, I want to gather information about Amazon's recent earnings reports, particularly focusing on the advertising segment. This information can provide insights into revenue growth, key performance indicators, and overall contributions of the advertising sector to Amazon's total revenue, helping to paint a clear picture of current financial performance.

Response:
To gather the necessary insights regarding Amazon's advertising segment, we can turn to the company’s recent earnings reports, which provide a window into the performance of its advertising business within the broader scope of its operations. The most recent quarterly report indicated that Amazon's advertising sales reached approximately $14.3 billion, marking a slight deceleration compared to previous periods. This performance comes at a time when the company is investing significantly in its infrastructure and technological capabilities, especially within the realm of artificia

### `compile_report`

Puts everything in `self.history` together to answer the research question, but with an explicit instruction that the report should focus on constructing a report on **background research**

Note: this returns a `Response` object

In [10]:
# im p sure im right so imma just skip this :)

## `execute`

Aight let's see if the whole thing works :)

In [None]:
bkg_report = await bkg_researcher.execute()

print("\n===== Report =====")
print(bkg_report.text)

Creating info requests...
Created 6 info requests


  0%|          | 0/6 [00:00<?, ?it/s]Retrieving from library for info request: To answer the query comprehens...
100%|██████████| 6/6 [03:26<00:00, 34.36s/it]


Compiling report...

===== Response =====
# Background Report: Key Developments and Financial Projections for Amazon's Advertising Business

## Introduction 
Amazon's foray into the digital advertising market has marked a significant turning point in the broader advertising landscape. As traditional forms of advertising evolve and consumer behavior shifts towards online platforms, understanding the key developments and financial projections for Amazon's advertising business provides essential insights into its competitive position and future prospects.

## Recent Market Trends in Digital Advertising
The digital advertising industry is rapidly growing, with projections indicating a global market valued at over $650 billion by 2025, representing a compound annual growth rate (CAGR) of approximately 10% from 2023 onwards. Various factors are driving this growth, including increased internet penetration, the rise of e-commerce, and the growing preference for online advertising among consum

NameError: name 'response' is not defined