# Generate Reports

1. Vanilla LLM
2. Naive RAG
3. Kruppe Model

In [1]:
import pandas as pd
import asyncio
from kruppe.llm import OpenAILLM
import pathlib

experiment_dir = pathlib.Path("/Users/danielliu/Workspace/fin-rag/experiments")

In [2]:
df = pd.read_csv("./reports.csv", index_col=False)
df

Unnamed: 0,category,human_report_loc,question
0,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What is the potential timing and impact of Con...
1,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What is the projected free cash flow growth tr...
2,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What are the updated 1Q25 earnings estimates f...
3,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What are the expectations and potential impact...
4,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What is the rationale behind J.P. Morgan's dec...
...,...,...,...
68,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What are the expectations and key topics of di...
69,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What are the key capital expenditure trends an...
70,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What is the expected future performance of NVI...
71,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What is the current state and future outlook o...


In [3]:
llm = OpenAILLM(model="gpt-4o")

async def generate_title(report: str):
    user_message = f"Given a report, create a short and concise title. Respond only with the title. Make sure every character is file name comptaible. \n\nReport: {report}\n\nOutput:"
    messages = [
        {"role": "system", "content": "You are a helpful assistant that summarizes reports."},
        {"role": "user", "content": user_message},
    ]

    llm_response = await llm.async_generate(messages)
    llm_string = llm_response.text
    return llm_string

## Vanilla LLM

In [4]:
llm = OpenAILLM(model="gpt-4o")

In [5]:
from kruppe.prompts.experiments import VANILLA_QA_SYSTEM
async def vanilla_llm_response(question: str):
    messages = [
        {"role": "system", "content": VANILLA_QA_SYSTEM},
        {"role": "user", "content": question},
    ]

    llm_response = await llm.async_generate(messages)
    llm_string = llm_response.text
    return llm_string

# example:
await vanilla_llm_response(df.question[0])

"**Title: Evaluating the Timing and Impact of ConocoPhillips' Free Cash Flow Inflection and its Commitment to Return $10 Billion to Shareholders**\n\n**Thesis:**\nConocoPhillips, a leading player in the energy sector, is navigating an increasingly volatile market while aiming to achieve a free cash flow (FCF) inflection point that could bolster its $10 billion return of capital to shareholders. Despite fluctuating oil prices, strategic operational efficiencies, disciplined capital expenditure, and robust hedging strategies are poised to support this ambitious commitment.\n\n**I. Understanding the Free Cash Flow Inflection:**\n\n1. **Defining FCF Inflection:**\n   The concept of an FCF inflection refers to the point at which a company's free cash flow transitions from a negative or neutral state to a positive trajectory. For ConocoPhillips, this transition is critical as it underpins its capacity to distribute substantial capital returns to shareholders.\n\n2. **Projected Timing:**\n   

In [6]:
vanilla_reports = await asyncio.gather(*[vanilla_llm_response(question) for question in df.question])
vanilla_reports

["**Title: Evaluating the Timing and Impact of ConocoPhillips' Free Cash Flow Inflection Amidst Oil Price Volatility**\n\n**Thesis:**  \nConocoPhillips is strategically positioned to reach a significant free cash flow (FCF) inflection point that will bolster its ability to fulfill a $10 billion return of capital commitment. Despite inherent oil price fluctuations, the company’s robust production strategy, cost management, and shareholder return policies are likely to secure its financial objectives.\n\n**Supporting Arguments:**\n\n1. **Operational Efficiency and Portfolio Diversification:**\n   ConocoPhillips' strategic approach focuses heavily on operational efficiency and asset optimization. By leveraging its diversified portfolio, which spans across key production regions such as the Permian Basin, Eagle Ford, and the North Slope of Alaska, the company can mitigate risks associated with localized economic disruptions and price volatility. The balanced production mix, including both 

In [7]:
vanilla_titles = await asyncio.gather(*[generate_title(report) for report in vanilla_reports])
vanilla_titles

['ConocoPhillips-FCF-Inflection-Amid-Oil-Volatility',
 'Chevron_FCF_Growth_Trajectory_2025-2027',
 'ExxonMobil_1Q25_Earnings_Estimates_Analysis',
 'ExxonMobil_Project_Startups_2025_Financial_Growth_Analysis',
 'JP_Morgan_Raises_ConocoPhillips_Price_Target_to_127',
 'EOG_Resources_Price_Target_Analysis_Factors_and_Implications',
 'Projected_Future_Performance_ConocoPhillips_Production_Cash_Shareholder_Returns',
 'OXY_Financial_Forecast_Strategy_2026_Post_Q4_Adjustments',
 'EOG_Resources_Q1_2025_Performance_Projections',
 'Chevron_Corp_Project_Stock_Performance_Shareholder_Returns',
 'EOG_Resources_2025_Financial_Outlook_and_Strategic_Initiatives',
 'EOG_Resources_2025_FCF_Guidance_vs_Market_Expectations',
 'ConocoPhillips_FCF_CapitalProjects_Impact_2029_vs_US_Oil_Peers',
 'JP_Morgan_Chase_Investment_Prospects_and_Financial_Performance_Report',
 'Citigroup_Value_Creation_Analysis_2024_2026',
 'Potential_Risks_and_Strategies_for_Bank_of_America',
 'Leveraging_AI_for_Competitive_Advantage_

In [8]:
vanilla_report_dir = experiment_dir / "vanilla_report"
vanilla_report_dir.mkdir(exist_ok=True)

file_paths = []
for i, report in enumerate(vanilla_reports):
    title = vanilla_titles[i]
    file_path = vanilla_report_dir / f"{title}.txt"
    file_path.write_text(report)
    file_paths.append(file_path.as_posix())
    print(f"Saved report {title}")

df["vanilla_report_loc"] = file_paths
df

Saved report ConocoPhillips-FCF-Inflection-Amid-Oil-Volatility
Saved report Chevron_FCF_Growth_Trajectory_2025-2027
Saved report ExxonMobil_1Q25_Earnings_Estimates_Analysis
Saved report ExxonMobil_Project_Startups_2025_Financial_Growth_Analysis
Saved report JP_Morgan_Raises_ConocoPhillips_Price_Target_to_127
Saved report EOG_Resources_Price_Target_Analysis_Factors_and_Implications
Saved report Projected_Future_Performance_ConocoPhillips_Production_Cash_Shareholder_Returns
Saved report OXY_Financial_Forecast_Strategy_2026_Post_Q4_Adjustments
Saved report EOG_Resources_Q1_2025_Performance_Projections
Saved report Chevron_Corp_Project_Stock_Performance_Shareholder_Returns
Saved report EOG_Resources_2025_Financial_Outlook_and_Strategic_Initiatives
Saved report EOG_Resources_2025_FCF_Guidance_vs_Market_Expectations
Saved report ConocoPhillips_FCF_CapitalProjects_Impact_2029_vs_US_Oil_Peers
Saved report JP_Morgan_Chase_Investment_Prospects_and_Financial_Performance_Report
Saved report Citigr

Unnamed: 0,category,human_report_loc,question,vanilla_report_loc
0,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What is the potential timing and impact of Con...,/Users/danielliu/Workspace/fin-rag/experiments...
1,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What is the projected free cash flow growth tr...,/Users/danielliu/Workspace/fin-rag/experiments...
2,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What are the updated 1Q25 earnings estimates f...,/Users/danielliu/Workspace/fin-rag/experiments...
3,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What are the expectations and potential impact...,/Users/danielliu/Workspace/fin-rag/experiments...
4,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What is the rationale behind J.P. Morgan's dec...,/Users/danielliu/Workspace/fin-rag/experiments...
...,...,...,...,...
68,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What are the expectations and key topics of di...,/Users/danielliu/Workspace/fin-rag/experiments...
69,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What are the key capital expenditure trends an...,/Users/danielliu/Workspace/fin-rag/experiments...
70,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What is the expected future performance of NVI...,/Users/danielliu/Workspace/fin-rag/experiments...
71,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What is the current state and future outlook o...,/Users/danielliu/Workspace/fin-rag/experiments...


In [9]:
df.to_csv("./reports.csv", index=False)

## Naive RAG Response

In [None]:
df = pd.read_csv("./reports.csv", index_col=False)
df

## Kruppe Response

In [2]:
df = pd.read_csv("./reports.csv", index_col=False)
df

Unnamed: 0,category,human_report_loc,question,vanilla_report_loc
0,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What is the potential timing and impact of Con...,/Users/danielliu/Workspace/fin-rag/experiments...
1,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What is the projected free cash flow growth tr...,/Users/danielliu/Workspace/fin-rag/experiments...
2,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What are the updated 1Q25 earnings estimates f...,/Users/danielliu/Workspace/fin-rag/experiments...
3,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What are the expectations and potential impact...,/Users/danielliu/Workspace/fin-rag/experiments...
4,Energy (Oil),/Users/danielliu/Workspace/fin-rag/experiments...,What is the rationale behind J.P. Morgan's dec...,/Users/danielliu/Workspace/fin-rag/experiments...
...,...,...,...,...
68,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What are the expectations and key topics of di...,/Users/danielliu/Workspace/fin-rag/experiments...
69,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What are the key capital expenditure trends an...,/Users/danielliu/Workspace/fin-rag/experiments...
70,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What is the expected future performance of NVI...,/Users/danielliu/Workspace/fin-rag/experiments...
71,NVDA,/Users/danielliu/Workspace/fin-rag/experiments...,What is the current state and future outlook o...,/Users/danielliu/Workspace/fin-rag/experiments...


In [None]:
from kruppe.algorithm.librarian import Librarian

from kruppe.llm import OpenAILLM, OpenAIEmbeddingModel
# data sources to use
from kruppe.data_source.news.newshub import NewsHub
from kruppe.data_source.news.nyt import NewYorkTimesData
from kruppe.data_source.news.ft import FinancialTimesData
from kruppe.data_source.news.newsapi import NewsAPIData

# import index and rag related stuff
from kruppe.functional.rag.vectorstore.chroma import ChromaVectorStore, get_chroma_collection_names
from kruppe.functional.rag.index.vectorstore_index import VectorStoreIndex
from kruppe.functional.rag.retriever.fusion_retriever import QueryFusionRetriever
from kruppe.functional.docstore.mongo_store import MongoDBStore

# import librarian
from kruppe.algorithm.librarian import Librarian

collection_name = "kruppe_experiment_1"

librarian_llm = OpenAILLM(model="gpt-4o-mini")
embedding_model = OpenAIEmbeddingModel(model="text-embedding-3-small")

# initializing news sources
news_sources = [
    NewYorkTimesData(headers_path="/Users/danielliu/Workspace/fin-rag/.nyt-headers.json"),
    FinancialTimesData(headers_path="/Users/danielliu/Workspace/fin-rag/.ft-headers.json"),
    NewsAPIData(),
]

# initializing vectorstore client
vectorstore_path = '/Volumes/Lexar/Daniel Liu/vectorstores/kruppe_librarian'
vectorstore = ChromaVectorStore(
    embedding_model=embedding_model,
    persist_path=vectorstore_path,
    collection_name=collection_name,
)
# vectorstore.clear()

index = VectorStoreIndex(llm=librarian_llm, vectorstore=vectorstore)
retriever = QueryFusionRetriever(
    retrievers=[index.as_retriever()],
    mode="rrf",
    llm=librarian_llm,
    num_queries=3
)

# document store
unique_indices = [['title', 'datasource']] # NOTE: necessary to avoid duplicates
docstore = await MongoDBStore.acreate_db(
    db_name='kruppe_librarian',
    collection_name=collection_name,
    unique_indices=unique_indices
)
# docstore.clear_collection()

librarian = Librarian(
    llm=librarian_llm,
    docstore=docstore,
    index=index,
    retriever=retriever,
    news_source=NewsHub(news_sources=news_sources[:]),
)

In [4]:
from tqdm.asyncio import tqdm
from kruppe.algorithm.hypothesis import HypothesisResearcher
from kruppe.algorithm.coordinator import Coordinator
from kruppe.algorithm.background import BackgroundResearcher
import time

overall_llm = OpenAILLM(model="gpt-4o-mini")


kruppe_dir = experiment_dir / "kruppe_report"
kruppe_dir.mkdir(exist_ok=True)

kruppe_reports = []

coordinators = []

if "kruppe_report_loc" not in df.columns:
    df["kruppe_report_loc"] = [None] * df.shape[0]
async for index, row in tqdm(df.iterrows(), total=df.shape[0]):

    question = row["question"]
    print("QUESTION:", question)

    bkg_researcher = BackgroundResearcher(
        llm=overall_llm,
        librarian=librarian,
        research_question=question,
    )

    coordinator = Coordinator(
        llm=overall_llm,
        librarian=librarian,
        research_question=question,
        bkg_researcher=bkg_researcher,
        hyp_researcher_config={
            "llm": overall_llm
        },
        num_leads=2
    )

    report = await coordinator.execute()
    kruppe_reports.append(report)

    coordinators.append(coordinator)

    title = await generate_title(report)
    file_path = kruppe_dir / f"{title}.txt"
    file_path.write_text(report)
    df.at[index, 'kruppe_report_loc'] = file_path.as_posix()
    df.to_csv("./reports.csv", index=False)
    print(f"Saved report {title}")

    time.sleep(20)



kruppe_reports


  0%|          | 0/73 [00:00<?, ?it/s]

QUESTION: What is the potential timing and impact of ConocoPhillips' looming free cash flow (FCF) inflection, and how will it support the company's $10 billion return of capital commitment despite current oil price fluctuations?
Executing background researcher...
Creating info requests...
Created 3 info requests



Answering info requestsIndex is empty
[httpx] Failed to scrape https://www.ft.com/content/29885ff8-043a-454f-aa76-387cd912a3b5 : HTTPStatusError("Redirect response '302 Found' for url 'https://www.ft.com/content/29885ff8-043a-454f-aa76-387cd912a3b5'\nRedirect location: '/content/50a94606-8a22-4c4d-907a-7a86ed00d7b9#post-29885ff8-043a-454f-aa76-387cd912a3b5'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/302")
[httpx] Failed to scrape https://www.cnbc.com/2025/04/08/european-markets-set-to-open-higher-as-tariff-tensions-rage-on.html : HTTPStatusError("Client error '403 Forbidden' for url 'https://www.cnbc.com/2025/04/08/european-markets-set-to-open-higher-as-tariff-tensions-rage-on.html'\nFor more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403")
[httpx] Failed to scrape https://www.marketwatch.com/story/wall-streets-s-p-500-targets-are-still-predicting-a-big-rally-even-after-a-flurry-of-cuts-f3f6ad36 : HTTPStatusEr

Compiling report...


  1%|▏         | 1/73 [16:39<19:59:56, 999.95s/it]

Creating leads...





TypeError: Coordinator.create_leads() takes 1 positional argument but 3 were given