# Finance "Story" Teller
Experiment to answer financial questions that ask to "make sense of something in the world". Example: how will a company's change in business model affect its business?

General idea: circular story construction \
Generate story <=> Identify/extract relevant information

In [None]:
from rag.llm import OpenAILLM
llm = OpenAILLM(model="gpt-4o-mini") # use the same llm to keep track of token usage

query = "Is Netflix's new policy on password sharing good or bad for its business?"

## Part I: Build Basic Story Blocks
Create a basic finance story of a firm by using **equity research reports** (or other priority sources)

- Potential extension: use the FinGPT structure with “priority” sources (e.g. equity research report), then google
- Possible?: store the information that is used in some kind of memory or “list of relevant information/events” 
- Additional: extract from query what topics do we need to construct a story for

In [6]:
# initialize data sources
from rag.tools.sources import DirectoryData, FinancialTimesData
import json

directory_data = DirectoryData("../data")

with open("../.ft-headers.json", "r") as f:
    headers = json.load(f)
ft_data = FinancialTimesData(headers=headers)

In [7]:
# fetch equity research report documents, chunk, and store into vectorstore
from rag.vector_storages import NumPyVectorStorage
from rag.embeddings import OpenAIEmbeddingModel
from rag.text_splitters import RecursiveTextSplitter

# build vectorstore
embedding_model = OpenAIEmbeddingModel()
erp_vectorstore = NumPyVectorStorage(embedding_model=embedding_model)

# fetch data
erp_documents = await directory_data.async_fetch("NFLX")
print("Fetch %d documents" % len(erp_documents))

# chunk
text_splitter = RecursiveTextSplitter()
chunked_documents = text_splitter.split_documents(erp_documents)
print("Chunked into %d documents" % len(chunked_documents))

# store into vectorstore
ids = await erp_vectorstore.async_insert_documents(chunked_documents)
print("Stored %d documents" % len(ids))

Fetch 7 documents
Chunked into 865 documents
Stored 843 documents


## Part II: "Story" Generation
Ask LLM to create a story for said firm/industry/event using the building blocks.
- Potential extension: use a methodology similar to the “Discord” question - ask LLMs to create several different versions of the story for that firm/industry/event. Make sure that these versions are actually different.
- Possible methods: few shot training (give it an example of what good story is like); finetuning a model, training a model, etc.
- Idea: you can also input your intuitions, preferences, frameworks, etc here. Maybe this stuff gets saved (like in some memory) that can be used again so you dont have to rebuild everytime you have a question abt some firm/event

In [8]:
# Fetch relevant context
contexts = await erp_vectorstore.async_similarity_search(query, top_k=5)

for context in contexts:
    print("====== CONTEXT ======")
    print(context)

Text: households must pay more or they will be cut off from the service.
The company has been testing strategies in a few markets and have highlighted two
such strategies. In Peru, accounts can add up to 2 additional users outside of the
household for ~$2 USD/month each. In Argentina, accounts can add up to 3 additional
households for ~$2 USD/month each. Netflix has ultimately decided to deploy password
sharing policies using the latter strategy starting in 2023. If, for example, 50% of global
password sharing households could be converted to paying households at $2
USD/month, we believe Netflix could generate an incremental $1.2bn in revenue per
year. We expect this revenue will flow through with an extremely high margin because:
(1) there will be no incremental content costs, (2) Netflix already bears the cost to serve
these households (namely server costs) and (3) there will need no for additional
marketing spend to capture these households – they already know about and use Netflix.

In [9]:
# generate story
from rag.prompts import RAGPromptFormatter

rag_prompt_formatter = RAGPromptFormatter(documents=contexts)

# TODO: need a better prompt
# TODO: need a way to automatically generated this prompt
prompt = """Generate a story about Netflix's business model, focusing on the impact of its new policy on password sharing. \
Keep the generated story as a paragraph. It should adequately explain the reasonining behind why certain actions are done.
"""

messages = rag_prompt_formatter.format_messages(prompt)

story = await llm.async_generate(messages)
print(story)

Netflix is shifting its approach to password sharing by introducing a policy that allows users to add additional households for a fee of approximately $2 per month. This decision stems from the realization that over 100 million households are sharing accounts, prompting Netflix to capitalize on this trend by converting a significant portion of these users into paying customers, which could generate substantial revenue without incurring additional content or marketing costs. By enforcing this policy, Netflix aims to enhance its long-term subscriber growth while accommodating users through features like profile transfers, ultimately maintaining a positive user experience.


## Part III: Topic Generation and News Fetching
Given the story, ask LLM to generate a *very extensive list* of topics that could be connected (even if *very remotely*) to the story. Use the generated topics as "keywords" to search for relevant events

- This may require some manual engineering or training some model. Because idk how a model can be “creative” with this.
- Potential extension: use “Discord” question
- Idea: keep track of topics that were generated

In [11]:
from rag.prompts import CustomPromptFormatter
# TODO: better prompt. should generate potential "events" and not just "ideas"
prompt_template = """Given the story below, generate as much potential events that could be \
relevant to the story as possible. Topics could be business operation, macro events, industry changes, \
new technology, or anything creative. Topics are valid even if they are not directly related, but \
could be related indirectly or even insignificantly. Output each topic as a few words. Output a list \
of {k} topics separated by new line symbol and no leading bullet points.

Business story:
{story}
"""

system_prompt = "You generate relevant topics given a story about a business model."

kw_prompt_formatter = CustomPromptFormatter(prompt_template=prompt_template, system_prompt=system_prompt)
messages = kw_prompt_formatter.format_messages(story=story, k = 20)
messages

[{'role': 'system',
  'content': 'You generate relevant topics given a story about a business model.'},
 {'role': 'user',
  'content': 'Given the story below, generate as much potential events that could be relevant to the story as possible. Topics could be business operation, macro events, industry changes, new technology, or anything creative. Topics are valid even if they are not directly related, but could be related indirectly or even insignificantly. Output each topic as a few words. Output a list of 20 topics separated by new line symbol and no leading bullet points.\n\nBusiness story:\nNetflix is shifting its approach to password sharing by introducing a policy that allows users to add additional households for a fee of approximately $2 per month. This decision stems from the realization that over 100 million households are sharing accounts, prompting Netflix to capitalize on this trend by converting a significant portion of these users into paying customers, which could genera

In [12]:
topic_response = await llm.async_generate(messages)
# TODO: store the topic list somewhere so we don't repeat
print(topic_response)

Impact of password sharing on subscription services  
Changes in consumer behavior regarding streaming services  
Statistical analysis of account sharing demographics  
Financial implications of implementing new policies  
Effect of increased fees on customer retention  
Comparative analysis with competitor streaming services  
User response to added fees for household additions  
Profile transfer features and user data management  
Technological advancements in account security  
Market trends in the streaming industry  
Potential for increased global viewership  
Legal considerations surrounding password sharing  
Emergence of new streaming platform models  
Consumer privacy concerns with profile transfers  
Competition from emerging streaming platforms  
Impact of economic factors on entertainment spending  
Innovative marketing strategies for user retention  
Integration of user feedback into policy changes  
Analysis of long-term subscriber growth strategies  
Adaptation of busine

In [None]:
from tqdm.notebook import tqdm
topics = [topic.strip() for topic in topic_response.split("\n") if topic.strip()]

topic_document = {}
for topic in tqdm(topics):
    documents = await ft_data.async_fetch(topic)
    topic_document[topic] = documents
topic_document


## Part IV: Event Extraction
After scraping for new sources, extract relevant events.

For now, i'm keeping it simple and just using llm to determine if the news is relevant to the "story", then adding those that are to the vectorstore. Consider more better strategy in the future. Also need to evaluate this step.

> “Event Extraction”: for all the sources that has been collected, including those that have been collected in previous cycles (e.g. priority sources like equity research report, news articles collected in previous round, but maybe don’t use chunks that are already in memory/list of events), follow the Risk Extraction paper to extract relevant events in from each news event for the generated story (or stories) in step 2 (or step 2a).
> - In other words, use the generated story as a “relevance score” calculator
> - Possible?: like in step 1b, add the extracted event into some “memory” that stores all the previously necessary information/events
> - KEY TASK (FUTURE): Calculate “relevant score” at this stage using story


In [15]:
# TODO: improve prompt
# TODO: consider "forcing" gpt to make a connection and "forcing" it to reject a connection,
# then decide which is better.

prompt_template = """News: {text} 
Does any part of the news connect to, even in a very remote way, the business story described below? 
Story: {story}

Options: Yes, No. Justify your answer. Keep your justification short, but logical.
Your answer is (Please always begin your response with Yes or No): """

relevance_prompt_formatter = CustomPromptFormatter(
    system_prompt="You extract relevance from a story given a text.",
    prompt_template=prompt_template,
)

In [16]:
import asyncio

for topic, documents in tqdm(topic_document.items()):
    tasks = []
    for document in documents:
        messages = relevance_prompt_formatter.format_messages(text=document.text, story=story)
        tasks.append(llm.async_generate(messages))

    relevance_results = await asyncio.gather(*tasks)
    topic_document[topic] = list(zip(documents, relevance_results))



  0%|          | 0/12 [00:00<?, ?it/s]

In [17]:
topic_document

{'Impact of password sharing on subscription services': [(Document(text='\nSonja Hutson, Joshua Franklin and Cristina Criddle\nThis is an audio transcript of the FT News Briefing podcast episode: ‘ AI that can control your computer’\n[MUSIC PLAYING]\nSonja Hutson Good morning from the Financial Times. Today is Thursday, October 24th and this is your FT News Briefing . There’s more bad news for Boeing. And US lenders are trying to close the door on open banking. Plus, over in AI, a new virtual agent could run your life for you.\nCristina Criddle It can schedule calendar appointments for you. These kind of tedious things that take us quite a long time to do.\nSonja Hutson I’m Sonja Hutson, and here’s the news you need to start your day.\n[MUSIC PLAYING]\nBoeing workers voted to reject a deal with the company yesterday. That means the nearly six-week strike will continue.\xa0The deal would have increased pay by 35 per cent over four years and improved retirement benefits. But workers want

In [19]:
events_vectorstore = NumPyVectorStorage(embedding_model=embedding_model)

inserted_doc_ids = []
for topic, document_relevance in tqdm(topic_document.items()):
    rel_documents = [doc for doc, relevance in document_relevance if relevance.startswith("Yes") or relevance.startswith("yes")]
    # print(rel_documents, end="\n\n")
    chunked_rel_documents = text_splitter.split_documents(rel_documents)
    ids = await events_vectorstore.async_insert_documents(chunked_rel_documents)
    inserted_doc_ids.extend(ids)

print("Stored %d documents" % len(inserted_doc_ids))

  0%|          | 0/12 [00:00<?, ?it/s]

Stored 619 documents


## Part V: Generate new story

Using the newly collected, extracted events, create a new story.

Right now, still just keeping it simple using vectorstore retrieval top k documents. need a better method (like maybe a list of summaries of all the events, where each summary is ~1 line).

- Ask LLM to make a choice 1. Not change the story at all; 2. Edit the story; 3. Completely rewrite the story (or ask it to generate all three choices)#
- Question: how to choose which events to use? Or use all of them? (list of simple summaries)
- Potential extension: again, like 2a, could generate several different versions
- Potential extension: add a “manual” step here and maybe human can choose which story they like more, and add in their comments

In [20]:
# NOTE: for now, I'm using the original query as the key to extract documents
# This SHOULD NOT be the case. Need to generate keywords relevant to the story.

# Fetch relevant context
k = 10
contexts = await events_vectorstore.async_similarity_search(query, top_k=k)
contexts

[Document(text='As its subscription growth slowed last year, Netflix announced plans to crack down on rampant password sharing and introduce advertising-supported streaming options. The company said on Wednesday that the effort to limit password sharing had boosted its subscriber and revenue growth over the past two quarters, adding that the “cancel reaction” had been lower than expected.\nThe subscriber increase was the strongest quarterly rise since the second quarter of 2020, when Covid-19 lockdowns led to a jump in sign-ups.\n“The ‘streamflation’ era is upon us, and consumers should expect to be hit with price hikes [and] password sharing limits, and enticed with ad supported options,” said Scott Purdy, KPMG’s media leader, on Wednesday. “Today’s results show that these levers are working, at least in the short term.”', metadata={'query': 'Impact of password sharing on subscription services', 'datasource': 'FinancialTimesData', 'url': 'https://www.ft.com/content/7844d60e-bdbb-4279-

In [None]:
rag_update_story_template = """Below is a story about a business's business model, as well as additional events \\
or information that has yet to be included in the story. Generate a new story that considers the new information, \\
following the instruction:
{query}

Keep the story as a paragraph. It should adequately explain the reasonining behind why certain business actions are done.

Business story:
{story}

Relevant information:
{context}

Updated story:
"""
original_story = story
new_story_prompt_formatter = RAGPromptFormatter(prompt_template=rag_update_story_template, documents=contexts)

# story 1: update existing story
messages = new_story_prompt_formatter.format_messages("Consider the new information and build upon the original story", story=original_story)
modified_story = await llm.async_generate(messages)

# story 2: create a brand new story
messages = new_story_prompt_formatter.format_messages("Create a completely new story that is different than the original one.", story=original_story)
new_story = await llm.async_generate(messages)

print("Original Story:", original_story, end="\n\n")
print("Modified Story:", modified_story, end="\n\n")
print("New Story:", new_story, end="\n\n")

## Part VI: Repeat parts III - IV
With a new story, we rinse and repeat on step III (generate new relevant topics), step IV (determining relevance and extracting events) and step V (generate new story)

Repeat this for some amount of times (uncertain as of just yet). End result is:
- A detailed “story” that has been rinsed and repeated on a large amount of different news events
- A list of relevant events and information used to construct the story
- A very extensive list of source documents that are connected (even if very very very very very remotely) to the financial query

TO BE IMPLEMENTED.

## Part VII
Once we have come to a satisfactory business story, we use the final story as a method to answer fiancial queries.
- If the question just asks to construct a story about the firm, return the story
- If the question asks to explain some event’s significance, use the story to explain answer that question
- If the question asks for the relevance of some news article, use the story to calculate a relevance score
- If the question asks for the most relevant news articles or information, use the story to retrieve relevant documents from the extensive list of sources. Can also return a relevance score.
- ...

In [35]:
final_answer_prompt = """Given the business model/story below, answer the following financial query. \
You must make sure that your answer follows the logic of the story.

Story: {story}

Query: {query}
"""
final_answer_prompt_formatter = CustomPromptFormatter(prompt_template=final_answer_prompt, system_prompt="You answer a financial query given a story about a business model.")

chosen_story = modified_story
messages = final_answer_prompt_formatter.format_messages(story=chosen_story, query=query)
final_answer = await llm.async_generate(messages)
print(final_answer)

Netflix's new policy on password sharing appears to be good for its business based on the story provided. The enforcement of fees for additional households is a strategic move to convert freeloaders—an estimated 100 million households benefiting from shared accounts—into paying customers. This policy has already shown positive results by attracting nearly 6 million new subscribers within just a quarter, indicating a successful shift in user behavior towards a more monetized model.

Moreover, the introduction of an ad-supported streaming option aligns with current consumer trends, enhancing Netflix’s ability to cater to price-sensitive users while increasing engagement metrics, which are important for investors. The reported 54 percent surge in operating income amid rising subscriber numbers further substantiates the effectiveness of this strategy.

Overall, while there may have been initial backlash, the resulting increase in subscribers and revenue forecast suggest that the new passwo