<center><h1><b>RAG-Fin-GPT : An AI Tool for Financial Research and Analytics</b></h1></center>

This is an AI solution for performing in-depth financial research and analysis. This system is based on Retrieval-Augmented Generation (RAG), utilizing a locally run Llama2-7b-chat LLM, developed by Meta. This system uses completely open-source components and takes care of the data security considerations as well, by hosting everything on a local system.

In [1]:
import os
import logging
import sys
import torch
import nest_asyncio 
nest_asyncio.apply()

from huggingface_hub import login
from llama_index.llms.llama_cpp import LlamaCPP
#from llama_index.core.llms.utils import messages_to_prompt, completion_to_prompt

from transformers import AutoTokenizer
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

from llama_index.core import Settings

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, set_global_tokenizer, StorageContext, load_index_from_storage

from llama_index.core import download_loader
from llama_index.readers.file import PyMuPDFReader
from llama_index.readers.web import NewsArticleReader

from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.core.ingestion import IngestionPipeline

from llama_index.retrievers.bm25 import BM25Retriever
from llama_index.core.retrievers import BaseRetriever, VectorIndexRetriever
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.postprocessor import SentenceTransformerRerank
from llama_index.core.postprocessor import SimilarityPostprocessor

from llama_index.core import QueryBundle
from llama_index.core.query_engine import RetrieverQueryEngine

from llama_index.core.response.notebook_utils import display_response, display_source_node, display_query_and_multimodal_response


In [2]:
hf_token = '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'
login(token=hf_token)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to C:\Users\rck05\.cache\huggingface\token
Login successful


In [3]:
logging.basicConfig(
    stream = sys.stdout,
    level = logging.INFO
)
logging.getLogger().addHandler(
    logging.StreamHandler(
        stream = sys.stdout
    )
)

In [4]:
model_name = 'Llama2-7b'
model_path = r"D:\0-VARAD-DESHMUKH\models\llama-2-7b-chat.Q6_K.gguf"
max_new_tokens = 2048
context_window = 4096

system_prompt = '''
You are an experienced investment and financial research analyst, who always generates responses based only on the source documents given./
You cite the relevant source documents properly at the end of the response or in the format 'According to <source>,'. You include the numerical figures/
from the source documents to elucidate your response, but NEVER HALLUCINATE ANY INFORMATION. If any details are missing from the source documents,/
you explicitly state so, rather than making up the missing information. Your responses are well-cited and credible, apt to be included in research reports.'''

In [5]:
# the model
llm = LlamaCPP(
    model_path = model_path,
    temperature = 0,
    max_new_tokens = max_new_tokens,
    context_window = context_window,
    generate_kwargs = {},
    model_kwargs = {
        'load_in_8bit' : True,
        'n_gpu_layers' : -1
    },
    system_prompt = system_prompt,
    #messages_to_prompt=messages_to_prompt,
    #completion_to_prompt=completion_to_prompt,
    verbose = True
)

print('Text-generation model "Llama2-7b" loaded.')

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from D:\0-VARAD-DESHMUKH\models\llama-2-7b-chat.Q6_K.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_

Text-generation model "Llama2-7b" loaded.


AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | 
Model metadata: {'general.name': 'LLaMA v2', 'general.architecture': 'llama', 'llama.context_length': '4096', 'llama.rope.dimension_count': '128', 'llama.embedding_length': '4096', 'llama.block_count': '32', 'llama.feed_forward_length': '11008', 'llama.attention.head_count': '32', 'tokenizer.ggml.eos_token_id': '2', 'general.file_type': '18', 'llama.attention.head_count_kv': '32', 'llama.attention.layer_norm_rms_epsilon': '0.000001', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.unknown_token_id': '0'}


In [6]:
tokenizer_model = r'meta-llama/Llama-2-7b-chat-hf'
hf_token = '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'
set_global_tokenizer(
    AutoTokenizer.from_pretrained(
        pretrained_model_name_or_path=tokenizer_model,
        token=hf_token
    ).encode
)

In [7]:
embed_model_path = r"C:\Users\rck05\.cache\huggingface\hub\models--WhereIsAI--UAE-Large-V1\snapshots\82f6ace7a8954c012dd2ae05e2604fbc9007205b"
embed_model_name = 'WhereIsAI/UAE-Large-V1'

if not os.path.exists(embed_model_path):
    embed_model = HuggingFaceEmbedding(embed_model_name)
    print('Embedding model not found in cache. Downloading and creating one.!')
else:
    embed_model = HuggingFaceEmbedding(embed_model_path) 
    print('Embedding model found in cache.')

print('Model name: ', embed_model_name)

Embedding model found in cache.
Model name:  WhereIsAI/UAE-Large-V1 
Model Directory:  C:\Users\rck05\.cache\huggingface\hub\models--WhereIsAI--UAE-Large-V1\snapshots\82f6ace7a8954c012dd2ae05e2604fbc9007205b


In [8]:
Settings.llm = llm
Settings.embed_model = embed_model
#Settings.chunk_size = 512
Settings.context_window = context_window
Settings.num_output = max_new_tokens

print('Settings done.')

Settings done.


In [50]:
# pdfs
files = [
    r"D:\0-VARAD-DESHMUKH\Files\data\meta-10k-2023.pdf",
    r"D:\0-VARAD-DESHMUKH\Files\data\q3-2023.pdf"
]

pdfs = []
for file in files:
    pdf = PyMuPDFReader().load_data(
        file_path=file,
        metadata=True
    )
    pdfs += pdf

In [None]:
# News Articles
news_articles = [
    r'https://www.indiatvnews.com/technology/news/meta-collaborates-with-ncmec-to-extend-take-it-down-program-for-teenagers-2024-02-07-915677',
    r'https://www.msn.com/en-in/money/news/meta-to-label-ai-generated-images-across-social-media-platforms-details-here/ar-BB1hTNrL',
    r'https://www.msn.com/en-in/money/other/meta-announces-plans-to-combat-deepfakes-and-ai-generated-content-on-facebook-instagram-threads-ahead-of-key-elections/ar-BB1hTfPt',
    r'https://timesofindia.indiatimes.com/gadgets-news/20-years-of-facebook-meta-added-more-than-one-tcs-in-a-day-to-its-value/articleshow/107460150.cms',
    r'https://www.nytimes.com/2024/02/01/technology/meta-profit-report.html',
    r'https://www.msn.com/en-in/money/markets/meta-platforms-shatters-records-with-a-196-bn-surge-in-stock-market-value/ar-BB1hMN6e',
]

news_reader = NewsArticleReader(use_nlp=False)
news = news_reader.load_data(
    news_articles
)

# change 'publish_date' metadata to string for JSON serialization
for i in range(len(news)):
    news[i].metadata['publish_date'] = str(news[i].metadata['publish_date'])

In [None]:
# Websites
WholeSiteReader = download_loader('WholeSiteReader')

prefix = r'https://about.meta.com'
base_url = r'https://about.meta.com/company-info/'
max_depth = 1

scraper = WholeSiteReader(
    prefix=prefix,
    max_depth=max_depth
)

websites = scraper.load_data(
    base_url=base_url
)

In [None]:
# Static htmls
SimpleWebPageReader = download_loader('SimpleWebPageReader')

urls = [
    r'https://www.sec.gov/Archives/edgar/data/1326801/000132680124000012/meta-20231231.htm'
]
loader = SimpleWebPageReader()

htmls = loader.load_data(
    urls=urls
)

In [35]:
# other files - docx, etc.
document_directory = r"D:\0-VARAD-DESHMUKH\Files\data"

others = SimpleDirectoryReader(
    document_directory,
    filename_as_id=True
).load_data()

In [51]:
# concatenating all the sources into Document objects
documents = pdfs #+ news + websites + htmls + others

In [52]:
splitter = SemanticSplitterNodeParser(
    buffer_size=3,
    breakpoint_percentile_threshold=95,
    embed_model=embed_model,
    include_prev_next_rel=True,
    include_metadata=True
)
    
embedding = HuggingFaceEmbedding(embed_model_name)
pipeline = IngestionPipeline(
transformations=[splitter, embedding]
)

nodes = pipeline.run(
    documents=documents,
    in_place=False,
    show_progress=True
)

storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
pdf_index = VectorStoreIndex(
    nodes,
    storage_context=storage_context
)

# store it for later
PERSIST_DIR = './storage-dt'
pdf_index.storage_context.persist(persist_dir=PERSIST_DIR)

###########################################################################################
'''
# prepare a text file with list of source documents
file_path = './sources.txt'
with open(file_path, 'w') as file:
	file.write("Hello, this is a new text file created using open() function.")
	
print(f"File '{file_path}' created successfully.")
'''
###########################################################################################

Parsing nodes:   0%|          | 0/219 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/11 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/9 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/25 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/16 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/29 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/27 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/6 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/31 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/26 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/24 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/30 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/30 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/25 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/30 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/6 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/10 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/14 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/15 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/17 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/25 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/22 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/30 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/19 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/20 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/8 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/15 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/20 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/25 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/33 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/30 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/22 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/21 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/27 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/20 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/25 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/34 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/21 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/33 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/28 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/24 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/19 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/25 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/23 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/25 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/23 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/19 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/14 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/9 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/32 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/27 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/40 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/42 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/34 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/42 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/19 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/8 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/22 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/14 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/17 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/28 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/11 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/17 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/15 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/8 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/7 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/8 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/9 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/7 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/26 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/18 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/25 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/18 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/14 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/10 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/20 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/18 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/17 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/23 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/25 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/10 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/19 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/10 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/13 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/19 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/29 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/24 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/28 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/20 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/27 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/28 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/36 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/20 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/7 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/12 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/11 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/7 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/10 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/7 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/14 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/8 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/9 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/10 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/10 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/30 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/45 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/38 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/40 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/24 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/7 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/5 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/10 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/30 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/15 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/20 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/13 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/27 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/26 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/16 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/3 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/24 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/6 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/22 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/13 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/22 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/399 [00:00<?, ?it/s]

'\n# load the existing index\nstorage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)\nindex = load_index_from_storage(storage_context)\nnodes = list(index.docstore.docs.values())\n'

In [138]:
# load the existing index
PERSIST_DIR = './storage-dt'
storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
index = load_index_from_storage(storage_context)
nodes = list(index.docstore.docs.values())

INFO:llama_index.core.indices.loading:Loading all indices.
Loading all indices.


In [139]:
# hybrid retrieval
vector_retriever = pdf_index.as_retriever(
    similarity_top_k=5
)

bm25_retriever = BM25Retriever.from_defaults(
    nodes=nodes,
    similarity_top_k=5
)

class Hybridretriever(BaseRetriever):
    def __init__(self, vector_retriever, bm25_retriever):
        self.vector_retriever = vector_retriever
        self.bm25_retriever = bm25_retriever
        super().__init__()

    def _retrieve(self, query, **kwargs):
        bm25_nodes = self.bm25_retriever.retrieve(query, **kwargs)
        vector_nodes = self.vector_retriever.retrieve(query, **kwargs)

        # combine the two lists of nodes
        all_nodes = []
        node_ids = set()
        for n in bm25_nodes + vector_nodes:
            if n.node_id not in node_ids:
                all_nodes.append(n)
                node_ids.add(n.node_id)
        
        return all_nodes

In [140]:
# retrievers
pdf_index.as_retriever(similarity_top_k=3)
hybrid_retriever = Hybridretriever(vector_retriever, bm25_retriever)

In [141]:
# re-ranker
reranker = SentenceTransformerRerank(
    top_n=3,
    model='BAAI/bge-reranker-base'
)

-----------------------------------------------------------------


------------------------------------------------------------------

### Financial and operational highlights

In [62]:
prompt = '''
Outline the financial and operational highlights of Meta for Q3:2023, i.e. 3 months ending Dec. 31, 2023 and full year 2023 from the source documents./
Include all the important details like revenue, profits, share prices, user base, etc. Strictly ensure that your response is factually correct and relevant./
Cite the relevant sections from the source documents, so that the response is trustworthy and credible. Cross-check and refine your response, if needed./
Your response must not contain any information that is not present in the source document. Structure your output as a paragraph under 300 words./
FOLLOW ALL THE INSTRUCTIONS GIVEN ABOVE VERY CAREFULLY.'''

In [63]:
# retrieval and reranking of nodes
retrieved_nodes = hybrid_retriever.retrieve(prompt)

reranked_nodes = reranker.postprocess_nodes(
    retrieved_nodes,
    query_bundle=QueryBundle(prompt)
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [64]:
# source nodes
for node in reranked_nodes:
    display_source_node(node)

**Node ID:** b60f5250-00f8-4330-b947-c8bf6696a886<br>**Similarity:** 0.5963371396064758<br>**Text:** Meta Reports Fourth Quarter and Full Year 2023 Results; Initiates Quarterly Dividend
MENLO PARK, ...<br>

**Node ID:** 72f78825-ab45-4d30-8106-07228b6ff809<br>**Similarity:** 0.09835785627365112<br>**Text:** META PLATFORMS, INC.
CONDENSED CONSOLIDATED STATEMENTS OF INCOME
(In millions, except per share a...<br>

**Node ID:** 7931d0dd-16ad-40e0-9312-56e323e99270<br>**Similarity:** 0.08390882611274719<br>**Text:** META PLATFORMS, INC.
CONDENSED CONSOLIDATED STATEMENTS OF CASH FLOWS
(In millions)
(Unaudited)
Th...<br>

In [67]:
# filtering out irrelevant nodes
filter = SimilarityPostprocessor(
    similarity_cutoff=0.1
)
filtered_nodes = filter.postprocess_nodes(
    reranked_nodes,
    query_bundle=QueryBundle(prompt)
)

In [68]:
# source nodes
for node in filtered_nodes:
    display_source_node(node)

**Node ID:** b60f5250-00f8-4330-b947-c8bf6696a886<br>**Similarity:** 0.5963371396064758<br>**Text:** Meta Reports Fourth Quarter and Full Year 2023 Results; Initiates Quarterly Dividend
MENLO PARK, ...<br>

In [69]:
# query engine
query_engine = RetrieverQueryEngine.from_args(
    retriever=hybrid_retriever,
    node_postprocessors=[reranker, filter],
    llm=llm
)

In [70]:
# response generation
response = query_engine.query(prompt)

display_response(
    response=response,
    show_source=True,
    show_source_metadata=True
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Llama.generate: prefix-match hit

llama_print_timings:        load time =   60105.15 ms
llama_print_timings:      sample time =     191.96 ms /   764 runs   (    0.25 ms per token,  3980.04 tokens per second)
llama_print_timings: prompt eval time =   26646.43 ms /   203 tokens (  131.26 ms per token,     7.62 tokens per second)
llama_print_timings:        eval time =  278085.64 ms /   763 runs   (  364.46 ms per token,     2.74 tokens per second)
llama_print_timings:       total time =  307442.20 ms /   966 tokens


**`Final Response:`** Based on the provided source document, Meta Platforms Inc. (Meta) reported its financial and operational highlights for Q3:2023 and full year 2023.
For Q3:2023, Meta's revenue was $40,111 million, a 25% increase year-over-year. The company's costs and expenses were $23,727 million, a decrease of 8% year-over-year. As a result, Meta's income from operations was $16,384 million, a 156% increase year-over-year. The company's operating margin was 41%, up from 20% in the same period last year.
For full year 2023, Meta's revenue was $134,902 million, a 16% increase year-over-year. The company's costs and expenses were $88,151 million, a decrease of 1% year-over-year. As a result, Meta's income from operations was $46,751 million, a 62% increase year-over-year. The company's operating margin was 35%, up from 25% in the same period last year.
In terms of user base, Meta reported that its family daily active people (DAP) was 3.19 billion on average for December 2023, an increase of 8% year-over-year. Its family monthly active people (MAP) was 3.98 billion as of December 31, 2023, an increase of 6% year-over-year. Its Facebook daily active users (DAUs) were 2.11 billion on average for December 2023, an increase of 6% year-over-year. Its Facebook monthly active users (MAUs) were 3.07 billion as of December 31, 2023, an increase of 3% year-over-year.
In terms of ad impressions and price per ad, Meta reported that ad impressions delivered across its Family of Apps increased by 21% year-over-year in Q3:2023, and the average price per ad increased by 2% year-over-year. For full year 2023, ad impressions increased by 28% year-over-year and the average price per ad decreased by 9% year-over-year.
References:
According to Meta's Q3:2023 earnings report (p. 11):
"Revenue increased 25% to $40,111 million, driven by growth in ad revenue, which increased 21% to $21,334 million. Family DAUs increased 8% to 3.19 billion, and Family MAP increased 6% to 3.98 billion."
According to Meta's full year 2023 earnings report (p. 11):
"Revenue increased 16% to $134,902 million, driven by growth in ad revenue, which increased 28% to $56,577 million. Family DAUs increased 3% to 3.07 billion, and Family MAP increased 6% to 3.98 billion."

---

**`Source Node 1/1`**

**Node ID:** b60f5250-00f8-4330-b947-c8bf6696a886<br>**Similarity:** 0.5963371396064758<br>**Text:** Meta Reports Fourth Quarter and Full Year 2023 Results; Initiates Quarterly Dividend
MENLO PARK, ...<br>**Metadata:** {'total_pages': 11, 'file_path': 'D:\\0-VARAD-DESHMUKH\\Files\\data\\q3-2023.pdf', 'source': '1'}<br>

### Risk factors

In [72]:
prompt = '''
Present an overview of the risk factors faced by Meta, according to the source documents. Focus on all the risk factors like the risk factors related to product offerings,/
market conditions, geopolitical conditions, global economic scenario, competition, technological innovations, reducing user base, etc./
USE INFORMATION GIVEN ONLY IN THE SOURCE DOCUMENTS AND NOT PRIOR KNOWLEDGE. YOU HAVE TO CITE THE RELEVANT SECTIONS AND THEIR PAGE NUMBERS AT THE END OF THE RESPONSE./
Cross-check and refine your response, if needed. Structure your output as a paragraph under 500 words. FOLLOW ALL THE INSTRUCTIONS GIVEN ABOVE VERY CAREFULLY.'''

In [73]:
# retrieval and reranking of nodes
retrieved_nodes = hybrid_retriever.retrieve(prompt)

reranked_nodes = reranker.postprocess_nodes(
    retrieved_nodes,
    query_bundle=QueryBundle(prompt)
)

# reranked source nodes
for node in reranked_nodes:
    display_source_node(node)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

**Node ID:** aead5a62-b58d-4393-8d6f-8850e7b57d42<br>**Similarity:** 0.8936010003089905<br>**Text:** It is not possible for our management to
predict all risks, nor can we assess the impact of all f...<br>

**Node ID:** 8b98fbed-e1a3-463e-b60f-253fa0d64b84<br>**Similarity:** 0.7181341052055359<br>**Text:** Meta Platforms, Inc.
Form 10-K
TABLE OF CONTENTS
Page
Note About Forward-Looking Statements
3
Lim...<br>

**Node ID:** 0d1d1c9a-14ab-4cc0-85b9-5897bff6e2ab<br>**Similarity:** 0.5528134703636169<br>**Text:** Table of Contents
Item 1A. Risk Factors
Certain factors may have a material adverse effect on our...<br>

In [74]:
# filtering out irrelevant nodes
filter = SimilarityPostprocessor(
    similarity_cutoff=0.1
)
filtered_nodes = filter.postprocess_nodes(
    reranked_nodes,
    query_bundle=QueryBundle(prompt)
)

# filtered source nodes
for node in filtered_nodes:
    display_source_node(node)

**Node ID:** aead5a62-b58d-4393-8d6f-8850e7b57d42<br>**Similarity:** 0.8936010003089905<br>**Text:** It is not possible for our management to
predict all risks, nor can we assess the impact of all f...<br>

**Node ID:** 8b98fbed-e1a3-463e-b60f-253fa0d64b84<br>**Similarity:** 0.7181341052055359<br>**Text:** Meta Platforms, Inc.
Form 10-K
TABLE OF CONTENTS
Page
Note About Forward-Looking Statements
3
Lim...<br>

**Node ID:** 0d1d1c9a-14ab-4cc0-85b9-5897bff6e2ab<br>**Similarity:** 0.5528134703636169<br>**Text:** Table of Contents
Item 1A. Risk Factors
Certain factors may have a material adverse effect on our...<br>

In [75]:
# query engine
query_engine = RetrieverQueryEngine.from_args(
    retriever=hybrid_retriever,
    node_postprocessors=[reranker, filter],
    llm=llm
)

# response generation
response = query_engine.query(prompt)

display_response(
    response=response,
    show_source=True,
    show_source_metadata=True
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Llama.generate: prefix-match hit

llama_print_timings:        load time =   60105.15 ms
llama_print_timings:      sample time =     211.49 ms /   796 runs   (    0.27 ms per token,  3763.79 tokens per second)
llama_print_timings: prompt eval time =   96445.06 ms /   893 tokens (  108.00 ms per token,     9.26 tokens per second)
llama_print_timings:        eval time =  282669.50 ms /   795 runs   (  355.56 ms per token,     2.81 tokens per second)
llama_print_timings:       total time =  382158.06 ms /  1688 tokens


**`Final Response:`** Based on the provided source documents, there are several risk factors that Meta faces. Firstly, there is a risk related to product offerings, where Meta's products may not continue to be popular or may not be able to compete effectively with other products in the market. According to page 3 of the source document, "We face intense competition in each of our product areas, and if we do not continue to innovate and improve our products, we may lose users to our competitors." This highlights the risk that Meta's products may not be able to maintain their market share, which could have a material adverse effect on the company's business, financial condition, and results of operations.
Another risk factor is related to market conditions, where changes in market conditions may affect Meta's business. According to page 15 of the source document, "We face risks related to changes in global economic conditions, including a slowdown in global economic growth, which could affect our revenue and profitability." This highlights the risk that changes in market conditions may lead to a decline in demand for Meta's products, which could have a material adverse effect on the company's business.
Geopolitical conditions are also a risk factor for Meta. According to page 16 of the source document, "We face risks related to changes in laws, regulations, and government policies, including those related to data privacy, data localization, and censorship." This highlights the risk that changes in geopolitical conditions may lead to increased regulatory scrutiny and compliance costs for Meta, which could have a material adverse effect on the company's business.
In addition, competition is a significant risk factor for Meta. According to page 17 of the source document, "We face intense competition in each of our product areas, and if we do not continue to innovate and improve our products, we may lose users to our competitors." This highlights the risk that Meta may face increased competition from other companies in the technology industry, which could lead to a decline in its market share and have a material adverse effect on its business.
Furthermore, technological innovations may also pose a risk to Meta. According to page 18 of the source document, "We face risks related to technological changes, including advancements in artificial intelligence, machine learning, and other technologies." This highlights the risk that technological innovations may make Meta's products less competitive or obsolete, which could have a material adverse effect on its business.
Moreover, reducing user base is also a risk factor for Meta. According to page 19 of the source document, "We face risks related to our user base, including decreases in user engagement or retention." This highlights the risk that Meta may face a decline in its user base, which could have a material adverse effect on its business.
Finally, global economic scenario is also a risk factor for Meta. According to page 20 of the source document, "We face risks related to global economic conditions, including a slowdown in global economic growth." This highlights the risk that a decline in global economic growth may lead to a decline in demand for Meta's products, which could have a material adverse effect on its business.
In conclusion, Meta faces several risk factors that could have a material adverse effect on its business, financial condition, and results of operations. These risk factors include product offerings, market conditions, geopolitical conditions, competition, technological innovations, reducing user base, and global economic scenario. It is important for investors to carefully consider these risk factors when making investment decisions related to Meta.

---

**`Source Node 1/3`**

**Node ID:** aead5a62-b58d-4393-8d6f-8850e7b57d42<br>**Similarity:** 0.8936010003089905<br>**Text:** It is not possible for our management to
predict all risks, nor can we assess the impact of all f...<br>**Metadata:** {'total_pages': 208, 'file_path': 'D:\\0-VARAD-DESHMUKH\\Files\\data\\meta-10k-2023.pdf', 'source': '5'}<br>

---

**`Source Node 2/3`**

**Node ID:** 8b98fbed-e1a3-463e-b60f-253fa0d64b84<br>**Similarity:** 0.7181341052055359<br>**Text:** Meta Platforms, Inc.
Form 10-K
TABLE OF CONTENTS
Page
Note About Forward-Looking Statements
3
Lim...<br>**Metadata:** {'total_pages': 208, 'file_path': 'D:\\0-VARAD-DESHMUKH\\Files\\data\\meta-10k-2023.pdf', 'source': '3'}<br>

---

**`Source Node 3/3`**

**Node ID:** 0d1d1c9a-14ab-4cc0-85b9-5897bff6e2ab<br>**Similarity:** 0.5528134703636169<br>**Text:** Table of Contents
Item 1A. Risk Factors
Certain factors may have a material adverse effect on our...<br>**Metadata:** {'total_pages': 208, 'file_path': 'D:\\0-VARAD-DESHMUKH\\Files\\data\\meta-10k-2023.pdf', 'source': '25'}<br>

### Functions for retrieval, reranking and response generation

In [142]:
# function for retrieval and reranking of source nodes, based on the query
def retrieval_rerank(prompt):
    '''
    Returns the retrieved and subsequently reranked source nodes.
    Reranking is done using the similarity scores.
    
    Inputs : prompt (str) = query to the engine
    Outputs: nodes (Node) = reranked nodes
    '''
    
    # retrieval
    retrieved_nodes = hybrid_retriever.retrieve(prompt)
    # reranking
    reranked_nodes = reranker.postprocess_nodes(
    retrieved_nodes,
    query_bundle=QueryBundle(prompt))

    for node in reranked_nodes:
        display_source_node(node)

    return reranked_nodes

In [143]:
# function to filter the reranked source nodes based on a similarity threshold
def filter_nodes(reranked_nodes, threshold=0.5):
    '''
    Filters the reranked source nodes, based on a similarity threshold.
    
    Inputs : prompt (str) = query to the engine
    Outputs: nodes (Node) = reranked nodes
    '''
    # filtering out irrelevant nodes
    filter = SimilarityPostprocessor(
        similarity_cutoff=threshold)
    filtered_nodes = filter.postprocess_nodes(
        reranked_nodes,
        query_bundle=QueryBundle(prompt))

    for node in filtered_nodes:
        display_source_node(node)

    return filtered_nodes

In [144]:
# function to print the source nodes in Markdown format
def display_nodes(nodes):
    '''
    Prints the source nodes.
    
    Inputs : nodes (List(Node)) = source nodes
    Outputs: reranked nodes printed in Markdown format
    '''
    
    for node in nodes:
        display_source_node(node)

In [145]:
# response generation query engine
def build_engine(prompt, filtering=True, streaming=True, mode='refine'):
    '''
    Builds the query engine.
    
    Inputs : prompt (str) = query prompt,
             filter=True (bool) = were the reranked nodes filtered based on a similarity threshold?
             streaming=True (bool) = allow streaming of response?
             mode='refine' = response mode
                             options -> 'refine', 'compact', 'tree_summarize', 'accumulate'
    Outputs: response
    '''
    if streaming==True:
        stream=True
    else:
        stream=False

    from llama_index.core import PromptTemplate

    # customize the prompts
    new_qa_template = (
        "You are an experienced financial and investment research analyst, with profound analytical capabilities."
        "Your task is to write high-quality financial and investment research reports, which are to be published by an esteemed firm, with a large user base."
        "You perform in-depth research about a company, the industry and its capabilities, in light of the current financial condition."
        "You always generate your response based on the context information from the source documents, given below, delineated by triple backticks (```) ONLY.\n"
        "-----------------------------------------------------------------------------------------------------------------------\n"
        "```{context_str}```"
        "\n-----------------------------------------------------------------------------------------------------------------------\n"
        "Given this context information. please answer the question: {query_str}\n"
        "Cite the relevant context sources that you use to answer the question. You use the numerical facts and data from the context documents to elucidate your answer."
        "YOUR RESPONSE MUST NOT INCLUDE ANY DATA THAT IS NOT PRESENT IN THE SOURCE DOCUMENTS. Your response should be credible and trustworthy."
    )

    new_refine_template = (
        "You are an profoundly experienced document curator, excellent at reviewing and refining the research articles written by financial and investment research analysts."
        "You refine the existing articles to make them excellent enough to be published in a research report."
        "The original query is as follows: {query_str}"
        "-----------------------------------------------------------------------------------------------------------------------\n"
        "We have provided an existing article written by the analyst: {existing_answer}"
        "-----------------------------------------------------------------------------------------------------------------------\n"
        "You have the opportunity to refine the existing article to make it even better (only if needed) with some more context given below.\n"
        "{context_msg}"
        "-----------------------------------------------------------------------------------------------------------------------\n"
        "Given the new context, cross-check the original article for factual accuracy and relevance. Refine it so that it looks more professional and crisp."
        "The refined answer must be of high-quality, apt to be included in a research report. If the context isn't useful, return the original answer."
        "-----------------------------------------------------------------------------------------------------------------------\n"
        "Refined Answer:"
    )

    new_summary_template = (
        "You are an experienced financial and investment research analyst, with profound analytical capabilities."
        "Your task is to summarize the financial documents given below as context, delineated by triple backticks (```), including every important detail that is relevant"
        "for including in a financial and investment research report. You ONLY USE THE INFORMATION GIVEN IN THE CONTEXT AND STRICTLY AVOID HALLUCINATING ANYTHING."
        "Your summary should be in-depth and must be relevant in light of the current financial condition.\n"
        "-----------------------------------------------------------------------------------------------------------------------\n"
        "```{context_str}```"
        "-----------------------------------------------------------------------------------------------------------------------\n"
        "Using the context and the summaries that you generate, AND NOT PRIOR KNOWLEDGE, answer the query."
        "Query: {query_str}"
        "Answer:" 
    )

    new_qa_prompt = PromptTemplate(new_qa_template)
    new_refine_prompt = PromptTemplate(new_refine_template)
    new_summary_prompt = PromptTemplate(new_summary_template)

    query_engine = RetrieverQueryEngine.from_args(
        retriever=hybrid_retriever,
        node_postprocessors=[reranker, filter] if filtering==True else [reranker],
        response_mode=mode,
        text_qa_template=new_qa_prompt,
        refine_template=new_refine_prompt,
        summary_template=new_summary_prompt,
        llm=llm,
        streaming=stream)

    print('Query Engine ready...')
    return query_engine

In [146]:
# response generation
def generate_response(prompt, query_engine, streaming=True):
    response = query_engine.query(prompt)

    if streaming==True:
        response.print_response_stream()
        response.get_formatted_sources()
    else:
        display_response(
            response=response,
            show_source=True,
            show_source_metadata=True)
        
    return response

#### prompts

In [147]:
prompt = 'Summarize the risk factors faced by Meta, as outlined in the source documents.'

In [148]:
reranked_nodes = retrieval_rerank(prompt)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

**Node ID:** 097fe497-eafa-468c-a07f-b555ddb48dc9<br>**Similarity:** 0.5635961890220642<br>**Text:** Additional risks and uncertainties that we are unaware of, or that we currently believe are not m...<br>

**Node ID:** 8b98fbed-e1a3-463e-b60f-253fa0d64b84<br>**Similarity:** 0.1926383674144745<br>**Text:** Meta Platforms, Inc.
Form 10-K
TABLE OF CONTENTS
Page
Note About Forward-Looking Statements
3
Lim...<br>

**Node ID:** aead5a62-b58d-4393-8d6f-8850e7b57d42<br>**Similarity:** 0.0849183052778244<br>**Text:** It is not possible for our management to
predict all risks, nor can we assess the impact of all f...<br>

In [149]:
filtered_nodes = filter_nodes(
    reranked_nodes,
    threshold=0.2
)

**Node ID:** 097fe497-eafa-468c-a07f-b555ddb48dc9<br>**Similarity:** 0.5635961890220642<br>**Text:** Additional risks and uncertainties that we are unaware of, or that we currently believe are not m...<br>

In [152]:
engine = build_engine(prompt, filtering=True, streaming=True, mode='refine')

Query Engine ready...


In [153]:
response = generate_response(
    prompt,
    query_engine=engine,
    streaming=True
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Llama.generate: prefix-match hit

llama_print_timings:        load time =   60105.15 ms
llama_print_timings:      sample time =       0.22 ms /     1 runs   (    0.22 ms per token,  4524.89 tokens per second)
llama_print_timings: prompt eval time =   96318.07 ms /   857 tokens (  112.39 ms per token,     8.90 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =   96323.26 ms /   858 tokens
Llama.generate: prefix-match hit



Based on Meta's 2023 Form 10-K filed with the SEC, there are several risk factors that the company faces. Here are some of the key risk factors identified by the company:
1. Competition: Meta faces intense competition from other social media platforms, messaging apps, and online advertising platforms. The company must continue to innovate and differentiate itself from its competitors to maintain its market share. According to page 3 of Meta's 10-K, "We face intense competition in each of our core businesses, including from other social media platforms, messaging apps, and online advertising platforms."
2. Regulatory Risks: Meta is subject to various domestic and foreign laws and regulations that can impact its business. The company must comply with these regulations while also advocating for policies that support its business model. According to page 15 of Meta's 10-K, "We face various domestic and foreign laws and regulations that may impact our business, including those related to d


llama_print_timings:        load time =   60105.15 ms
llama_print_timings:      sample time =     201.95 ms /   826 runs   (    0.24 ms per token,  4090.04 tokens per second)
llama_print_timings: prompt eval time =   41280.81 ms /   348 tokens (  118.62 ms per token,     8.43 tokens per second)
llama_print_timings:        eval time =  284411.75 ms /   825 runs   (  344.74 ms per token,     2.90 tokens per second)
llama_print_timings:       total time =  329211.27 ms /  1173 tokens


In [102]:
engine = pdf_index.as_query_engine(response_mode='tree_summarize')


In [None]:
response = engine.query('What was the revenue of Meta in Q3:2023 and full year 2023?')

In [58]:
display_response(
    response=response,
    show_source=True,
    show_source_metadata=True
)

**`Final Response:`** According to the provided statement of income for Meta Platforms, Inc. for Q3:2023 and full year 2023, the revenue of Meta was $40,111 million for Q3:2023 and $134,902 million for full year 2023.

---

**`Source Node 1/2`**

**Node ID:** b60f5250-00f8-4330-b947-c8bf6696a886<br>**Similarity:** 0.7965127134729699<br>**Text:** Meta Reports Fourth Quarter and Full Year 2023 Results; Initiates Quarterly Dividend
MENLO PARK, ...<br>**Metadata:** {'total_pages': 11, 'file_path': 'D:\\0-VARAD-DESHMUKH\\Files\\data\\q3-2023.pdf', 'source': '1'}<br>

---

**`Source Node 2/2`**

**Node ID:** 72f78825-ab45-4d30-8106-07228b6ff809<br>**Similarity:** 0.7556246571345275<br>**Text:** META PLATFORMS, INC.
CONDENSED CONSOLIDATED STATEMENTS OF INCOME
(In millions, except per share a...<br>**Metadata:** {'total_pages': 11, 'file_path': 'D:\\0-VARAD-DESHMUKH\\Files\\data\\q3-2023.pdf', 'source': '6'}<br>

In [100]:
from llama_index.core import PromptTemplate
from IPython.display import Markdown, display

query_engine = pdf_index.as_query_engine()

# define prompt viewing function
def display_prompt_dict(prompts_dict):
    for k, p in prompts_dict.items():
        text_md = f"**Prompt Key**: {k}<br>" f"**Text:** <br>"
        display(Markdown(text_md))
        print(p.get_template())
        display(Markdown("<br><br>"))

In [103]:
prompts_dict = engine.get_prompts()
display_prompt_dict(prompts_dict)

**Prompt Key**: response_synthesizer:summary_template<br>**Text:** <br>

Context information from multiple sources is below.
---------------------
{context_str}
---------------------
Given the information from multiple sources and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


<br><br>

In [108]:
from llama_index.core import PromptTemplate

# customize the prompts
new_qa_template = (
    "You are an experienced financial and investment research analyst, with profound analytical capabilities."
    "Your task is to write high-quality financial and investment research reports, which are to be published by an esteemed firm, with a large user base."
    "You perform in-depth research about a company, the industry and its capabilities, in light of the current financial condition."
    "You always generate your response based on the context information from the source documents, given below, delineated by triple backticks (```) ONLY.\n"
    "-----------------------------------------------------------------------------------------------------------------------\n"
    "```{context_str}```"
    "\n-----------------------------------------------------------------------------------------------------------------------\n"
    "Given this context information. please answer the question: {query_str}\n"
    "Cite the relevant context sources that you use to answer the question. You use the numerical facts and data from the context documents to elucidate your answer."
    "YOUR RESPONSE MUST NOT INCLUDE ANY DATA THAT IS NOT PRESENT IN THE SOURCE DOCUMENTS. Your response should be credible and trustworthy."
)

new_refine_template = (
    "You are an profoundly experienced document curator, excellent at reviewing and refining the research articles written by financial and investment research analysts."
    "You refine the existing articles to make them excellent enough to be published in a research report."
    "The original query is as follows: {query_str}"
    "-----------------------------------------------------------------------------------------------------------------------\n"
    "We have provided an existing article written by the analyst: {existing_answer}"
    "-----------------------------------------------------------------------------------------------------------------------\n"
    "You have the opportunity to refine the existing article to make it even better (only if needed) with some more context given below.\n"
    "{context_msg}"
    "-----------------------------------------------------------------------------------------------------------------------\n"
    "Given the new context, cross-check the original article for factual accuracy and relevance. Refine it so that it looks more professional and crisp."
    "The refined answer must be of high-quality, apt to be included in a research report. If the context isn't useful, return the original answer."
    "-----------------------------------------------------------------------------------------------------------------------\n"
    "Refined Answer:"
)

new_summary_template = (
    "You are an experienced financial and investment research analyst, with profound analytical capabilities."
    "Your task is to summarize the financial documents given below as context, delineated by triple backticks (```), including every important detail that is relevant"
    "for including in a financial and investment research report. You ONLY USE THE INFORMATION GIVEN IN THE CONTEXT AND STRICTLY AVOID HALLUCINATING ANYTHING."
    "Your summary should be in-depth and must be relevant in light of the current financial condition.\n"
    "-----------------------------------------------------------------------------------------------------------------------\n"
    "```{context_str}```"
    "-----------------------------------------------------------------------------------------------------------------------\n"
    "Using the context and the summaries that you generate, AND NOT PRIOR KNOWLEDGE, answer the query."
    "Query: {query_str}"
    "Answer:" 
)

new_qa_prompt = PromptTemplate(new_qa_template)
new_refine_prompt = PromptTemplate(new_refine_template)
new_summary_prompt = PromptTemplate(new_summary_template)