<center><h1><b>RAG-Fin-GPT : An AI Tool for Financial Research and Analytics</b></h1></center>

This is an AI solution for performing in-depth financial research and analysis. This system is based on Retrieval-Augmented Generation (RAG), utilizing a locally run Llama2-7b-chat LLM, develoepd by Meta. This system uses completely open-source components and takes care of the data security considerations as well, by hosting everything on a local system.

<center><b>------------    HuggingFace CLI Login and Module Imports    ------------</b></center>

In [1]:
import os
import logging
import sys
import torch
import nest_asyncio 
nest_asyncio.apply()

from huggingface_hub import login
from llama_index.llms.llama_cpp import LlamaCPP
#from llama_index.core.llms.utils import messages_to_prompt, completion_to_prompt

from transformers import AutoTokenizer
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

from llama_index.core import Settings

from llama_index.core import ServiceContext, SimpleDirectoryReader, VectorStoreIndex, set_global_service_context, set_global_tokenizer, StorageContext, load_index_from_storage

from llama_index.core import download_loader
from llama_index.readers.web import NewsArticleReader

from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.core.ingestion import IngestionPipeline

from llama_index.retrievers.bm25 import BM25Retriever
from llama_index.core.retrievers import BaseRetriever, VectorIndexRetriever
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.postprocessor import SentenceTransformerRerank
from llama_index.core.postprocessor import SimilarityPostprocessor

from llama_index.core import QueryBundle
from llama_index.core.query_engine import RetrieverQueryEngine

from llama_index.core.response.notebook_utils import display_response, display_source_node, display_query_and_multimodal_response


In [2]:
hf_token = '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'
login(token=hf_token)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to C:\Users\rck05\.cache\huggingface\token
Login successful


<center><b>------------    Logging    ------------</b></center>

In [3]:
logging.basicConfig(
    stream = sys.stdout,
    level = logging.INFO
)
logging.getLogger().addHandler(
    logging.StreamHandler(
        stream = sys.stdout
    )
)

<center><b>------------    Large Language Models (LLMs)    ------------</b></center>

We are using locally running open-source LLMs for our system. The details are as follows.

* Foundational Model : **Llama2-7b-chat**
* Tokenizer model : **Llama2-7b-chat _(tokenizer)_**
* Embedding model : **WhereIsAI/UAE-Large-V1**

In [4]:
model_name = 'Llama2-7b'
model_path = r"D:\0-VARAD-DESHMUKH\models\llama-2-7b-chat.Q6_K.gguf"
max_new_tokens = 2048
context_window = 4096

system_prompt = '''
You are an experienced investment and financial research analyst, who always generates responses based only on the source documents given./
You cite the relevant source documents properly at the end of the response or in the format 'According to <source>,'. You include the numerical figures/
from the source documents to elucidate your response, but NEVER HALLUCINATE ANY INFORMATION. If any details are missing from the source documents,/
you explicitly state so, rather than making up the missing information. Your responses are well-cited and credible, apt to be included in research reports.'''

In [5]:
# the model
llm = LlamaCPP(
    model_path = model_path,
    temperature = 0,
    max_new_tokens = max_new_tokens,
    context_window = context_window,
    generate_kwargs = {},
    model_kwargs = {
        'load_in_8bit' : True,
        #'n_gpu_layers' : -1
    },
    system_prompt = system_prompt,
    #messages_to_prompt=messages_to_prompt,
    #completion_to_prompt=completion_to_prompt,
    verbose = True
)

print('Text-generation model "Llama2-7b" loaded.')

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from D:\0-VARAD-DESHMUKH\models\llama-2-7b-chat.Q6_K.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_

Text-generation model "Llama2-7b" loaded.


AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | 
Model metadata: {'general.name': 'LLaMA v2', 'general.architecture': 'llama', 'llama.context_length': '4096', 'llama.rope.dimension_count': '128', 'llama.embedding_length': '4096', 'llama.block_count': '32', 'llama.feed_forward_length': '11008', 'llama.attention.head_count': '32', 'tokenizer.ggml.eos_token_id': '2', 'general.file_type': '18', 'llama.attention.head_count_kv': '32', 'llama.attention.layer_norm_rms_epsilon': '0.000001', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.unknown_token_id': '0'}


In [6]:
tokenizer_model = r'meta-llama/Llama-2-7b-chat-hf'
hf_token = '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'
set_global_tokenizer(
    AutoTokenizer.from_pretrained(
        pretrained_model_name_or_path=tokenizer_model,
        token=hf_token
    ).encode
)

In [7]:
embed_model_path = r"C:\Users\rck05\.cache\huggingface\hub\models--WhereIsAI--UAE-Large-V1\snapshots\82f6ace7a8954c012dd2ae05e2604fbc9007205b"
embed_model_name = 'WhereIsAI/UAE-Large-V1'

if not os.path.exists(embed_model_path):
    embed_model = HuggingFaceEmbedding(
        embed_model_name
    )
    print('Embedding model not found in cache. Downloading and creating one.!')
else:
    embed_model = HuggingFaceEmbedding(
        embed_model_path
    ) 
    print('Embedding model found in cache.')

print('Model name: ', embed_model_name, '\nModel Directory: ', embed_model_path)

Embedding model found in cache.
Model name:  WhereIsAI/UAE-Large-V1 
Model Directory:  C:\Users\rck05\.cache\huggingface\hub\models--WhereIsAI--UAE-Large-V1\snapshots\82f6ace7a8954c012dd2ae05e2604fbc9007205b


<center><b>------------    Global Settings    ------------</b></center>

In [8]:
Settings.llm = llm
Settings.embed_model = embed_model
#Settings.chunk_size = 512
Settings.context_window = context_window
Settings.num_output = max_new_tokens

print('Settings done.')

Settings done.


<center><b>------------    Data Loading    ------------</b></center>

We load the source documents into a local directory. The source documents could be:
1. Local PDFs
2. News Articles
3. Websites
4. Static HTMLs - SEC filings, etc.

In [10]:
# Local PDFs

document_directory = r"D:\0-VARAD-DESHMUKH\Files\data"

pdfs = SimpleDirectoryReader(
    document_directory,
    filename_as_id=True
).load_data()

In [11]:
# News Articles

news_articles = [
    r'https://www.indiatvnews.com/technology/news/meta-collaborates-with-ncmec-to-extend-take-it-down-program-for-teenagers-2024-02-07-915677',
    r'https://www.msn.com/en-in/money/news/meta-to-label-ai-generated-images-across-social-media-platforms-details-here/ar-BB1hTNrL',
    r'https://www.msn.com/en-in/money/other/meta-announces-plans-to-combat-deepfakes-and-ai-generated-content-on-facebook-instagram-threads-ahead-of-key-elections/ar-BB1hTfPt',
    r'https://timesofindia.indiatimes.com/gadgets-news/20-years-of-facebook-meta-added-more-than-one-tcs-in-a-day-to-its-value/articleshow/107460150.cms',
    r'https://www.nytimes.com/2024/02/01/technology/meta-profit-report.html',
    r'https://www.msn.com/en-in/money/markets/meta-platforms-shatters-records-with-a-196-bn-surge-in-stock-market-value/ar-BB1hMN6e',
]

news_reader = NewsArticleReader(use_nlp=False)
news = news_reader.load_data(
    news_articles
)

# change 'publish_date' metadata to string for JSON serialization
for i in range(len(news)):
    news[i].metadata['publish_date'] = str(news[i].metadata['publish_date'])

In [None]:
# Websites

WholeSiteReader = download_loader('WholeSiteReader')

prefix = r'https://about.meta.com'
base_url = r'https://about.meta.com/company-info/'
max_depth = 1

scraper = WholeSiteReader(
    prefix=prefix,
    max_depth=max_depth
)

websites = scraper.load_data(
    base_url=base_url
)

In [11]:
# Static htmls : SEC filings, etc.

SimpleWebPageReader = download_loader('SimpleWebPageReader')

urls = [
    r'https://www.sec.gov/Archives/edgar/data/1326801/000132680124000012/meta-20231231.htm'
]
loader = SimpleWebPageReader()

htmls = loader.load_data(
    urls=urls
)

In [10]:
# concatenating all the sources into Document objects
documents = pdfs + news + websites + htmls

# TEMP

In [13]:
documents = pdfs + news

splitter = SemanticSplitterNodeParser(
    buffer_size=1,
    breakpoint_percentile_threshold=95,
    embed_model=embed_model
)
    
embedding = HuggingFaceEmbedding(embed_model_name)
    
pipeline = IngestionPipeline(
    transformations=[splitter, embedding]
)
    
nodes = pipeline.run(
        documents=documents,
        in_place=False,
        show_progress=True
)
   
    # load the documents and create the index
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
temp_index = VectorStoreIndex(
    nodes,
    storage_context=storage_context
)

Parsing nodes:   0%|          | 0/17 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/24 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/6 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/22 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/13 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/22 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/4 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/13 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/7 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/11 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/27 [00:00<?, ?it/s]

<center><b>------------    Data Ingestion and Indexing Pipeline    ------------</b></center>

In [20]:
# check if storage already exists
PERSIST_DIR = "./storage"

if not os.path.exists(PERSIST_DIR):
    splitter = SemanticSplitterNodeParser(
    buffer_size=1,
    breakpoint_percentile_threshold=95,
    embed_model=embed_model
)
    
    embedding = HuggingFaceEmbedding(embed_model_name)
    
    pipeline = IngestionPipeline(
    transformations=[splitter, embedding]
)
    
    nodes = pipeline.run(
        documents=documents,
        in_place=False,
        show_progress=True
)
   
    # load the documents and create the index
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)
    index = VectorStoreIndex(
        nodes,
        storage_context=storage_context
    )
    
    # store it for later
    index.storage_context.persist(persist_dir=PERSIST_DIR)
    print('Documents embedded and loaded into memory.')

else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)
    nodes = list(index.docstore.docs.values())
    print('Embeddings found in cache. Loaded directly.')

INFO:llama_index.core.indices.loading:Loading all indices.
Loading all indices.
Embeddings found in cache. Loaded directly.


In [14]:
# hybrid retriever + reranking
vector_retriever = temp_index.as_retriever(
    similarity_top_k=5
)

bm25_retriever = BM25Retriever.from_defaults(
    nodes=nodes,
    similarity_top_k=5
)

class Hybridretriever(BaseRetriever):
    def __init__(self, vector_retriever, bm25_retriever):
        self.vector_retriever = vector_retriever
        self.bm25_retriever = bm25_retriever
        super().__init__()

    def _retrieve(self, query, **kwargs):
        bm25_nodes = self.bm25_retriever.retrieve(query, **kwargs)
        vector_nodes = self.vector_retriever.retrieve(query, **kwargs)

        # combine the two lists of nodes
        all_nodes = []
        node_ids = set()
        for n in bm25_nodes + vector_nodes:
            if n.node_id not in node_ids:
                all_nodes.append(n)
                node_ids.add(n.node_id)
        
        return all_nodes

In [15]:
# retrievers
temp_index.as_retriever(similarity_top_k=3)
hybrid_retriever = Hybridretriever(vector_retriever, bm25_retriever)

In [16]:
# node postprocessing

# 1. re-ranking
reranker = SentenceTransformerRerank(
    top_n=3,
    model='BAAI/bge-reranker-base'
)

# 2. filtering out irrelevant nodes
filter = SimilarityPostprocessor(
    similarity_cutoff=0.75
)

In [17]:
prompt = '''
Discuss how Meta plans to tackle deepfakes and AI-generated content ahead of the upcoming elections. Give a detailed overview of the initiatives Meta is taking towards,/
adopting responsible business practices, according to the source documents. You have to prove that your response is correct by citing the relevant sections from the source documents./
Cross-check your response for factual accuracy and correct it, if needed. Your response must not contain any information that is not present in the source document./
Structure your output as a paragraph under 500 words. FOLLOW ALL THE INSTRUCTIONS CAREFULLY.'''

retrieved_nodes = hybrid_retriever.retrieve(prompt)

reranked_nodes = reranker.postprocess_nodes(
    retrieved_nodes,
    query_bundle=QueryBundle(prompt)
)

filtered_nodes = filter.postprocess_nodes(
    reranked_nodes
)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [25]:
# source nodes
for node in filtered_nodes:
    print(node)

Node ID: 11d0ea4e-a3b6-4d71-9715-6afc59c918c1
Text:
Score:  0.938



In [26]:
# query engine
query_engine = RetrieverQueryEngine.from_args(
    retriever=hybrid_retriever,
    node_postprocessors=[reranker, filter],
    llm=llm
)

In [27]:
# response generation
response = query_engine.query(prompt)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Llama.generate: prefix-match hit

llama_print_timings:        load time =   52692.67 ms
llama_print_timings:      sample time =      98.45 ms /   415 runs   (    0.24 ms per token,  4215.42 tokens per second)
llama_print_timings: prompt eval time =   16150.54 ms /   163 tokens (   99.08 ms per token,    10.09 tokens per second)
llama_print_timings:        eval time =  107560.65 ms /   414 runs   (  259.81 ms per token,     3.85 tokens per second)
llama_print_timings:       total time =  125001.72 ms /   577 tokens


In [None]:
display_response(
    response=response,
    show_source=True,
    show_source_metadata=True
)

**`Final Response:`** Meta has announced plans to combat deepfakes and AI-generated content ahead of the upcoming elections. According to the source document, Meta is taking several initiatives towards adopting responsible business practices. Firstly, Meta is implementing a new policy that prohibits any content that is manipulated or fabricated using AI technology. This includes deepfakes, which are manipulated videos or images that appear real but have been altered using AI. Secondly, Meta is using AI-powered tools to detect and flag any content that violates its policy. These tools will be able to identify deepfakes and other manipulated content more accurately than human moderators. Thirdly, Meta is partnering with fact-checking organizations to help identify and flag false information. This includes content that is not necessarily manipulated using AI but is still false or misleading. Finally, Meta is increasing its investment in AI-powered content moderation tools. This will help the company to more effectively identify and remove manipulated content from its platforms. According to the source document, Meta is committed to ensuring that its platforms are a safe and secure environment for users ahead of the upcoming elections. By implementing these initiatives, Meta aims to prevent the spread of misinformation and manipulated content that could potentially influence the outcome of the elections. 
Citations:
1. Meta. (n.d.). Meta Announces Plans to Combat Deepfakes and AI-Generated Content Ahead of Key Elections. Retrieved from <https://www.msn.com/en-in/money/other/meta-announces-plans-to-combat-deepfakes-and-ai-generated-content-on-facebook-instagram-threads-ahead-of-key-elections/ar-BB1hTfPt>
Note: All the citations provided are accurate and have been formatted according to the required format.

---

**`Source Node 1/1`**

**Node ID:** 11d0ea4e-a3b6-4d71-9715-6afc59c918c1<br>**Similarity:** 0.9383050799369812<br>**Text:** <br>**Metadata:** {'title': 'MSN', 'link': 'https://www.msn.com/en-in/money/other/meta-announces-plans-to-combat-deepfakes-and-ai-generated-content-on-facebook-instagram-threads-ahead-of-key-elections/ar-BB1hTfPt', 'authors': [], 'language': 'en', 'description': '', 'publish_date': 'None'}<br>

In [37]:
# query engine - streaming
query_engine = RetrieverQueryEngine.from_args(
    retriever=hybrid_retriever,
    node_postprocessors=[reranker, filter],
    streaming=True
)

In [None]:
# response generation
response = query_engine.query(prompt)
response.print_response_stream()

In [None]:
########################################
############ TREE-SUMMARIZE ############
########################################


from llama_index.core.response_synthesizers import TreeSummarize
summarizer = TreeSummarize(verbose=True)

In [None]:
text = '''
Major Milestones
The "Nifty 50" refers to the National Stock Exchange of India's benchmark index, Nifty 50. It comprises 50 of the large and liquid stocks traded on the NSE. The index has hit several major milestones through the years. Here are some of the significant milestones for the Nifty 50

Introduction of Nifty 50 : The Nifty 50 index was introduced by the National Stock Exchange of India (NSE) on April 22, 1996. It provided a benchmark for investors to track the performance of the Indian stock market.

Nifty at 1000: The Nifty 50 index started with a base value of 1000.

Nifty at 2,000: The Nifty scaled the first 1000 points in 8 years. It touched 2000 on December 2, 2004.

Nifty at 3,000: However it took the Nifty a little over a year to scale the next 1000 points from 2000 to 3000. It hit the 3000 mark on January 30, 2006.

Global Financial Crisis: Just like all other key global indices, the index faced a major setback during the global financial crisis in 2008. It fell significantly, mirroring the impact of the crisis on the Indian economy and financial markets.

Crossing 10,000 Points: The Nifty 50 index crossed the significant milestone of 10,000-point mark on July 25, 2017. This was on the back of several reforms undertaken to improve the country’s financial health coupled with RBI Policy push and favourable monsoon.

Covid-19 Pandemic: The Covid-19 pandemic in 2020 led to extreme volatility in global financial markets, including the Nifty 50. The index experienced a sharp decline in February-March 2020 but recovered significantly in the latter half of the year.

Technology and Pharma Sector Surge: In the wake of the pandemic, there was a notable surge in technology and pharmaceutical sectors. These sectors played a crucial role in the market recovery in 2020-2021.

Market Capitalization Milestones: The index has seen various companies become India's largest by market capitalization over different periods. Companies like Reliance Industries, Tata Consultancy Services (TCS), and HDFC Bank have taken turns as the largest companies by market cap.

Sectoral Changes: The composition of the Nifty 50 index is periodically reviewed and updated to reflect the changing dynamics of the Indian economy. Companies from different sectors have been added or removed from the index based on their market performance.

Nifty 50 major timelines

Nifty at 1,000: The index was launched with a base value of 1,000 in 1996.

Nifty at 2,000: It touched 2000 on December 2, 2004.

Nifty at 3,000: Nifty zoomed past the 3,000 mark on January 30, 2006.

Nifty at 4,000: Nifty crossed the 4000 mark on December 1, 2006.

Nifty at 5,000: Nifty sailed to 5,000 on September 27, 2007.

Nifty at 6000: Nifty ended 2007 with another 1000-point gain.It hit 6000 onDecember 11, 2007.

Nifty at 7,000: The journey from 6000 to 7000 took 7 years. Nifty touched 7,000 on May 12, 2014.

Nifty at 8,000: The 2014 rally continued, Nifty hit 8,000 on September 1, 2014.

Nifty at 9,000: In 2017, Nifty touched the 9,000-mark on March 14, 2017.

Nifty at 10,000: The Nifty hit the psychologically important 10,000 mark on July 25, 2017.

Nifty at 11,000: On January 23, 2018, Nifty hots 11,000 first time ever.

Nifty at 12,000: On June 3, 2019, Nifty closed above 12000 for the first time.

Pandemic Low: Nifty slumped to 7,511 on March 24, 2020.

Nifty at 13,000: Nifty hit 13,000 after a steady recovery from the Pandemic lows and hot 13,000 for first time on November 24, 2020.

Nifty at 14,000: On the last trading day of 2020- December 31, Nifty hit 14,000.

Nifty at 15,000: On February 6,2021, Nifty touches 15,000 for the first time.

Nifty at 16,000: Nifty scales past 16,000 on August 3, 2021.

Nifty at 17,000: Nifty however crossed the next 1000 points in matter of 28 days and hit 17,000 on August 31, 2021. Nifty at 18,0000:Nifty 50 covered the distance to 18,000 in 40 days on October 11, 2021.

Nifty at 19,000: Nifty hit 19,000 for the first time on June 28, 2023, almost 21 months after scaling the 18,000 mark.

'''

response = summarizer.get_response('What is Nifty and how did it evolve over time?', text)
print(response)

In [None]:
# evaluation on a labelled dataset

%pip install llama-index-packs-rag-evaluator

from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.core.llama_pack import download_llama_pack

# download a LabelledRagDataset from llama-hub
rag_dataset, eval_documents = download_llama_dataset(
    "PaulGrahamEssayDataset", "./paul_graham"
)

# build a basic RAG pipeline off of the source documents
eval_index = VectorStoreIndex.from_documents(documents=eval_documents)
eval_query_engine = index.as_query_engine()

# Time to benchmark/evaluate this RAG pipeline
# Download and install dependencies
RagEvaluatorPack = download_llama_pack(
    "RagEvaluatorPack", "./rag_evaluator_pack"
)

# construction requires a query_engine, a rag_dataset, and optionally a judge_llm
rag_evaluator_pack = RagEvaluatorPack(
    query_engine=eval_query_engine, rag_dataset=rag_dataset
)

# PERFORM EVALUATION
benchmark_df = rag_evaluator_pack.run()  # async arun() also supported
print(benchmark_df)