<center><h1><b>RCK-GPT : An AI Tool for Financial Research and Analytics</b></h1></center>

This is an AI solution for performing in-depth financial research and analysis. This system is based on Retrieval-Augmented Generation (RAG), utilizing a locally run Llama2-7b-chat LLM, develoepd by Meta. This system uses completely open-source components and takes care of the data security considerations as well, by hosting everything on a local system.

<center><b>------------    HuggingFace CLI Login and Module Imports    ------------</b></center>

In [1]:
!huggingface-cli login

In [2]:
import os
import logging
import sys
import torch
from transformers import AutoTokenizer
import nest_asyncio 
nest_asyncio.apply()

from llama_index.llms import LlamaCPP
from llama_index import (
    ServiceContext,
    SimpleDirectoryReader,
    VectorStoreIndex,
    set_global_service_context,
    set_global_tokenizer
)

from llama_hub.web.news import NewsArticleReader
from llama_index import download_loader


from llama_index.embeddings import HuggingFaceEmbedding
from llama_index.node_parser import SemanticSplitterNodeParser
from llama_index.ingestion import IngestionPipeline
from llama_index.query_engine import CitationQueryEngine
#from llama_index.prompts import PromptTemplate

<center><b>------------    Logging    ------------</b></center>

In [3]:
logging.basicConfig(
    stream = sys.stdout,
    level = logging.INFO
)
logging.getLogger().addHandler(
    logging.StreamHandler(
        stream = sys.stdout
    )
)

<center><b>------------    Large Language Models (LLMs)    ------------</b></center>

We are using locally running open-source LLMs for our system. The details are as follows.

* Foundational Model : **Llama2-7b-chat**
* Tokenizer model : **Llama2-7b-chat _(tokenizer)_**
* Embedding model : **WhereIsAI/UAE-Large-V1**

In [4]:
model_name = 'Llama2-7b-chat'
model_path = r"C:\0-VARAD-DESHMUKH\models\llama-2-7b-chat.Q6_K.gguf"
max_new_tokens = 2048
context_window = 4096

llm = LlamaCPP(
    model_path = model_path,
    temperature = 0,
    max_new_tokens = max_new_tokens,
    context_window = context_window,
    generate_kwargs = {},
    model_kwargs = {
        'load_in_8bit' : True
    }
)

tokenizer_model = r'meta-llama/Llama-2-7b-chat-hf'
token = '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'
set_global_tokenizer(
    AutoTokenizer.from_pretrained(
        tokenizer_model,
        token = token
    ).encode
)

print('Foundational model loaded.')
print('Details -\nModel name: ', model_name, '\nModel Directory: ', model_path)

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | 
Model metadata: {'general.name': 'LLaMA v2', 'general.architecture': 'llama', 'llama.context_length': '4096', 'llama.rope.dimension_count': '128', 'llama.embedding_length': '4096', 'llama.block_count': '32', 'llama.feed_forward_length': '11008', 'llama.attention.head_count': '32', 'tokenizer.ggml.eos_token_id': '2', 'general.file_type': '18', 'llama.attention.head_count_kv': '32', 'llama.attention.layer_norm_rms_epsilon': '0.000001', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.unknown_token_id': '0'}


Foundational model loaded.
Details -
Model name:  Llama2-7b-chat 
Model Directory:  C:\0-VARAD-DESHMUKH\models\llama-2-7b-chat.Q6_K.gguf


In [5]:
embed_model_path = r"C:\Users\rck05\.cache\huggingface\hub\models--WhereIsAI--UAE-Large-V1\snapshots\82f6ace7a8954c012dd2ae05e2604fbc9007205b"
embed_model_name = 'WhereIsAI/UAE-Large-V1'

if not os.path.exists(embed_model_path):
    embed_model = HuggingFaceEmbedding(
        embed_model_name
    )
    print('Embedding model not found in cache. Downloading and creating one.!')
else:
    embed_model = HuggingFaceEmbedding(
        embed_model_path
    ) 
    print('Embedding model found in cache.')

print('Details -\nModel name: ', embed_model_name, '\nModel Directory: ', embed_model_path)

Embedding model found in cache.
Details -
Model name:  WhereIsAI/UAE-Large-V1 
Model Directory:  C:\Users\rck05\.cache\huggingface\hub\models--WhereIsAI--UAE-Large-V1\snapshots\82f6ace7a8954c012dd2ae05e2604fbc9007205b


<center><b>------------    Global Service Context    ------------</b></center>

In [6]:
service_context = ServiceContext.from_defaults(
    llm = llm,
    embed_model = embed_model
)

set_global_service_context(service_context)
print('Global context set.')
print('Foundational model: ', model_name)
print('Embedding model: ', embed_model_name)

Global context set.
Foundational model:  Llama2-7b-chat
Embedding model:  WhereIsAI/UAE-Large-V1


<center><b>------------    Data Loading    ------------</b></center>

We load the source documents into a local directory. The source documents could be:
1. Local PDFs
2. News Articles
3. Websites
4. Static HTMLs - SEC filings, etc.

In [7]:
##### Local PDFs #####

#document_directory = r"C:\0-VARAD-DESHMUKH\Files\data"

#pdfs = SimpleDirectoryReader(
 #   document_directory,
  #  filename_as_id=True
#).load_data()

In [7]:
##### News Articles #####

news_articles = [
    r'https://www.indiatvnews.com/technology/news/meta-collaborates-with-ncmec-to-extend-take-it-down-program-for-teenagers-2024-02-07-915677',
    r'https://www.msn.com/en-in/money/news/meta-to-label-ai-generated-images-across-social-media-platforms-details-here/ar-BB1hTNrL',
    r'https://www.msn.com/en-in/money/other/meta-announces-plans-to-combat-deepfakes-and-ai-generated-content-on-facebook-instagram-threads-ahead-of-key-elections/ar-BB1hTfPt',
    r'https://timesofindia.indiatimes.com/gadgets-news/20-years-of-facebook-meta-added-more-than-one-tcs-in-a-day-to-its-value/articleshow/107460150.cms',
    r'https://www.nytimes.com/2024/02/01/technology/meta-profit-report.html',
    r'https://www.msn.com/en-in/money/markets/meta-platforms-shatters-records-with-a-196-bn-surge-in-stock-market-value/ar-BB1hMN6e'
]

reader = NewsArticleReader(use_nlp=False)

news = reader.load_data(
    news_articles
)

# change 'publish_date' metadata to string for JSON serialization
for i in range(len(news)):
    news[i].metadata['publish_date'] = str(news[i].metadata['publish_date'])

In [22]:
##### Websites #####

WholeSiteReader = download_loader('WholeSiteReader')

prefix = r'https://about.fb.com'
base_url = r'https://about.fb.com/news'
max_depth = 1

scraper = WholeSiteReader(
    prefix=prefix,
    max_depth=max_depth
)

websites = scraper.load_data(
    base_url=base_url
)

Visiting: https://about.fb.com/news, 0 left
Found 255 new potential links
Visiting: https://about.fb.com/news/, 12 left
Found 204 new potential links
Visiting: https://about.fb.com/news/2024/02/labeling-ai-generated-images-on-facebook-instagram-and-threads/, 11 left
Found 225 new potential links
Visiting: https://about.fb.com/news/2024/02/helping-teens-avoid-sextortion-scams/, 26 left
Found 217 new potential links
Visiting: https://about.fb.com/news/2024/01/our-work-to-help-provide-young-people-with-safe-positive-experiences/, 31 left
Found 207 new potential links
Visiting: https://about.fb.com/news/2024/01/investing-in-privacy/, 33 left
Found 212 new potential links
Visiting: https://about.fb.com/news/2023/12/metas-2023-progress-in-ai-and-mixed-reality/, 36 left
Found 220 new potential links
Visiting: https://about.fb.com/news/2024/01/introducing-stricter-message-settings-for-teens-on-instagram-and-facebook/, 44 left
Found 212 new potential links
Visiting: https://about.fb.com/news/20

In [23]:
##### Static htmls : SEC filings, etc. #####

SimpleWebPageReader = download_loader('SimpleWebPageReader')

urls = [
    r'https://www.sec.gov/Archives/edgar/data/1326801/000132680124000012/meta-20231231.htm'
]
loader = SimpleWebPageReader()
htmls = loader.load_data(
    urls=urls
)

In [9]:
documents = news #+ websites + htmls

<center><b>------------    Data Ingestion and Indexing Pipeline    ------------</b></center>

In [10]:
splitter = SemanticSplitterNodeParser(
    buffer_size=1,
    breakpoint_percentile_threshold=95,
    embed_model=embed_model
)

embedding = HuggingFaceEmbedding(embed_model_name)

pipeline = IngestionPipeline(
    transformations=[splitter, embedding]
)

In [11]:
nodes = pipeline.run(
    documents=documents,
    in_place=False,
    show_progress=True
)

Parsing nodes:   0%|          | 0/6 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/13 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/7 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/11 [00:00<?, ?it/s]

Generating embeddings: 0it [00:00, ?it/s]

Generating embeddings:   0%|          | 0/9 [00:00<?, ?it/s]

In [27]:
print('Total number of nodes derived from the source documents : ', len(nodes))

Total number of nodes derived from the source documents :  55


<center><b>------------    Storage of Vector Embeddings    ------------</b></center>

In [12]:
index = VectorStoreIndex(nodes)

<center><b>------------    Query Engine (with streaming)    ------------</b></center>

In [16]:
query_engine = index.as_query_engine(
    streaming=True
)

def generate(prompt):
    response = query_engine.query(prompt)
    response.print_response_stream()

<center><b>------------    Prompts and Responses    ------------</b></center>

In [14]:
prompt = '''
Explain in detail how Meta plans to combat deepfakes and AI-generated content ahead of the upcoming elections./
Also present an overview of how Meta plans to label AI-generated images on the internet. Use the information given in the source documents only./
You have to also cite the resources used to answer the above questions.
'''
generate(prompt)

 Based on the provided context information, Meta (formerly Facebook) has announced plans to combat deepfakes and AI-generated content ahead of upcoming elections. The company aims to prevent the spread of manipulated media, including deepfakes and AI-generated images, on its platforms such as Facebook and Instagram.
To combat deepfakes and AI-generated content, Meta has planned the following steps:
1. Detection and Removal: Meta will use artificial intelligence (AI) and machine learning (ML) to detect and remove manipulated media from its platforms. The company has developed a new detection tool that can identify deepfakes and AI-generated content with high accuracy.
2. Labeling: Meta plans to label media posts containing manipulated content, indicating that the content has been altered or fabricated. This will help users understand when they are viewing manipulated content and make informed decisions about its credibility.
3. Educating Users: Meta will educate users on how to identify

In [15]:
prompt = '''
Present a detailed overview of the current market condition of Meta. Use the numerical facts in the source documents to elucidate your point./
Also discuss about the current market capitalization value of Meta. Use the information given in the source documents only./
You have to also cite the resources used to answer the above questions.
'''
generate(prompt)

Llama.generate: prefix-match hit


 Based on the provided context, the current market condition of Meta can be summarized as follows:
1. Impressive fourth-quarter earnings: Meta reported a 25% year-over-year (YoY) revenue growth, with total revenue rising from $32.2 billion to $40.1 billion.
2. Stock surge: The company's stock price rose 20% on February 2, 2023, closing at an all-time high of $475, adding more than one TCS (Tata Consultancy Services) to its market value. This sent the company's market capitalization to a record $197 billion in a day - two years after a $251 billion wipeout.
3. Market capitalization value: As of February 2023, Meta's market capitalization value is $197 billion, with a current stock price of $475 per share. This value is more than twice the market capitalization of ICICI Bank ($86 billion), India's second-most valued company at that time.
According to the source documents:
Source 1:
* "Meta added 'more than one TCS' in a day to its value." (The Economic Times)
Source 2:
* "The company’s s

In [17]:
prompt = '''
You have to present a detailed analysis of the financial condition of Meta. Also discuss how the revenue of Meta is growing./
Use the numerical data from the source documents to make your point. Discuss how Meta sees its revenue growing in the coming years./
Your response should be technical in tone, apt to be included in a market research report. Limit to the information given in the source documents only./
You have to also cite the resources used to answer the above questions.
'''
generate(prompt)

Llama.generate: prefix-match hit



Meta's financial performance has shown significant improvement in its latest quarterly results, with revenue growing by 25% and profit more than tripling. This growth can be attributed to the company's cost-cutting measures and increased efficiency after layoffs last year. The company's ad business has been a major contributor to this growth, as Meta continues to invest heavily in data centers and other infrastructure.
Revenue Growth:
According to the source documents, Meta's quarterly revenue rose 25% to $21.7 billion, driven by increased ad sales and growth in its messaging services (Instagram and WhatsApp). This is a significant improvement from the previous year, when revenue grew by only 10%. The company's ad business has been a major contributor to this growth, as Meta continues to invest heavily in data centers and other infrastructure.
Profit Growth:
The company's net income also more than tripled to $8.5 billion, driven by the same factors that contributed to revenue growth. 

In [18]:
prompt = '''
What is the 'take-it-down' program? Explain in 100 words how Meta is planning to do this and what collaborations it is having?
Your response should be technical in tone, apt to be included in a market research report. Limit to the information given in the source documents only./
You have to also cite the resources used to answer the above questions.
'''
generate(prompt)

Llama.generate: prefix-match hit



The 'Take It Down' program is an initiative by Meta (formerly Facebook) and the National Center for Missing & Exploited Children (NCMEC) to help teenagers combat sextortion globally. The program offers support in multiple languages, including Hindi, Chinese, French, German, and others. Meta has expanded its reach by collaborating with Thorn, a nonprofit working to defend children from sexual abuse, to provide updated guidance for teens on handling sextortion. Additionally, the platform will offer advice for parents and teachers on supporting affected teens or students. (Source: India Tv News)

**Total running time of the script : 17.62 minutes (approx. 18 minutes)**