#  Notebook: Chatting with NVIDIA Financial Reports

 In this notebook, we are going to use milvus as vectorstore, the **mixtral_8x7b as LLM** and **ai-embed-qa-4 embedding** provided by [NVIDIA_AI_Endpoint](https://python.langchain.com/docs/integrations/text_embedding/nvidia_ai_endpoints) as LLM and embedding model, and build a simply RAG example for chatting with NVIDIA Financial Reports.


NVIDIA financial reports are available pubicly in nvidianews. 

Below is an example of financial report in Fiscal Year 2024 Q1 

https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2024

<img src="imgs/nvidianews.png" alt="drawing" width="800"/>

### Step 1  - Export the NVIDIA_API_KEY
Supply the NVIDIA_API_KEY in this notebook when you run the cell below

In [None]:
# import getpass
# import os
# if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
#     print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
# else:
#     nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
#     assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
#     os.environ["NVIDIA_API_KEY"] = nvapi_key

In [1]:
import os
nvapi_key = os.getenv("NVIDIA_API_KEY")

### Step 2 - initialize the LLM and Embedding Model
Here we will use **mixtral_8x7b** 

In [4]:
# test run and see that you can genreate a respond successfully
from langchain_nvidia_ai_endpoints import ChatNVIDIA,NVIDIAEmbeddings
llm = ChatNVIDIA(model="ai-mixtral-8x7b-instruct", nvidia_api_key=nvapi_key, max_tokens=1024)
from langchain.vectorstores import Milvus
import torch
import time
embedder_document = NVIDIAEmbeddings(model="ai-embed-qa-4", model_type="passage")
embedder_query = NVIDIAEmbeddings(model="ai-embed-qa-4", model_type="query")



### Step 3 - Ingest http files

#### 3.1 Download http files covering financial reports from Fiscal year 2020 to 2024

In [6]:
import requests

urls_content = []

url_template1 = "https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-{quarter}-quarter-fiscal-{year}"
url_template2 = "https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-{quarter}-quarter-and-fiscal-{year}"

for quarter in ["first", "second", "third", "fourth"]:
    for year in range(2020,2025):
        args = {"quarter":quarter, "year": str(year)}
        if quarter == "fourth":
            urls_content.append(requests.get(url_template2.format(**args)).content)
        else:
            urls_content.append(requests.get(url_template1.format(**args)).content)

urls_content

[b'\n\n\n<!doctype html>\n<html lang="en-us" class="no-js">\n<head>\n  \n  \n  <meta charset="utf-8">\n  <meta http-equiv="X-UA-Compatible" content="IE=Edge">\n  <meta name="viewport" content="width=device-width, initial-scale=1.0">\n  <meta name="generator" content="iPR Software">\n\n  \n  \n    \n      \n<title>NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom</title>\n<meta name="description" content="NVIDIA today reported revenue for the first quarter ended April 28, 2019, of $2.22 billion compared with $3.21 billion a year earlier and $2.21 ..." />\n<meta name="keywords" content="">\n\n<meta property="og:title" content="NVIDIA Announces Financial Results for First Quarter Fiscal 2020">\n\n<meta property="og:type" content="article">\n\n<meta property="og:site_name" content="NVIDIA Newsroom">\n<meta property="og:url" content="http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2020">\n<meta property="og:descr

#### 3.2 Parse html files

In [8]:
# extract the url, title, text content, and tables in the html
from bs4 import BeautifulSoup
import markdownify
def extract_url_title_time(soup):
    url = ""
    title = ""
    revised_time = ""
    tables = []
    try:
        if soup.find("title"):
            title = str(soup.find("title").string)

        og_url_meta = soup.find("meta", property="og:url")
        if og_url_meta:
            url = og_url_meta.get("content", "")

        for table in soup.find_all("table"):
            tables.append(markdownify.markdownify(str(table)))
            table.decompose()

        text_content = soup.get_text(separator=' ', strip=True)
        text_content = ' '.join(text_content.split())

        return url, title,text_content, tables
    except:
        print("parse error")
        return "", "", "", "", []

parsed_htmls = []
for url_content in urls_content:
    soup = BeautifulSoup(url_content, 'html.parser')
    url, title, content, tables = extract_url_title_time(soup)
    parsed_htmls.append({"url":url, "title":title, "content":content, "tables":tables})

parsed_htmls


[{'url': 'http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2020',
  'title': 'NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom',
  'content': 'NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom Artificial Intelligence Computing Leadership from NVIDIA PLATFORMS Autonomous Machines Cloud & Data Center Deep Learning & Ai Design & Pro Visualization Healthcare High Performance Computing Self-Driving Cars Gaming & Entertainment other links Developers Industries Shop Drivers Support About NVIDIA View All Products GPU TECHNOLOGY CONFERENCE NVIDIA Blog Community Careers TECHNOLOGIES Newsroom NVIDIA in Brief Exec Bios NVIDIA Blog Podcast Media Assets In the News Press Contacts Online Press Kit NVIDIA in Brief Exec Bios NVIDIA Blog Podcast Media Assets In the News Press Contacts Online Press Kit Press Release Share Tweet Twitter Share LinkedIn Share Facebook Email ic_arrow-back-to-top NV

#### 3.3 Summarize tables

In [9]:
# summarize tables
def get_table_summary(table, title, llm):
    res = ""
    try:
        #table = markdownify.markdownify(table)
        prompt = f"""
                    [INST] You are a virtual assistant.  Your task is to understand the content of TABLE in the markdown format.
                    TABLE is from "{title}".  Summarize the information in TABLE into SUMMARY. SUMMARY MUST be concise. Return SUMMARY only and nothing else.
                    TABLE: ```{table}```
                    Summary:
                    [/INST]
                """
        result = llm.invoke(prompt)
        res = result.content
    except Exception as e:
        print(f"Error: {e} while getting table summary from LLM")
        if not os.getenv("NVIDIA_API_KEY", False):
            print("NVIDIA_API_KEY not set")
        pass
    finally:
        return res


for parsed_item in parsed_htmls:
    title = parsed_item['title']
    for idx, table in enumerate(parsed_item['tables']):
        print(f"parsing tables in {title}...")
        table = get_table_summary(table, title, llm)
        parsed_item['tables'][idx] = table


parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2021 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2021 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2021 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Q

#### 3.4 Split the text/table in chunks, extract embedding for each chunk, and store the embeddinges into milvus vectordb

In [10]:
from langchain.docstore.document import Document
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
TEXT_SPLITTER_MODEL = "intfloat/e5-large-v2"
TEXT_SPLITTER_CHUNCK_SIZE = 200
TEXT_SPLITTER_CHUNCK_OVERLAP = 50

text_splitter = SentenceTransformersTokenTextSplitter(
    model_name=TEXT_SPLITTER_MODEL,
    tokens_per_chunk=TEXT_SPLITTER_CHUNCK_SIZE,
    chunk_overlap=TEXT_SPLITTER_CHUNCK_OVERLAP,
)

documents = []

for parsed_item in parsed_htmls:
    title = parsed_item['title']
    url =  parsed_item['url']
    text_content = parsed_item['content']
    documents.append(Document(page_content=text_content, metadata = {'title':title, 'url':url}))

    for idx, table in enumerate(parsed_item['tables']):
        table_content = table
        documents.append(Document(page_content=table, metadata = {'title':title, 'url':url}))

documents = text_splitter.split_documents(documents)
print(f"obtain {len(documents)} chunks")

modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


README.md:   0%|          | 0.00/67.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/616 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/201 [00:00<?, ?B/s]

obtain 753 chunks


In [12]:
COLLECTION_NAME = "NVIDIA_Finance"
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embedder_document,
    collection_name=COLLECTION_NAME,
    )
vec_count = vectorstore._collection.count()
print(f"Vector count {vec_count}")
docs = vectorstore.similarity_search_with_score("what are 2024 Q3 revenues? ")
print(docs)

Vector count 753
[(Document(page_content='q3 fy20 ( non - gaap ) : revenue $ 3, 014 million, up 17 % q / q and down 5 % y / y ; gross margin 64. 1 %, up 400 bps q / q and 310 bps y / y ; operating income $ 1, 156 million, up 44 % q / q and down 4 % y / y ; net income $ 1, 103 million, up 45 % q / q and down 4 % y / y ; diluted eps $ 1. 78, up 44 % q / q and down 3 % y / y.', metadata={'title': 'NVIDIA Announces Financial Results for Third Quarter Fiscal 2020 | NVIDIA Newsroom', 'url': 'http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-third-quarter-fiscal-2020'}), 0.4972231090068817), (Document(page_content='revenue for q4 fy23 was $ 6, 051 million, with gross margin at 66. 1 %. operating expenses were $ 1, 775 million, leading to an operating income of $ 2, 224 million. net income was $ 2, 174 million, and diluted earnings per share were $ 0. 88. compared to q3 fy23, revenue increased by 2 %, and net income grew by 49 %. however, revenue and net income decreased

### Step 4  Retrieve and Generate Answer

In [13]:

from langchain.prompts.prompt import PromptTemplate

PROMPT_TEMPLATE = """[INST]You are a friendly virtual assistant and maintain a conversational, polite, patient, friendly and gender neutral tone throughout the conversation.

Your task is to understand the QUESTION, read the Content list from the DOCUMENT delimited by ```, generate an answer based on the Content, and provide references used in answering the question in the format "[Title](URL)".
Do not depend on outside knowledge or fabricate responses.
DOCUMENT: ```{context}```

Your response should follow these steps:

1. The answer should be short and concise, clear.
    * If detailed instructions are required, present them in an ordered list or bullet points.
2. If the answer to the question is not available in the provided DOCUMENT, ONLY respond that you couldn't find any information related to the QUESTION, and do not show references and citations.
3. Citation
    * ALWAYS start the citation section with "Here are the sources to generate response." and follow with references in markdown link format [Title](URL) to support the answer.
    * Use Bullets to display the reference [Title](URL).
    * You MUST ONLY use the URL extracted from the DOCUMENT as the reference link. DO NOT fabricate or use any link outside the DOCUMENT as reference.
    * Avoid over-citation. Only include references that were directly used in generating the response.
    * If no reference URL can be provided, remove the entire citation section.
    * The Citation section can include one or more references. DO NOT include same URL as multiple references. ALWAYS append the citation section at the end of your response.
    * You MUST follow the below format as an example for this citation section:
      Here are the sources used to generate this response:
      * [Title](URL)
[/INST]
[INST]
QUESTION: {question}
FINAL ANSWER:[/INST]"""

prompt_template = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["context", "question"])



def build_context(chunks):
    context = ""
    for chunk in chunks:
        context = context + "\n  Content: " + chunk.page_content + " | Title: (" + chunk.metadata["title"] + ") | URL: (" + chunk.metadata.get("url", "source") + ")"
    return context


def generate_answer(llm, vectorstore, prompt_template, question):
    retrieved_chunks = vectorstore.similarity_search(question)
    context = build_context(retrieved_chunks)
    args = {"context":context, "question":question}
    prompt = prompt_template.format(**args)
    ans = llm.invoke(prompt)
    return ans.content


question = "what are 2024 Q1 revenues?"

generate_answer(llm, vectorstore, prompt_template, question)

' The revenue for Q1 FY24 was $7,192 million.\n\nHere are the sources used to generate this response:\n- [NVIDIA Announces Financial Results for First Quarter Fiscal 2024](http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2024)'