# RAG with Unstructured, LangChain, & KDB.AI
##### Note: This example requires a KDB.AI endpoint and API key. Sign up for a free [KDB.AI account](https://kdb.ai/get-started).

> [KDB.AI](https://kdb.ai/) is a powerful knowledge-based vector database and search engine that allows you to build scalable, reliable AI applications, using real-time data, by providing advanced search, recommendation and personalization.

PDFs and other complex document types are notoriously difficult to work with, yet are the common file formats used for publishing important business related information. Since these file types are so common, it is key to have the capability to parse and ingest these documents swiftly, with accuracy, while cleanly extracting embedded entities such as images, tables, and graphs. If extracted correctly, all of the data held in a complex document like a PDF can be ingested into a RAG workflow to generate accurate and contextual responses for users and the business.

This sample will illustrate how to use Unstructured, a complex document parsing technology, to ingest complex documentation, partition it into useful elements, perform chunking and embedding, and finally store the embeddings in KDB.AI. After this, we can complete a RAG pipeline with LangChain and query the KDB.AI vector database to retrieve the most relevant elements and pass them to an LLM to generate a response.

We will focus in on how to enhance table elements with context and standardized formatting to enhance retrieval and generation.

Agenda:
1. Dependencies, Imports & Setup
2. Use Unstructured to Process Complex PDF Documentation
3. Embed Extracted Elements with OpenAI Embedding Model
4. Define KDB.AI Session
5. Create Schema and KDB.AI Table
6. Use LangChain and KDB.AI to Perform RAG!

## 1. Dependencies, Imports & Setup

In order to successfully run this sample, note the following steps depending on where you are running this notebook:

-***Run Locally / Private Environment:*** The [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` will guide you on prerequisites and how to run this with Jupyter.


-***Colab / Hosted Environment:*** Open this notebook in Colab and run through the cells.

In [None]:
!apt-get -qq install poppler-utils tesseract-ocr
%pip install -q --user --upgrade pillow
%pip install -q --upgrade unstructured["all-docs"]
%pip install pymupdf
%pip install kdbai_client
%pip install langchain-openai
%pip install langchain
#%pip install langchain-community
import os
!git clone -b KDBAI_v1.4 https://github.com/KxSystems/langchain.git
#!cd langchain/libs/community
os.chdir('langchain/libs/community')
!pip install .
%pip install --upgrade nltk

In [None]:
from unstructured.partition.pdf import partition_pdf
from unstructured.partition.auto import partition
from unstructured.embed.openai import OpenAIEmbeddingConfig, OpenAIEmbeddingEncoder
import fitz
from langchain_openai import OpenAIEmbeddings
import kdbai_client as kdbai
from langchain_community.vectorstores import KDBAI
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
import nltk
nltk.download('punkt')

Get OpenAI API key here:
- [OpenAI](https://platform.openai.com/api-keys)

In [4]:
import os
from getpass import getpass
# Set OpenAI API
if "OPENAI_API_KEY" in os.environ:
    OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
else:
    # Prompt the user to enter the API key
    OPENAI_API_KEY = getpass("OPENAI API KEY: ")
    # Save the API key as an environment variable for the current session
    os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY



#### Download Earnings Report

In [5]:
!wget 'https://s21.q4cdn.com/399680738/files/doc_news/Meta-Reports-Second-Quarter-2024-Results-2024.pdf' -O './doc1.pdf'

--2024-10-01 01:10:17--  https://s21.q4cdn.com/399680738/files/doc_news/Meta-Reports-Second-Quarter-2024-Results-2024.pdf
Resolving s21.q4cdn.com (s21.q4cdn.com)... 194.26.213.25, 2a09:cd46:f:426e::1
Connecting to s21.q4cdn.com (s21.q4cdn.com)|194.26.213.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 195613 (191K) [application/pdf]
Saving to: ‘./doc1.pdf’


2024-10-01 01:10:17 (5.77 MB/s) - ‘./doc1.pdf’ saved [195613/195613]



# 2. Use Unstructured to Process Complex PDF Documentation

1. Read in data
2. Partition using the 'hi_res' strategy
3. Chunk

In [7]:
elements = partition_pdf('./doc1.pdf',
                              strategy="hi_res",
                              chunking_strategy="by_title",
                              )

#### Explore the extracted elements

In [8]:
from collections import Counter
display(Counter(type(element) for element in elements))

Counter({unstructured.documents.elements.CompositeElement: 17,
         unstructured.documents.elements.Table: 10})

In [9]:
for element in elements:
  print(type(element))

<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.Table'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.Table'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.Table'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.Table'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.Table'>
<class 'unstructured.documents.elements.CompositeElement'>
<class 'unstructured.documents.elements.Table'>
<cla

In [10]:
for element in elements:
  if element.to_dict()['type'] == 'Table':
    print(element.text)

Three Months Ended June 30, In millions, except percentages and per share amounts 2024 2023 % Change Revenue $ 39,071 $ 31,999 22 % Costs and expenses 24,224 22,607 7 % Income from operations $ 14,847 $ 9,392 58 % Operating margin 38 % 29 % Provision for income taxes $ 1,641 $ 1,505 9 % Effective tax rate 11 % 16 % Net income $ 13,465 $ 7,788 73 % Diluted earnings per share (EPS) $ 5.16 $ 2.98 73 %
2024 2023 2024 2023 Revenue $ 39,071 Costs and expenses: Cost of revenue 7,308 5,945 13,948 Research and development 10,537 9,344 20,515 Marketing and sales 2,721 3,154 5,285 General and administrative (1) 3,658 4,164 7,114 Total costs and expenses 24,224 22,607 46,862 Income from operations 14,847 9,392 28,665 Interest and other income (expense), net 259 (99) 624 Income before provision for income taxes 15,106 9,293 29,289 Provision for income taxes 1,641 1,505 3,455 Net income $ 13,465
Basic $ 5.31 Diluted $ 5.16 Weighted-average shares used to compute earnings per share: Basic 2,534 2,568

#### What a table element looks like after extraction:

In [11]:
print(elements[-2])

2024 2023 2024 2023 $ 39,071 $ 31,999 $ 75,527 Foreign exchange effect on 2024 revenue using 2023 rates 371 265 Revenue excluding foreign exchange effect $ 39,442 $ 75,792 GAAP revenue year-over-year change % 22 % 25 % Revenue excluding foreign exchange effect year-over-year change % 23 % 25 % GAAP advertising revenue $ 38,329 $ 31,498 $ 73,965 Foreign exchange effect on 2024 advertising revenue using 2023 rates 367 261 Advertising revenue excluding foreign exchange effect $ 38,696 $ 74,226 GAAP advertising revenue year-over-year change % 22 % 24 % Advertising revenue excluding foreign exchange effect year-over-year 23 % 25 % Net cash provided by operating activities $ 19,370 $ 17,309 $ 38,616 Purchases of property and equipment, net (8,173) (6,134) (14,573) Principal payments on finance leases (299) (220) (614) $ 10,898 $ 10,955 $ 23,429


## Embed Extracted Elements with OpenAI Embedding Model


In [12]:
from unstructured.embed.openai import OpenAIEmbeddingConfig, OpenAIEmbeddingEncoder

embedding_encoder = OpenAIEmbeddingEncoder(
    config=OpenAIEmbeddingConfig(
      api_key=os.getenv("OPENAI_API_KEY"),
      model_name="text-embedding-3-small",
    )
)
elements = embedding_encoder.embed_documents(
    elements=elements
)

### Store original elements in a dataframe

In [13]:
import pandas as pd
data = []

for c in elements:
  row = {}
  row['id'] = c.id
  row['text'] = c.text.encode()
  row['metadata'] = c.metadata.to_dict()
  row['embedding'] = c.embeddings
  data.append(row)

df_non_contextualized = pd.DataFrame(data)
df_non_contextualized.head()

Unnamed: 0,id,text,metadata,embedding
0,7673dd5dd3348ca922edfeb765c4f8ec,b'FACEBOOK\n\nNEWS RELEASE\n\nMeta Reports Sec...,"{'filetype': 'application/pdf', 'languages': [...","[0.057783662762384565, 0.015500455026484948, 0..."
1,0042c2ce77a154ed737cfb0d9b20b598,"b'Three Months Ended June 30, In millions, exc...","{'last_modified': '2024-07-31T21:06:06', 'file...","[0.011829043858470036, 0.013133554749833454, 0..."
2,1006ba147b4696dcfa364d82a7cc3ff9,b'Second Quarter 2024 Operational and Other Fi...,"{'filetype': 'application/pdf', 'languages': [...","[0.07881357843105635, 0.010791519166744949, 0...."
3,3ccaeebfca3cd0b37f89d9865ed86620,"b""CFO Outlook Commentary\n\nWe expect third qu...","{'filetype': 'application/pdf', 'languages': [...","[0.06501258679384729, 0.02305678941583444, -0...."
4,f83ab884e1b7f6cd22bb6fec166375de,b'About Meta\n\nMeta builds technologies that ...,"{'filetype': 'application/pdf', 'languages': [...","[0.044354494483358486, -0.008683296817018685, ..."


### Create contextualized descriptions and markdown formatted tables, these new chunks will be used in place of the old table descriptions

In [14]:
import os
import openai
from openai import OpenAI

# Initialize the OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def get_table_description(table_content, document_context):
    prompt = f"""
    Given the following table and its context from the original document,
    provide a detailed description of the table. Then, include the table in markdown format.

    Original Document Context:
    {document_context}

    Table Content:
    {table_content}

    Please provide:
    1. A comprehensive description of the table.
    2. The table in markdown format.
    """

    response = client.chat.completions.create(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that describes tables and formats them in markdown."},
            {"role": "user", "content": prompt}
        ]
    )

    return response.choices[0].message.content

def extract_text_from_pdf(pdf_path):
    text = ""
    with fitz.open(pdf_path) as doc:
        for page in doc:
            text += page.get_text()
    return text

pdf_path = './doc1.pdf'
document_content = extract_text_from_pdf(pdf_path)

# Process each table in the directory
for element in elements:
  if element.to_dict()['type'] == 'Table':
    table_content = element.to_dict()['text']

    # Get description and markdown table from GPT-4
    result = get_table_description(table_content, document_content)
    element.text = result

print("Processing complete.")


Processing complete.


## Embed Extracted Text Elements and Updated Table Elements with OpenAI Embedding Model

In [15]:
from unstructured.embed.openai import OpenAIEmbeddingConfig, OpenAIEmbeddingEncoder

embedding_encoder = OpenAIEmbeddingEncoder(
    config=OpenAIEmbeddingConfig(
      api_key=os.getenv("OPENAI_API_KEY"),
      model_name="text-embedding-3-small",
    )
)
elements = embedding_encoder.embed_documents(
    elements=elements
)

### Take a look through the new contextualized table elements:

In [16]:
for element in elements:
  if element.to_dict()['type'] == 'Table':
    print(element.text)

### Comprehensive Description of the Table

The table presents a financial overview of Meta Platforms, Inc. for the second quarter, ending June 30, 2024, in comparison to the same period in 2023. The key financial metrics outlined in the table are revenue, costs and expenses, income from operations, operating margin, provision for income taxes, effective tax rate, net income, and diluted earnings per share (EPS). 

1. **Revenue**: Meta reported revenue of $39.071 billion in Q2 2024, a 22% increase from $31.999 billion in Q2 2023.

2. **Costs and Expenses**: Costs and expenses for Q2 2024 amounted to $24.224 billion, up 7% from $22.607 billion in Q2 2023.

3. **Income from Operations**: The income from operations saw a significant rise of 58%, reaching $14.847 billion in Q2 2024, compared to $9.392 billion in Q2 2023.

4. **Operating Margin**: The operating margin improved to 38% in Q2 2024 from 29% in the same quarter of the previous year.

5. **Provision for Income Taxes**: The provis

This markdown table provides a concise presentation of the financial data, making it easy to read and comprehend in a digital format.
### Detailed Description of the Table

The table presents segment information from Meta Platforms, Inc. for both revenue and income (loss) from operations. The data is organized into two main sections:
1. **Revenue**: This section is subdivided into two categories: "Advertising" and "Other revenue". The total revenue generated from these subcategories is then summed up for two segments: "Family of Apps" and "Reality Labs". The table provides the revenue figures for three months and six months ended June 30, for the years 2024 and 2023.
2. **Income (loss) from operations**: This section shows the income or loss from operations for the "Family of Apps" and "Reality Labs" segments, again for the same time periods.

The table allows for a comparison between the two segments of Meta's business over time, illustrating the performance of each segment in terms of revenue and operational income or loss.

### The Table in Markdown Format

```markdown
### Segment Information (In millions, Unaudited)

|                            | Three Months Ended June 30, 2024 | Three Months Ended June 30, 2023 | Six Months Ended June 30, 2024 | Six Months Ended June 30, 2023 |
|----------------------------|----------------------------------|----------------------------------|------------------------------- |-------------------------------|
| **Revenue:**               |                                  |                                  |                               |                               |
| Advertising                | $38,329                          | $31,498                          | $73,965                       | $59,599                       |
| Other revenue              | $389                             | $225                             | $769                          | $430                          |
| **Family of Apps**         | $38,718                          | $31,723                          | $74,734                       | $60,029                       |
| Reality Labs               | $353                             | $276                             | $793                          | $616                          |
| **Total revenue**          | $39,071                          | $31,999                          | $75,527                       | $60,645                       |
|                            |                                  |                                  |                               |                               |
| **Income (loss) from operations:** |                                  |                                  |                               |                               |
| Family of Apps             | $19,335                          | $13,131                          | $36,999                       | $24,351                       |
| Reality Labs               | $(4,488)                         | $(3,739)                         | $(8,334)                      | $(7,732)                      |
| **Total income from operations** | $14,847                          | $9,392                           | $28,665                       | $16,619                       |
```


### Create a Pandas dataframe to store text and updated table elements within

In [17]:
import pandas as pd
data = []

for c in elements:
  row = {}
  row['id'] = c.id
  row['text'] = c.text.encode()
  row['metadata'] = c.metadata.to_dict()
  row['embedding'] = c.embeddings
  data.append(row)

df_contextualized = pd.DataFrame(data)
df_contextualized.head()

Unnamed: 0,id,text,metadata,embedding
0,7673dd5dd3348ca922edfeb765c4f8ec,b'FACEBOOK\n\nNEWS RELEASE\n\nMeta Reports Sec...,"{'filetype': 'application/pdf', 'languages': [...","[0.05788249539290755, 0.015626749069267247, 0...."
1,0042c2ce77a154ed737cfb0d9b20b598,"b""### Comprehensive Description of the Table\n...","{'last_modified': '2024-07-31T21:06:06', 'file...","[-0.0004421418779838655, 0.011881716893403, 0...."
2,1006ba147b4696dcfa364d82a7cc3ff9,b'Second Quarter 2024 Operational and Other Fi...,"{'filetype': 'application/pdf', 'languages': [...","[0.07885341798234707, 0.010770062515672503, 0...."
3,3ccaeebfca3cd0b37f89d9865ed86620,"b""CFO Outlook Commentary\n\nWe expect third qu...","{'filetype': 'application/pdf', 'languages': [...","[0.06499940353648115, 0.02307817596004495, -0...."
4,f83ab884e1b7f6cd22bb6fec166375de,b'About Meta\n\nMeta builds technologies that ...,"{'filetype': 'application/pdf', 'languages': [...","[0.044356841931628053, -0.008697156860572564, ..."


# 4. Define KDB.AI Session
KDB.AI comes in two offerings:

KDB.AI Cloud - For experimenting with smaller generative AI projects with a vector database in our cloud.
KDB.AI Server - For evaluating large scale generative AI applications on-premises or on your own cloud provider.
Depending on which you use there will be different setup steps and connection details required.

Option 1. KDB.AI Cloud
To use KDB.AI Cloud, you will need two session details - a URL endpoint and an API key. To get these you can sign up for free here.

You can connect to a KDB.AI Cloud session using kdbai.Session and passing the session URL endpoint and API key details from your KDB.AI Cloud portal.

If the environment variables KDBAI_ENDPOINTS and KDBAI_API_KEY exist on your system containing your KDB.AI Cloud portal details, these variables will automatically be used to connect. If these do not exist, it will prompt you to enter your KDB.AI Cloud portal session URL endpoint and API key details.

### Option 1. KDB.AI Cloud

Find KDB.AI API Key and Endpoint here: [KDB.AI](https://kdb.ai/)

In [18]:
# Check if KDBAI_ENDPOINT is in the environment variables
if "KDBAI_ENDPOINT" in os.environ:
    KDBAI_ENDPOINT = os.environ["KDBAI_ENDPOINT"]
else:
    # Prompt the user to enter the API key
    KDBAI_ENDPOINT = input("KDB.AI ENDPOINT: ")
    # Save the API key as an environment variable for the current session
    os.environ["KDBAI_ENDPOINT"] = KDBAI_ENDPOINT

# Check if KDBAI_ENDPOINT is in the environment variables
if "KDBAI_API_KEY" in os.environ:
    KDBAI_API_KEY = os.environ["KDBAI_API_KEY"]
else:
    # Prompt the user to enter the API key
    KDBAI_API_KEY = input("KDB.AI KEY: ")
    # Save the API key as an environment variable for the current session
    os.environ["KDBAI_API_KEY"] = KDBAI_API_KEY

In [None]:
#connect to KDB.AI
session = kdbai.Session(api_key=KDBAI_API_KEY, endpoint=KDBAI_ENDPOINT)

### Option 2. KDB.AI Server
To use KDB.AI Server, you will need download and run your own container. To do this, you will first need to sign up for free here.

You will receive an email with the required license file and bearer token needed to download your instance. Follow instructions in the signup email to get your session up and running.

Once the setup steps are complete you can then connect to your KDB.AI Server session using kdbai.Session and passing your local endpoint.

In [20]:
### start session with KDB.AI Server
#session = kdbai.Session()

# 5. Create Schema and KDB.AI Table

In [21]:
schema = [
    {'name': 'id', 'type': 'str'},
    {'name': 'text', 'type': 'bytes'},
    {'name': 'metadata', 'type': 'general'},
    {'name': 'embedding', 'type': 'float32s'}
]

indexes = [{'name': 'flat_index', 'column': 'embedding', 'type': 'flat', 'params': {'dims': 1536, 'metric': 'L2'}}]

### Here we create two tables, one containing the original table elements, the other containing the newly contextualized and formatted table elements

In [22]:
Contextualized_KDBAI_TABLE_NAME = "Contextualized_Table"
non_Contextualized_KDBAI_TABLE_NAME = "Non_Contextualized_Table"
database = session.database('default')

# First ensure the tables do not already exist
for table in database.tables:
    if table.name in [Contextualized_KDBAI_TABLE_NAME, non_Contextualized_KDBAI_TABLE_NAME]:
        table.drop()

#Create the tables
table_contextualized = database.create_table(Contextualized_KDBAI_TABLE_NAME, schema=schema, indexes=indexes)
table_non_contextualized = database.create_table(non_Contextualized_KDBAI_TABLE_NAME, schema=schema, indexes=indexes)

In [23]:
# Insert Elements into the KDB.AI Tables
table_contextualized.insert(df_contextualized)
table_non_contextualized.insert(df_non_contextualized)

{'rowsInserted': 27}

In [24]:
# Check to see that the elements were inserted
table_contextualized.query()

Unnamed: 0,id,text,metadata,embedding
0,7673dd5dd3348ca922edfeb765c4f8ec,b'FACEBOOK\n\nNEWS RELEASE\n\nMeta Reports Sec...,"{'filetype': 'application/pdf', 'languages': [...","[0.057882495, 0.015626749, 0.0125014, 0.016935..."
1,0042c2ce77a154ed737cfb0d9b20b598,"b""### Comprehensive Description of the Table\n...","{'last_modified': '2024-07-31T21:06:06', 'file...","[-0.00044214187, 0.011881717, 0.03507208, 0.01..."
2,1006ba147b4696dcfa364d82a7cc3ff9,b'Second Quarter 2024 Operational and Other Fi...,"{'filetype': 'application/pdf', 'languages': [...","[0.07885342, 0.010770063, 0.032093342, 0.07417..."
3,3ccaeebfca3cd0b37f89d9865ed86620,"b""CFO Outlook Commentary\n\nWe expect third qu...","{'filetype': 'application/pdf', 'languages': [...","[0.0649994, 0.023078175, -0.0025019816, 6.0829..."
4,f83ab884e1b7f6cd22bb6fec166375de,b'About Meta\n\nMeta builds technologies that ...,"{'filetype': 'application/pdf', 'languages': [...","[0.04435684, -0.008697157, -0.0011843009, 0.05..."
5,6523c2a592eaef3c9c62b13ff33414e6,b'Ryan Moore\n\npress@meta.com / about.fb.com/...,"{'filetype': 'application/pdf', 'languages': [...","[0.068496555, 0.012527236, 0.0041264608, 0.055..."
6,b288eb094301be949cb9c4d070135af5,b'For a discussion of limitations in the measu...,"{'filetype': 'application/pdf', 'languages': [...","[0.06137703, 0.026085237, 0.07956282, 0.036115..."
7,69e526bdf29fb6ab25756fefa40b5439,"b""Non-GAAP Financial Measures\n\nTo supplement...","{'filetype': 'application/pdf', 'languages': [...","[-0.00022395901, 0.037475675, 0.05577307, 0.02..."
8,75d393ef2f359efa9a296e832490f823,b'For more information on our non-GAAP nancial...,"{'filetype': 'application/pdf', 'languages': [...","[0.01794313, 0.016595712, 0.021199394, 2.51983..."
9,adf7621ac4c8f6206d85b1bba43e591f,"b""The table presented shows the condensed cons...","{'last_modified': '2024-07-31T21:06:06', 'file...","[0.007979893, -0.012464022, 0.051368486, 0.019..."


# 6. Use LangChain and KDB.AI to Perform RAG!

In [25]:
# Define OpenAI embedding model for LangChain to embed the query
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# use KDBAI as vector store
vecdb_kdbai_contextualized = KDBAI(table_contextualized, embeddings)
vecdb_kdbai_non_contextualized = KDBAI(table_non_contextualized, embeddings)

In [26]:
# Define a Question/Answer LangChain chain
qabot_contextualized = RetrievalQA.from_chain_type(
    chain_type="stuff",
    llm=ChatOpenAI(model="gpt-4o"),
    retriever=vecdb_kdbai_contextualized.as_retriever(search_kwargs=dict(k=5, index='flat_index')),
    return_source_documents=True,
)

qabot_non_contextualized = RetrievalQA.from_chain_type(
    chain_type="stuff",
    llm=ChatOpenAI(model="gpt-4o"),
    retriever=vecdb_kdbai_non_contextualized.as_retriever(search_kwargs=dict(k=5, index='flat_index')),
    return_source_documents=True,
)

In [27]:
# Helper function to perform RAG
def RAG(query):
  print(query)
  print("-----")
  print("Contextualized")
  print("-----")
  print(qabot_contextualized.invoke(dict(query=query))["result"])
  print("-----")
  print("Non Contextualized")
  print("-----")
  print(qabot_non_contextualized.invoke(dict(query=query))["result"])


In [28]:
# Query the RAG chain!
RAG("What is the research and development costs for six months ended in June 2024")

What is the research and development costs for six months ended in June 2024
-----
Contextualized
-----
The research and development costs for the six months ended in June 2024 were $20,515 million.
-----
Non Contextualized
-----
The research and development costs for the six months ended June 30, 2024, were $20.515 billion.


In [29]:
# Query the RAG chain!
RAG("What is the research and development costs for six months ended in June 2023")

What is the research and development costs for six months ended in June 2023
-----
Contextualized
-----
The research and development costs for the six months ended June 30, 2023, were $18,725 million.
-----
Non Contextualized
-----
The research and development costs for the six months ended in June 2023 were $20,515 million.


In [30]:
# Query the RAG chain!
RAG("what is the 2024 GAAP advertising Revenue in the three months ended June 30th? What about net cash by operating activies")

what is the 2024 GAAP advertising Revenue in the three months ended June 30th? What about net cash by operating activies
-----
Contextualized
-----
For the three months ended June 30, 2024, the GAAP advertising revenue for Meta was $38.329 billion. The net cash provided by operating activities was $19.370 billion.
-----
Non Contextualized
-----
The 2024 GAAP advertising revenue for the three months ended June 30th is $38,329 million. The net cash provided by operating activities for the same period is $19,370 million.


In [31]:
# Query the RAG chain!
RAG("What segment made the most money in the six months ended June 30th?")

What segment made the most money in the six months ended June 30th?
-----
Contextualized
-----
In the six months ended June 30, the "Family of Apps" segment made the most money, generating $74.734 billion in revenue.
-----
Non Contextualized
-----
The segment that made the most money in the six months ended June 30th was the Family of Apps (FoA) segment, with a revenue of $31,307 million.


In [32]:
# Query the RAG chain!
RAG("what is the three month costs and expensis for 2023?")

what is the three month costs and expensis for 2023?
-----
Contextualized
-----
The total costs and expenses for Meta Platforms, Inc. for the three months ended June 30, 2023, were $22,607 million.
-----
Non Contextualized
-----
The three-month costs and expenses for 2023 are $22,607 million.


In [33]:
# Query the RAG chain!
RAG("At the end of 2023, what was the value of Meta's Goodwill assets?")

At the end of 2023, what was the value of Meta's Goodwill assets?
-----
Contextualized
-----
At the end of 2023, the value of Meta's Goodwill assets was $20,654 million.
-----
Non Contextualized
-----
The value of Meta's Goodwill assets at the end of 2023 was $20,654 million.


In [34]:
# Query the RAG chain!
RAG("Given a sentiment score between 1 and 10 for the outlook? Explain your reasoning")

Given a sentiment score between 1 and 10 for the outlook? Explain your reasoning
-----
Contextualized
-----
Based on the provided financial data for Meta Platforms, Inc. in the second quarter of 2024, a sentiment score of **8** out of 10 can be reasonably assigned for the outlook. Here's the reasoning behind this score:

### Positive Indicators

1. **Earnings Per Share (EPS) Growth**:
   - Basic EPS increased to $5.31, and diluted EPS increased to $5.16, reflecting strong profitability.
   - The substantial increase in diluted EPS (73% year-over-year) indicates robust earnings growth even after accounting for potential share dilution.

2. **Revenue Growth**:
   - Revenue for Q2 2024 rose by 22% compared to Q2 2023, indicating strong top-line growth.

3. **Income from Operations**:
   - Income from operations increased by 58%, suggesting significant improvement in operational efficiency.

4. **Operating Margin**:
   - The operating margin improved from 29% to 38%, indicating better cost

### Conclusion: We see that there are several situations where the non-contextualized response is incorrect and the contextualized response is correct. We also see there are some situations where they are both correct. In general, the more complex your tables and the more tables you have, the more advantageous this method becomes.

### Delete the KDB.AI Tables
Once finished with the table, it is best practice to drop it.

In [35]:
table_contextualized.drop()
table_non_contextualized.drop()

#### Take Our Survey
We hope you found this sample helpful! Your feedback is important to us, and we would appreciate it if you could take a moment to fill out our brief survey. Your input helps us improve our content.

Take the [Survey](https://delighted.com/t/U2RoT32R)