<img src="images/ragna-logo.png" width="200px" align="right"/>

# Basics of RAG-powered chat app

<hr>

## What is Retrieval-augmented generation (RAG)?

LLMs are trained on vast, but static datasets. This means, while they can predict answers for several general questions like:

<img src="images/chatgpt-what-is-scipy.png" width=60% style="border:1px solid black;"/>

They can't answer or hallucinate answers for recent events or specific topics:

<img src="images/chatgpt-when-is-scipy.png" width=50% style="border:1px solid black;"/>

**Retrieval-augmented generation (RAG)** is a method to augment foundational LLMs with **contextual data (documents)**, to reduce hallucinations and get around the limited space available in an LLM prompts (around 3,000 words for ChatGPT 3.5).

As a basic example, we can provide some relevant information from the SciPy website:

<img src="images/chatgpt-with-context.png" width=50% style="border:1px solid black;"/>

<img src="images/RAG-new.png" width=70%/>

### Tokenization

Breaking down document text into component units like words, sub-words, characters, etc.

<img src="images/openai-tokenization.png" width=60% style="border:1px solid black;"/>


### Embedding

Storing the tokens logically, usually as vectors, where related tokens are closer to each other.

<img src="images/embedding-projector.png" width=80% style="border:1px solid black;"/>


## What is Ragna?

Open source library for RAG **Orchestration** with a Python API, REST API, and web UI.

It gives you a convenience tools to quickly build RAG workflows and applications, with any LLM or source storage you prefer.

<img src="images/ragna-architecture.png" width=80%/>

## Build a chat function with Ragna

### Step 0: Setup requirements

To use builtin LLMs like OpenAI, you will need API keys. For this tutorial, we have included a key for you.

In [1]:
from pathlib import Path

from dotenv import load_dotenv

dotenv_path = Path.home() / Path("shared/scipy/rags-to-riches/.env")
assert load_dotenv(dotenv_path)

#### Side note: Local setup instructions 💻

On local computers, follow these step to get started with Ragna:

1. Install Ragna: `pip install 'ragna[all]'`
2. Run `ragna init` to create the `ragna.toml` config file with a guided CLI. 
3. [Get an OpenAI API key](https://platform.openai.com/api-keys) and set the relevant environment variable `export OPENAI_API_KEY="XXX"`

### Step 1: Select relevant documents

Let's use 10k report filings for Ford, General Motors, and Tesla.

💡 **Tip:** There are more documents in the `/files` directory that you can explore.

In [2]:
documents = [
    "files/10k-report-ford.pdf",
    "files/10k-report-gm.pdf",
    "files/10k-report-tesla.pdf",
]

### Step 2: Select assistants and source storage

🔗 [Check the available assistants in the docs](https://ragna.chat/en/stable/generated/tutorials/gallery_python_api/#step-3-select-an-assistant)

We are selecting OpenAI's GPT-3.5 and GPT-4 LLMs, and Chroma and LanceDB source storages.

In [3]:
from ragna import Rag
from ragna.assistants import Gpt4, Gpt35Turbo16k
from ragna.source_storages import Chroma, LanceDB

In [4]:
rag = Rag()

### Step 3: Start chat

Ragna is async by design.

This setup involves building the vector embedding, and therefore will take longer for large document sets.

In [5]:
chat = rag.chat(
    documents=documents,
    source_storage=Chroma,
    assistant=Gpt4,
)

await chat.prepare()

/home/chance.sanger@dell.com/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:00<00:00, 92.6MiB/s]


Message(content=How can I help you with the documents?, role=MessageRole.SYSTEM, sources=[])

### Step 4: Ask questions

In [6]:
answer = await chat.answer("How did Ford, General Motors, and Tesla's perform financially 2023?")
print(f"\nLLM Response: \n\n{answer.content}\n")


LLM Response: 

The text provided does not include information on the financial performance of General Motors and Tesla in 2023. However, it does provide some details on Ford's financial performance in 2023. Specifically, it mentions that the Ford Next segment had an EBIT loss of $138 million in 2023, which was a $788 million improvement from the previous year. The Ford Credit segment had total net receivables of $133 billion, an increase of $11 billion from 2022. However, its EBT decreased by $1,326 million to $1,331 million in 2023.



Let's check the sources used:

In [7]:
answer.sources

[Source(id='f4cac961-c1f5-4b87-a43f-0830bc49f9a6', document=<ragna.core.LocalDocument object at 0x7ade7818fa10>, location='52, 53', content=' Discussion and Analysis of Financial Condition and Results of Operations (Continued)\nFord Next Segment\nThe Ford Next segment (formerly Mobility) primarily includes expenses and investments for emerging business initiatives aimed at creating\nvalue for Ford in vehicle-adjacent market segments.\xa0\nIn this segment, our 2023 EBIT loss was $138 million, a $788 million improvement from a year ago. Ford Next has evolved from primarily\ninvesting in the development of autonomous vehicle capabilities to focus exclusively on incubating and launching new businesses creating\nstrategic value for Ford.\n48\nItem 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations (Continued)\nFord Credit Segment\nThe tables below provide full year 2023 key metrics and the change in full year 2023 EBT compared with full year 2022 by cau

In [8]:
for idx, source in enumerate(answer.sources, 1):
    print(f"{idx}.: {source.document.name}, page(s) {source.location}\n")
    print(source.content)
    print("#" * 80 + "\n")

1.: 10k-report-ford.pdf, page(s) 52, 53

 Discussion and Analysis of Financial Condition and Results of Operations (Continued)
Ford Next Segment
The Ford Next segment (formerly Mobility) primarily includes expenses and investments for emerging business initiatives aimed at creating
value for Ford in vehicle-adjacent market segments. 
In this segment, our 2023 EBIT loss was $138 million, a $788 million improvement from a year ago. Ford Next has evolved from primarily
investing in the development of autonomous vehicle capabilities to focus exclusively on incubating and launching new businesses creating
strategic value for Ford.
48
Item 7. Management’s Discussion and Analysis of Financial Condition and Results of Operations (Continued)
Ford Credit Segment
The tables below provide full year 2023 key metrics and the change in full year 2023 EBT compared with full year 2022 by causal factor for
the Ford Credit segment. For a description of these causal factors, see Definitions and Informatio

#### Streaming answers

Ragna allows you to stream the answers, one chunk at a time, just set `stream=True`:

In [21]:
answer = await chat.answer("What are the primary risk factors for Ford?", stream=True)
# answer = await chat.answer("What are the main drivers for oil and gas companies in 2024?", stream=True)

print(f"\nLLM Response: \n\n")
      
async for chunk in answer:
    print(chunk, end= "")


LLM Response: 


The primary risk factors for Ford are grouped into Operational Risks, Macroeconomic, Market, and Strategic Risks, Financial Risks, and Legal and Regulatory Risks. Some of these include:

Operational Risks:
- Ford is highly dependent on its suppliers to deliver components in accordance with Ford’s production schedule and specifications. A shortage of or inability to acquire key components or raw materials can disrupt Ford’s production of vehicles.
- Ford has entered into multi-year commitments to raw material and other suppliers that subject Ford to risks associated with lower future demand for such items as well as costs that fluctuate and are difficult to accurately forecast.

Macroeconomic, Market, and Strategic Risks:
- Ford’s results are dependent on sales of larger, more profitable vehicles, particularly in the United States.
- With a global footprint and supply chain, Ford’s results and operations could be adversely affected by economic or geopolitical developme

#### Reducing hallucinations (errors)

Ragna tries to ensure only the sources are used for answering questions.

🔗 [For reference, see Ragna source code on GitHub highlighting the prompt.](https://github.com/Quansight/ragna/blob/3cef0f7da1f2ed90e5d0618bcad82f824d00dc5a/ragna/assistants/_openai.py#L25-L26)

In [18]:
answer = await chat.answer("How much did Tesla earn in 2024?")
print(f"\nLLM Response: \n\n{answer.content}")


LLM Response: 

The document provided does not contain information about Tesla's earnings for the year 2024.


### Advanced configuration

`Rag().chat()` takes the following keyword arguments to help you optimize the quality of answers:

* `chunk_size` - Size of each chunk (sections of the document that contain context) to use. Default is 500.
* `chunk_overlap` - Size of the overlap with previous and next chunk for retrieving additional context for future prompts. Default is 250.
* `num_tokens` - Maximum number of context tokens, and in turn the number of document chunks, pulled out of the vector database. Default is 1024.

You can also set these configurations in the web app (which we will see later).

### Your turn ✨ 

1. Create a new chat with larger `chunk_size` and `num_tokens`
2. Ask about the overall carbon emissions for each company

In [13]:
# Picking chunk size: If you make chunk size very small (250) you may match with the beginning of a paragraph where the answer resides, 
# but where the answer actually resides is later in that paragraph.

chat = rag.chat(
    documents=documents,
    source_storage=Chroma,
    assistant=Gpt4,
    chunk_size=1000,
    num_tokens=2048,
)

await chat.prepare()

Message(content=How can I help you with the documents?, role=MessageRole.SYSTEM, sources=[])

In [19]:
answer = await chat.answer("What are the primary risk factors for Ford?")
print(f"\nLLM Response: \n\n{answer.content}")


LLM Response: 

The primary risk factors for Ford are grouped into Operational Risks, Macroeconomic, Market, and Strategic Risks, Financial Risks, and Legal and Regulatory Risks. Some of these include:

Operational Risks:
- Ford is highly dependent on its suppliers to deliver components in accordance with Ford’s production schedule and specifications. A shortage of or inability to acquire key components or raw materials can disrupt Ford’s production of vehicles.
- Ford has entered into multi-year commitments to raw material and other suppliers that subject Ford to risks associated with lower future demand for such items as well as costs that fluctuate and are difficult to accurately forecast.

Macroeconomic, Market, and Strategic Risks:
- Ford’s results are dependent on sales of larger, more profitable vehicles, particularly in the United States.
- With a global footprint and supply chain, Ford’s results and operations could be adversely affected by economic or geopolitical developmen

<hr>

_❗️ **Warning:** Make sure to stop the Jupyter Kernel (in the JupyterLab Menu Bar, click on "Kernel" -> "Shut down Kernel") before proceeding to prevent the "insufficient VRAM" error._

<br>

**✨ Next: [Use Local LLM with Ragna](03-RAG-local-llm.ipynb) →**

<br>

💬 _Wish to continue discussions after the tutorial? Contact the presenters: [@pavithraes](https://github.com/pavithraes), [@dharhas](https://github.com/dharhas), [@ahuang11](https://github.com/ahuang11)_

<hr>