# Newtral Simple RAG Example


In this notebook we will expose the implementation of a conversational system capable of answering questions over a list of documents.

The process can be separated into the following systems:

1. _Information Retrieval_ (IR): for each question **q** the IR system is in charge of finding the set of documents **D** where the answer is found.
2. _Question Answering_ (QA): The QA system generates the answer to question **q** using the information present in the set of documents **D** .


### Setup


In [3]:
import os
from typing import Any, Dict, Optional

import gradio as gr
import numpy as np
import openai
import pandas as pd
from sentence_transformers import SentenceTransformer, util
from tqdm import tqdm

pd.set_option("display.max_colwidth", 100)

openai.api_key = os.getenv("OPENAI_API_KEY")

# Data


In [4]:
dataset_path = "data/claims.csv"
claims = pd.read_csv(dataset_path)

In [5]:
claims.head(1)

Unnamed: 0,claimReviewed,url,article
0,"InfoJobs está buscando, a través de llamadas telefónicas, 30 personas para trabajar de manera in...",https://www.newtral.es/estafa-infojobs-llamada-telefono-30-personas-empleo/20240401/,InfoJobs no está buscando a través de llamadas telefónicas 30 personas para “trabajar de manera ...


In [6]:
len(claims)

80

### Processing


In [7]:
raw_documents = []
for _, claim in tqdm(claims.iterrows(), total=len(claims)):
    text = claim["article"]
    metadata = {
        "claimReviewed": claim["claimReviewed"],
        "url": claim["url"],
    }
    raw_documents.append({"text": text, "metadata": metadata})

100%|██████████| 80/80 [00:00<00:00, 7712.07it/s]


The first **design decision** is how to create the documents that will serve as sources for the QA system.

In this case we have chosen to divide each article into segments of size less than or equal to _chunk_size_ characters (1000) with a sliding window of, at most, _chunk_overlap_ characters (200).

To make the segments semantically coherent and syntactically correct, the text is recursively divided using different separators.


In [8]:
from utils import split_text

In [9]:
chunk_size = 1000
chunk_overlap = 200

separators = ["\n\n", "\n", " ", ""]

In [10]:
documents = []
for document in tqdm(raw_documents):
    text = document["text"]
    metadata = document["metadata"]

    splits = split_text(text, chunk_size, chunk_overlap, separators)
    for chunk in splits:
        new_doc = {"text": chunk, "metadata": metadata}
        documents.append(new_doc)

  0%|          | 0/80 [00:00<?, ?it/s]

100%|██████████| 80/80 [00:00<00:00, 441.56it/s]


# IR


The IR system is in charge of retrieving the set of documents where the answer is found.

All documents and the query are represented using dense vectors, obtained through a _SentenceTransformer_ model. The relevance that a document $d$ has for the query $q$ is given by the scalar product between both, $ddot q$.


### Embeddings


In [11]:
model_name = "intfloat/multilingual-e5-small"

In [12]:
model = SentenceTransformer(model_name)

In [13]:
EMBEDDINGS_PATH = "data/vectorstore.npy"
if os.path.exists(EMBEDDINGS_PATH):
    vectorstore = np.load(EMBEDDINGS_PATH)
else:
    vectorstore = model.encode(
        [doc["text"] for doc in documents], show_progress_bar=True
    )
    np.save(EMBEDDINGS_PATH, vectorstore)

### Search


In [14]:
class IR:
    def __init__(
        self,
        documents: list[Dict[str, Any]],
        vectorstore: np.ndarray,
        model: SentenceTransformer,
    ):
        self.documents = documents
        self.vectorstore = vectorstore
        self.model = model

    def search(self, query: str, k: int = 4):
        """Retrieves the top k documents from the index, by relevance to the query"""
        query_embedding = self.model.encode(query)
        scores = util.dot_score(query_embedding, self.vectorstore)
        scores = scores.squeeze()

        # Bigger is better
        topk = (-scores).argsort()[:k]

        return [{**self.documents[i], "score": scores[i].item()} for i in topk]

In [15]:
index = IR(documents, vectorstore, model)

# QA


In [16]:
from abc import abstractmethod
from typing import Any, Generator


class LLM:
    @abstractmethod
    def completion_stream(self, *args: Any, **kwarg: Any) -> Generator:
        """Generate text stream"""
        raise NotImplementedError

    def completion(self, *args, **kwargs):
        """Generate text"""
        return "".join(self.completion_stream(*args, **kwargs))

In [17]:
import openai


class ChatGPT(LLM):
    def __init__(
        self,
        model: str,
        temperature: float = 0.0,
        max_tokens: Optional[int] = None,
        top_p: int = 1,
        frequency_penalty: float = 0,
        presence_penalty: float = 0,
        n: int = 1,
        logit_bias: Optional[dict] = None,
        seed: Optional[int] = None,
    ):
        self.client = openai.Client()

        self.params = {
            "model": model,
            "temperature": temperature,
            "max_tokens": max_tokens,
            "top_p": top_p,
            "frequency_penalty": frequency_penalty,
            "presence_penalty": presence_penalty,
            "n": n,
            "logit_bias": logit_bias if logit_bias else {},
        }

        if seed:
            self.params["seed"] = seed

    def completion_stream(self, messages):
        stream = self.client.chat.completions.create(
            messages=messages, stream=True, **self.params
        )
        for chunk in stream:
            content = chunk.choices[0].delta.content
            token = content if content is not None else ""
            yield token

In order to execute ollama code, you need to: 

1. Download the app and execute it in background: https://ollama.com/download/mac.

2. Run the code, it may take sometime depending on your PC capacity.


In [18]:
import ollama


class Ollama(LLM):
    def __init__(
        self,
        model: str,
    ) -> None:
        self.model = model

    def completion_stream(self, messages):
        stream = ollama.chat(
            model=self.model,
            messages=messages,
            stream=True,
        )
        for chunk in stream:
            content = chunk["message"]["content"]
            token = content if content is not None else ""
            yield token

In [19]:
llm_chatgpt = ChatGPT(model="gpt-3.5-turbo-0125")
llm_ollama = Ollama(model="llama2")

# Chat


### Prompts


In [20]:
document_separator = "\n\n"

In [21]:
question_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know.

{context}

Question: {question}
Helpful Answer:"""

history_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""

### Chatbot


In [22]:
class ChatQA:
    def __init__(self, ir, llm):
        self.ir = ir
        self.llm = llm

        self.history = []

    def reset(self):
        self.history = []

    def follow_up_query(self, question):
        prompt = history_template.format(
            chat_history="/n".join(self.history), question=question
        )
        query = self.llm.completion(prompt)
        return query

    def __call__(self, question: str, history=False):
        if history and len(self.history):
            query = self.follow_up_query(question)
        else:
            query = question

        documents = self.ir.search(query)

        contexts = [document["text"] for document in documents]
        context = document_separator.join(contexts)
        prompt = question_template.format(context=context, question=query)

        messages = [{"role": "system", "content": prompt}]

        answer = self.llm.completion(messages)

        if history:
            self.history.append("\n".join([prompt, answer]))

        urls = [document["metadata"]["url"] for document in documents]
        urls = list(dict.fromkeys(urls))
        citation = [f"{i+1}. {url}" for i, url in enumerate(urls)]

        return "\n".join([answer, *citation])

In [24]:
chat = ChatQA(index, llm_chatgpt)

### Ejemplos


In [25]:
question = "Taylor Swift se cayó de un columpio en las bahamas?"

In [26]:
print(chat(question))

No, Taylor Swift no se cayó de un columpio en Las Bahamas. El video que se ha viralizado no es actual y no muestra a Taylor Swift y Travis Kelce cayéndose juntos de un columpio, ya que en 2018 aún no se conocían.
1. https://www.newtral.es/taylor-swift-columpio/20240328/


# Demo


In [27]:
def reset_chat():
    chat.reset()
    return ""


with gr.Blocks() as demo:
    chatbot = gr.Chatbot()
    chat = ChatQA(index, llm_chatgpt)
    msg = gr.Textbox(
        placeholder="Enter text and press enter, or upload an image",
    )
    clear = gr.Button("Clear")

    def respond(question, chat_history):
        bot_message = chat(question)
        chat_history.append((question, bot_message))
        return "", chat_history

    msg.submit(respond, [msg, chatbot], [msg, chatbot])
    clear.click(reset_chat, None, chatbot, queue=False)

demo.launch(inline=False)

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.


