# Getting Started with RAG

While large language models (LLMs) show powerful capabilities that power advanced use cases, they suffer from issues such as factual inconsistency and hallucination. Retrieval-augmented generation (RAG) is a powerful approach to enrich LLM capabilities and improve their reliability. RAG involves combining LLMs with external knowledge by enriching the prompt context with relevant information that helps accomplish a task.

This tutorial shows how to getting started with RAG by leveraging vector store and open-source LLMs. To showcase the power of RAG, this use case will cover building a RAG system that suggests short and easy to read ML paper titles from original ML paper titles. Paper tiles can be too technical for a general audience so using RAG to generate short titles based on previously created short titles can make research paper titles more accessible and used for science communication such as in the form of newsletters or blogs.


Before getting started, let's first install the libraries we will use:


In [2]:
!conda install chromadb tqdm python-dotenv pandas -y
!conda install sentence-transformers ollama -y

Channels:
 - defaults
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

Channels:
 - defaults
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/monkey/.conda/envs/pe_rag_test

  added / updated specs:
    - ollama
    - sentence-transformers


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cuda-nvrtc-12.4.127        |       hd3aeb46_1        18.0 MB  conda-forge
    libcublas-12.4.5.8         |       hd3aeb46_1       229.6 MB  conda-forge
    ollama-0.1.17              |cuda120_h85e0b77_0         9.3 MB  conda-forge
    ------------------------------------------------------------
                                           Total:       256.9 MB

The following NEW packages wi

Before continuing, you need to obtain a Fireworks API Key to use the Mistral 7B model.

Checkout this quick guide to obtain your Fireworks API Key: https://readme.fireworks.ai/docs


In [3]:
import os
import dotenv
import chromadb
import json
from tqdm.auto import tqdm
import pandas as pd
import random
import ollama
from IPython.display import display_markdown

# you can set envs using Colab secrets
dotenv.load_dotenv()

MODEL_NAME = 'phi3:3.8b-mini-instruct-4k-q4_K_M'

  from .autonotebook import tqdm as notebook_tqdm


## Getting Started


Let's define a function to get completions from the Fireworks inference platform.


In [4]:
def get_completion(prompt, model=MODEL_NAME, max_tokens=50):
    response = ollama.chat(
        options={
            'temperature': 0,
            'num_predict': max_tokens
        },
        model=model,
        messages=[
            {
                'role': 'user',
                'content': prompt,
            },
        ]
    )

    return response['message']['content']

Let's first try the function with a simple prompt:


In [5]:
get_completion("Hello, my name is")

'Hello! It\'s a pleasure to meet you. May I have your full name so that I can address you properly in our conversation?\n\n(Note: Since the user has only provided "my name is" without their actual name, this'

Now let's test with Mistral-7B-Instruct:


In [6]:
mistral_llm = "mistral:7b-instruct-q4_K_S"

get_completion("Hello, my name is", model=mistral_llm)

"Hello! It's nice to meet you. What would you like to talk about today?"

The Mistral 7B Instruct model needs to be instructed using special instruction tokens `[INST] <instruction> [/INST]` to get the right behavior. You can find more instructions on how to prompt Mistral 7B Instruct here: https://docs.mistral.ai/llm/mistral-instruct-v0.1


In [20]:
get_completion("Tell me 2 jokes", model=mistral_llm)

".\n1. Why don't scientists trust atoms? Because they make up everything!\n2. Did you hear about the mathematician who’s afraid of negative numbers? He will stop at nothing to avoid them."

In [16]:
get_completion("[INST]Tell me 2 jokes[/INST]", model=mistral_llm)

"\n1. Why don't scientists trust atoms? Because they make up everything!\n2. Did you hear about the mathematician who's afraid of negative numbers? He will stop at nothing to avoid them."

Now let's try with a more complex prompt that involves instructions:


In [21]:
prompt = """[INST]
Given the following wedding guest data, write a very short 3-sentences thank you letter:

{
  "name": "John Doe",
  "relationship": "Bride's cousin",
  "hometown": "New York, NY",
  "fun_fact": "Climbed Mount Everest in 2020",
  "attending_with": "Sophia Smith",
  "bride_groom_name": "Tom and Mary"
}

Use only the data provided in the JSON object above.

The senders of the letter is the bride and groom, Tom and Mary.
[/INST]"""

completion = get_completion(prompt, model=mistral_llm, max_tokens=150)
display_markdown(completion, raw=True)

Dear John Doe,

We are writing this letter to express our heartfelt gratitude for your presence at our wedding. It was a pleasure to have you as one of our guests, and we truly appreciate the effort you made to attend. Your fun fact about climbing Mount Everest was fascinating and added a special touch to the evening. We hope you had a wonderful time and that you continue to pursue your adventurous spirit. Thank you again for being a part of our special day.

Sincerely,
Tom and Mary

## RAG Use Case: Generating Short Paper Titles

For the RAG use case, we will be using [a dataset](https://github.com/dair-ai/ML-Papers-of-the-Week/tree/main/research) that contains a list of weekly top trending ML papers.

The user will provide an original paper title. We will then take that input and then use the dataset to generate a context of short and catchy papers titles that will help generate catchy title for the original input title.


### Step 1: Load the Dataset

Let's first load the dataset we will use:


In [7]:
# load dataset from data/ folder to pandas dataframe
# dataset contains column names

ml_papers = pd.read_csv("./assets/ml-potw-10232023.csv", header=0)

# remove rows with empty titles or descriptions
ml_papers = ml_papers.dropna(subset=["Title", "Description"])

In [8]:
ml_papers.head()

Unnamed: 0,Title,Description,PaperURL,TweetURL,Abstract
0,Llemma,an LLM for mathematics which is based on conti...,https://arxiv.org/abs/2310.10631,https://x.com/zhangir_azerbay/status/171409802...,"We present Llemma, a large language model for ..."
1,LLMs for Software Engineering,a comprehensive survey of LLMs for software en...,https://arxiv.org/abs/2310.03533,https://x.com/omarsar0/status/1713940983199506...,This paper provides a survey of the emerging a...
2,Self-RAG,presents a new retrieval-augmented framework t...,https://arxiv.org/abs/2310.11511,https://x.com/AkariAsai/status/171511027707796...,"Despite their remarkable capabilities, large l..."
3,Retrieval-Augmentation for Long-form Question ...,explores retrieval-augmented language models o...,https://arxiv.org/abs/2310.12150,https://x.com/omarsar0/status/1714986431859282...,We present a study of retrieval-augmented lang...
4,GenBench,presents a framework for characterizing and un...,https://www.nature.com/articles/s42256-023-007...,https://x.com/AIatMeta/status/1715041427283902...,


In [9]:
# convert dataframe to list of dicts with Title and Description columns only

ml_papers_dict = ml_papers.to_dict(orient="records")

In [10]:
ml_papers_dict[0]

{'Title': 'Llemma',
 'Description': 'an LLM for mathematics which is based on continued pretraining from Code Llama on the Proof-Pile-2 dataset; the dataset involves scientific paper, web data containing mathematics, and mathematical code; Llemma outperforms open base models and the unreleased Minerva on the MATH benchmark; the model is released, including dataset and code to replicate experiments.',
 'PaperURL': 'https://arxiv.org/abs/2310.10631',
 'TweetURL': 'https://x.com/zhangir_azerbay/status/1714098025956864031?s=20',
 'Abstract': 'We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further finet

We will be using SentenceTransformer for generating embeddings that we will store to a chroma document store.


In [13]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, input: Documents) -> Embeddings:
        batch_embeddings = embedding_model.encode(input)
        return batch_embeddings.tolist()

embed_fn = MyEmbeddingFunction()

# Initialize the chromadb directory, and client.
client = chromadb.PersistentClient(path="./chromadb")

# create collection
collection = client.get_or_create_collection(
    name=f"ml-papers-nov-2023"
)

We will now generate embeddings for batches:


In [14]:
# Generate embeddings, and index titles in batches
batch_size = 50

# loop through batches and generated + store embeddings
for i in tqdm(range(0, len(ml_papers_dict), batch_size)):

    i_end = min(i + batch_size, len(ml_papers_dict))
    batch = ml_papers_dict[i : i + batch_size]

    # Replace title with "No Title" if empty string
    batch_titles = [str(paper["Title"]) if str(paper["Title"]) != "" else "No Title" for paper in batch]
    batch_ids = [str(sum(ord(c) + random.randint(1, 10000) for c in paper["Title"])) for paper in batch]
    batch_metadata = [dict(url=paper["PaperURL"],
                           abstract=paper['Abstract'])
                           for paper in batch]

    # generate embeddings
    batch_embeddings = embedding_model.encode(batch_titles)

    # upsert to chromadb
    collection.upsert(
        ids=batch_ids,
        metadatas=batch_metadata,
        documents=batch_titles,
        embeddings=batch_embeddings.tolist(),
    )

  batch_titles = [str(paper["Title"]) if str(paper["Title"]) != "" else "No Title" for paper in batch]
  batch_titles = [str(paper["Title"]) if str(paper["Title"]) != "" else "No Title" for paper in batch]
 11%|█         | 1/9 [01:23<11:10, 83.83s/it]`SentenceTransformer._target_device` has been removed, please use `SentenceTransformer.device` instead.
100%|██████████| 9/9 [02:42<00:00, 18.09s/it]


Now we can test the retriever:


In [15]:
collection = client.get_or_create_collection(
    name=f"ml-papers-nov-2023",
    embedding_function=embed_fn
)

retriever_results = collection.query(
    query_texts=["Software Engineering"],
    n_results=2,
)

print(retriever_results["documents"])

[['LLMs for Software Engineering', 'Communicative Agents for Software Development']]


Now let's put together our final prompt:


In [16]:
# user query
user_query = "S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models"

# query for user query
results = collection.query(
    query_texts=[user_query],
    n_results=10,
)

# concatenate titles into a single string
short_titles = '\n'.join(results['documents'][0])

prompt_template = f'''[INST]

Your main task is to generate 5 SUGGESTED_TITLES based for the PAPER_TITLE

You should mimic a similar style and length as SHORT_TITLES but PLEASE DO NOT include titles from SHORT_TITLES in the SUGGESTED_TITLES, only generate versions of the PAPER_TILE.

PAPER_TITLE: {user_query}

SHORT_TITLES: {short_titles}

SUGGESTED_TITLES:

[/INST]
'''

responses = get_completion(prompt_template, model=mistral_llm, max_tokens=2000)
suggested_titles = ''.join([str(r) for r in responses])

# Print the suggestions.
print("Model Suggestions:")
print(suggested_titles)
print("\n\n\nPrompt Template:")
print(prompt_template)

  suggested_titles = ''.join([str(r) for r in responses])
  suggested_titles = ''.join([str(r) for r in responses])
  suggested_titles = ''.join([str(r) for r in responses])
  suggested_titles = ''.join([str(r) for r in responses])
  suggested_titles = ''.join([str(r) for r in responses])
  suggested_titles = ''.join([str(r) for r in responses])
  suggested_titles = ''.join([str(r) for r in responses])


Model Suggestions:
1. S3Eval: A Synthetic, Scalable, and Systematic Evaluation Suite for Large Language Models
2. S3Eval: A Comprehensive Evaluation Framework for Large Language Models
3. S3Eval: A Synthetic and Scalable Approach to Evaluating Large Language Models
4. S3Eval: A Systematic Evaluation Suite for Large Language Models in Real-World Applications
5. S3Eval: A Synthetic and Scalable Evaluation Framework for Large Language Models in Complex Tasks



Prompt Template:
[INST]

Your main task is to generate 5 SUGGESTED_TITLES based for the PAPER_TITLE

You should mimic a similar style and length as SHORT_TITLES but PLEASE DO NOT include titles from SHORT_TITLES in the SUGGESTED_TITLES, only generate versions of the PAPER_TILE.

PAPER_TITLE: S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models

SHORT_TITLES: Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
ChemCrow: Augmenting large-language models with chemistry tools


As you can see, the short titles generated by the LLM are somewhat okay. This use case still needs a lot more work and could potentially benefit from finetuning as well. For the purpose of this tutorial, we have provided a simple application of RAG using open-source models from Firework's blazing-fast models.

Try out other open-source models here: https://app.fireworks.ai/models

Read more about the Fireworks APIs here: https://readme.fireworks.ai/reference/createchatcompletion
