# Getting Started with RAG using Fireworks Fast Inference LLMs

<a href="https://colab.research.google.com/github/fw-ai/cookbook/blob/main/recipes/rag/rag-paper-titles.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

While large language models (LLMs) show powerful capabilities that power advanced use cases, they suffer from issues such as factual inconsistency and hallucination. Retrieval-augmented generation (RAG) is a powerful approach to enrich LLM capabilities and improve their reliability. RAG involves combining LLMs with external knowledge by enriching the prompt context with relevant information that helps accomplish a task.

This tutorial shows how to getting started with RAG by leveraging vector store and open-source LLMs. To showcase the power of RAG, this use case will cover building a RAG system that suggests short and easy to read ML paper titles from original ML paper titles. Paper tiles can be too technical for a general audience so using RAG to generate short titles based on previously created short titles can make research paper titles more accessible and used for science communication such as in the form of newsletters or blogs.

Before getting started, let's first install the libraries we will use:

In [1]:
%%capture
!pip install chromadb tqdm fireworks-ai python-dotenv pandas
!pip install sentence-transformers

In [2]:
!pip install colab-env -qU

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for colab-env (setup.py) ... [?25l[?25hdone


In [3]:
!pip install datasets

Collecting datasets
  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: xxhash, dill, multiprocess, datasets
Successfully installed dataset

Let's download the dataset we will use:

In [4]:
#!wget https://raw.githubusercontent.com/dair-ai/ML-Papers-of-the-Week/main/research/ml-potw-10232023.csv
#!mkdir data
#!mv ml-potw-10232023.csv data/

Before continuing, you need to obtain a Fireworks API Key to use the Mistral 7B model.

Checkout this quick guide to obtain your Fireworks API Key: https://readme.fireworks.ai/docs

In [5]:
import fireworks.client
import os
import dotenv
import chromadb
import json
from tqdm.auto import tqdm
import pandas as pd
import random
from google.colab import userdata
from colab_env import envvar_handler

Mounted at /content/gdrive


**Make sure you have a fireworks api key**

In [6]:
import fireworks.client

# Set your FireWorks API key
fireworks.client.api_key = "XXXXXXXXXXXXXXXXXXXXXXXXXX"


## Getting Started

Let's define a function to get completions from the Fireworks inference platform.

In [7]:
def get_completion(prompt, model=None, max_tokens=50):

    fw_model_dir = "accounts/fireworks/models/"

    if model is None:
        model = fw_model_dir + "llama-v2-7b"
    else:
        model = fw_model_dir + model

    completion = fireworks.client.Completion.create(
        model=model,
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=0
    )

    return completion.choices[0].text

Let's first try the function with a simple prompt:

In [8]:
get_completion("Hello, my name is")

' Katie and I am a 20 year old student at the University of Leeds. I am currently studying a BA in English Literature and Creative Writing. I have been working as a tutor for over 3 years now and I'

Now let's test with Mistral-7B-Instruct:

In [9]:
mistral_llm = "mistral-7b-instruct-4k"

get_completion("Hello, my name is", model=mistral_llm)

' [Your Name]. I am a [Your Profession/Occupation]. I am writing to [Purpose of Writing].\n\nI am writing to [Purpose of Writing] because [Reason for Writing]. I believe that ['

The Mistral 7B Instruct model needs to be instructed using special instruction tokens `[INST] <instruction> [/INST]` to get the right behavior. You can find more instructions on how to prompt Mistral 7B Instruct here: https://docs.mistral.ai/llm/mistral-instruct-v0.1

In [10]:
mistral_llm = "mistral-7b-instruct-4k"

get_completion("Tell me 2 jokes", model=mistral_llm)

".\n1. Why don't scientists trust atoms? Because they make up everything!\n2. Did you hear about the mathematician who’s afraid of negative numbers? He will stop at nothing to avoid them."

In [11]:
mistral_llm = "mistral-7b-instruct-4k"

get_completion("[INST]Tell me 2 jokes[/INST]", model=mistral_llm)

" Sure, here are two jokes for you:\n\n1. Why don't scientists trust atoms? Because they make up everything!\n2. Why did the tomato turn red? Because it saw the salad dressing!"

Now let's try with a more complex prompt that involves instructions:

In [12]:
prompt = """[INST]
Given the following wedding guest data, write a very short 3-sentences thank you letter:

{
  "name": "John Doe",
  "relationship": "Bride's cousin",
  "hometown": "New York, NY",
  "fun_fact": "Climbed Mount Everest in 2020",
  "attending_with": "Sophia Smith",
  "bride_groom_name": "Tom and Mary"
}

Use only the data provided in the JSON object above.

The senders of the letter is the bride and groom, Tom and Mary.
[/INST]"""

get_completion(prompt, model=mistral_llm, max_tokens=150)

" Dear John Doe,\n\nWe, Tom and Mary, would like to extend our heartfelt gratitude for your attendance at our wedding. It was a pleasure to have you there, and we truly appreciate the effort you made to be a part of our special day.\n\nWe were thrilled to learn about your fun fact - climbing Mount Everest is an incredible accomplishment! We hope you had a safe and memorable journey.\n\nThank you again for joining us on this special occasion. We hope to stay in touch and catch up on all the amazing things you've been up to.\n\nWith love,\n\nTom and Mary"

## RAG Use Case: Generating Short Paper Titles


The user will provide an original movie title. We will then take that input and then use the dataset to generate a context similar to their search




### Step 1: Load the Dataset

Let's first load the dataset we will use:

In [13]:
from datasets import load_dataset
import pandas as pd
from chromadb import Documents, EmbeddingFunction, Embeddings
from sentence_transformers import SentenceTransformer
from chromadb import Documents, EmbeddingFunction, Embeddings
from sentence_transformers import SentenceTransformer
import random
from tqdm.auto import tqdm
import uuid

# Load movie dataset
ds = load_dataset("Coder-Dragon/wikipedia-movies", split='train[:1000]')

# Convert movie dataset to pandas dataframe
movie_df = pd.DataFrame(ds)

# Extracting only the Title column and Plot
movie_df = movie_df[["Title", "Plot"]]
print(len(movie_df))

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/1.04k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/75.0M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

1000


In [14]:
movie_df.head()

Unnamed: 0,Title,Plot
0,Kansas Saloon Smashers,"A bartender is working at a saloon, serving dr..."
1,Love by the Light of the Moon,"The moon, painted with a smiling face hangs ov..."
2,The Martyred Presidents,"The film, just over a minute long, is composed..."
3,Alice in Wonderland,"Alice follows a large white rabbit down a ""Rab..."
4,The Great Train Robbery,The film opens with two bandits breaking into ...


In [15]:
movie_df.tail()

Unnamed: 0,Title,Plot
995,The Man in Possession,Raymond Dabney (Montgomery) returns to a mixed...
996,Man of the World,"In 1930's Paris, American Michael Trevor (Will..."
997,Mata Hari,"In 1917, France is embroiled in World War I. D..."
998,Men of the Sky,"In the years before World War I, a love affair..."
999,Millie,Millie (Helen Twelvetrees) is a naive young wo...


We will be using SentenceTransformer for generating embeddings that we will store to a chroma document store.

In [16]:
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

# Define embedding function
class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, input: Documents) -> Embeddings:
        batch_embeddings = embedding_model.encode(input)
        return batch_embeddings.tolist()

# Instantiate embedding function
embed_fn = MyEmbeddingFunction()

# Initialize the chromadb directory, and client
client = chromadb.PersistentClient(path="./chromadb")

# Create collection
collection = client.get_or_create_collection(
    name="movies-collection",
    embedding_function=embed_fn
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

We will now generate embeddings for batches:

In [17]:
batch_size = 50

for i in tqdm(range(0, len(movie_df), batch_size)):
    batch = movie_df.iloc[i:i+batch_size].copy()  # Make a copy to avoid SettingWithCopyWarning

    # Replace empty strings with placeholders
    batch["Title"].fillna("No Title", inplace=True)
    batch["Plot"].fillna("No Plot", inplace=True)

    # Generate embeddings for titles and plots
    batch_embeddings = embedding_model.encode(batch["Title"].tolist() + batch["Plot"].tolist())

    # Split embeddings into title and plot embeddings
    title_embeddings = batch_embeddings[:len(batch)]
    plot_embeddings = batch_embeddings[len(batch):]

    # Generate unique IDs for titles and plots
    title_ids = [str(uuid.uuid4()) for _ in range(len(batch["Title"]))]
    plot_ids = [str(uuid.uuid4()) for _ in range(len(batch["Plot"]))]

    print(f'Batch {i//batch_size + 1}:')
    print(f'Batch size: {batch_size}, Titles: {len(batch["Title"])}, Plots: {len(batch["Plot"])}, Title IDs: {len(title_ids)}, Plot IDs: {len(plot_ids)}, Embeddings length: {len(batch_embeddings)}')

    # Upsert titles and embeddings to ChromaDB
    collection.upsert(
        ids=title_ids,
        documents=batch["Title"].tolist(),
        embeddings=title_embeddings
    )

    # Upsert plots and embeddings to ChromaDB
    collection.upsert(
        ids=plot_ids,
        documents=batch["Plot"].tolist(),
        embeddings=plot_embeddings
    )

  0%|          | 0/20 [00:00<?, ?it/s]

Batch 1:
Batch size: 50, Titles: 50, Plots: 50, Title IDs: 50, Plot IDs: 50, Embeddings length: 100
Batch 2:
Batch size: 50, Titles: 50, Plots: 50, Title IDs: 50, Plot IDs: 50, Embeddings length: 100
Batch 3:
Batch size: 50, Titles: 50, Plots: 50, Title IDs: 50, Plot IDs: 50, Embeddings length: 100
Batch 4:
Batch size: 50, Titles: 50, Plots: 50, Title IDs: 50, Plot IDs: 50, Embeddings length: 100
Batch 5:
Batch size: 50, Titles: 50, Plots: 50, Title IDs: 50, Plot IDs: 50, Embeddings length: 100
Batch 6:
Batch size: 50, Titles: 50, Plots: 50, Title IDs: 50, Plot IDs: 50, Embeddings length: 100
Batch 7:
Batch size: 50, Titles: 50, Plots: 50, Title IDs: 50, Plot IDs: 50, Embeddings length: 100
Batch 8:
Batch size: 50, Titles: 50, Plots: 50, Title IDs: 50, Plot IDs: 50, Embeddings length: 100
Batch 9:
Batch size: 50, Titles: 50, Plots: 50, Title IDs: 50, Plot IDs: 50, Embeddings length: 100
Batch 10:
Batch size: 50, Titles: 50, Plots: 50, Title IDs: 50, Plot IDs: 50, Embeddings length: 100

Now we can test the retriever:

In [18]:
collection = client.get_or_create_collection(
    name="movies-collection",
    embedding_function=embed_fn
)

# Example query for movie titles
query_text = ["action movie"]

# Query the collection for similar movie titles
retriever_results = collection.query(
    query_texts=query_text,
    n_results=2,
)

# Print the retrieved movie titles
print(retriever_results["documents"])


[['The Red Dance', 'Adventure']]


Now let's put together our final prompt:

In [19]:
def search_and_generate_suggested_titles(user_query):
    # Query for user query
    results = collection.query(
        query_texts=[user_query],
        n_results=10,
    )

    # Extract retrieved movie titles
    retrieved_titles = results

    # Concatenate titles into a single string
    retrieved_titles_str = '\n'.join(retrieved_titles)

    # Prompt template for suggesting movie titles
    prompt_template = f'''[INST]

        Your main task is to generate 5 SUGGESTED_TITLES based on the MOVIE_TITLE and PLOT.

        You should mimic a similar style and length as the retrieved titles but PLEASE DO NOT include them in the SUGGESTED_TITLES, only generate versions of the MOVIE_TITLE.

        MOVIE_TITLE and PLOT: {user_query}

        SUGGESTED_TITLES:

        [/INST]
        '''

    # Get model suggestions based on the prompt
    responses = get_completion(prompt_template, model=mistral_llm, max_tokens=2000)
    suggested_titles = ''.join([str(r) for r in responses])

    # Print the suggestions
    print("Model Suggestions:")
    print(suggested_titles)
    print("\n\n\nPrompt Template:")
    print(prompt_template)

In [20]:
# Example usage
search_and_generate_suggested_titles("Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions")

Model Suggestions:
1. "Arctic Survival: A Journey Through Indigenous Communities"
        2. "Life in the Frost: The Arctic's Indigenous Peoples"
        3. "Beyond the Ice: The Lives of Arctic Indigenous Communities"
        4. "Arctic Resilience: The Survival of Indigenous Peoples"
        5. "Arctic Voices: The Stories of Indigenous Peoples in the North"



Prompt Template:
[INST]

        Your main task is to generate 5 SUGGESTED_TITLES based on the MOVIE_TITLE and PLOT.

        You should mimic a similar style and length as the retrieved titles but PLEASE DO NOT include them in the SUGGESTED_TITLES, only generate versions of the MOVIE_TITLE.

        MOVIE_TITLE and PLOT: Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions

        SUGGESTED_TITLES:

        [/INST]
        


In [21]:
search_and_generate_suggested_titles("Western romance")

Model Suggestions:
1. "The Last Cowboy"
2. "Rustling Romance"
3. "The Wild West Love Story"
4. "A Gunfighter's Heart"
5. "The Lone Ranger's Lady"



Prompt Template:
[INST]

        Your main task is to generate 5 SUGGESTED_TITLES based on the MOVIE_TITLE and PLOT.

        You should mimic a similar style and length as the retrieved titles but PLEASE DO NOT include them in the SUGGESTED_TITLES, only generate versions of the MOVIE_TITLE.

        MOVIE_TITLE and PLOT: Western romance

        SUGGESTED_TITLES:

        [/INST]
        


In [22]:
search_and_generate_suggested_titles("Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo.")

Model Suggestions:
1. "Egyptian Dreams"
2. "Parisian Passion"
3. "Cairo's Call"
4. "Love in the Desert"
5. "The Return to Paris"



Prompt Template:
[INST]

        Your main task is to generate 5 SUGGESTED_TITLES based on the MOVIE_TITLE and PLOT.

        You should mimic a similar style and length as the retrieved titles but PLEASE DO NOT include them in the SUGGESTED_TITLES, only generate versions of the MOVIE_TITLE.

        MOVIE_TITLE and PLOT: Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo.

        SUGGESTED_TITLES:

        [/INST]
        


In [23]:
search_and_generate_suggested_titles("Comedy film, office disguises, boss's daughter, elopement.")

Model Suggestions:
1. "Office Escapade: A Comedy of Disguises"
        2. "The Boss's Daughter's Elopement: A Comedy"
        3. "The Office Elopement: A Comedy of Errors"
        4. "The Comedy of Disguises: An Office Elopement"
        5. "The Elopement of the Boss's Daughter: A Comedy"



Prompt Template:
[INST]

        Your main task is to generate 5 SUGGESTED_TITLES based on the MOVIE_TITLE and PLOT.

        You should mimic a similar style and length as the retrieved titles but PLEASE DO NOT include them in the SUGGESTED_TITLES, only generate versions of the MOVIE_TITLE.

        MOVIE_TITLE and PLOT: Comedy film, office disguises, boss's daughter, elopement.

        SUGGESTED_TITLES:

        [/INST]
        


In [24]:
search_and_generate_suggested_titles("Lost film, Cleopatra charms Caesar, plots world rule, treasures from mummy, revels with Antony, tragic end with serpent in Alexandria.")

Model Suggestions:
1. "The Cleopatra Conspiracy"
        2. "Caesar's Charm"
        3. "The Rise of Cleopatra"
        4. "The Mummy's Treasure"
        5. "Antony's Reveal"



Prompt Template:
[INST]

        Your main task is to generate 5 SUGGESTED_TITLES based on the MOVIE_TITLE and PLOT.

        You should mimic a similar style and length as the retrieved titles but PLEASE DO NOT include them in the SUGGESTED_TITLES, only generate versions of the MOVIE_TITLE.

        MOVIE_TITLE and PLOT: Lost film, Cleopatra charms Caesar, plots world rule, treasures from mummy, revels with Antony, tragic end with serpent in Alexandria.

        SUGGESTED_TITLES:

        [/INST]
        


In [25]:
search_and_generate_suggested_titles("Denis Gage Deane-Tanner")

Model Suggestions:
1. "The Deane-Tanner Chronicles"
2. "Deane-Tanner: A Family Legacy"
3. "Deane-Tanner: The Next Generation"
4. "Deane-Tanner: A Legacy Continues"
5. "Deane-Tanner: The Legacy Lives On"



Prompt Template:
[INST]

        Your main task is to generate 5 SUGGESTED_TITLES based on the MOVIE_TITLE and PLOT.

        You should mimic a similar style and length as the retrieved titles but PLEASE DO NOT include them in the SUGGESTED_TITLES, only generate versions of the MOVIE_TITLE.

        MOVIE_TITLE and PLOT: Denis Gage Deane-Tanner

        SUGGESTED_TITLES:

        [/INST]
        


As you can see, the short titles generated by the LLM are somewhat okay. This use case still needs a lot more work and could potentially benefit from finetuning as well. For the purpose of this tutorial, we have provided a simple application of RAG using open-source models from Firework's blazing-fast models.

Try out other open-source models here: https://app.fireworks.ai/models

Read more about the Fireworks APIs here: https://readme.fireworks.ai/reference/createchatcompletion
