# Getting Started with RAG using Fireworks Fast Inference LLMs

<a href="https://colab.research.google.com/github/fw-ai/cookbook/blob/main/recipes/rag/rag-paper-titles.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

While large language models (LLMs) show powerful capabilities that power advanced use cases, they suffer from issues such as factual inconsistency and hallucination. Retrieval-augmented generation (RAG) is a powerful approach to enrich LLM capabilities and improve their reliability. RAG involves combining LLMs with external knowledge by enriching the prompt context with relevant information that helps accomplish a task.

This tutorial shows how to getting started with RAG by leveraging vector store and open-source LLMs. To showcase the power of RAG, this use case will cover building a RAG system that suggests short and easy to read ML paper titles from original ML paper titles. Paper tiles can be too technical for a general audience so using RAG to generate short titles based on previously created short titles can make research paper titles more accessible and used for science communication such as in the form of newsletters or blogs.

Before getting started, let's first install the libraries we will use:

In [1]:
%%capture
!pip install chromadb tqdm fireworks-ai python-dotenv pandas
!pip install sentence-transformers
!pip install datasets

Let's download the dataset we will use:

In [2]:
from datasets import load_dataset
ds = load_dataset("Coder-Dragon/wikipedia-movies", split='train[:1000]')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Before continuing, you need to obtain a Fireworks API Key to use the Mistral 7B model.

Checkout this quick guide to obtain your Fireworks API Key: https://readme.fireworks.ai/docs

In [3]:
import fireworks.client
import os
import dotenv
import chromadb
import json
from tqdm.auto import tqdm
import pandas as pd
import random
from google.colab import userdata

# you can set envs using Colab secrets
fireworks.client.api_key = userdata.get('FIREWORKS_API_KEY')

## Getting Started

In [4]:
def get_completion(prompt, model=None, max_tokens=50):

    fw_model_dir = "accounts/fireworks/models/"

    if model is None:
        model = fw_model_dir + "llama-v2-7b"
    else:
        model = fw_model_dir + model

    completion = fireworks.client.Completion.create(
        model=model,
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=0
    )

    return completion.choices[0].text

In [5]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, input: Documents) -> Embeddings:
        batch_embeddings = embedding_model.encode(input)
        return batch_embeddings.tolist()

embed_fn = MyEmbeddingFunction()

# Initialize the chromadb directory, and client.
client = chromadb.PersistentClient(path="./chromadb")

# create collection
collection = client.get_or_create_collection(
    name=f"wikipedia-movies"
)
collection

Collection(name=wikipedia-movies)

We will now generate embeddings for batches:

In [6]:
from hashlib import md5

# Generate embeddings, and index titles in batches
batch_size = 50

# loop through batches and generated + store embeddings
for i in tqdm(range(0, len(ds), batch_size)):

    i_end = min(i + batch_size, len(ds))
    batch = ds[i : i_end]

    # Replace title with "No Title" if empty string
    # batch_titles = [title + ": " + plot for title,plot in zip(batch['Title'], batch['Plot'])]
    # batch_ids = [str(sum(ord(c) + random.randint(1, 10000) for c in title)) for title in batch["Title"]]
    # batch_metadata = [dict(plot=plot) for plot in batch["Plot"]]

    # Replace title with "No Title" if empty string
    batch_titles = [title for title in batch["Title"]]
    batch_ids = [md5((title + str(random.randint(1,10000))).encode()).digest().hex() for title in batch["Title"]]
    batch_metadata = [dict(title=title, plot=plot) for title,plot in zip(batch['Title'], batch['Plot'])]

    # generate embeddings
    batch_embeddings = embedding_model.encode(batch_titles)

    # upsert to chromadb
    collection.upsert(
        ids=batch_ids,
        metadatas=batch_metadata,
        documents=batch_titles,
        embeddings=batch_embeddings.tolist(),
    )

  0%|          | 0/20 [00:00<?, ?it/s]

Now we can test the retriever:

In [7]:
collection = client.get_or_create_collection(
    name=f"wikipedia-movies",
    embedding_function=embed_fn
)

retriever_results = collection.query(
    query_texts=["Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions","Western romance",],
    n_results=5,
)

print(retriever_results["documents"])

[['The Frozen North', 'From Leadville to Aspen: A Hold-Up in the Rockies', 'The Ghost of Slumber Mountain', 'The Viking', 'The Call of the Wild'], ['The Road to Romance', 'Song of the West', 'Romance', 'Romance', 'A Romance of Happy Valley']]


In [12]:
mistral_llm = "mistral-7b-instruct-4k"

def RAG(query):
  # query for user query
  results = collection.query(
      query_texts=[query],
      n_results=5,
  )

  # concatenate titles into a single string
  titles = '\n'.join(results['documents'][0])

  prompt_template = f'''[INST]

  Your goal is to helps users discover relevant movies and shows based on their QUERY from SUGGESTIONS

  You should give results based on QUERY and PLEASE ONLY include entries from SUGGESTIONS in the SUGGESTED_MOVIES.

  QUERY: {query}

  SUGGESTIONS: {results}

  SUGGESTED_MOVIES:

  [/INST]
  '''

  responses = get_completion(prompt_template, model=mistral_llm, max_tokens=2000)
  suggested_titles = ''.join([str(r) for r in responses])

  # Print the suggestions.
  print("Model Suggestions:")
  print(suggested_titles)
  print("\n\n\nPrompt Template:")
  print(prompt_template)

In [13]:
queries = [
    """Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions""",
    """Western romance""",
    """Silent film about a Parisian star moving to Egypt, leaving her husband for a baron, and later reconciling after finding her family in poverty in Cairo.""",
    """Comedy film, office disguises, boss's daughter, elopement.""",
    """Lost film, Cleopatra charms Caesar, plots world rule, treasures from mummy, revels with Antony, tragic end with serpent in Alexandria.""",
    "Denis Gage Deane-Tanner"
]

for query in queries:
  RAG(query)
  print("\n\n")

Model Suggestions:

  Based on your query, here are some suggested movies and shows that showcase indigenous peoples' survival and daily life in Arctic regions:

  
  - "The Frozen North" (1922)
  
  - "From Leadville to Aspen: A Hold-Up in the Rockies" (1923)
  
  - "The Ghost of Slumber Mountain" (1923)
  
  - "The Viking" (1924)
  
  - "The Call of the Wild" (1923)
  
  These movies and shows are all part of the SUGGESTIONS provided, and they showcase the survival and daily life of indigenous peoples in Arctic regions.



Prompt Template:
[INST]

  Your goal is to helps users discover relevant movies and shows based on their QUERY from SUGGESTIONS

  You should give results based on QUERY and PLEASE ONLY include entries from SUGGESTIONS in the SUGGESTED_MOVIES.

  QUERY: Documentaries showcasing indigenous peoples' survival and daily life in Arctic regions

  SUGGESTIONS: {'ids': [['e86721d9ee8312808cf54263c7ccb9d5', 'd8b2051278a0f9af70c2fab392f66ea4', 'cf939d82e6ed4225322484ac2325b