# Getting started with RAG
Following this guide: https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/pe-rag.ipynb


In [1]:
import fireworks.client
import dotenv
import chromadb
import json
from tqdm.auto import tqdm
import pandas as pd
import random

config = dotenv.dotenv_values(".env")

fireworks.client.api_key = config['FIREWORKS_API_KEY']

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
def get_completion(prompt, model=None, max_tokens=50):

    fw_model_dir = "accounts/fireworks/models/"

    if model is None:
        model = fw_model_dir + "llama-v2-7b"
    else:
        model = fw_model_dir + model

    completion = fireworks.client.Completion.create(
        model=model,
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=0
    )

    return completion.choices[0].text

In [7]:
get_completion("Hello, my name is")

' Katie and I am a 20 year old student at the University of Leeds. I am currently studying a BA in English Literature and Creative Writing. I have been working as a tutor for over 3 years now and I'

In [8]:
mistral_llm = "mistral-7b-instruct-4k"

get_completion("Hello, my name is", model=mistral_llm)


' [Your Name]. I am a [Your Profession/Occupation]. I am writing to [Purpose of Writing].\n\nI am writing to [Purpose of Writing] because [Reason for Writing]. I believe that ['

In [9]:
mistral_llm = "mistral-7b-instruct-4k"

get_completion("Tell me 2 jokes", model=mistral_llm)

".\n1. Why don't scientists trust atoms? Because they make up everything!\n2. Did you hear about the mathematician who’s afraid of negative numbers? He will stop at nothing to avoid them."

In [11]:
prompt = """[INST]
Given the following wedding guest data, write a very short 3-sentences thank you letter:

{
  "name": "John Doe",
  "relationship": "Bride's cousin",
  "hometown": "New York, NY",
  "fun_fact": "Climbed Mount Everest in 2020",
  "attending_with": "Sophia Smith",
  "bride_groom_name": "Tom and Mary"
}

Use only the data provided in the JSON object above.

The senders of the letter is the bride and groom, Tom and Mary.
[/INST]"""

response = get_completion(prompt, model=mistral_llm, max_tokens=150)
print(response)

 Dear John Doe,

We, Tom and Mary, would like to extend our heartfelt gratitude for your attendance at our wedding. It was a pleasure to have you there, and we truly appreciate the effort you made to be a part of our special day.

We were thrilled to learn about your fun fact - climbing Mount Everest is an incredible accomplishment! We hope you had a safe and memorable journey.

Thank you again for joining us on this special occasion. We hope to stay in touch and catch up on all the amazing things you've been up to.

With love,

Tom and Mary


# RAG Use Case: Generating Short Paper Titles

In [3]:
# From https://github.com/dair-ai/ML-Papers-of-the-Week/tree/main/research
ml_papers = pd.read_csv("./data/ml-potw-10232023.csv", header=0)

# remove rows with empty titles or descriptions
ml_papers = ml_papers.dropna(subset=["Title", "Description"])

In [14]:
ml_papers.head()

Unnamed: 0,Title,Description,PaperURL,TweetURL,Abstract
0,Llemma,an LLM for mathematics which is based on conti...,https://arxiv.org/abs/2310.10631,https://x.com/zhangir_azerbay/status/171409802...,"We present Llemma, a large language model for ..."
1,LLMs for Software Engineering,a comprehensive survey of LLMs for software en...,https://arxiv.org/abs/2310.03533,https://x.com/omarsar0/status/1713940983199506...,This paper provides a survey of the emerging a...
2,Self-RAG,presents a new retrieval-augmented framework t...,https://arxiv.org/abs/2310.11511,https://x.com/AkariAsai/status/171511027707796...,"Despite their remarkable capabilities, large l..."
3,Retrieval-Augmentation for Long-form Question ...,explores retrieval-augmented language models o...,https://arxiv.org/abs/2310.12150,https://x.com/omarsar0/status/1714986431859282...,We present a study of retrieval-augmented lang...
4,GenBench,presents a framework for characterizing and un...,https://www.nature.com/articles/s42256-023-007...,https://x.com/AIatMeta/status/1715041427283902...,


In [4]:
# convert dataframe to list of dicts with Title and Description columns only

ml_papers_dict = ml_papers.to_dict(orient="records")

In [18]:
ml_papers_dict[:2]

[{'Title': 'Llemma',
  'Description': 'an LLM for mathematics which is based on continued pretraining from Code Llama on the Proof-Pile-2 dataset; the dataset involves scientific paper, web data containing mathematics, and mathematical code; Llemma outperforms open base models and the unreleased Minerva on the MATH benchmark; the model is released, including dataset and code to replicate experiments.',
  'PaperURL': 'https://arxiv.org/abs/2310.10631',
  'TweetURL': 'https://x.com/zhangir_azerbay/status/1714098025956864031?s=20',
  'Abstract': 'We present Llemma, a large language model for mathematics. We continue pretraining Code Llama on the Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding Llemma. On the MATH benchmark Llemma outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, Llemma is capable of tool use and formal theorem proving without any further 

In [5]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, input: Documents) -> Embeddings:
        batch_embeddings = embedding_model.encode(input)
        return batch_embeddings.tolist()

embed_fn = MyEmbeddingFunction()

# Initialize the chromadb directory, and client.
client = chromadb.PersistentClient(path="./chromadb")

# create collection
collection = client.get_or_create_collection(
    name=f"ml-papers-nov-2023"
)

In [6]:
# Generate embeddings, and index titles in batches
batch_size = 50

# loop through batches and generate + store embeddings
for i in tqdm(range(0, len(ml_papers_dict), batch_size)):

    i_end = min(i + batch_size, len(ml_papers_dict))
    batch = ml_papers_dict[i : i + batch_size]

    # Replace title with "No Title" if empty string
    batch_titles = [str(paper["Title"]) if str(paper["Title"]) != "" else "No Title" for paper in batch]
    batch_ids = [str(sum(ord(c) + random.randint(1, 10000) for c in paper["Title"])) for paper in batch]
    batch_metadata = [
        dict(
            url=paper["PaperURL"],
            abstract=paper['Abstract']
        )
        for paper in batch
    ]

    # generate embeddings
    # batch_embeddings = embedding_model.encode(batch_titles)
    batch_embeddings = embedding_model.encode([paper['Abstract'] for paper in batch])

    # upsert to chromadb
    collection.upsert(
        ids=batch_ids,
        metadatas=batch_metadata,
        documents=batch_titles,
        embeddings=batch_embeddings.tolist(),
    )

  0%|          | 0/9 [00:00<?, ?it/s]

100%|██████████| 9/9 [00:25<00:00,  2.81s/it]


In [8]:
collection = client.get_or_create_collection(
    name=f"ml-papers-nov-2023",
    embedding_function=embed_fn
)

retriever_results = collection.query(
    query_texts=["Software Engineering"],
    n_results=5,
)

print(retriever_results["documents"])

[['LLMs for Software Engineering', 'Communicative Agents for Software Development', 'LLMs for Software Engineering', 'LLMs for Software Engineering', 'Generative AI for Programming Education']]


In [25]:
# user query
user_query = "S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models"

# query for user query
results = collection.query(
    query_texts=[user_query],
    n_results=10,
)

results['metadatas']

[[{'abstract': 'How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \\textit{Pythia}, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend \\textit{Pythia} to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at \\url{this https URL}.',
   'url': 'https://arxiv.org/abs/2304.01373'},
  {'abst

In [10]:
# concatenate titles into a single string
short_titles = '\n'.join(results['documents'][0])

prompt_template = f'''[INST]

Your main task is to generate 5 SUGGESTED_TITLES based for the PAPER_TITLE

You should mimic a similar style and length as SHORT_TITLES but PLEASE DO NOT include titles from SHORT_TITLES in the SUGGESTED_TITLES, only generate versions of the PAPER_TITLE.

PAPER_TITLE: {user_query}

SHORT_TITLES: {short_titles}

SUGGESTED_TITLES:

[/INST]
'''

responses = get_completion(prompt_template, model="mistral-7b-instruct-4k", max_tokens=2000)
suggested_titles = ''.join([str(r) for r in responses])

# Print the suggestions.
print("Model Suggestions:")
print(suggested_titles)
print("\n\n\nPrompt Template:")
print(prompt_template)

NameError: name 'results' is not defined

In [36]:
# user query
user_query = "Self-driving car autonomous vehicle"

# query for user query
results = collection.query(
    query_texts=[user_query],
    n_results=10,
)

results['documents']

[['A Survey on LLM-based Autonomous Agents',
  'AutoRobotics-Zero',
  'Reflexion: an autonomous agent with dynamic memory and self-reflection',
  'A Cookbook of Self-Supervised Learning',
  'Robot Parkour Learning',
  'Robot Parkour Learning',
  'Emergence of Maps in the Memories of Blind Navigation Agents',
  'Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control',
  'Self-Check',
  'AutoMix']]

In [29]:
responses

'\n1. S3Eval: A Comprehensive Evaluation Suite for Large Language Models\n2. Synthetic and Scalable Evaluation for Large Language Models\n3. Systematic Evaluation of Large Language Models with S3Eval\n4. S3Eval: A Synthetic and Scalable Approach to Language Model Evaluation\n5. S3Eval: A Synthetic and Scalable Evaluation Suite for Large Language Models'

In [30]:
user_query = {"USER_QUERY": "What is a language model that can find relevant documents given a text prompt?"}

prompt_template = f"""[INST] Please generate a single keyword-based search string in JSON format (with the key "SEARCH_STRING") based on the following user query:
{json.dumps(user_query)} [INST]"""

response = get_completion(prompt_template, model="mistral-7b-instruct-4k", max_tokens=2000)

print("\nResponse:")
print(response)


Response:
 {
  "SEARCH_STRING": "language model document retrieval"
}


In [32]:
response_json = response[response.index('{'):]
response_json = json.loads(response_json)
response_json

{'SEARCH_STRING': 'language model document retrieval'}

In [33]:
results = collection.query(
    # query_texts=["RAG LLM text retrieval search"],
    query_texts=[response_json['SEARCH_STRING']],
    n_results=5,
)

documents = []
for i in range(5):
    document = {
        "title": results['documents'][0][i],
        "abstract": results['metadatas'][0][i]['abstract']
    }
    documents.append(document)

prompt_json = {
    "USER_QUERY": user_query,
    "DOCUMENTS": documents
}

prompt_template = \
f'''[INST] Using your knowledge and the following 5 documents, please answer the USER_QUERY to the best of your ability if possible. The information is provided below in JSON format.

{json.dumps(prompt_json)}
[/INST]
'''

print("Search results:")
print(results['documents'])

print("\nPrompt:")
print(prompt_template)

Search results:
[['Rethinking with Retrieval: Faithful Large Language Model Inference', 'REPLUG: Retrieval-Augmented Black-Box Language Models', 'Long-range Language Modeling with Self-Retrieval', 'REPLUG: Retrieval-Augmented Black-Box Language Models', 'REPLUG: Retrieval-Augmented Black-Box Language Models']]

Prompt:
[INST] Using your knowledge and the following 5 documents, please answer the USER_QUERY to the best of your ability if possible. The information is provided below in JSON format.

{"USER_QUERY": {"USER_QUERY": "What is a language model that can find relevant documents given a text prompt?"}, "DOCUMENTS": [{"title": "Rethinking with Retrieval: Faithful Large Language Model Inference", "abstract": "Despite the success of large language models (LLMs) in various natural language processing (NLP) tasks, the stored knowledge in these models may inevitably be incomplete, out-of-date, or incorrect. This motivates the need to utilize external knowledge to assist LLMs. Unfortunate

In [34]:
response = get_completion(prompt_template, model="mistral-7b-instruct-4k", max_tokens=2000)

print("\nResponse:")
print(response)


Response:

A language model that can find relevant documents given a text prompt is a retrieval-augmented language model. Retrieval-augmented language models use external knowledge to assist language models in making predictions. The external knowledge is retrieved based on the decomposed reasoning steps obtained from the chain-of-thought (CoT) prompting. One example of a retrieval-augmented language model is REPLUG, which treats the language model as a black box and augments it with a tuneable retrieval model. REPLUG simply prepends retrieved documents to the input for the frozen black-box LM. The LM can be used to supervise the retrieval model, which can then find documents that help the LM make better predictions.
