# Part 2: Simple RAG with Fireworks

### Overview of the Notebook

This notebook demonstrates the process of building and testing a Retrieval-Augmented Generation (RAG) system using **ChromaDB** as the vector store, **Sentence Transformers** for embedding, and **Fireworks** models for language generation. The notebook takes users through data preparation, embedding generation, and querying, and concludes by showcasing how retrieved information can be used to augment model responses.

### Key Components:

1. **Data Preparation and Chunking**:
   - The notebook starts by reshaping large text datasets (like League of Legends lore) into a long-format, splitting long pieces of text into manageable chunks for embedding. Each chunk is assigned a unique ID based on its context (name, field, category).

2. **ChromaDB for Vector Storage**:
   - **ChromaDB** is used as a vector database to store document embeddings and metadata. After preparing the data, it is chunked, encoded into embeddings using **Sentence Transformers**, and upserted (added) to the ChromaDB collection.
   - The vector store allows for fast similarity search: queries can retrieve relevant text chunks and their associated metadata from a large corpus, based on their vector representations.

3. **Fireworks for Model Completions**:
   - The **Fireworks** platform is used for generating language model responses in the RAG system. Once relevant chunks of text are retrieved from ChromaDB based on a query, they are passed as context to the Fireworks models.
   - The notebook iterates through different **Fireworks** models (such as LLaMA, Mistral, and others) to generate responses based on user queries and retrieved data.

4. **Retrieval-Augmented Generation (RAG)**:
   - The system retrieves relevant documents from ChromaDB using embeddings and metadata. This context is then used to augment the responses generated by the Fireworks models.
   - By combining retrieval (from ChromaDB) and generation (from Fireworks), the system can answer questions more effectively by referencing a specific, relevant knowledge base.

### How Fireworks Fits into the RAG System

In this Retrieval-Augmented Generation system:
- **ChromaDB** is responsible for storing and retrieving contextually relevant information.
- **Fireworks** models generate human-like text responses based on the retrieved information, providing answers that are enriched by the context found in the data.
- Fireworks enhances the RAG system by allowing users to choose from different models, compare outputs, and tailor the generated responses to the retrieved data, offering both flexibility and power in AI-driven responses.

This notebook illustrates how the combination of ChromaDB for retrieval and Fireworks models for generation can be used to build intelligent systems that provide more accurate and contextually relevant answers to user queries.


## Steps:

1. **Environment Setup**:
   - Load environment variables, including the Fireworks API key, to authenticate API access.
   - Set up dependencies and initialize key components such as the Fireworks client and the embedding model.

2. **Data Preparation and Reshaping**:
   - Load the dataset (e.g., League of Legends lore) from JSON files.
   - Reshape the data from a wide format to a long format, where each row corresponds to a specific character, field (like Background or Abilities), and chunk of text. Long fields are split into smaller, manageable pieces.

3. **Embedding Generation**:
   - Use **Sentence Transformers** to convert each chunk of text into a vector (embedding). These embeddings will allow us to perform similarity searches later.
   - Chunked data is processed in batches, and embeddings are generated for each batch.

4. **Upserting Data into ChromaDB**:
   - Each data chunk, along with its associated metadata (like category, URL), is upserted into **ChromaDB**, which stores these embeddings and metadata for fast similarity searches.
   - Unique IDs are generated for each chunk, ensuring that each entry in the vector database is distinct.

5. **Querying the Vector Store**:
   - A query (e.g., "Tell me about the history of Piltover") is transformed into an embedding and used to search for the most relevant text chunks stored in ChromaDB.
   - The top N results (text chunks and their metadata) are retrieved based on similarity to the query embedding.

6. **Generating Model Responses with Fireworks**:
   - The retrieved text chunks from ChromaDB serve as context for the language models. This context is used to augment the model’s responses.
   - Multiple **Fireworks** models (such as LLaMA and Mistral) are iterated over, each generating responses based on the retrieved context and the user’s query.

7. **Comparison of Model Outputs**:
   - The notebook allows you to compare outputs from different models, making it easy to assess which model performs best for a given task or query.
   - Outputs are displayed in a structured way for side-by-side comparison.

8. **Final Query Results**:
   - The generated responses and their associated metadata are printed, showing how the system retrieves and generates relevant information based on user queries.

# Step 1: Environment Setup

 - Load environment variables, including the Fireworks API key, to authenticate API access.
- Set up dependencies and initialize key components such as the Fireworks client and the embedding model.

Reminder of what these libraries do:

- Data Handling: pandas, json, os, shutil, urllib.request, dotenv
- AI/ML: fireworks.client, chromadb
- Progress Tracking: tqdm
- Utilities: random, dotenv.load_dotenv

Reminder:
- chromadb: ChromaDB is a vector database used for storing and retrieving vector embeddings. It’s often used in retrieval-augmented generation (RAG) applications for similarity search.

- fireworks.client: This is part of the Fireworks AI library, used to interact with the Fireworks API for querying language models, generating embeddings, and managing AI-based workflows.

In [1]:
import urllib.request
import os
import shutil
import fireworks.client
import dotenv
import chromadb
import json
from tqdm import tqdm
import pandas as pd
import random
from dotenv import load_dotenv
from prettytable import PrettyTable


### Load Fireworks key

In [2]:
# Specify the path to the .env file in the env/ directory
dotenv_path = "../env/.env"

# Load the .env file from the specified path
load_dotenv(dotenv_path)

# Get the Fireworks API key from the environment variable
fireworks_api_key = os.getenv("FIREWORKS_API_KEY")

if not fireworks_api_key:
    raise ValueError("No API key found in the .env file. Please add your FIREWORKS_API_KEY to the .env file.")

# Set the Fireworks API key
fireworks.client.api_key = fireworks_api_key

# Step 1B: Connect to Fireworks inference APIs

Function Purpose: 
- `get_completion()` is designed to send a prompt to a specific Fireworks model and return the generated text.

Here are additional parameters you can provide when interacting with Fireworks models through the `/chat/completions` API:

1. **`temperature`**: Controls the randomness of the response. Lower values (e.g., 0.1) make the output more focused and deterministic, while higher values (e.g., 0.9) introduce more randomness.
   
2. **`top_p`**: Implements nucleus sampling. This parameter specifies that only tokens with top cumulative probability `p` are considered for output, providing an alternative method to `temperature` for controlling randomness. A value of 1 includes all tokens, while lower values restrict options.

3. **`stop`**: A list of sequences where the API will stop generating further tokens. This is useful to end the output at specific words or phrases.

4. **`presence_penalty`**: A positive value increases the likelihood of introducing new topics or words that haven’t appeared in the text before. This encourages creativity and novelty in responses.

5. **`frequency_penalty`**: Reduces the model’s tendency to repeat the same words or phrases. Positive values make repeated words less likely.

6. **`n`**: Specifies the number of completions to generate for the prompt. Setting this to a value greater than 1 will return multiple completion options.

7. **`logprobs`**: When set, returns the log probabilities of each token, allowing for more detailed analysis of the model's token selection process.

8. **`user`**: Helps associate requests with specific users, providing the model with user-specific customization options over time.

For more information, check out the completion API docs:
- https://docs.fireworks.ai/api-reference/post-chatcompletions

In [3]:
# Define the function to get the completion from Fireworks models
def get_completion(prompt, model, max_tokens=50):
    completion = fireworks.client.Completion.create(
        model=model,
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=0
    )

    return completion.choices[0].text

Full list of models can be found here: https://fireworks.ai/models?show=Serverless

In [4]:
# Define a list of models to iterate through (using full model names)
models = [
    "accounts/fireworks/models/llama-v3-8b-instruct", 
    "accounts/fireworks/models/gemma2-9b-it",
    "accounts/fireworks/models/mixtral-8x7b-instruct",
    "accounts/yi-01-ai/models/yi-large"
]

### Simple Prompt

In [5]:
simple_prompt = "Tell me your best joke"

In [6]:
# Iterate through each model, run the prompt, and print the results
for model in models:
    response = get_completion(simple_prompt, model=model, max_tokens=80)
    print(f"Model: {model}")
    print(f"Response:\n{response}")
    print("\n" + "-"*80 + "\n")


Model: accounts/fireworks/models/llama-v3-8b-instruct
Response:
 about a chicken.
I've got one! Why did the chicken go to the doctor?
Because it had a fowl cough! (get it? fowl, like a chicken, but also a play on the word "foul" cough? ahh, I slay me!) What do you think? Is it egg-cellent? (okay, I'll stop with the chicken puns

--------------------------------------------------------------------------------

Model: accounts/fireworks/models/gemma2-9b-it
Response:
!

As a large language model, I don't really "get" jokes the way humans do. I can recognize patterns and understand wordplay, but I don't have the same emotional context or sense of humor.

However, I can tell you a classic joke:

Why don't scientists trust atoms?

Because they make up everything!

Let me know if you'

--------------------------------------------------------------------------------

Model: accounts/fireworks/models/mixtral-8x7b-instruct
Response:
.

I'm a simple man. I see, I write. Here's a joke for you: Why

### Complicated Prompt

Context: You’re responding to three different publishers, each with a unique rejection reason:

- Literary House Publishing: Rejected due to the concept being too niche.
- Sunset Press: Rejected because they aren’t accepting submissions in the genre.
- Ocean Blue Books: Rejected because the manuscript doesn’t fit their readership.

Goal: The model should generate three different letters—one for each publisher—while keeping the tone polite and positive.

In [7]:
complicated_prompt = """
You are responding to three publishers who have each rejected your manuscript about a talking dog. Write a very short, polite 3-sentence letter to each publisher, acknowledging the rejection but expressing your belief in the potential of the story:

[
  {
    "manuscript_title": "The Adventures of Barkley the Talking Dog",
    "publisher_name": "Literary House Publishing",
    "rejection_reason": "The concept is too niche for our current catalog.",
    "author_name": "Jane Doe"
  },
  {
    "manuscript_title": "The Adventures of Barkley the Talking Dog",
    "publisher_name": "Sunset Press",
    "rejection_reason": "We are not currently accepting submissions in this genre.",
    "author_name": "Jane Doe"
  },
  {
    "manuscript_title": "The Adventures of Barkley the Talking Dog",
    "publisher_name": "Ocean Blue Books",
    "rejection_reason": "While well-written, we don't feel it fits with our readership.",
    "author_name": "Jane Doe"
  }
]

Write separate, polite letters for each publisher. Use only the data provided in the JSON objects above.

The author of the letters is Jane Doe.
"""


In [8]:
# Iterate through each model, run the prompt, and print the results
for model in models:
    response = get_completion(complicated_prompt, model=model, max_tokens=150)
    print(f"Model: {model}")
    print(f"Response:\n{response}")
    print("\n" + "-"*80 + "\n")

Model: accounts/fireworks/models/llama-v3-8b-instruct
Response:
The letters should be:

* Short (3 sentences)
* Polite
* Expressing a belief in the potential of the story

Here are the letters:

Dear Literary House Publishing,

Thank you for considering my manuscript, "The Adventures of Barkley the Talking Dog". I understand that the concept may be too niche for your current catalog, but I believe that the story's unique blend of humor and heart has the potential to resonate with readers. I will continue to seek out opportunities to share Barkley's adventures with the world.

Sincerely,
Jane Doe

Dear Sunset Press,

Thank you for your time and consideration of my manuscript, "The Adventures of Barkley the Talking Dog". I appreciate your honesty in letting me know that you are not currently

--------------------------------------------------------------------------------

Model: accounts/fireworks/models/gemma2-9b-it
Response:
 


## Letters to Publishers:

**Literary House Publishing:*

# Step 2: Data Preparation and Reshaping

For this notebook, we'll be expanding to a larger use case. A Q&A RAG application for League of Legends, with a focus on the show Arcane. 

In this section we'll process the 3 wide-format JSON files: 
- Arcane characters profiles: `arcane_characters_data.json`
- LoL character profiles:`lol_champion_data.json`
- Locations information: `lol_geography_data.json`

into a long-format structure suitable for text embedding and retrieval tasks.

The steps consist of:

- Initial Setup and Helper Functions: We start by defining functions to chunk long text fields and generate unique identifiers for each chunk.

- Reshaping the Data: The datasets are reshaped from a wide format (with multiple fields like "Background" or "Personality") into a long format where each row corresponds to a single chunk of text from a particular field. Long text fields are split into manageable chunks to ensure efficient handling during embedding generation.

- Combining and Preparing the Data: The reshaped datasets are combined into a single DataFrame, which is then converted into a list of dictionaries for downstream embedding and querying tasks.


This pipeline ensures that long-form textual data is prepared in a format that facilitates efficient storage and retrieval in tasks such as embedding generation, similarity search, and question-answering models.

### Imports and Helper Functions

This block imports necessary libraries (pandas for data handling and hashlib for generating unique identifiers). Two helper functions are defined:

- `generate_unique_id`: This function takes various inputs (name, field, chunk index, category, and URL) and returns a unique identifier using the MD5 hashing algorithm.
- `chunk_text`: This function splits long text into smaller chunks of approximately chunk_size words. It handles cases where text might be None or empty.

In [9]:
import pandas as pd
import hashlib

# Function to generate a unique ID based on the name, field, and chunk index
def generate_unique_id(name, field_name, chunk_idx, category, url):
    unique_string = f"{name}_{field_name}_{chunk_idx}_{category}_{url}"
    return hashlib.md5(unique_string.encode()).hexdigest()

# Function to chunk long text into smaller pieces
def chunk_text(text, chunk_size=256):
    """
    Splits text into chunks of approximately chunk_size words.
    Adjust chunk_size based on the number of tokens/words.
    """
    if not text or not isinstance(text, str):
        return []  # Return an empty list if the text is None or not a string
    
    words = text.split()
    return [" ".join(words[i:i + chunk_size]) for i in range(0, len(words), chunk_size)]

### Reshape Function
This block defines the core function `reshape_to_long_format_with_chunking`. 
It processes the input data dictionary by converting wide-format fields (e.g., `Background`, `Appearance`) into long-format rows. 
For each field that contains text, it further splits the text into chunks using the `chunk_text` function and generates a unique ID for each chunk using `generate_unique_id`. 
The processed rows are then stored in a list that will later be converted into a DataFrame.

In [10]:
# Function to reshape a dataset from wide to long format and chunk long text
def reshape_to_long_format_with_chunking(data_dict, chunk_size=256):
    long_format_rows = []
    
    for item in data_dict:
        # Iterate over all fields to convert into long format
        fields_to_convert = ["Background", "Appearance", "Personality", "Abilities", "Relations", "Lore", 
                             "History", "History in Arcane", "Locations"]

        for field in fields_to_convert:
            field_value = item.get(field)
            if field_value:  # Only include fields that have a value
                # Chunk the field value if it's long
                text_chunks = chunk_text(field_value, chunk_size=chunk_size)
                
                # Create a new row for each chunk of the field value
                for idx, chunk in enumerate(text_chunks):
                    long_format_rows.append({
                        "Name": item.get("Name", ""),
                        "Category": item.get("Category", ""),  # Ensure Category is passed
                        "URL": item.get("URL", ""),  # Ensure URL is passed
                        "Field_name": field,
                        "Field_value": chunk,
                        "chunk_index": idx,  # Keep track of the chunk index
                        # Pass name, field, chunk index, category, and URL to generate_unique_id
                        "unique_id": generate_unique_id(item.get("Name", ""), field, idx, item.get("Category", ""), item.get("URL", ""))
                    })
    
    # Convert the list of rows to a DataFrame
    long_format_df = pd.DataFrame(long_format_rows)
    return long_format_df

### Loading Data

In this block, JSON files containing data for "Arcane" characters, "League of Legends" champions, and geography are loaded into Python as dictionaries. Each JSON file is converted to a list of records, where each record corresponds to a character or location with its associated fields.

In [11]:
# Example: Load the JSON data and reshape it
arcane_data = pd.read_json("data_lol/arcane_characters_data.json").to_dict(orient="records")
lol_data = pd.read_json("data_lol/lol_champion_data.json").to_dict(orient="records")
geography_data = pd.read_json("data_lol/lol_geography_data.json").to_dict(orient="records")

### Reshaping the Data

Here, each loaded dataset (Arcane characters, LoL champions, and geography) is reshaped from a wide format (many columns) to a long format (more rows but fewer columns) using the `reshape_to_long_format_with_chunking` function. This transformation ensures that fields with longer text are split into smaller chunks. The reshaped DataFrame is previewed using `head()`.

In [12]:
# Reshape each dataset into long format with chunking
arcane_long_df = reshape_to_long_format_with_chunking(arcane_data)
lol_long_df = reshape_to_long_format_with_chunking(lol_data)
geography_long_df = reshape_to_long_format_with_chunking(geography_data)

# Preview the reshaped and chunked data
arcane_long_df.head()

Unnamed: 0,Name,Category,URL,Field_name,Field_value,chunk_index,unique_id
0,Amara,,https://leagueoflegends.fandom.com/wiki/Amara,Background,Not much is known about Amara's early life. At...,0,277cee44a4c9972007f23064cf3b96fe
1,Amara,,https://leagueoflegends.fandom.com/wiki/Amara,Appearance,"Amara is an elderly woman with gray hair, very...",0,44595ae945ca85d89ac1d76578e6213f
2,Amara,,https://leagueoflegends.fandom.com/wiki/Amara,Personality,"Amara is shrewd and cunning, and is able to ea...",0,cb89c80b2e84917ad379c8dd4a05d1d1
3,Amara,,https://leagueoflegends.fandom.com/wiki/Amara,Abilities,Amara has business contracts with several memb...,0,5b0836eaa98f34518f1b857b5a2723b8
4,Amara,,https://leagueoflegends.fandom.com/wiki/Amara,Relations,Amara has business contracts with several memb...,0,3100681e477d8826702edf7974194734


### Combining DataFrames

This block combines the three reshaped datasets (`arcane_long_df`, `lol_long_df`, and `geography_long_df`) into a single long-format DataFrame using `pd.concat`. This combined dataset will later be used for embedding and retrieval purposes.

In [13]:
# Combine the three long-format dataframes into one
combined_long_df = pd.concat([arcane_long_df, lol_long_df, geography_long_df], ignore_index=True)

### Converting Data to List of Dictionaries
In this block, the combined long-format DataFrame is converted into a list of dictionaries using `to_dict`. This structure is more suitable for embedding generation, and the first item in the list is previewed to ensure the data is in the expected format.

In [14]:
# Convert the combined dataframe to a list of dictionaries for embedding
compiled_data_dict = combined_long_df.to_dict(orient="records")

# Preview again to ensure the data is in the correct format
print(compiled_data_dict[0])


{'Name': 'Amara', 'Category': '', 'URL': 'https://leagueoflegends.fandom.com/wiki/Amara', 'Field_name': 'Background', 'Field_value': "Not much is known about Amara's early life. At some point during her younger years, she made a fortune as a merchant in Piltover and had a son named Rohan.", 'chunk_index': 0, 'unique_id': '277cee44a4c9972007f23064cf3b96fe'}


# Step 3: Setting Up Embedding Model and ChromaDB for Semantic Search

This block of code sets up the infrastructure for generating and storing embeddings in a vector database (ChromaDB), which is essential for tasks like document similarity search or question-answering.

### Embedding Model Initialization:

- The SentenceTransformer model, specifically 'all-MiniLM-L6-v2', is initialized. This is a pre-trained model used to convert text into numerical embeddings (vectors) that can represent the meaning of the text in a lower-dimensional space.
- The model is optimized for generating embeddings suitable for semantic search tasks.

In [15]:
from chromadb import Documents, EmbeddingFunction, Embeddings
from sentence_transformers import SentenceTransformer
from tqdm import tqdm
import hashlib

# Initialize the embedding model (same as before)
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')


  from tqdm.autonotebook import tqdm, trange


### Custom Embedding Function:

- The `MyEmbeddingFunction` class is defined to serve as an interface between the embedding model and ChromaDB. This custom class inherits from `EmbeddingFunction` and overrides the `__call__` method to accept a list of documents (strings), encode them into embeddings using the SentenceTransformer model, and return them as a list of vectors.

In [16]:
# Custom EmbeddingFunction class to interface with ChromaDB
class MyEmbeddingFunction(EmbeddingFunction):
    def __call__(self, input: Documents) -> Embeddings:
        batch_embeddings = embedding_model.encode(input)
        return batch_embeddings.tolist()

embed_fn = MyEmbeddingFunction()

### ChromaDB Client Setup:

- The code initializes a `PersistentClient` for ChromaDB, specifying the storage location as `./chromadb`. ChromaDB is a vector database where text embeddings will be stored and queried.
- This client allows for efficient storage and retrieval of embeddings across multiple sessions.

In [17]:
# Initialize the chromadb directory and client
client = chromadb.PersistentClient(path="./chromadb")

### Creating or Getting a Collection:

- A collection named `"lol-RAG-workshop-example"` is created or retrieved from ChromaDB. A collection in ChromaDB is analogous to a table in traditional databases; it will store embeddings, documents, and associated metadata.
- The embedding function (`embed_fn`) is used to specify how the embeddings should be generated when documents are inserted into the collection.

In [18]:
# Create or get the collection in ChromaDB
collection = client.get_or_create_collection(
    name="lol-RAG-workshop-example"
)

# Step 4: Batch Processing and Upserting Data into ChromaDB (+ Embedding Generation)

This block is responsible for processing and storing the data into ChromaDB in batches. It iterates through the compiled dataset (`compiled_data_dict`), generates embeddings, and upserts (inserts or updates) the data into ChromaDB. Here's a breakdown of what's happening:

1. **Batch Size Definition**: It sets the size of each batch of data to be processed (in this case, 50 entries per batch).
   
2. **Looping Over Batches**: Using `tqdm` for progress tracking, it loops over the entire dataset in increments of 50, fetching the corresponding entries for each batch.

3. **Prepare Batch Data**:
    - **IDs**: Extracts unique IDs (`unique_id`) for each item in the batch.
    - **Field Values**: Retrieves the text values from the `Field_value` column to be used for embedding.
    - **Metadata**: Collects additional information such as `Category` and `URL` for each item in the batch.

4. **Generate Embeddings**: It generates sentence embeddings for the `Field_value` text data using the pre-loaded `embedding_model`.

5. **Upserting into ChromaDB**: Finally, it inserts or updates (upserts) the batch data into ChromaDB. The upsert operation stores the `ids`, `metadata`, `documents` (text data), and the generated embeddings for later retrieval and querying.

In [19]:
# Set batch size for upserting into ChromaDB
batch_size = 50

# Loop through batches and generate + store embeddings
for i in tqdm(range(0, len(compiled_data_dict), batch_size)):

    # Get the batch
    i_end = min(i + batch_size, len(compiled_data_dict))
    batch = compiled_data_dict[i: i_end]

    # Prepare lists for batch processing
    batch_ids = [item["unique_id"] for item in batch]
    batch_field_values = [item["Field_value"] for item in batch]
    batch_metadata = [{"category": item.get("Category", ""), "url": item.get("URL", "")} for item in batch]

    # Generate embeddings
    batch_embeddings = embedding_model.encode(batch_field_values)

    # Upsert to ChromaDB
    collection.upsert(
        ids=batch_ids,
        metadatas=batch_metadata,
        documents=batch_field_values,
        embeddings=batch_embeddings.tolist(),
    )


100%|██████████| 4/4 [00:01<00:00,  3.95it/s]


 # Step 5: Querying the Vector Store
 
This block is responsible for querying the ChromaDB collection to retrieve relevant documents based on a user-specified query. Here’s what each part does:

1. **Get or create the collection in ChromaDB:**
   - It checks whether a collection named `"lol-RAG-workshop-example"` already exists in the ChromaDB instance. If it doesn’t exist, it creates one.
   - It also ensures that the custom embedding function (`embed_fn`) is used for the embeddings.

2. **Querying the collection:**
   - The `collection.query()` method takes a query text (`"Zaun"` in this case) and retrieves the top 10 most relevant documents (or data chunks) from the collection based on similarity in embeddings.
   
3. **Printing the retrieved documents:**
   - The block then prints out the documents retrieved from ChromaDB, enumerating them for easy readability. This gives insight into which parts of the dataset are most relevant to the query.

This block essentially tests the retrieval system to see which documents (chunks of text) from the dataset are most similar to the query.

### Check number of docs

In [20]:
num_documents = collection.count()
print(f"Number of documents in the collection: {num_documents}")


Number of documents in the collection: 200


In [21]:
# Get or create the collection in ChromaDB
collection = client.get_or_create_collection(
    name="lol-RAG-workshop-example",  # Your custom collection name
    embedding_function=embed_fn  # Ensure you're using the custom embedding function
)

# Test the retriever with a query related to your dataset
retriever_results = collection.query(
    query_texts=["Zaun"],  # Query text relevant to your dataset
    n_results=10,  # Number of results to retrieve
)

# Print the retrieved documents
print("Retrieved documents:")
for i, doc in enumerate(retriever_results["documents"], 1):  # Start enumeration from 1
    print(f"Result {i}: {doc}\n")


Retrieved documents:
Result 1: ['Zaun is a city within Piltover located between Valoran and Shurima. The current well known locations (not counting Piltover) in Zaun are:', "Zaunfinalized its plans to destroy a portion of the isthmus connectingValoranand theSouthern Continent, allowing for safe sea passage between eastern and western Valoran. The plan involved using thousand of chemtech bombs to crack open an area of the land so that a cavern could be created, but the results were catastrophic. In what seemed to be an accident, the bombs triggered a series of earthquakes that completely destroyed the isthmus and sank large districts of Zaun and thousands of its citizens, while also leaking poisonous gas into the city's surviving areas.", "Zaunis a polluted undercity located beneath Piltover - once united, they are now separate, symbiotic cultures. Stifled inventors often find their unorthodox research welcomed in Zaun, but reckless industry has rendered whole swathes of the city highly

# Step 6: Generating Responses
This block retrieves relevant information from the ChromaDB collection, builds a prompt using the retrieved data, and then generates responses from four different Fireworks models. It compares how each model interprets the prompt and generates unique outputs based on the same query and context.

### Boilerplate functions

In [22]:
import fireworks.client

# Define the function to get the completion from Fireworks models
def get_completion(prompt, model, max_tokens=2000):
    completion = fireworks.client.Completion.create(
        model=model,
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=0
    )
    return completion.choices[0].text

# Define a list of models to iterate through (using full model names)
models = [
    "accounts/fireworks/models/llama-v3-8b-instruct", 
    "accounts/fireworks/models/gemma2-9b-it",
    "accounts/fireworks/models/mixtral-8x7b-instruct",
    "accounts/yi-01-ai/models/yi-large"
]


### User Query and Retrieval:

- A user query (`"Tell me about the history of Piltover"`) is sent to the ChromaDB collection to retrieve relevant documents.
- The `collection.query()` function queries the ChromaDB collection to get 10 relevant results, including the metadata (`Category`, `URL`, etc.) and the documents (chunks of text).

In [23]:
# User query example
user_query = "Tell me about the history of Piltover"

# Query the collection with the user query, returning both documents and their metadata
results = collection.query(
    query_texts=[user_query],
    n_results=10,  # Return the top 10 results
    include=['metadatas', 'documents']  # Include metadata and document text
)

### Processing Retrieved Results:
- The retrieved documents (text chunks) are concatenated into a single string and printed alongside the associated metadata. This provides context for the Fireworks models to base their responses on.

In [24]:
# Retrieve the documents (field values) and associated metadata
retrieved_field_values = '\n'.join([doc for doc in results['documents'][0]])
retrieved_metadata = results['metadatas'][0]

# Print metadata to understand what we're working with
print("Retrieved Metadata for Chunks Queried:")
for i, metadata in enumerate(retrieved_metadata):
    print(f"Chunk {i+1}: {metadata}")

Retrieved Metadata for Chunks Queried:
Chunk 1: {'category': 'LoL_locations', 'url': 'https://leagueoflegends.fandom.com/wiki/Piltover'}
Chunk 2: {'category': 'LoL_locations', 'url': 'https://leagueoflegends.fandom.com/wiki/Piltover'}
Chunk 3: {'category': '', 'url': 'https://leagueoflegends.fandom.com/wiki/Heimerdinger/Arcane'}
Chunk 4: {'category': '', 'url': 'https://leagueoflegends.fandom.com/wiki/Salo'}
Chunk 5: {'category': '', 'url': 'https://leagueoflegends.fandom.com/wiki/Bolbok'}
Chunk 6: {'category': '', 'url': 'https://leagueoflegends.fandom.com/wiki/Heimerdinger/Arcane'}
Chunk 7: {'category': '', 'url': 'https://leagueoflegends.fandom.com/wiki/Heimerdinger/Arcane'}
Chunk 8: {'category': '', 'url': 'https://leagueoflegends.fandom.com/wiki/Sevika'}
Chunk 9: {'category': 'LoL_locations', 'url': 'https://leagueoflegends.fandom.com/wiki/Zaun'}
Chunk 10: {'category': '', 'url': 'https://leagueoflegends.fandom.com/wiki/Amara'}


### Prompt Template:

- A prompt template is built, incorporating the user query and the retrieved field values (text from the documents). The prompt asks the model to generate 5 new responses based on the retrieved content but without repeating it directly.

In [25]:
# Adjust the prompt template to generate responses related to the retrieved data and metadata
prompt_template = f'''[INST]

Your task is to generate 5 RESPONSES based on the USER_QUERY.

You should refer to the FIELD_VALUES provided as context, but do not repeat them directly. Provide new information or suggestions.

USER_QUERY: {user_query}

FIELD_VALUES: {retrieved_field_values}

RESPONSES:

[/INST]
'''

### Model Iteration:

- The script iterates through each model in the models list, uses the `get_completion()` function to generate a response from each model based on the prompt, and prints the results.
- Each model’s response is separated with a line for clarity, allowing easy comparison of how different models handle the same task.

In [26]:
# Iterate through each model, run the prompt, and print the results
for model in models:
    print(f"Response from model: {model}")
    response = get_completion(prompt_template, model=model, max_tokens=2000)
    print(response)
    print("\n" + "-"*80 + "\n")  # Separator between model responses


Response from model: accounts/fireworks/models/llama-v3-8b-instruct
```
Here are five potential responses based on the user query and field values:

1. Piltover's history is deeply intertwined with the city's founders, including Heimerdinger, who is credited with contributing to the city's construction and founding the Piltover University. The city's progressive nature and focus on innovation have made it a hub for inventors and craftsmen from across Runeterra.

2. The city's ruling council has played a significant role in shaping Piltover's history, with notable members like Salo and Bolbok holding influential positions. The council's decisions have often been driven by a desire to promote progress and prosperity, but have also led to tensions with the undercity of Zaun.

3. Piltover's relationship with Zaun has been marked by conflict and tension, particularly in recent years. The Piltovan Enforcers have been accused of attacking undercity streets, leading to a sense of resentment an

KeyboardInterrupt: 