# Generation

To build a chatbot app, we need a set of questions and answers. This is helpful for evaluating different prompt engineering techniques and different app design choices.

In this notebook we dive deeper on prompting the model by passing a better context by:
* using available data of W&B user questions 
* using the documentation files to generate better answers

In [16]:
import os
import random

import openai
import tiktoken

from pathlib import Path
from pprint import pprint
from getpass import getpass

from rich.markdown import Markdown
import pandas as pd
from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential, # for exponential backoff
)  
import wandb
from wandb.integration.openai import autolog

# Set OpenAI API key 

To get key, click on [link](https://platform.openai.com/account/api-keys).

In [17]:
if os.getenv("OPENAI_API_KEY") is None:
  if any(['VSCODE' in x for x in os.environ.keys()]):
    print('Please enter password in the VS Code prompt at the top of your VS Code window!')
  os.environ["OPENAI_API_KEY"] = getpass("Paste your OpenAI key from: https://platform.openai.com/account/api-keys\n")
  openai.api_key = os.getenv("OPENAI_API_KEY", "")

assert os.getenv("OPENAI_API_KEY", "").startswith("sk-"), "This doesn't look like a valid OpenAI API key"
print("OpenAI API key configured")

OpenAI API key configured


# Start W&B logging

autolog - convenient function for logging OpenAI results to W&B

In [18]:
autolog({"project":"llmapps", "job_type": "generation"})



# Generating synthetic support questions

In [19]:
# completion_with_backoff 
# - this decorator will make API request wait if it hits a rate limiting error
@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def completion_with_backoff(**kwargs):
    return openai.ChatCompletion.create(**kwargs)

In [20]:
MODEL_NAME = "gpt-3.5-turbo"
# MODEL_NAME = "gpt-4"

## Zero Shot prompting

Not giving any examples or context

In [None]:
# Define the behaviour, qualities of the LLM
system_prompt = "You are a helpful assistant."
# Define what we ask the LLM to do
user_prompt = "Generate a support question from a W&B user"

def generate_and_print(system_prompt, user_prompt, n=5):
    messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ]
    responses = completion_with_backoff(
        model=MODEL_NAME,
        messages=messages,
        n = n,
        )
    for response in responses.choices:
        generation = response.message.content
        display(Markdown(generation))
    
generate_and_print(system_prompt, user_prompt)

The 5 generated responses above are quite generic. 

## Few Shot prompting 

Give the model several examples.
* We read some user submitted questions (from Discord server) listed in the file `examples.txt`. 
* This file contains multiline questions separated by tabs (`\t`).

In [None]:
delimiter = "\t" # tab separated queries
with open("../data/examples.txt", "r") as file:
    data = file.read()
    real_queries = data.split(delimiter)

pprint(f"We have {len(real_queries)} real queries:")  
Markdown(f"Sample one: \n\"{random.choice(real_queries)}\"")

'We have 228 real queries:'


For a Few Shot prompt:
* we can now add a few of those real user questions to the prompt, to guide our model to produce synthetic questions like those.

In [None]:
def generate_few_shot_prompt(queries, n=3):
    prompt = "Generate a support question from a W&B user\n" +\
        "Below you will find a few examples of real user queries:\n"
    for _ in range(n):
        prompt += random.choice(queries) + "\n"
    prompt += "Let's start!"
    return prompt

generation_prompt = generate_few_shot_prompt(real_queries)

# Print the prompt to be fitted into the LLM
Markdown(generation_prompt)

OpenAI `Chat` models are very good at following instructions with a few examples. 

Below we can see how it does by using some context from the prompt.

In [None]:
generate_and_print(system_prompt, user_prompt=generation_prompt)

Questions produced by LLM, by passing some example questions, are a bit more diverse than with Zero Shot prompting, but we can go further (to increase range of questions produced by LLM).

Something that we could add to the prompt to avoid words like "Sure, ..." could be: "Just give me the question, don't give me extra text"

## Add Context & Response

We want to be able to respond questions that also have some documentation available. As long as we have the documentation for a specific user question, we should be able to answer that question.

To evaluate the model, we want to make sure that whenever documentation is available, the answer is correct for that question.
* So why not use documentation to also generate synthetic questions?

To do this, the folder `../docs_sample` contains several examples of wandb docs. Dataset of questions will be limited to what is available in this docs.

In [None]:
# check if directory exists, if not, create it and download the files, e.g if running in colab
if not os.path.exists("../docs_sample/"):
  !git clone https://github.com/wandb/edu.git
  !cp -r edu/llm-apps-course/docs_sample ../

Cloning into 'edu'...
remote: Enumerating objects: 2470, done.[K
remote: Counting objects: 100% (1013/1013), done.[K
remote: Compressing objects: 100% (372/372), done.[K
remote: Total 2470 (delta 712), reused 843 (delta 633), pack-reused 1457[K
Receiving objects: 100% (2470/2470), 22.60 MiB | 7.51 MiB/s, done.
Resolving deltas: 100% (1408/1408), done.


In [None]:
def find_md_files(directory):
    "Find all markdown files in a directory and return their content and path"
    md_files = []
    for file in Path(directory).rglob("*.md"):
        with open(file, 'r', encoding='utf-8') as md_file:
            content = md_file.read()
        md_files.append((file.relative_to(directory), content))
    return md_files

documents = find_md_files('../docs_sample/')
len(documents)

11

Check that the documents are not too long for our context window (prompt), by computing the number of tokens in each document.

In [22]:
tokenizer = tiktoken.encoding_for_model(MODEL_NAME)
tokens_per_document = [len(tokenizer.encode(document)) for _, document in documents]
pprint(tokens_per_document)

[4179, 365, 1206, 2596, 2940, 537, 956, 803, 1644, 2529, 2093]


Some of the documents are too long (don't need that much text in our prompt). For thos documents, we'll extract a random chunk from them - to inspire the LLM to generate ome questions.

In [23]:
# extract a random chunk from a document
def extract_random_chunk(document, max_tokens=512):
    tokens = tokenizer.encode(document)
    if len(tokens) <= max_tokens:
        return document
    start = random.randint(0, len(tokens) - max_tokens)
    end = start + max_tokens
    return tokenizer.decode(tokens[start:end])

Now, we use extracted chunk to create a question that can be answered by the document. This way we can generate questions that our current documentation is capable of answering.

In [24]:
def generate_context_prompt(chunk):
    prompt = "Generate a support question from a W&B user\n" +\
        "The question should be answerable by provided fragment of W&B documentation.\n" +\
        "Below you will find a fragment of W&B documentation:\n" +\
        chunk + "\n" +\
        "Let's start!"
    return prompt

chunk = extract_random_chunk(documents[0][1])
generation_prompt = generate_context_prompt(chunk)

In [25]:
Markdown(generation_prompt)

Let's generate 3 possible questions:

In [None]:
generate_and_print(system_prompt, generation_prompt, n=3)

Some output questions above seem synthetic (not that related to wandb, but more about some specific coding concept). There are further prompt engineering steps to improve this.

## Level 5 prompt structure

This prompt structure has a complex directive that includes:
* Description of high-level goal 
* And few short examples
* Detailed bulleted list of sub-tasks 
* An explicit statement asking the LLM to explain its output
* Guidelines on how LLM output will be evaluated


In [27]:
# we will use GPT4 from here, as it gives better answers and abides to instructions better
MODEL_NAME = "gpt-4"

### System and user templates

Here we attempt to create a prompt that follows these Level 5 directions. We split the prompt split into:
* **System template** (system message) - instructing model to get into a specific role
* **User template** (input from the user)

In [29]:
# read system_template.txt file into an f-string
with open("../data/system_template.txt", "r") as file:
    system_prompt = file.read()

In [30]:
Markdown(system_prompt)

In [32]:
# read prompt_template.txt file into an f-string
with open("../data/prompt_template.txt", "r") as file:
    prompt_template = file.read()

In [33]:
Markdown(prompt_template)

In the above prompt, we tell the model:
* We say that we provide examples of real user question (this is the **few shot** part of the prompt)
* {Need to provide examples}  
* We provide fragment of W&B docs for inspiration for synthetic questions and source of answer
* {Need to provide docs}
* Provide further info to the model to guide the model answer

Now, below, we fill above template prompt by using
* Example questions from **[examples.txt](../data/examples.txt)**
* Example documentation from **[docs_sample](../docs_sample/)**

In [34]:
def generate_context_prompt(chunk, n_questions=3):
    questions = '\n'.join(random.sample(real_queries, n_questions))
    user_prompt = prompt_template.format(QUESTIONS=questions, CHUNK=chunk)
    return user_prompt

user_prompt = generate_context_prompt(chunk)

In [35]:
Markdown(user_prompt)

Now, we request the model to generate answers

In [37]:
def generate_questions(documents, n_questions=3, n_generations=5):
    questions = []
    for _, document in documents:
        # Extract random chunck from a W&B document
        chunk = extract_random_chunk(document)
        # Fill in prompt_template with example questions and docs chunck
        user_prompt = generate_context_prompt(chunk, n_questions)
        # Pass system_prompt and user_prompt to LLM
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ]
        response = completion_with_backoff(
            model=MODEL_NAME,
            messages=messages,
            n = n_generations,
            )
        # Produce n responses from input prompt
        questions.extend([response.choices[i].message.content for i in range(n_generations)])
    return questions

In [38]:
# function to parse model generation and extract CONTEXT, QUESTION and ANSWER
def parse_generation(generation):
    lines = generation.split("\n")
    context = []
    question = []
    answer = []
    flag = None
    
    for line in lines:
        if "CONTEXT:" in line:
            flag = "context"
            line = line.replace("CONTEXT:", "").strip()
        elif "QUESTION:" in line:
            flag = "question"
            line = line.replace("QUESTION:", "").strip()
        elif "ANSWER:" in line:
            flag = "answer"
            line = line.replace("ANSWER:", "").strip()

        if flag == "context":
            context.append(line)
        elif flag == "question":
            question.append(line)
        elif flag == "answer":
            answer.append(line)

    context = "\n".join(context)
    question = "\n".join(question)
    answer = "\n".join(answer)
    return context, question, answer

In [None]:
# Generate questions using LLM
generations = generate_questions([documents[0]], n_questions=3, n_generations=5)
parse_generation(generations[0])

* Above generated text is split into `Context`, `Question`, `Answer`
* Question looks better that with previous approaches

Now that we verified that function works, we can run it in a loop to generate questions

Below.. cause we want a big dataset of synthetic questions for our model evaluation:
* we save LLM generations into a dataframe and a csv, 
* we log this as a W&B Table and save the csv as a W&B Artifact

In [None]:
parsed_generations = []
generations = generate_questions(documents, n_questions=3, n_generations=5)
for generation in generations:
    context, question, answer = parse_generation(generation)
    parsed_generations.append({"context": context, "question": question, "answer": answer})

# let's convert parsed_generations to a pandas dataframe and save it locally
df = pd.DataFrame(parsed_generations)
df.to_csv('generated_examples.csv', index=False)

# log df as a table to W&B for interactive exploration
wandb.log({"generated_examples": wandb.Table(dataframe=df)})

# log csv file as an artifact to W&B for later use
artifact = wandb.Artifact("generated_examples", type="dataset")
artifact.add_file("generated_examples.csv")
wandb.log_artifact(artifact)

In [None]:
# Finish wandb run
wandb.finish()