## Tutorial and exercises: Few-Shot prompting and semantic embeddings
*This Colab notebook is part of the [few-shot prompting tutorial](http://colab.research.google.com/github/PerttuHamalainen/MediaAI/blob/master/Code/Jupyter/few_shot_prompting.ipynb) of the [AI for media, art, and design](https://github.com/PerttuHamalainen/MediaAI) course of Aalto University*

*To run the code and complete the exercises, you need an OpenAI account*

This notebook demonstrates text generation using Few-Shot prompting and provides exercises to extend notebook. Solutions to the exercises are also provided, but the code is hidden by default.

**Learning goals:**

* Everyone: Practice producing the few-shot examples and learn to anticipate how different examples change the results
* Everyone: Practice exploring a large number of generations (or other text data such as customer feedback messages) using semantic embeddings
* Those with at least some programming skills: Advanced few-shot prompting techniques such as combinatorial prompting

**How to use:**
1. Select "Save a copy in drive" from the File menu to make a copy that you can edit in the exercises
2. Select "Run all" from the Runtime menu. In the code cell below, enter your OpenAI API key when requested. If you don't have a key, follow [OpenAI's instructions](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)
3. Proceed through the notebook, following the exercise instructions.

**New to Colab notebooks?**

Colab notebooks are browser-based learning environments consisting of *cells* that include either text or code. The code is executed in a Google virtual machine instead of your own computer. You can run code cell-by-cell (click the "play" symbol of each code cell), and selecting "Run all" as instructed above is usually the first step to verify that everything works. For more info, see Google's [Intro video](https://www.youtube.com/watch?v=inN8seMm7UI) and [curated example notebooks](https://colab.google/notebooks/)


In [None]:
# @title Ask the user to enter an OpenAI API key
from getpass import getpass
try: OPENAI_API_KEY
except NameError: OPENAI_API_KEY = getpass('Enter OpenAI API key: ')

Enter OpenAI API key: ··········


## Key concepts and theory
Note: This material is explained in more detail in the course lectures.

**What is Few-Shot prompting?**

Few-shot prompting means that one includes *high-quality examples* of desired outputs in the LLM prompt. This can greatly improve the generated results.

Similar to explaining things to a human, using examples is often the most *efficient and precise way* to explain what one wants.

**Finding/writing the few-shot examples is a key skill to practise** in AI-assisted creative writing. Typically, this includes both writing example text yourself, and browsing and/or scraping the Internet for examples.

For example, we can prompt game ideas using the following prompt:

```
A list of experimental indie game ideas:

---

An FPS game where time only moves when the player moves or performs actions. This allows Matrix-style slow-motion gun ballet, and transforms real-time action into a puzzle.

---

A Sokoban-style game where the player pushes around blocks that are variables, operators, and definitions of a programming language. Blocks that connect to each other form statements that allow the player to define and alter the game's rules. For example, the player can connect "floor", "is", and "lava" to make the floor deadly.

---

```

Above, the few-shot examples describe the indie games Superhot and Baba Is You in a way that makes the core mechanics clear.

Note that this prompt is for LLMs that continue prompt (e.g., OpenAI's davinci-002 and gpt-3.5-turbo-instruct). For chat-based models like ChatGPT, you should change the first line to "Please continue this list of experimental indie game ideas:"


**A few important things:**
* When prompting lists such as game ideas, LLMs cannot distinguish between the prompt and previously generated items. In effect, *previously generated items become new few-shot examples.* This means that even one bad generation can throw the LLM off the rails and quality generally decreases with each item due to small random errors accumulating.
* To prevent the above, it's better to prompt each item separately, making sure that the LLM only sees the original examples. Here, a bit of Python automation can greatly reduce the tedium, as demonstrated in this notebook.
* Generated text quality typically improves with more examples, but going beyond 10 examples tends to provide diminishing gains.
* The ability to generalize from the few-shot examples, a.k.a. *in-context learning*, only emerges in sufficiently large models and was demonstrated for the first time in OpenAI's largest GPT-3 variant (175 billion parameters).
* If you cannot get good quality results and you can produce at least a few hundred examples, an alternative to few-shot prompting is finetuning an LLM, i.e., continuing to train some base LLM with your examples.

**What are semantic embeddings?**
The remarkable abilities of LLMs are largely due to how they create internal representations or *embeddings* of each word or token. The embeddings are vectors (lists of coordinates) that define positions in a high-dimensional embedding space where *moving in a certain direction corresponds to changing the meaning* of the word. This allows an LLM to perform complex semantic manipulations and inferences using standard math, e.g., $king-man=queen-woman$ or, equivalenly, $queen=king-man+woman$.

In this notebook and in other applications such as Retrieval Augmented Generation (RAG), one computes such embedding vectors from whole sentences instead of single words or tokens. This allows plotting the sentences visually such that similar sentences form clusters. This can help one quickly notice and explore both main themes and rare outliers in some dataset of texts.

### Do some setup and connect to OpenAI.


In [None]:
# @title
#First, install and import code packages
#! in the beginning of a Colab code line allows running Linux shell commands
!pip install openai
!pip install tiktoken
!pip install umap-learn
import tiktoken
import time
import asyncio
import pickle
import hashlib
import os
import openai
import numpy as np
import pandas as pd
import itertools
import matplotlib.pyplot as plt
import json
import plotly.express as px
import textwrap

client=openai.OpenAI(api_key=OPENAI_API_KEY)
client_async=openai.AsyncOpenAI(api_key=OPENAI_API_KEY)

Collecting openai
  Downloading openai-1.30.1-py3-none-any.whl (320 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m320.6/320.6 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.5 ht

In [None]:
# To make things easier, this cell defines some code helpers.
# source: https://github.com/PerttuHamalainen/LLMCode
import json

#Colab is already running an asyncio event loop => need this hack for async OpenAI API calling
import nest_asyncio
nest_asyncio.apply()

#progress bar helper
def print_progress_bar (iteration, total, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):
    """
    Call in a loop to create terminal progress bar
    @params:
        iteration   - Required  : current iteration (Int)
        total       - Required  : total iterations (Int)
        prefix      - Optional  : prefix string (Str)
        suffix      - Optional  : suffix string (Str)
        decimals    - Optional  : positive number of decimals in percent complete (Int)
        length      - Optional  : character length of bar (Int)
        fill        - Optional  : bar fill character (Str)
        printEnd    - Optional  : end character (e.g. "\r", "\r\n") (Str)
    """
    percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
    filledLength = int(length * iteration // total)
    bar = fill * filledLength + '-' * (length - filledLength)
    print(f'\r{prefix} |{bar}| {percent}% {suffix}', end = printEnd)
    # Print New Line on Complete
    if iteration == total:
        print()

#LLM response cache. For reducing API costs, if the exact same prompt has been
#used before, the helpers below offer an option to return results from cache.
cache_dir = "./llm_cache"
def cache_keys_equal(key1,key2):
    if (type(key1) is np.ndarray) and (type(key2) is np.ndarray):
        return np.array_equal(key1,key2)
    return key1==key2

def cache_hash(key):
    return hashlib.md5(key).hexdigest()

def load_cached(key):
    cached_name= cache_dir + "/" + cache_hash(key)
    if os.path.exists(cached_name):
        cached=pickle.load(open(cached_name,"rb"))
        if cache_keys_equal(cached["key"],key):
            #cache_copy_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "cache_copy") #for debugging which files are actually used...
            #shutil.copy(cached_name, cache_copy_dir+"/" + cache_hash(key))
            return cached["value"]
    return None

def cache(key,value):
    if not os.path.exists(cache_dir):
        os.mkdir(cache_dir)
    cached_name= cache_dir + "/" + cache_hash(key)
    pickle.dump({"key":key,"value":value},open(cached_name,"wb"))


#Some info for tokenizing text (e.g., for calculating the number of prompt tokens)
tiktoken_encodings = {
    "gpt-4-turbo": tiktoken.get_encoding("cl100k_base"),
    "gpt-4-turbo-preview": tiktoken.get_encoding("cl100k_base"),
    "gpt-4": tiktoken.get_encoding("cl100k_base"),
    "gpt-3.5-turbo": tiktoken.get_encoding("cl100k_base"),
    "gpt-3.5-turbo-instruct": tiktoken.get_encoding("cl100k_base"),
    "gpt-3.5-turbo-16k": tiktoken.get_encoding("cl100k_base"),
    "davinci-002": tiktoken.get_encoding("cl100k_base"),
    "text-davinci-003": tiktoken.get_encoding("p50k_base"),
    "text-davinci-002": tiktoken.get_encoding("p50k_base"),
    "text-davinci-001": tiktoken.get_encoding("r50k_base"),
    "text-curie-001": tiktoken.get_encoding("r50k_base"),
    "text-babbage-001": tiktoken.get_encoding("r50k_base"),
    "text-ada-001": tiktoken.get_encoding("r50k_base"),
    "davinci": tiktoken.get_encoding("r50k_base"),
    "curie": tiktoken.get_encoding("r50k_base"),
    "babbage": tiktoken.get_encoding("r50k_base"),
    "ada": tiktoken.get_encoding("r50k_base"),
}

#Maximum number of tokens supported by different models
max_llm_context_length = {
    "gpt-4-turbo": 16384*2,
    "gpt-4-turbo-preview": 16384*2,
    "gpt-3.5-turbo-16k": 16384,
    "gpt-4": 8192,
    "gpt-3.5-turbo": 4096,
    "gpt-3.5-turbo-instruct": 4096,
    "text-davinci-003": 4096,
    "text-davinci-002": 4096,
    "text-davinci-001": 2049,
    "text-curie-001": 2049,
    "text-babbage-001": 2049,
    "text-ada-001": 2049,
    "davinci": 2049,
    "curie": 2049,
    "babbage": 2049,
    "ada": 2049
}

#Does a model only support the newer chat API and not the continuations API?
def is_chat_model(model):
    return ("gpt-4" in model) or ("gpt-3.5-turbo" in model) and ("gpt-3.5-turbo-instruct" not in model)

#Calculate the number of tokens for a string
def num_tokens_from_string(string: str, model: str) -> int:
    """Returns the number of tokens in a text string."""
    if not model in tiktoken_encodings:
        raise Exception(f"Tiktoken encoding unknown for LLM: {model}")
    encoding = tiktoken_encodings[model]
    num_tokens = len(encoding.encode(string))
    return num_tokens

# Queries an LLM for continuations of a batch of prompts given as a list
def query_LLM_batch(model, prompt_batch, max_tokens, use_cache=None, temperature=None,system_message=None,stop=None):
    if use_cache is None:
        use_cache=False
    cache_key=model.join(prompt_batch).encode('utf-8')
    if use_cache:
        cached_result=load_cached(cache_key)
        if cached_result is not None:
            return cached_result

    #choose whether to use the chat API or the older query API
    if is_chat_model(model):
        if system_message is None:
            system_message = "You are a helpful assistant."

        # each batch in the prompt becomes its own asynchronous chat completion request
        async def batch_request(prompt_batch):
            tasks=[]
            for prompt in prompt_batch:
                messages = [
                    {"role": "system", "content": ""},
                    {"role": "user", "content": prompt},
                ]
                tasks.append(client_async.chat.completions.create(
                    model=model,
                    messages=messages,
                    temperature=temperature,
                    max_tokens=max_tokens,
                    n=1,  # one completion per prompt
                    stop=stop,
                    frequency_penalty=0.0,
                    presence_penalty=0.0,
                ))
            return await asyncio.gather(*tasks)

        loop = asyncio.get_event_loop()
        responses = loop.run_until_complete(batch_request(prompt_batch))
        continuations = [response.choices[0].message.content.strip() for response in responses]

        # before we return the continuations, ensure that we don't violate OpenAI's rate limits
        total_tokens = 0
        for prompt in prompt_batch:
            total_tokens += num_tokens_from_string(string=system_message, model=model)
            total_tokens += num_tokens_from_string(string=prompt, model=model)
        for continuation in continuations:
            total_tokens += num_tokens_from_string(string=continuation, model=model)
        max_tokens_per_minute = 90000  # currently imposed limit for ChatGPT models
        wait_seconds = (total_tokens / max_tokens_per_minute) * 60.0
        #print(f"Waiting {wait_seconds} seconds to ensure staying within rate limit")
        time.sleep(wait_seconds)

    else:
        # The old completions API supports batched prompts out-of-the-box
        openai.api_key = os.getenv("OPENAI_API_KEY")
        response = client.completions.create(
            model=model,
            prompt=prompt_batch,
            temperature=temperature,
            max_tokens=max_tokens,
            top_p=1.0,
            frequency_penalty=0.0,
            presence_penalty=0.0,
            stop=stop,
            n=1  # one completion per prompt
        )
        # extract continuations
        continuations = [choice.text for choice in response.choices]

        # before we return the continuations, ensure that we don't violate OpenAI's rate limits
        total_tokens = 0
        for prompt in prompt_batch:
            total_tokens += num_tokens_from_string(string=prompt, model=model)
        for continuation in continuations:
            total_tokens += num_tokens_from_string(string=continuation, model=model)
        max_tokens_per_minute = 90000  # currently imposed limit for ChatGPT models
        wait_seconds = (total_tokens / max_tokens_per_minute) * 60.0
        #print(f"Waiting {wait_seconds} seconds to ensure staying within rate limit")
        time.sleep(wait_seconds)

    if use_cache:
        cache(key=cache_key,value=continuations)
    return continuations


#Query LLM with a list of prompts
def query_LLM(model, prompts, max_tokens, use_cache=None, temperature=None, system_message=None,stop=None):
    #Query the LLM in batches
    continuations=[]
    batch_size = 10  # The exact max batch_size for each model is unknown. This seems to work for all, and provides a nice speed-up.
    N = len(prompts)
    for i in range(0, N, batch_size):
        prompt_batch=prompts[i:min([N, i + batch_size])]
        continuations+=query_LLM_batch(model=model,
                                 prompt_batch=prompt_batch,
                                 max_tokens=max_tokens,
                                 use_cache=use_cache,
                                 temperature=temperature,
                                 system_message=system_message,
                                 stop=stop)
        print_progress_bar(min([N, i + batch_size]), N,printEnd="")
    return continuations



def embed(texts,use_cache=None,model=None):
    if model is None:
        model = "text-embedding-3-small"
    if use_cache is None:
        use_cache = True
    cache_key=(model+("".join(texts))).encode('utf-8')
    if use_cache:
        cached_result=load_cached(cache_key)
        if cached_result is not None:
            print("Loaded embeddings from cache, hash", cache_hash(cache_key))
            return cached_result


    #query embeddings from the API
    texts=[json.dumps(s) for s in texts]  #make sure we escape quotes in a way compatible with GPT-3 API's internal use of json
    openai.api_key = os.getenv("OPENAI_API_KEY")
    batch_size = 32
    N = len(texts)

    embed_matrix=[]
    for i in range(0, N, batch_size):
        print_progress_bar(i, N)
        embed_batch=texts[i:min([N, i + batch_size])]
        embeddings = client.embeddings.create(input=embed_batch, model=model)
        print(embeddings)
        for j in range(len(embed_batch)):
            embed_matrix.append(embeddings.data[j].embedding)
    print("")
    embed_matrix=np.array(embed_matrix)
    #dim = len(embeddings['data'][0]['embedding'])
    #embed_matrix = np.zeros([N, dim])
    #for i in range(N):
    #    embed_matrix[i, :] = embeddings['data'][i]['embedding']

    #update cache
    if use_cache:
        cache(cache_key,embed_matrix)

    #return results
    return embed_matrix


def reduce_embedding_dimensionality(embeddings,num_dimensions,method="UMAP",use_cache=True,n_neighbors=None):
    if isinstance(embeddings,list):
        #embeddings is a list of embedding matrices => pack all to one big matrix for joint dimensionality reduction
        all_emb = np.concatenate(embeddings, axis=0)
    else:
        all_emb = embeddings
    def unpack(x,embeddings_list):
        row = 0
        result = []
        for e in embeddings_list:
            N = e.shape[0]
            result.append(x[row:row + N])
            row += N
        return result

    cache_key=(str(all_emb.tobytes())+str(num_dimensions)+method+str(n_neighbors)).encode('utf-8')
    if use_cache:
        cached_result=load_cached(cache_key)
        if cached_result is not None:
            print("Loaded dimensionality reduction results from cache, hash ", cache_hash(cache_key))
            if isinstance(embeddings, list):
                return unpack(cached_result,embeddings)
            else:
                return cached_result
    from sklearn.manifold import MDS
    from sklearn.manifold import TSNE
    import umap
    from sklearn.decomposition import PCA
    #cosine distance
    all_emb=all_emb/np.linalg.norm(all_emb,axis=1,keepdims=True)

    if method=="MDS":
        mds=MDS(n_components=num_dimensions,dissimilarity="precomputed")
        cosine_sim = np.inner(all_emb, all_emb)
        cosine_dist = 1 - cosine_sim
        x=mds.fit_transform(cosine_dist)
    elif method=="TSNE":
        tsne=TSNE(n_components=num_dimensions)
        x=tsne.fit_transform(all_emb)
    elif method=="PCA":
        pca=PCA(n_components=num_dimensions)
        x=pca.fit_transform(all_emb)
    elif method=="UMAP":
        if n_neighbors is None:
            n_neighbors=5
        reducer = umap.UMAP(n_components=num_dimensions,metric='cosine',n_neighbors=n_neighbors)
        x=reducer.fit_transform(all_emb)
    else:
        raise Exception("Invalid dimensionality reduction method!")

    if use_cache:
        cache(cache_key,x)

    if isinstance(embeddings, list):
        return unpack(x,embeddings)
    return x


#some quick test methods
def test_embeddings():
    texts=["queen","king","man","woman"]
    embeddings=embed(texts)
    embeddings=reduce_embedding_dimensionality(embeddings,method="PCA",num_dimensions=2)
    df=pd.DataFrame()
    df["texts"]=texts
    df["x"]=embeddings[:,0]
    df["y"]=embeddings[:,1]
    px.scatter(df,
                #width=1300, height=1000, #The codes should be approximately 1:1 aspect ratio, but need space for the color bar
                x="x",
                y="y",
                hover_name="texts")

def test_batched_prompting():
    prompts=["what is 1+1?","what is 1+2?","what is 1+3?"]
    print(query_LLM("davinci-002",prompts,max_tokens=20))


### Basic few-shot prompting of game ideas
The code cell below simply prompts the LLM to continue the list of game ideas.

The code is hidden by default to make the interface cleaner. You can show it by clicking on the "Show code".

**Exercise:**
1. Run the code by clicking on the triangle. Repeat this a few times to see how the results are different every time.
2. Change the model to gpt-3.5-turbo-instruct using the drop-down menu. This is the same model as ChatGPT, but for continuing text instead of chat. How do the results change?

**What you should observe:**
* Especially with davinci-002, quality decreases with each idea, as  generated ideas become new few-shot examples for the next ideas and random errors gradually accumulate and throw the model off the rails.
* gpt-3.5-turbo-instruct obeys instructions better than davinci-002, but the generated ideas are more repetitive and generic.

**Why is davinci-002 less reliable but more diverse?**

The davinci-002 is the so-called "base" OpenAI model trained on very large and diverse data. gpt-3.5-turbo-instruct has been finetuned to follow instructions, which tends to increase generation quality but reduce diversity and produce the recognizably boring voice of ChatGPT. Finetuning is typically formulated as an optimization problem for which the optimal solution is to always give the single best answer to a particular request or question. The finetuning datasets are also smaller, which inevitably limits diversity.

In [None]:
model="davinci-002" # @param ["davinci-002", "gpt-3.5-turbo-instruct"] {allow-input: false}

#Define the prompt
prompt="""A list of novel and experimental indie game ideas:

---

An FPS game where time only moves when the player moves or performs actions. This allows Matrix-style slow-motion gun ballet, and transforms real-time action into a puzzle.

---

A Sokoban-style game where the player pushes around blocks that are variables, operators, and definitions of a programming language. Blocks that connect to each other form statements that allow the player to define and alter the game's rules. For example, the player can connect "floor", "is", and "lava" to make the floor deadly.

---

"""

#Query the OpenAI completions API
response = client.completions.create(
  model=model,
  prompt=prompt,
  temperature=1,
  max_tokens=500,
)

#Print the response
print(response.choices[0].text)




Super-Prank:

A game where the player must annoying and frustratingly sabotage his environment all while trying to maintain a level head and patience. Levels escalate in complexity with an objective that usually takes the form of "repair [homeowner vandalism]".

---

A game where exploration is key as the player wanders through a series of apartments looking for clues about these flatmates. The longer each day goes without seeing a fellow tenant awake, the more concerned they become about their well-being. Extremely easy to add social anxiety sim elements to this one.

---

A glossy, super-polished game about fulfilling dreams, backed by a heartwrenching soundtrack that allows the player to tell their awesome story. The dream being fulfilled at any given time is fully under the player's control, which perverse the suspense of watching a friend do that one thing you've always wanted to do and they invite you along for the ride. It's a massive track list of will you/won't you prank momen

### Advanced few-shot prompting
The code cell below prompts each idea separately, extracting only the first idea from each generated result. This prevents the LLM from using previously generated ideas as few-shot examples. Thus, the quality of results does not decrease when generating multiple ideas.

The code is hidden by default to make the interface cleaner. You can show it by clicking on the "Show code".

**Exercise:**

1. Run the code by clicking on the triangle. Repeat this a few times to see how the results are different every time.
2. Change the examples to describe your own favorite games. You can also try changing the prompt start, for example, use different adjectives such as "funny" or "vulgar" instead of "novel" and "experimental". If games are not your thing, skip this step.
3. Change the prompt_start and examples to generate something else such as story ideas, book opening sentences, or book quotes. If you have difficulty articulating the examples, try googling for best book opening sentences or Amazon Kindle's most highlighted passages, or try generating 6 word stories based on examples from [the Six Word Stories subreddit](https://www.reddit.com/r/sixwordstories/top/?t=all). Legend has it that Ernest Hemingway won a bet by writing the six-word story “For sale: baby shoes. Never worn.”



In [None]:
#parameters for the user
model="davinci-002" # @param ["davinci-002", "gpt-3.5-turbo-instruct"] {allow-input: false}
prompt_start="A list of novel experimental indie game ideas:" # @param {type:"string"}
example1="An FPS game where time only moves when the player moves or performs actions. This allows Matrix-style slow-motion gun ballet, and transforms real-time action into a puzzle." # @param {type:"string"}
example2="A Sokoban-style game where the player pushes around blocks that are variables, operators, and definitions of a programming language. Blocks that connect to each other form statements that allow the player to define and alter the game's rules. For example, the player can connect \"floor\", \"is\", and \"lava\" to make the floor deadly." # @param {type:"string"}
example3="An FPS puzzle game with a \"portal gun\" that can create portals between two flat planes. For example, the player can create one portal on the floor and the second on the ceiling and push an object so that it falls down the floor portal and drops from the ceiling portal on top of some target." # @param {type:"string"}
n_generated=2 #@param {type:"slider", min:1, max:10, step:1}
max_tokens=400 #@param {type:"slider", min:50, max:500, step:10}
examples=[example1,example2,example3] #for convenience, convert the separate examples to a list

#define what separates each list item in the prompt
list_delimiter="---"

#Helper function: Construct a list prompting prompt based on an initial start string and examples
def construct_prompt(prompt_start,examples):
  prompt=prompt_start
  for example in examples:
    if len(example)>0 and (not str.isspace(example)):
      prompt+="\n\n"+list_delimiter+"\n\n"+example
  prompt+="\n\n"+list_delimiter+"\n\n" #add one more delimiter so that the LLM will then add the next list item
  return prompt

#Helper function: Generate multiple continuations, returned as a list of strings
def generate_multiple(model,prompt,max_tokens,n):
  response = client.completions.create(
    model=model,
    prompt=prompt,
    temperature=1,
    max_tokens=max_tokens,
    stop=list_delimiter,  #stop after we encounter the list delimiter => only generate one idea per prompt
    n=n #here, we generate multiple alternative continuations
  )
  return [choice.text.strip() for choice in response.choices] #strip trailing/leading spaces

#construct the prompt
prompt=construct_prompt(prompt_start,examples)
print("Prompt:\n")
print(prompt)

#use the prompt to generate multiple list items
items=generate_multiple(model=model,
                        prompt=prompt,
                        max_tokens=max_tokens,
                        n=n_generated)

#print results
print("Generated list items:")
for i in range(n_generated):
  print("\n\n"+list_delimiter+"\n\n")
  print(items[i])



Prompt:

A list of novel experimental indie game ideas:

---

An FPS game where time only moves when the player moves or performs actions. This allows Matrix-style slow-motion gun ballet, and transforms real-time action into a puzzle.

---

A Sokoban-style game where the player pushes around blocks that are variables, operators, and definitions of a programming language. Blocks that connect to each other form statements that allow the player to define and alter the game's rules. For example, the player can connect "floor", "is", and "lava" to make the floor deadly.

---

An FPS puzzle game with a "portal gun" that can create portals between two flat planes. For example, the player can create one portal on the floor and the second on the ceiling and push an object so that it falls down the floor portal and drops from the ceiling portal on top of some target.

---


Generated list items:


---


A game about managing a pantheon of deities who yo-yo to inspire/enrage/milk the population (

### For further analysis, generate 50 ideas with both the base model and finetuned model.

To minimize API cost, the default is that we load pre-generated ideas. Uncheck the "load_generated" checkbox to generate new ideas.

In [None]:
load_pregenerated=True #@param {type: "boolean"}
ideas_per_model=50
print(f"Generating {2*ideas_per_model} ideas using the following prompt:\n\n")
print(prompt)
if load_pregenerated:
  df=pd.read_csv("https://drive.google.com/uc?export=download&id=19QCJfmJNiExRNlEYEHzGXFlgytAAszim")
else:
  #use the prompt to generate new list items
  items=generate_multiple(model="davinci-002",
                        prompt=prompt,
                        max_tokens=max_tokens,
                        n=ideas_per_model)
  items+=generate_multiple(model="gpt-3.5-turbo-instruct",
                        prompt=prompt,
                        max_tokens=max_tokens,
                        n=ideas_per_model)

  #convert the items to a Pandas DataFrame (basically, an Excel sheet)
  df=pd.DataFrame()
  df["items"]=items
  df["model"]="davinci-002"
  df["model"][ideas_per_model:]="gpt-3.5-turbo-instruct"
  df.to_csv("generated_items.csv")
df_all=df.copy()
df_all

Generating 100 ideas using the following prompt:


A list of novel experimental indie game ideas:

---

An FPS game where time only moves when the player moves or performs actions. This allows Matrix-style slow-motion gun ballet, and transforms real-time action into a puzzle.

---

A Sokoban-style game where the player pushes around blocks that are variables, operators, and definitions of a programming language. Blocks that connect to each other form statements that allow the player to define and alter the game's rules. For example, the player can connect "floor", "is", and "lava" to make the floor deadly.

---

An FPS puzzle game with a "portal gun" that can create portals between two flat planes. For example, the player can create one portal on the floor and the second on the ceiling and push an object so that it falls down the floor portal and drops from the ceiling portal on top of some target.

---




Unnamed: 0.1,Unnamed: 0,items,model
0,0,A fusion of Pippen Bar\n\nHere's an interestin...,davinci-002
1,1,A puzzle platformer with portals that function...,davinci-002
2,2,An RPG where the player can pursue a quest lin...,davinci-002
3,3,An early idea. A dungeon game with a reactive ...,davinci-002
4,4,A point & click adventure game where the playe...,davinci-002
...,...,...,...
95,95,"A survival game set in an open-world, procedur...",gpt-3.5-turbo-instruct
96,96,A narrative exploration game set in a haunted ...,gpt-3.5-turbo-instruct
97,97,A virtual reality escape room game where every...,gpt-3.5-turbo-instruct
98,98,"A survival game set in a constantly shifting, ...",gpt-3.5-turbo-instruct


### Visualizing generations using embeddings (Not implemented yet, coming in the next update of this notebook)
Here, we demonstrate how embedding vectors can be used for text data visualization.

**What are embedding vectors?**

Modern language models can be used to map any piece of text into real-valued vectors (arrays of floating point values) that encode the meaning of the text in an abstract "embedding space". Similar texts will produce similar embedding vectors (typically, in terms of cosine similarity).

**Exercise:**

Browse the scatter plot using mouse. What kind of clusters you notice? Are the ideas on the edges of the graph more weird or interesting than the ideas in the center? Are there any differences between the davinci-002 and gpt-3.5-turbo-instruct?

You can try different diminsionality reduction methods to see different clusters form. UMAP often considered as the most advanced method, but calculating it can take a long time.

In [None]:
dimensionality_reduction="MDS" # @param ["PCA", "MDS", "UMAP"] {allow-input: false}
df=df_all.copy() #use a copy of the dataframe we generated or loaded above
items=df["items"].to_list()

#calculate embeddings using our OpenAI API helper
embeddings=embed(items)

#Reduce the dimensionality to 2d for visualization
embeddings=reduce_embedding_dimensionality(embeddings,
                                           method=dimensionality_reduction,
                                           num_dimensions=2)

hover_texts=["</br>".join(textwrap.wrap(item,width=60)) for item in items]
df["Hover"]=hover_texts
df["x"] = embeddings[:,0]
df["y"] = embeddings[:,1]

# Plot
px.scatter(df,
             width=800, height=800, #The codes should be approximately 1:1 aspect ratio, but need space for the color bar
             x="x",
             y="y",
             hover_name="Hover",
             color = "model")



CreateEmbeddingResponse(data=[Embedding(embedding=[-0.010645415633916855, 0.03745855391025543, -5.656026769429445e-05, -0.012749548070132732, -0.008832368068397045, -0.027212342247366905, 0.023935550823807716, 0.045209746807813644, 0.008241879753768444, 0.0162342581897974, -0.014853681437671185, -0.04750516638159752, -0.010121461004018784, -0.017714636400341988, 0.012857665307819843, 0.0028671929612755775, -0.032884351909160614, -0.03742528706789017, 0.0005426667048595846, 0.008125445805490017, 0.008196137845516205, 0.008508015424013138, -0.009647407568991184, 0.004965088330209255, -0.007530799601227045, -0.029008757323026657, 0.01876254379749298, 0.06507010012865067, -0.005044097080826759, -0.022288838401436806, 0.030555669218301773, -0.030788537114858627, 0.004033614415675402, -0.03093823790550232, 0.026197701692581177, -0.002228883793577552, 0.005809236317873001, 0.022438539192080498, -0.009622457437217236, -0.036793217062950134, -0.014213292859494686, -0.04883584380149841, 0.006915

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



CreateEmbeddingResponse(data=[Embedding(embedding=[-0.04655182734131813, 0.03717085346579552, -0.052957117557525635, -0.00321525358594954, -0.0346490852534771, 0.01392015628516674, 0.015080169774591923, -0.004712552763521671, 0.03167339786887169, -0.01787933148443699, 0.026629865169525146, -0.013050146400928497, -0.01135425828397274, -0.07524954527616501, -0.007905741222202778, 0.026251599192619324, -0.012331442907452583, -0.012678186409175396, -0.011341649107635021, 0.03850739076733589, 0.005210602190345526, 0.0033035154920071363, -0.033690813928842545, -0.029479462653398514, 0.03406907990574837, -0.035657793283462524, -0.022721124812960625, 0.026226382702589035, 0.009967286139726639, 0.036767370998859406, 0.006209852639585733, -0.020779363811016083, -0.003508409019559622, -0.013983200304210186, 0.00018223709776066244, 0.038204777985811234, -0.03237949311733246, -0.021611547097563744, -0.02768900617957115, -0.001963826362043619, 0.0001101303132600151, -0.013176235370337963, 0.00609637



### Combinatorial few-shot prompting
A common creativity method for humans is to pick random elements and recombine them. For example, the VNA game ideation method combines a random Verb, Noun, and Adjective.

This can help one think outside the box, and it can also be useful to force an LLM to generate more diverse ideas, especially when using a finetuned model that by default produces high quality but less diverse generations.

Here, we extend few-shot prompting with randomly combined modifiers: We prompt game ideas by *systematically combining genres and mechanics*.


**Exercises:**
* Run the code a few times and try different models. Try adding a third modifier, e.g., "Single Player, Multiplayer".

* Change the prompt start, modifiers, and examples to generate different kind of games. For example, you could try combining classic games such as Space Invaders and Pac Man with different emotions such as sadness, love, grief, break-up. For instance, a Space Invaders + Break-up could have the enemies as photos of the player and their ex, and when the player shoots them, they get deleted from their photos folder. The shields could be frieds of the player and their ex who become collateral damage in a break-up. To get some seed ideas, you can try to continue prompts like "My experimental game combines Space Invaders with the emotion of Love. The player" in the OpenAI Playground.

* Modify the prompt start, examples, and modifiers to prompt spell names for a game where each spell combines different elements such as earth, wind, fire, and blood.

**Important:** Separate the modifiers using commas. When changing the examples, remember to start each example with a modifier combination.

In [None]:
# @title
model="davinci-002" # @param ["davinci-002", "gpt-3.5-turbo-instruct"] {allow-input: false}
prompt_start="A list of novel experimental indie game ideas:" # @param {type:"string"}
modifiers_1="FPS,Puzzle,Dating sim,Platformer" # @param {type:"string"}
modifiers_2="Time manipulation,Rule manipulation,Camera manipulation" # @param {type:"string"}
modifiers_3="" # @param {type:"string"}
modifiers_4="" # @param {type:"string"}
example1="FPS + Time manipulation: An FPS game where time only moves when the player moves or performs actions. This allows Matrix-style slow-motion gun ballet, and transforms real-time action into a puzzle." # @param {type:"string"}
example2="Puzzle + Rule manipulation: A Sokoban-style game where the player pushes around blocks that are variables, operators, and definitions of a programming language. Blocks that connect to each other form statements that allow the player to define and alter the game's rules. For example, the player can connect \"floor\", \"is\", and \"lava\" to make the floor deadly." # @param {type:"string"}
example3="Platformer + Camera manipulation: An FPS puzzle game with a \"portal gun\" that can create portals between two flat planes. For example, the player can create one portal on the floor and the second on the ceiling and push an object so that it falls down the floor portal and drops from the ceiling portal on top of some target." # @param {type:"string"}
n_generated=4 #@param {type:"slider", min:1, max:100, step:1}

#define what separates each list item in the prompt
list_delimiter="---"

#Helper function: Construct the prompt with examples
def construct_prompt(prompt_start,examples):
  prompt=prompt_start
  for example in examples:
    if len(example)>0 and (not str.isspace(example)):
      prompt+="\n\n"+list_delimiter+"\n\n"+example
  prompt+="\n\n"+list_delimiter+"\n\n"
  return prompt

#Construct a list of all possible combinations of the modifiers
modifiers=[modifiers_1,modifiers_2,modifiers_3,modifiers_4]
combined_items=[]
for modifier in modifiers:
  if len(modifier)>0:
    combined_items.append(modifier.split(","))
combinations = list(itertools.product(*combined_items))

#If there's more combinations than the user requests, take a random subset
if n_generated < len(combinations):
  np.random.shuffle(combinations)
  combinations=combinations[:n_generated]

#Format the combinations as strings that can be added to the prompt
combination_strings=[" + ".join(combination)+":" for combination in combinations]

#Construct the prompts
prompt_base=construct_prompt(prompt_start,examples)
prompts=[prompt_base+combination_string for combination_string in combination_strings]

#Generate items. Here, instead of using the OpenAI API directly,
#we use our own helper for batches of multiple prompts
items=query_LLM(model=model,
                prompts=prompts,
                max_tokens=max_tokens,
                stop=list_delimiter)

#Add the combination strings back to the generated items for readability
for i,comb in enumerate(combination_strings):
  items[i]=comb+items[i]

#Print results
for item in items:
  print(item+"\n\n")

#Also show the results as a data table, which may be more readable
df=pd.DataFrame()
df["items"]=items
df

 |████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0% 
FPS + Camera manipulation: Being able to take a picture of an enemy's body or claw to reveal their weaknesses. Being able to take a picture of an item, and use the camera to find where that item can be found in location


Platformer + Time manipulation: A hack-based platformer where the player cannot die but his abilities will become progressively more limited until he hits a "right time right place" state where he has one last chance to complete a very short sequence of actions. The time manipulation consists of reversing his forwards progress and speeding up progression through the level, so that the player gets more control over time's movement in real-time. This lets the player walk over spike after midnight and then activate special blocks and enemies to bring down the spike while reverse-replaying twice as fast and ending with the spike underneath his feet.




FPS + T

Unnamed: 0,items
0,FPS + Camera manipulation: Being able to take ...
1,Platformer + Time manipulation: A hack-based p...
2,FPS + Time manipulation: OPPOSITE TIME (8.5/10...
3,Platformer + Rule manipulation: Items scattere...
