# Zero and Few Shot Prompting

Writing good prompts for large language models is a combination of art, science and, perhaps a bit of wizardry. As you
know, the model is a statistical one, taking numeric input sequences and creating an output sequence to match based on
the (a) pretrained data, (b) the input sequence itself, and (c) the various parameters we can use when calling the
model. In this lecture I want to introduce you to three different prompting strategies. I caution that there are many
more strategies out there, but these two form a nice foundation for current practices and they work well with llama 2.


## Zero Shot Prompting

You've actually seen zero shot prompting many times in this course -- it's simply giving the model a single set of
instructions to respond to, and letting the model use it's pretrained weights. And actually, calling the inputs as
"instructions" is questionable at best -- many of us are use to the chat gpt conversational style, but the model is
actually just doing output sequence prediction. So zero shot prompting is giving an input sequence and taking from the
model a predicted appropriate output sequence.

Throughout this lecture I'm going to use a really authentic task for myself -- you see, I teach a lot of programming in
my day to day job at the University of Michigan and here on the Coursera platform. This often coveres topics in the
areas of python, data science, and applied AI. One of the things I'd like to do is give students more quick questions to
test their knowledge. But, coming up with good questions is difficult, and then I have to type them all out and enter
them into a quizzing tool to be delivered to the learner. Most of these tools can take JSON formatted questions, but I
find typing out JSON documents as slow and error prone. So let's see if we can build a conversational agent to help.


In [1]:
# Let's setup our lamma model, and this time I'm going to bump up the context
# window a bit. This can slow things down, but also will result in more output
# tokens being sent back to us.
import os
from llama_cpp import Llama
from llama_cpp.llama_types import *

model: Llama = Llama(
    model_path=os.environ["LLAMA_13B"], verbose=False, n_ctx=2048
)

In [2]:
# Now, in zero shot prompting we're just asking the language model to
# continue our text it's pretrained weights. Since I want to generate
# some python 3 lambda questions in JSON, this seems like a good
# starting point!

prompt = "Python 3 lambda question in JSON:"

# Now let's watch the results. Remember you need to increase the
# max_tokens as well as the context window or llama.cpp will cut
# off the reply
for response in model.create_completion(prompt, max_tokens=2048, stream=True):
    result = response["choices"][0]
    print(result["text"], end="")

 how can I do this?

I am new to Python and I have a question about the lambda function.
In my code, I need to do something like:

\begin{code}
def parse_json(in_json):
    print json.dumps({"a": {lambda x: x[0]}}, indent=2)

parse_json('["a"]')
\end{code}

The output is:

\begin{code}
[{'a': <function __main__.lambda at 0x13fc8e75>}]
\end{code}

How to get the desired result? I want "a": 0.

Comment: Please provide a [mcve]

Answer: You need to convert it to string:

\begin{code}
print json.dumps({"a": lambda x: str(x[0])}, indent=2)
\end{code}

Ok, well, that doesn't seem like a completely unreasonable response to the prompt, but it's certainly not what I was
looking for. Let's try another.


In [3]:
# Just a little tweak, trying to write the prompt as if it were
# something that was observed in the training data, e.g. a textbook
prompt = "A good Python 3 lambda question rendered in JSON is "

for response in model.create_completion(prompt, max_tokens=2048, stream=True):
    result = response["choices"][0]
    print(result["text"], end="")

97% humanly readable.
(That is, if you ignore the fact that you would have to be a Python programmer to even understand what it's trying to say.)
"This is one of my favourite examples for illustrating how human-readable Python code can be." — Richard Barrow, creator of the Bloggify project.
The resultant JSON output is rendered in HTML using the beautiful syntax highlighting and other features provided by Prism:
https://github.com/prismjs/prism

## Few Shot Prompting

Large language models are statistical pattern matching machines, and the idea behind few-shot prompting is that we can
give examples in our prompt to help the model tailor its output to what we are looking for. This turns out to be a sort
of very simple super power for prompt engineering, and is very helpful when you want to constrain the responses from a
model to a specific format. Let's see if it helps us here.


In [4]:
# I'm increasing the size of my prompt, but I'm going back to the format I had
# previously. I've intentionally done two things here: (a) a macro pattern, where
# I indicate I'm looking for a question in JSON and I just vary the topic, and
# (b) a format pattern, where I show what I want the output to look like.

prompt = """Python 3 lambda question in JSON:
{"question":"The lambda keyword in python is:","correct_answer":"For declaring anonymous functions","incorrect_answer":"For mathematical operations"}

Python 3 def question in JSON:
{"question":"What does the 'def' keyword do?","correct_answer":"Define a function","incorrect_answer":"Declare variables"}

Python 3 assert question in JSON: 
"""

for response in model.create_completion(prompt, max_tokens=2048, stream=True):
    result = response["choices"][0]
    print(result["text"], end="")

Wow. Instantly we get basically what I was looking for -- a question on the topic of asserts in python 3. Sometimes when
I run this I also get a number of other questions, with the model just continuing the pattern and going through a list
of python keywords and topics and giving me output. I don't always want this, as I don't cover all of the topics it
might generate text for. Let's tweak this a bit more.


In [None]:
# Now I'm just adding the topics at the very beginning. I expect that the model
# is going to recognize the patterns here, seeing the list of topics, are repeated
# in the individual prompts, and that it will follow and just give me results
# for the python 3 assert, with, and import keywords.
prompt = """Topics: lambda, def, assert, with, import.

Python 3 lambda question in JSON:
{"question":"The lambda keyword in python is:","correct_answer":"For declaring anonymous functions","incorrect_answer":"For mathematical operations"}

Python 3 def question in JSON:
{"question":"What does the 'def' keyword do?","correct_answer":"Define a function","incorrect_answer":"Declare variables"}

Python 3 assert question in JSON: 
"""

for response in model.create_completion(prompt, max_tokens=2048, stream=True):
    result = response["choices"][0]
    print(result["text"], end="")

Alright, things are getting exciting and I've just about automated my weekend job! I actually just want the JSON
results, and sometimes (though maybe not in the case in your notebook if you are following along!) the result doesn't
have all of the JSON fields I might want. Remember, everything is tokens and sequences, and the input prompt is a
statistical machine, so there are a couple of things we might tweak further.


In [None]:
# I'm going to include the list of topics at the top, then I'm going to
# use some whitespace formatting on the JSON with newlines to see if this
# helps increase adherence to the format while keeping the semantics.
# everything except the first line of my

prompt = """
{"python_3_topics" = ["lambda", "def", "assert", "with", "import"],questions=[
{
"question":"The lambda keyword in python is:",
"correct_answer":"For declaring anonymous functions",
"incorrect_answer":"For mathematical operations"
},
{
"question":"What does the 'def' keyword do?",
"correct_answer":"Define a function",
"incorrect_answer":"Declare variables"
},
{
"""

for response in model.create_completion(prompt, max_tokens=2048, stream=True):
    result = response["choices"][0]
    print(result["text"], end="")

Ok, I think that's pretty solid. What's really cool about this, I think, is that I'm not conversing with the model, or
trying to prime it to be an expert. I just started a JSON document and it captured both the meaning of what I was doing
-- writing questions with correct and incorrect answers -- and the syntax of what I was doing -- writing well formed
JSON. All this on a quantized 13B parameter model!

This is a great time to jump into the notebooks and experiment a bit yourself to see this in action. Here are a couple
of nice tasks for you to try and practice what you've learned; first, how would you reimplement this code using the
llama 2 chat model?, and second, how would you rewrite the prompts so that there were multiple incorrect answers, all in
a JSON list of their own? Give these a shot in the labs workspace.
