# Basics of Prompt Engineering

This notebook was created by Natasha Patnaik for MIT 15.S60 - Computing in Optimization and Statistics. 

**Last Updated**: January 2026.

In [1]:
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain_ollama import ChatOllama

OLLAMA_MODEL = "llama3.1:8b"

  from pydantic.v1.fields import FieldInfo as FieldInfoV1
  from .autonotebook import tqdm as notebook_tqdm


## Motivation

Without modifying a general-purpose language model's parameters (i.e. - no more training updates), can we adapt it to perform better on a specific downstream task, purely at inference time?

_Prompt engineering_ tries to address this question by carefully structuring the input prompt to be compatible with how the model was originally pre-trained. Since generative language models are typically pre-trained for the next-token prediction task, we want to elicit better outputs by formatting our inputs as clear **task completion** problems.

More generally, _prompt engineering_ is a relatviely newer discipline about designing a collection of tips, tricks, and techniques for typically getting better performance in practice. Importantly, these prompt design principles are not guaranteed to work in all cases. They are empirical, task-dependent, and heuristic in nature, rather than a strict set of prescriptive rules that must be adhered to.

In [2]:
# Create an instance of the ChatOllama class, specifying our model.
llm = ChatOllama(model=OLLAMA_MODEL, temperature=0.2)

# A standard prompt as the baseline
# We'll try and see if we can get better output with some prompt engineering....
baseline_prompt = "How many rs are there in strawberry."
response = llm.invoke(baseline_prompt)
print(response.content)

There are no "rs" in a strawberry. A strawberry is a type of fruit, and it doesn't have any monetary value or currency associated with it.

If you meant to ask something else, please feel free to clarify!


## In-context Learning

Here's the paper from the OpenAI Team that looked at few-shot learning with GPT-3: https://arxiv.org/abs/2005.14165. 

### Few-shot, One-shot, and Zero-shot Prompting

Suppose you have a natural language description of the task you would like the language model to complete. You put this description in the prompt. If your prompt contains only the task description, this is called zero-shot prompting.

If you additionally include one correctly completed example in the prompt, this is referred to as one-shot prompting. Including multiple examples is known as few-shot prompting.

By providing examples of the task being completed correctly in the prompt itself, the language model can generate output that is conditioned on these examples in the input. This pehnomenon is called **in-context learning** - which becomes increasingly pronounced as language models are scaled to larger sizes.

This approach requires orders of magnitude fewer labeled pairs than supervised fine-tuning! Moreover, this inference-time approach doesn't update the model's parameters or architecture in any way, thus enabling it to retain general-purpose capabilities.

In [3]:
# ZERO-SHOT EXAMPLE
zero_shot_prompt = PromptTemplate(
    template="""
Count the number of times the letter 'r' appears in the following phrase:

{phrase}
""",
    input_variables=["phrase"],
)
response = llm.invoke(zero_shot_prompt.format(phrase="strawberry"))
print("Zero-shot answer:")
print(response.content)
print("")

# ONE-SHOT EXAMPLE
one_shot_prompt = PromptTemplate(
    template="""
Count the number of times the letter 'r' appears in the following phrases:

* red rover => 3

{new_phrase} =>
""",
    input_variables=["new_phrase"],
)

response = llm.invoke(one_shot_prompt.format(new_phrase="strawberry"))
print("One-shot answer:")
print(response.content)
print("")

# FEW-SHOT EXAMPLE
few_shot_prompt = PromptTemplate(
    template="""
Count the number of times the letter 'r' appears in the following phrases:

* red rover => 3
* roaring river => 4
* green => 1
* red riding hood => 2
* ferry => 2
* raspberry cream => 4

* {new_phrase} =>
""",
    input_variables=["new_phrase"],
)

response = llm.invoke(few_shot_prompt.format(new_phrase="strawberry"))
print("Few-shot answer:")
print(response.content)

Zero-shot answer:
The letter "r" appears 3 times in the word "strawberry".

One-shot answer:
Let's count the number of times the letter "r" appears in the phrase "strawberry":

1. str- (1 "r")
2. a-w-b-e-r-y (2 more "r"s)

So, the total number of times the letter "r" appears in the phrase "strawberry" is: 3

Few-shot answer:
Let's count the number of times the letter 'r' appears in each phrase:

* strawberry: 3


## Role Indicators

Within the prompt template, you can mark certain sections as belonging to a specific "role", such as an assistant, user, or system instructions. This imposes more structure on conversations, allowing us to easily distinguish between high-level instructions and specific user inputs.

In [4]:
prompt = ChatPromptTemplate.from_messages(
    [("system", "You are a precise assistant that follows patterns. You look at every character in the phrase."),
     ("user",
            """
Count the number of times the letter 'r' appears in the following phrases:

red rover => 3
roaring river => 4
green => 1
red riding hood => 2
ferry => 2
raspberry cream => 4

strawberry =>
"""
        ),
    ]
)

messages = prompt.format_messages()
response = llm.invoke(messages)
print(response.content)

Let's count the number of times the letter 'r' appears in each character of the phrase "strawberry":

s-t-r-a-w-b-e-r-r-y

The letter 'r' appears 3 times.


## Chain of Thought (CoT) Prompting

Here's the paper from Google Research: https://arxiv.org/pdf/2201.11903.

When providing examples of succesfully completed tasks in the prompt, include the intermediate steps used to arrive at the correct/ desired answer. By structuring the examples to expicitly show this reasoning trace, the language model will use this context to generate answers that break down the larger task into simpler steps. This makes generating the correct/ desired next token for each intermediate step more likley, thus resulting in a improved final answer.

You don't necessarily need to include full examples in a few-shot or one-shot structure to leverage the CoT format. You can also take a zero-shot approach without any solved examples, but simply add a "_break down the task step-by-step to arrive at the final answer_" instruction line to encourage intermediate reasoning.

The [original paper](https://arxiv.org/pdf/2201.11903) also states: "*Notably, chain-of-thought reasoning
is an emergent ability of increasing model scale*". This means the benefits of CoT prompting over standard prompting become more apparent for larger language models (i.e. - more billions of parameters).

A nice side-benefit of this approach is that you can also view the reasoning trace to see the steps leading up to the final answer.

In [5]:
cot_few_shot_prompt = PromptTemplate(
    template="""
Count the number of times the letter 'r' appears in the following phrases:

Phrase: red rover
Reasoning:
- r appears in "r-e-d" once.
- r appears in "r-o-v-e-r" twice.
Total = 3

Phrase: ferry
Reasoning:
- r appears in "f-e-r-r-y" twice.
Total = 2

Phrase: green
Reasoning:
- r appears in "g-r-e-e-n" once.
Total = 1

Phrase: raspberry cream
Reasoning:
- r appears in "r-a-s-p-b-e-r-r-y" thrice.
- r appears once in "c-r-e-a-m" once.
Total = 4

Phrase: {new_phrase}
Reasoning:
""",
    input_variables=["new_phrase"],
)

response = llm.invoke(cot_few_shot_prompt.format(new_phrase="strawberry"))
print(response.content)


Based on the provided reasoning, I will count the number of times the letter 'r' appears in each phrase:

1. Phrase: red rover
   Total = 3 (as calculated)

2. Phrase: ferry
   Total = 2 (as calculated)

3. Phrase: green
   Total = 1 (as calculated)

4. Phrase: raspberry cream
   - r appears in "r-a-s-p-b-e-r-r-y" thrice.
   - r appears once in "c-r-e-a-m"
   Total = 4 (as calculated)

5. Phrase: strawberry
   Reasoning:
   - r appears in "s-t-r-w-b-e-r-r-y" thrice.

Total for the phrase "strawberry": 3


## Iterate! Improve your prompts over time.

It helps to experiment with your prompt to see what works best for your application. If you empirically test out different versions of your prompt for the task at hand, and track the outcomes over time, you'll get a sense of what works well in practice. 

Being as specific as possible and leveraging a consistent template/ well-defined structure can help. Additionally, explicit instructions that require some default output or pre-set option if the task is too difficult can help mitigate the risk of hallucination.