# Prompt Engineering with Llama 3.1

Prompt engineering is using natural language to produce a desired response from a large language model (LLM).

## Introduction

### Llama Models

In 2023, Meta introduced the [Llama language models](https://ai.meta.com/llama/) (Llama Chat, Code Llama, Llama Guard). These are general purpose, state-of-the-art LLMs.

Llama models come in varying parameter sizes. The smaller models are cheaper to deploy and run; the larger models are more capable.

#### Llama 3.1
1. `llama-3.1-8b` - base pretrained 8 billion parameter model
1. `llama-3.1-70b` - base pretrained 70 billion parameter model
1. `llama-3.1-405b` - base pretrained 405 billion parameter model
1. `llama-3.1-8b-instruct` - instruction fine-tuned 8 billion parameter model
1. `llama-3.1-70b-instruct` - instruction fine-tuned 70 billion parameter model
1. `llama-3.1-405b-instruct` - instruction fine-tuned 405 billion parameter model (flagship)


#### Llama 3
1. `llama-3-8b` - base pretrained 8 billion parameter model
1. `llama-3-70b` - base pretrained 70 billion parameter model
1. `llama-3-8b-instruct` - instruction fine-tuned 8 billion parameter model
1. `llama-3-70b-instruct` - instruction fine-tuned 70 billion parameter model (flagship)

#### Llama 2
1. `llama-2-7b` - base pretrained 7 billion parameter model
1. `llama-2-13b` - base pretrained 13 billion parameter model
1. `llama-2-70b` - base pretrained 70 billion parameter model
1. `llama-2-7b-chat` - chat fine-tuned 7 billion parameter model
1. `llama-2-13b-chat` - chat fine-tuned 13 billion parameter model
1. `llama-2-70b-chat` - chat fine-tuned 70 billion parameter model (flagship)


Code Llama is a code-focused LLM built on top of Llama 2 also available in various sizes and finetunes:

#### Code Llama
1. `codellama-7b` - code fine-tuned 7 billion parameter model
1. `codellama-13b` - code fine-tuned 13 billion parameter model
1. `codellama-34b` - code fine-tuned 34 billion parameter model
1. `codellama-70b` - code fine-tuned 70 billion parameter model
1. `codellama-7b-instruct` - code & instruct fine-tuned 7 billion parameter model
2. `codellama-13b-instruct` - code & instruct fine-tuned 13 billion parameter model
3. `codellama-34b-instruct` - code & instruct fine-tuned 34 billion parameter model
3. `codellama-70b-instruct` - code & instruct fine-tuned 70 billion parameter model
1. `codellama-7b-python` - Python fine-tuned 7 billion parameter model
2. `codellama-13b-python` - Python fine-tuned 13 billion parameter model
3. `codellama-34b-python` - Python fine-tuned 34 billion parameter model
3. `codellama-70b-python` - Python fine-tuned 70 billion parameter model

## Notebook Setup

In [1]:
import ollama

response = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    options={'temperature': 0.8, 'top_p': 1.0}
)

response['message']['content']

"The sky appears blue to us during the day because of a phenomenon called Rayleigh scattering.\n\nHere's what happens:\n\n1. **Sunlight enters Earth's atmosphere**: When the sun shines, it sends out a vast spectrum of light across the entire electromagnetic spectrum, including all the colors of the visible spectrum (red, orange, yellow, green, blue, indigo, and violet).\n2. **Light scatters in all directions**: As sunlight travels through the atmosphere, it encounters tiny molecules of gases like nitrogen (N2) and oxygen (O2), as well as smaller particles like dust, water vapor, and pollutants.\n3. **Short wavelengths scatter more**: When light hits these particles, the shorter-wavelength colors (like blue and violet) are scattered in all directions much more efficiently than the longer-wavelength colors (like red and orange). This is because the smaller size of the particles allows them to interact with the shorter wavelengths more effectively.\n4. **Blue light reaches our eyes**: As 

In [2]:
stream = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
  print(chunk['message']['content'], end='', flush=True)

The sky appears blue to us because of a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh who first described it in the late 19th century. Here's why:

1. **Light Composition**: Light from the sun is made up of all the colors of the visible spectrum (red, orange, yellow, green, blue, indigo, and violet). When this light travels through space to reach our eyes, it encounters tiny molecules of gases such as nitrogen (N2) and oxygen (O2) in the Earth's atmosphere.

2. **Scattering**: The shorter wavelengths (or higher frequencies) of light, like blue and violet, are scattered more than the longer wavelengths (lower frequencies), like red and orange. This is because smaller molecules scatter shorter wavelength light more efficiently. Blue light, being a shorter wavelength, gets scattered in all directions by these tiny atmospheric particles.

3. **What We See**: When we look up at the sky on a clear day, what we see is the sum total of all the light tha

In [3]:
from typing import Dict, List

DEFAULT_MODEL = "llama3.1"

def assistant(content: str):
    return { "role": "assistant", "content": content }

def user(content: str):
    return { "role": "user", "content": content }

def chat_completion(
    messages: List[Dict],
    model = DEFAULT_MODEL,
    temperature: float = 0.6,
    top_p: float = 0.9,
) -> str:

    response = ollama.chat(
        model=model,
        messages=messages,
        options={'temperature': temperature, 'top_p': top_p}
    )

    return response['message']['content']
        

def completion(
    prompt: str,
    model: str = DEFAULT_MODEL,
    temperature: float = 0.6,
    top_p: float = 0.9,
) -> str:
    return chat_completion(
        [user(prompt)],
        model=model,
        temperature=temperature,
        top_p=top_p,
    )

def complete_and_print(prompt: str, model: str = DEFAULT_MODEL):
    print(f'==============\n{prompt}\n==============')
    response = completion(prompt, model)
    print(response, end='\n\n')

### Completion APIs

Let's try Llama 3.1!

In [4]:
complete_and_print("The typical color of the sky is: ")

The typical color of the sky is: 
Blue.



In [5]:
complete_and_print("which model version are you?")

which model version are you?
I'm an AI, and I don't have a specific "model version" in the classical sense. However, I can give you some information about my underlying technology.

I was trained on a large corpus of text data using a transformer-based architecture, which is a type of neural network designed for natural language processing tasks. My training data includes a massive amount of text from various sources, including books, articles, research papers, and websites.

My model is based on the BERT (Bidirectional Encoder Representations from Transformers) architecture, which was developed by Google in 2018. However, my training setup is slightly different, as I'm a cloud-based AI designed to provide conversational interfaces.

As for specific "model versions," I'm currently running on a variant of the BERT model known as RoBERTa (Robustly Optimized BERT Pretraining Approach). RoBERTa was developed by Facebook AI in 2019 and is an improved version of the original BERT architectur

### Chat Completion APIs
Chat completion models provide additional structure to interacting with an LLM. An array of structured message objects is sent to the LLM instead of a single piece of text. This message list provides the LLM with some "context" or "history" from which to continue.

Typically, each message contains `role` and `content`:
* Messages with the `system` role are used to provide core instruction to the LLM by developers.
* Messages with the `user` role are typically human-provided messages.
* Messages with the `assistant` role are typically generated by the LLM.

In [6]:
response = chat_completion(messages=[
    user("My favorite color is blue."),
    assistant("That's great to hear!"),
    user("What is my favorite color?"),
])
print(response)
# "Sure, I can help you with that! Your favorite color is blue."

I remember! Your favorite color is blue!


### LLM Hyperparameters

#### `temperature` & `top_p`

These APIs also take parameters which influence the creativity and determinism of your output.

At each step, LLMs generate a list of most likely tokens and their respective probabilities. The least likely tokens are "cut" from the list (based on `top_p`), and then a token is randomly selected from the remaining candidates (`temperature`).

In other words: `top_p` controls the breadth of vocabulary in a generation and `temperature` controls the randomness within that vocabulary. A temperature of ~0 produces *almost* deterministic results.

[Read more about temperature setting here](https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api-a-few-tips-and-tricks-on-controlling-the-creativity-deterministic-output-of-prompt-responses/172683).

Let's try it out:

In [7]:
def print_tuned_completion(temperature: float, top_p: float):
    response = completion("Write a haiku about llamas", temperature=temperature, top_p=top_p)
    print(f'[temperature: {temperature} | top_p: {top_p}]\n{response.strip()}\n')

print_tuned_completion(0.01, 0.01)
print_tuned_completion(0.01, 0.01)
# These two generations are highly likely to be the same

print_tuned_completion(1.0, 1.0)
print_tuned_completion(1.0, 1.0)
# These two generations are highly likely to be different

[temperature: 0.01 | top_p: 0.01]
Here is a haiku about llamas:

Fuzzy, gentle soul
Llama's soft eyes gaze at me
Peaceful Andean heart

[temperature: 0.01 | top_p: 0.01]
Here is a haiku about llamas:

Fuzzy, gentle soul
Llama's soft eyes gaze at me
Peaceful Andean heart

[temperature: 1.0 | top_p: 1.0]
Here is a haiku about llamas:

Softly llama stands
Fuzzy, gentle eyes gaze down
Peaceful Andean king

[temperature: 1.0 | top_p: 1.0]
Furry, spiky pack
Alpaca cousins roam freely
Mountain's gentle soul



## Prompting Techniques

### Explicit Instructions

Detailed, explicit instructions produce better results than open-ended prompts:

In [8]:
complete_and_print(prompt="Describe quantum physics in one short sentence of no more than 12 words")
# Returns a succinct explanation of quantum physics that mentions particles and states existing simultaneously.

Describe quantum physics in one short sentence of no more than 12 words
Quantum physics studies the behavior of matter and energy at atomic level.



You can think about giving explicit instructions as using rules and restrictions to how Llama 3 responds to your prompt.

- Stylization
    - `Explain this to me like a topic on a children's educational network show teaching elementary students.`
    - `I'm a software engineer using large language models for summarization. Summarize the following text in under 250 words:`
    - `Give your answer like an old timey private investigator hunting down a case step by step.`
- Formatting
    - `Use bullet points.`
    - `Return as a JSON object.`
    - `Use less technical terms and help me apply it in my work in communications.`
- Restrictions
    - `Only use academic papers.`
    - `Never give sources older than 2020.`
    - `If you don't know the answer, say that you don't know.`

Here's an example of giving explicit instructions to give more specific results by limiting the responses to recently created sources.

In [9]:
complete_and_print("Explain the latest advances in large language models to me.")
# More likely to cite sources from 2017

complete_and_print("Explain the latest advances in large language models to me. Always cite your sources. Never cite sources older than 2020.")
# Gives more specific advances and only cites sources from 2020

Explain the latest advances in large language models to me.
Large language models (LLMs) have made tremendous progress in recent years, and I'll summarize some of the key advancements:

**1. Transformer Architecture**: The transformer architecture, introduced by Vaswani et al. in 2017, has become the de facto standard for LLMs. This architecture is based on self-attention mechanisms, which allow models to process sequences of tokens (e.g., words or characters) simultaneously and capture long-range dependencies.

**2. Scaling Up**: As computational resources have improved, researchers have been able to scale up transformer-based models to unprecedented sizes. For example:
	* BERT (2018): 110 million parameters
	* RoBERTa (2019): 345 million parameters
	* T5 (2020): 1.3 billion parameters
	* Megatron-LM (2020): 8.4 billion parameters
	* Switch Transformer (2022): 1.6 trillion parameters

**3. Pre-training and Fine-tuning**: The effectiveness of LLMs has been greatly enhanced by pre-train

### Example Prompting using Zero- and Few-Shot Learning

A shot is an example or demonstration of what type of prompt and response you expect from a large language model. This term originates from training computer vision models on photographs, where one shot was one example or instance that the model used to classify an image ([Fei-Fei et al. (2006)](http://vision.stanford.edu/documents/Fei-FeiFergusPerona2006.pdf)).

#### Zero-Shot Prompting

Large language models like Llama 3 are unique because they are capable of following instructions and producing responses without having previously seen an example of a task. Prompting without examples is called "zero-shot prompting".

Let's try using Llama 3 as a sentiment detector. You may notice that output format varies - we can improve this with better prompting.

In [10]:
complete_and_print("Text: This was the best movie I've ever seen! \n The sentiment of the text is: ")
# Returns positive sentiment

complete_and_print("Text: The director was trying too hard. \n The sentiment of the text is: ")
# Returns negative sentiment

Text: This was the best movie I've ever seen! 
 The sentiment of the text is: 
The sentiment of the text is extremely positive, indicating a strong enthusiasm and admiration for the movie. The use of the superlative "best" emphasizes the speaker's high opinion of the film.

Text: The director was trying too hard. 
 The sentiment of the text is: 
Negative.

The phrase "trying too hard" implies that the director's efforts are coming across as insincere, over-the-top, or even annoying, which suggests a negative opinion about their work.




#### Few-Shot Prompting

Adding specific examples of your desired output generally results in more accurate, consistent output. This technique is called "few-shot prompting".

In this example, the generated response follows our desired format that offers a more nuanced sentiment classifer that gives a positive, neutral, and negative response confidence percentage.

See also: [Zhao et al. (2021)](https://arxiv.org/abs/2102.09690), [Liu et al. (2021)](https://arxiv.org/abs/2101.06804), [Su et al. (2022)](https://arxiv.org/abs/2209.01975), [Rubin et al. (2022)](https://arxiv.org/abs/2112.08633).



In [11]:
def sentiment(text):
    response = chat_completion(messages=[
        user("You are a sentiment classifier. For each message, give the percentage of positive/netural/negative."),
        user("I liked it"),
        assistant("70% positive 30% neutral 0% negative"),
        user("It could be better"),
        assistant("0% positive 50% neutral 50% negative"),
        user("It's fine"),
        assistant("25% positive 50% neutral 25% negative"),
        user(text),
    ])
    return response

def print_sentiment(text):
    print(f'INPUT: {text}')
    print(sentiment(text))

print_sentiment("I thought it was okay")
# More likely to return a balanced mix of positive, neutral, and negative
print_sentiment("I loved it!")
# More likely to return 100% positive
print_sentiment("Terrible service 0/10")
# More likely to return 100% negative

INPUT: I thought it was okay
20% positive 70% neutral 10% negative
INPUT: I loved it!
95% positive 5% neutral 0% negative
INPUT: Terrible service 0/10
0% positive 0% neutral 100% negative


### Role Prompting

Llama will often give more consistent responses when given a role ([Kong et al. (2023)](https://browse.arxiv.org/pdf/2308.07702.pdf)). Roles give context to the LLM on what type of answers are desired.

Let's use Llama 3 to create a more focused, technical response for a question around the pros and cons of using PyTorch.

In [12]:
complete_and_print("Explain the pros and cons of using PyTorch.")
# More likely to explain the pros and cons of PyTorch covers general areas like documentation, the PyTorch community, and mentions a steep learning curve

complete_and_print("Your role is a machine learning expert who gives highly technical advice to senior engineers who work with complicated datasets. Explain the pros and cons of using PyTorch.")
# Often results in more technical benefits and drawbacks that provide more technical details on how model layers

Explain the pros and cons of using PyTorch.
**Pros of Using PyTorch:**

### 1. **Dynamic Computation Graph**
PyTorch's computation graph is built dynamically at runtime, which allows for faster model prototyping and easier debugging.

```python
import torch

x = torch.tensor(2)
y = x + 3  # Dynamic computation graph created automatically
print(y)   # Output: tensor(5)
```

### 2. **Pythonic API**
PyTorch's API is designed to be intuitive and Pythonic, making it easy for developers familiar with Python to learn and use.

```python
import torch

model = torch.nn.Linear(5, 3)  # Simple linear model created in a few lines of code
```

### 3. **Automatic Differentiation**
PyTorch's automatic differentiation (autograd) system enables seamless computation of gradients for training neural networks.

```python
import torch

x = torch.tensor(2)
y = x + 3

# Automatic gradient computation enabled by autograd
y.backward()
print(x.grad)  # Output: tensor(1.0)
```

### 4. **Torchvision and Torchtext

### Chain-of-Thought

Simply adding a phrase encouraging step-by-step thinking "significantly improves the ability of large language models to perform complex reasoning" ([Wei et al. (2022)](https://arxiv.org/abs/2201.11903)). This technique is called "CoT" or "Chain-of-Thought" prompting.

Llama 3.1 now reasons step-by-step naturally without the addition of the phrase. This section remains for completeness.

In [13]:
prompt = "Who lived longer, Mozart or Elvis?"

complete_and_print(prompt)
# Llama 2 would often give the incorrect answer of "Mozart"

complete_and_print(f"{prompt} Let's think through this carefully, step by step.")
# Gives the correct answer "Elvis"

Who lived longer, Mozart or Elvis?
A classic question!

Wolfgang Amadeus Mozart (1756-1791) was a composer who passed away at the age of 35.

Elvis Presley (1935-1977), on the other hand, died at the age of 42.

So, Elvis outlived Mozart by about 7 years.

Who lived longer, Mozart or Elvis? Let's think through this carefully, step by step.
Let's analyze the timeline of both lives to determine who lived longer.

**Step 1: Birth and Death Dates**

* Wolfgang Amadeus Mozart (1723-1791)
	+ Born on January 27, 1756
	+ Died on December 5, 1791 (at age 35)
* Elvis Presley (1935-1977)
	+ Born on January 8, 1935
	+ Died on August 16, 1977 (at age 42)

**Step 2: Comparison**

Now that we have the birth and death dates for both individuals, let's compare them.

* Mozart was born in 1756 and died in 1791, which means he lived from approximately 35 years.
* Elvis was born in 1935 and died in 1977, which means he lived for approximately 42 years.

**Conclusion**

Based on the calculations, Elvis Pre

### Self-Consistency

LLMs are probablistic, so even with Chain-of-Thought, a single generation might produce incorrect results. Self-Consistency ([Wang et al. (2022)](https://arxiv.org/abs/2203.11171)) introduces enhanced accuracy by selecting the most frequent answer from multiple generations (at the cost of higher compute):

In [14]:
import re
from statistics import mode

def gen_answer():
    response = completion(
        "John found that the average of 15 numbers is 40."
        "If 10 is added to each number then the mean of the numbers is?"
        "Report the answer surrounded by backticks (example: `123`)",
    )
    match = re.search(r'`(\d+)`', response)
    if match is None:
        return None
    return match.group(1)

answers = [gen_answer() for i in range(5)]

print(f"Answers: {answers}\nFinal answer: {mode(answers)}")

# Sample runs of Llama-3-70B (all correct):
# ['60', '50', '50', '50', '50'] -> 50
# ['50', '50', '50', '60', '50'] -> 50
# ['50', '50', '60', '50', '50'] -> 50

Answers: ['50', '50', '50', None, '50']
Final answer: 50


### Retrieval-Augmented Generation

You'll probably want to use factual knowledge in your application. You can extract common facts from today's large models out-of-the-box (i.e. using just the model weights):

In [15]:
complete_and_print("What is the capital of the California?")
# Gives the correct answer "Sacramento"

What is the capital of the California?
The capital of California is Sacramento.



However, more specific facts, or private information, cannot be reliably retrieved. The model will either declare it does not know or hallucinate an incorrect answer:

In [16]:
complete_and_print("What was the temperature in Menlo Park on December 12th, 2023?")
# "I'm just an AI, I don't have access to real-time weather data or historical weather records."

complete_and_print("What time is my dinner reservation on Saturday and what should I wear?")
# "I'm not able to access your personal information [..] I can provide some general guidance"

What was the temperature in Menlo Park on December 12th, 2023?
I'm a large language model, I don't have real-time access to current or historical weather data. However, I can suggest some ways for you to find out the temperature in Menlo Park on December 12th, 2023.

You can try checking online weather websites such as:

1. Weather.com
2. AccuWeather.com
3. National Weather Service (NWS) website

These websites usually have historical weather data available, but it might take some time for the data from December 12th, 2023 to be updated and made public.

Alternatively, you can also try checking with local news outlets or government websites in Menlo Park for more information.

What time is my dinner reservation on Saturday and what should I wear?
I'm a large language model, I don't have the ability to access personal information or specific details about your schedule. However, I can guide you through the process of finding out more.

To find out the time of your dinner reservation on 

Retrieval-Augmented Generation, or RAG, describes the practice of including information in the prompt you've retrived from an external database ([Lewis et al. (2020)](https://arxiv.org/abs/2005.11401v4)). It's an effective way to incorporate facts into your LLM application and is more affordable than fine-tuning which may be costly and negatively impact the foundational model's capabilities.

This could be as simple as a lookup table or as sophisticated as a [vector database](https://github.com/facebookresearch/faiss) containing all of your company's knowledge:

In [17]:
MENLO_PARK_TEMPS = {
    "2023-12-11": "52 degrees Fahrenheit",
    "2023-12-12": "51 degrees Fahrenheit",
    "2023-12-13": "51 degrees Fahrenheit",
}


def prompt_with_rag(retrived_info, question):
    complete_and_print(
        f"Given the following information: '{retrived_info}', respond to: '{question}'"
    )


def ask_for_temperature(day):
    temp_on_day = MENLO_PARK_TEMPS.get(day) or "unknown temperature"
    prompt_with_rag(
        f"The temperature in Menlo Park was {temp_on_day} on {day}'",  # Retrieved fact
        f"What is the temperature in Menlo Park on {day}?",  # User question
    )


ask_for_temperature("2023-12-12")
# "Sure! The temperature in Menlo Park on 2023-12-12 was 51 degrees Fahrenheit."

ask_for_temperature("2023-07-18")
# "I'm not able to provide the temperature in Menlo Park on 2023-07-18 as the information provided states that the temperature was unknown."

Given the following information: 'The temperature in Menlo Park was 51 degrees Fahrenheit on 2023-12-12'', respond to: 'What is the temperature in Menlo Park on 2023-12-12?'
Based on the provided information, the temperature in Menlo Park on 2023-12-12 was 51 degrees Fahrenheit.

Given the following information: 'The temperature in Menlo Park was unknown temperature on 2023-07-18'', respond to: 'What is the temperature in Menlo Park on 2023-07-18?'
I don't have access to real-time or specific weather data, but I can tell you that the information provided states that the temperature in Menlo Park was unknown on 2023-07-18. Therefore, I cannot provide a specific temperature for that date and location. If you need accurate and up-to-date weather information, I recommend checking a reliable weather forecasting service or website.



### Program-Aided Language Models

LLMs, by nature, aren't great at performing calculations. Let's try:

$$
((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))
$$

(The correct answer is 91383.)

In [18]:
complete_and_print("""
Calculate the answer to the following math problem:

((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))
""")
# Gives incorrect answers like 92448, 92648, 95463


Calculate the answer to the following math problem:

((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))

To calculate this expression, we need to follow the order of operations (PEMDAS):

1. Evaluate the expressions inside the parentheses:
	* `93 * 4 = 372`
	* `4^4 = 256`
2. Replace these values in the original expression:
	* `((-5 + 372 - 0) * (256 + (-7) + 0 * 5))`
3. Simplify inside the parentheses:
	* `-5 + 372 = 367`
	* `(256 - 7) = 249`
	* `0 * 5 = 0` (since any number multiplied by zero is zero)
4. Now the expression becomes:
	* `(367 * (249 + 0))`
5. Simplify inside the parentheses:
	* `249 + 0 = 249`
6. Multiply 367 and 249:
	* `367 * 249 = 91413`

So, the final answer is: **91413**



[Gao et al. (2022)](https://arxiv.org/abs/2211.10435) introduced the concept of "Program-aided Language Models" (PAL). While LLMs are bad at arithmetic, they're great for code generation. PAL leverages this fact by instructing the LLM to write code to solve calculation tasks.

In [19]:
complete_and_print(
    """
    # Python code to calculate: ((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))
    """,
)


    # Python code to calculate: ((-5 + 93 * 4 - 0) * (4^4 + -7 + 0 * 5))
    
Here is the Python code that performs the calculation:
```python
result = (((-5) + (93 * 4)) - 0) * ((4**4) + (-7) + (0 * 5))
print(result)
```
Explanation:

* The expression `(-5 + 93 * 4 - 0)` is evaluated first, following the order of operations. This equals `-5 + 372 = 367`.
* Next, `(4^4 + -7 + 0 * 5)` is evaluated. Since `^` denotes exponentiation in Python, this equals `256 + (-7) = 249`.
* Finally, the result of the first expression (`367`) is multiplied by the result of the second expression (`249`), yielding a final answer.

Running this code will output:
```
91313
```
Let me know if you have any questions or need further clarification!



In [20]:
# The following code was generated by Llama 3 70B:

result = ((-5 + 93 * 4 - 0) * (4**4 - 7 + 0 * 5))
print(result)

91383


### Limiting Extraneous Tokens

A common struggle with Llama 2 is getting output without extraneous tokens (ex. "Sure! Here's more information on..."), even if explicit instructions are given to Llama 2 to be concise and no preamble. Llama 3.x can better follow instructions.

Check out this improvement that combines a role, rules and restrictions, explicit instructions, and an example:

In [21]:
complete_and_print(
    "Give me the zip code for Menlo Park in JSON format with the field 'zip_code'",
)
# Likely returns the JSON and also "Sure! Here's the JSON..."

complete_and_print(
    """
    You are a robot that only outputs JSON.
    You reply in JSON format with the field 'zip_code'.
    Example question: What is the zip code of the Empire State Building? Example answer: {'zip_code': 10118}
    Now here is my question: What is the zip code of Menlo Park?
    """,
)
# "{'zip_code': 94025}"

Give me the zip code for Menlo Park in JSON format with the field 'zip_code'
Here is a single JSON object containing the zip code for Menlo Park:

```
{
  "zip_code": "94025"
}
```


    You are a robot that only outputs JSON.
    You reply in JSON format with the field 'zip_code'.
    Example question: What is the zip code of the Empire State Building? Example answer: {'zip_code': 10118}
    Now here is my question: What is the zip code of Menlo Park?
    
{"zip_code": "94025"}



## Additional References
- [PromptingGuide.ai](https://www.promptingguide.ai/)
- [LearnPrompting.org](https://learnprompting.org/)
- [Lil'Log Prompt Engineering Guide](https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/)
