# Introduction to Large Language Model (LLM) Prompting
In this lesson, you'll learn how to create effective prompts for large language models (LLMs) like OpenAI's GPT-4. This notebook will guide you through the setup, basic usage, and advanced techniques for prompt engineering.

## Guidelines for Prompting
Effective prompting is essential to get the desired output from LLMs. In this section, you'll practice two prompting principles and their related tactics to write effective prompts for large language models.



## Setup
Before we begin, we need to load the API key and relevant Python libraries.
#### Load the API key and relevant Python libaries.

In this course, we've provided some code that loads the OpenAI API key for you.

In [None]:
# pip install openai


The library needs to be configured with your account's secret key, which is available on the [OpenAI website](https://platform.openai.com/account/api-keys). 

You can either set it as the `OPENAI_API_KEY` environment variable before using the library:


In [3]:
import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

client = openai.OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key = os.getenv('OPENAI_API_KEY'),
)


#### Helper function
Throughout this course, we will use OpenAI's `gpt-4` model and the [chat completions endpoint](https://platform.openai.com/docs/guides/chat). 

This helper function will make it easier to use prompts and look at the generated outputs.  
**Note**: In May 2023, OpenAI updated gpt-4. The results you see in the notebook may be slightly different than those in the video. Some of the prompts have also been slightly modified to product the desired results.

<font color='#FF6347'>**Note:**</font>
### OpenAI Models

As of 2024, OpenAI has developed several models, primarily focused on natural language processing (NLP) and artificial intelligence (AI). Here is a list of notable models developed by OpenAI:

1. **GPT (Generative Pre-trained Transformer) Series:**
    - **GPT:** The original model, introduced in 2018, demonstrated the ability to generate coherent text from a prompt.
    - **GPT-2:** Released in 2019, this model improved upon the original with 1.5 billion parameters and demonstrated more advanced text generation capabilities.
    - **GPT-3:** Launched in 2020, GPT-3 significantly expanded the model size to 175 billion parameters, becoming one of the largest and most powerful language models available.
    - **GPT-3.5:** An interim update to GPT-3 with improved performance and efficiency.
    - **GPT-4:** The latest iteration, released in 2023, further enhancing the capabilities with better context understanding, reasoning, and generation.
    - **GPT-4O:** A specialized variant of GPT-4 optimized for specific use cases, offering improved performance and efficiency for targeted applications.  

2. **Codex:** A model specialized for coding tasks, capable of understanding and generating code in various programming languages. This model powers GitHub Copilot.

3. **DALL-E Series:**
    - **DALL-E:** Introduced in 2021, capable of generating images from textual descriptions.
    - **DALL-E 2:** An enhanced version with improved image quality and understanding of complex textual prompts.

4. **CLIP (Contrastive Language-Image Pre-Training):** A model that connects vision and language, capable of understanding and generating textual descriptions for images.

5. **Whisper:** A speech recognition model introduced to accurately transcribe and understand spoken language.

These models represent significant advancements in AI, each tailored to specific tasks and capabilities, pushing the boundaries of what AI can achieve in natural language understanding, image generation, and code generation.


In [4]:
def chat_gpt(prompt):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content.strip()

In [5]:
def get_completion(prompt):
    return chat_gpt(prompt)

## Prompting Techniques

### Temperature
The temperature parameter controls the randomness of the model's output. A higher temperature results in more random completions, while a lower temperature results in more focused and deterministic completions.

Try experimenting with different temperature settings:


**Note:** This and all other lab notebooks of this course use OpenAI library version `0.27.0`. 

In order to use the OpenAI library version `1.0.0`, here is the code that you would use instead for the `get_completion` function:

```python
client = openai.OpenAI()

def get_completion(prompt, model="gpt-4"):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0
    )
    return response.choices[0].message.content
```

## Prompting Principles
- **Principle 1: Write clear and specific instructions**
- **Principle 2: Give the model time to “think”**

### Tactics

#### Tactic 1: Use delimiters to clearly indicate distinct parts of the input
- Delimiters can be anything like: ```, """, < >, `<tag> </tag>`, `:`

In [7]:
text = """
You should express what you want a model to do by 
providing instructions that are as clear and 
specific as you can possibly make them. 
This will guide the model towards the desired output, 
and reduce the chances of receiving irrelevant 
or incorrect responses. Don't confuse writing a 
clear prompt with writing a short prompt. 
In many cases, longer prompts provide more clarity 
and context for the model, which can lead to 
more detailed and relevant outputs.
"""

prompt = f"""
Summarize the text delimited by triple backticks 
into a single sentence.
```{text}```
"""

response = get_completion(prompt)
print(response)

To achieve desired model outputs, you should provide clear, specific instructions and not equate brevity with clarity as longer prompts can often lead to clearer and more detailed responses.


Clear and specific instructions are crucial in guiding a model towards the desired output, and often longer prompts can provide more clarity and context, leading to more detailed and relevant outputs.

To achieve desired model outputs, you should provide clear, specific instructions and not equate brevity with clarity as longer prompts can often lead to clearer and more detailed responses.

#### Tactic 2: Ask for a structured output
- JSON, HTML

In [9]:
prompt = f"""
Generate a list of three made-up book titles along \ 
with their authors and genres. 
Provide them in jupyter markdown format with the following keys: 
book_id, title, author, genre.
"""
response = get_completion(prompt)
print(response)

```markdown
| book_id | title               | author         | genre         |
|---------|---------------------|----------------|---------------|
| 1       | "The Ocean's Echo"  | John Caldwell  | Fantasy       |
| 2       | "Twisted Horizons"  | Jane Milton    | Science Fiction |
| 3       | "Sweet Vengeance"   | Sophie Bernard | Romance       |
```


#### Tactic 3: Ask the model to check whether conditions are satisfied

In [10]:
text_1 = f"""
Making a cup of tea is easy! First, you need to get some \ 
water boiling. While that's happening, \ 
grab a cup and put a tea bag in it. Once the water is \ 
hot enough, just pour it over the tea bag. \ 
Let it sit for a bit so the tea can steep. After a \ 
few minutes, take out the tea bag. If you \ 
like, you can add some sugar or milk to taste. \ 
And that's it! You've got yourself a delicious \ 
cup of tea to enjoy.
"""
prompt = f"""
You will be provided with text delimited by triple quotes. 
If it contains a sequence of instructions, \ 
re-write those instructions in the following format:

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions, \ 
then simply write \"No steps provided.\"

\"\"\"{text_1}\"\"\"
"""
response = get_completion(prompt)
print("Completion for Text 1:")
print(response)

Completion for Text 1:
Step 1 - Get some water boiling.
Step 2 - While that's happening, grab a cup and put a tea bag in it.
Step 3 - Once the water is hot enough, pour it over the tea bag.
Step 4 - Let it sit for a bit so the tea can steep.
Step 5 - After a few minutes, take out the tea bag.
Step 6 - If you like, add some sugar or milk to taste.
Step 7 - Enjoy your delicious cup of tea.


In [11]:
text_2 = f"""
The sun is shining brightly today, and the birds are \
singing. It's a beautiful day to go for a \ 
walk in the park. The flowers are blooming, and the \ 
trees are swaying gently in the breeze. People \ 
are out and about, enjoying the lovely weather. \ 
Some are having picnics, while others are playing \ 
games or simply relaxing on the grass. It's a \ 
perfect day to spend time outdoors and appreciate the \ 
beauty of nature.
"""
prompt = f"""
You will be provided with text delimited by triple quotes. 
If it contains a sequence of instructions, \ 
re-write those instructions in the following format:

Step 1 - ...
Step 2 - …
…
Step N - …

If the text does not contain a sequence of instructions, \ 
then simply write \"No steps provided.\"

\"\"\"{text_2}\"\"\"
"""
response = get_completion(prompt)
print("Completion for Text 2:")
print(response)

Completion for Text 2:
No steps provided.


#### Tactic 4: "Few-shot" prompting

### What is Few-Shot Prompting?
Few-shot prompting is a technique used in natural language processing (NLP) where a language model, such as OpenAI's GPT-4, is provided with a small number of example inputs and corresponding outputs (known as "shots") to guide it in generating a response for a new, similar input. This approach leverages the model's ability to generalize from a limited set of examples to perform a task without requiring extensive fine-tuning or large datasets.

### Key Concepts:
1. **Few-Shot Learning:** The model is provided with a few examples (typically 1-10) to learn the task. This contrasts with zero-shot learning, where the model is given no examples, and many-shot learning, where the model is trained on a large dataset.
2. **Prompt Format:** The few-shot prompt includes the task description followed by several example input-output pairs, and finally the new input for which the model needs to generate an output.

### Example:
Suppose we want the model to translate English sentences to French. Here is how a few-shot prompt might look:

**Prompt:**


Translate the following English sentences to French:

English: How are you?
French: Comment ça va?

English: Good morning.
French: Bonjour.

English: Thank you.
French: Merci.

English: Where is the library?
French: Où est la bibliothèque?

In [12]:
prompt = f"""
Your task is to answer in a consistent style.

<child>: Teach me about patience.

<grandparent>: The river that carves the deepest \ 
valley flows from a modest spring; the \ 
grandest symphony originates from a single note; \ 
the most intricate tapestry begins with a solitary thread.

<child>: Teach me about resilience.
"""
response = get_completion(prompt)
print(response)

<grandparent>: The mightiest oak tree started life as a fragile acorn, enduring countless storms and winters; the unyielding rock that withstands the ocean's waves was once but a grain of sand; the iron that carries the weight of the world was once a lump of ore, beaten and shaped under the blacksmith's hammer.


### Benefits of Few-Shot Prompting:
- **Flexibility:** Allows the model to adapt to a wide range of tasks without extensive retraining.
- **Efficiency:** Requires fewer examples and less computational resources compared to training a model from scratch.
- **Ease of Use:** Users can quickly create prompts with a few examples to guide the model.

### Applications:
- **Text Translation:** Translating text from one language to another.
- **Text Summarization:** Summarizing long documents into concise summaries.
- **Question Answering:** Answering questions based on provided context.
- **Creative Writing:** Generating stories, poems, or other creative content.

Few-shot prompting is a powerful technique that demonstrates the versatility and capability of large language models in performing a variety of tasks with minimal input data.


### Principle 2: Give the model time to “think” 

#### Tactic 1: Specify the steps required to complete a task

In [22]:
text = f"""
In a charming village, siblings Jack and Jill set out on \ 
a quest to fetch water from a hilltop \ 
well. As they climbed, singing joyfully, misfortune \ 
struck—Jack tripped on a stone and tumbled \ 
down the hill, with Jill following suit. \ 
Though slightly battered, the pair returned home to \ 
comforting embraces. Despite the mishap, \ 
their adventurous spirits remained undimmed, and they \ 
continued exploring with delight.
"""
# example 1
prompt_1 = f"""
Perform the following actions: 
1 - Summarize the following text delimited by triple \
backticks with 1 sentence.
2 - Translate the summary into French.
3 - List each name in the French summary.
4 - Output a json object that contains the following \
keys: french_summary, num_names.

Separate your answers with line breaks.

Text:
```{text}```
"""
response = get_completion(prompt_1)
print("Completion for prompt 1:")
print(response)

Completion for prompt 1:
1 - In a quaint village, siblings Jack and Jill experience a mishap while fetching water from a hilltop well but continue their adventures with undimmed spirits.
2 - Dans un charmant village, les frères et sœurs Jack et Jill vivent un incident en allant chercher de l'eau à un puits sur une colline, mais continuent leurs aventures avec un esprit indompté.
3 - Jack, Jill
4 - {"french_summary": "Dans un charmant village, les frères et sœurs Jack et Jill vivent un incident en allant chercher de l'eau à un puits sur une colline, mais continuent leurs aventures avec un esprit indompté.", "num_names": 2 }


#### Ask for output in a specified format

In [23]:
prompt_2 = f"""
Your task is to perform the following actions: 
1 - Summarize the following text delimited by 
  <> with 1 sentence.
2 - Translate the summary into French.
3 - List each name in the French summary.
4 - Output a json object that contains the 
  following keys: french_summary, num_names.

Use the following format:
Text: <text to summarize>
Summary: <summary>
Translation: <summary translation>
Names: <list of names in summary>
Output JSON: <json with summary and num_names>

Text: <{text}>
"""
response = get_completion(prompt_2)
print("\nCompletion for prompt 2:")
print(response)


Completion for prompt 2:
Summary: Siblings Jack and Jill go on a quest for water, trip and fall, but remain undaunted and continue exploring.
Translation: Les frères et sœurs Jack et Jill partent en quête d'eau, trébuchent et tombent, mais restent imperturbables et continuent d'explorer.
Names: Jack, Jill
Output JSON: {"french_summary": "Les frères et sœurs Jack et Jill partent en quête d'eau, trébuchent et tombent, mais restent imperturbables et continuent d'explorer.", "num_names": 2}


#### Tactic 2: Instruct the model to work out its own solution before rushing to a conclusion

In [24]:
prompt = f"""
Determine if the student's solution is correct or not.

Question:
I'm building a solar power installation and I need \
 help working out the financials. 
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \ 
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations 
as a function of the number of square feet.

Student's Solution:
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
"""
response = get_completion(prompt)
print(response)

The student's solution is correct.


#### Note that the student's solution is actually not correct.
#### We can fix this by instructing the model to work out its own solution first.

In [25]:
prompt = f"""
Your task is to determine if the student's solution \
is correct or not.
To solve the problem do the following:
- First, work out your own solution to the problem including the final total. 
- Then compare your solution to the student's solution \ 
and evaluate if the student's solution is correct or not. 
Don't decide if the student's solution is correct until 
you have done the problem yourself.

Use the following format:
Question:
```
question here
```
Student's solution:
```
student's solution here
```
Actual solution:
```
steps to work out the solution and your solution here
```
Is the student's solution the same as actual solution \
just calculated:
```
yes or no
```
Student grade:
```
correct or incorrect
```

Question:
```
I'm building a solar power installation and I need help \
working out the financials. 
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost \
me a flat $100k per year, and an additional $10 / square \
foot
What is the total cost for the first year of operations \
as a function of the number of square feet.
``` 
Student's solution:
```
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
```
Actual solution:
"""
response = get_completion(prompt)
print(response)

```
Let x be the size of the installation in square feet.
Costs:
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 10x
Total cost: 100x + 250x + 100,000 + 10x = 360x + 100,000
```
Is the student's solution the same as actual solution just calculated:
```
No
```
Student grade:
```
Incorrect


## Model Limitations: Hallucinations

### What is Hallucination in Language Models?
Hallucination in the context of language models refers to the phenomenon where the model generates text that is coherent and plausible but is factually incorrect or nonsensical. These hallucinations occur because the model generates responses based on patterns in the data it was trained on, rather than understanding or verifying the accuracy of the information.

### Why Do Hallucinations Occur?
1. **Training Data:** The model's responses are based on patterns learned from vast amounts of text data, which may include inaccuracies.
2. **Autoregressive Nature:** Language models generate text one token at a time, and small errors can propagate and amplify as the text generation progresses.
3. **Lack of Real-Time Knowledge:** Models like GPT-4 do not have access to real-time data and cannot verify facts in real-time.
4. **Ambiguous Prompts:** Vague or poorly structured prompts can lead to hallucinations as the model tries to fill in the gaps.

### How Can We Prevent Hallucinations?
1. **Clear and Specific Prompts:** Ensure that prompts are clear, specific, and unambiguous. Providing detailed context can help the model generate more accurate responses.
   - **Example:** Instead of asking, "Tell me about the latest advancements in AI," specify, "List the major AI advancements in 2023 related to natural language processing."
2. **Use of Few-Shot or Zero-Shot Examples:** Provide examples in the prompt to guide the model towards the desired output.
   - **Example:** When asking for a summary, provide a few examples of well-crafted summaries.
3. **Verification and Validation:** Cross-check the model's responses against reliable sources. Incorporate a human-in-the-loop approach where experts review and validate the content.
4. **Structured Outputs:** Request structured outputs, such as bullet points or specific formats, to minimize the chances of generating incorrect information.
   - **Example:** Instead of asking for a general explanation, request a list of key points or a step-by-step guide.
5. **Limitations Acknowledgment:** Inform users about the potential for hallucinations and encourage them to critically evaluate the generated content.



### Example of Preventing Hallucinations:

> **Prompt with Ambiguity:** Tell me about the history of AI.  
> **Potential Hallucinated Response:** AI was invented in 1950 by Alan Turing, who also created the first computer.

> **Improved Prompt:** Provide a brief timeline of significant milestones in the history of artificial intelligence, starting from the 1950s to the present day.  
> **More Accurate Response:**  
1950: Alan Turing proposes the Turing Test to evaluate machine intelligence.
1956: The term "artificial intelligence" is coined at the Dartmouth Conference.
1966: Joseph Weizenbaum develops ELIZA, one of the first chatbots.
1997: IBM's Deep Blue defeats chess champion Garry Kasparov.
2011: IBM Watson wins the quiz show Jeopardy!
2023: Significant advancements in NLP with models like GPT-4.

Example: - Boie is a real company, the product name is not real.

In [26]:
prompt = f"""
Tell me about AeroGlide UltraSlim Smart Toothbrush by Boie
"""
response = get_completion(prompt)
print(response)

The AeroGlide UltraSlim Smart Toothbrush by Boie is a revolutionary oral care device designed to bring advanced technology to your daily dental care routine. It comes with ultra-soft, silver-infused bristles that not only thoroughly clean your teeth but also keeps your toothbrush cleaner and reduces the chances of bacteria growth.

This smart toothbrush is known for its slim and flexible design which makes it easy to reach every corner of your mouth. Using sonic technology, it delivers 30,000 brush strokes per minute ensuring to eliminate plaque and other dental impurities more effectively than a manual toothbrush.

The AeroGlide UltraSlim Smart Toothbrush also includes features such as built-in smart sensors and an automatic timer which makes sure that you brush for the dentist-recommended time of two minutes. This smart toothbrush also generates minimal noise during operation and comes with long-lasting battery life.

Apart from these features, this smart toothbrush can also connect 

<font color='#FF6347'>**Note:**</font> By following these strategies, we can reduce the occurrence of hallucinations and improve the reliability of the model's outputs.

## Try experimenting on your own!

#### Notes on using the OpenAI API outside of this classroom

To install the OpenAI Python library:
```
!pip install openai
```

The library needs to be configured with your account's secret key, which is available on the [website](https://platform.openai.com/account/api-keys). 

You can either set it as the `OPENAI_API_KEY` environment variable before using the library:
 ```
 !export OPENAI_API_KEY='sk-...'
 ```

Or, set `openai.api_key` to its value:

```
import openai
openai.api_key = "sk-..."
```

#### A note about the backslash
- In the course, we are using a backslash `\` to make the text fit on the screen without inserting newline '\n' characters.
- GPT models isn't really affected whether you insert newline characters or not.  But when working with LLMs in general, you may consider whether newline characters in your prompt may affect the model's performance.