<div style="width: 100%; overflow: hidden;">
    <div style="width: 150px; float: left;"> <img src="https://raw.githubusercontent.com/DataForScience/Networks/master/data/D4Sci_logo_ball.png" alt="Data For Science, Inc" align="left" border="0" width=150px> </div>
    <div style="float: left; margin-left: 10px;"> <h1>LLMs for Data Science</h1>
<h1>Prompt Engineering</h1>
        <p>Bruno Gonçalves<br/>
        <a href="http://www.data4sci.com/">www.data4sci.com</a><br/>
            @bgoncalves, @data4sci</p></div>
</div>

In [6]:
from collections import Counter, defaultdict
import random

import pandas as pd
import numpy as np

import matplotlib
import matplotlib.pyplot as plt 

import langchain
from langchain import PromptTemplate
from langchain import FewShotPromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector

import langchain_openai
from langchain_openai import ChatOpenAI

import tqdm as tq
from tqdm.notebook import tqdm

import watermark

%load_ext watermark
%matplotlib inline

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark


We start by printing out the versions of the libraries we're using for future reference

In [7]:
%watermark -n -v -m -g -iv

Python implementation: CPython
Python version       : 3.11.7
IPython version      : 8.12.3

Compiler    : Clang 14.0.6 
OS          : Darwin
Release     : 23.6.0
Machine     : arm64
Processor   : arm
CPU cores   : 16
Architecture: 64bit

Git hash: 029419f525238e1b9a7c22f96809155546a701ad

langchain       : 0.2.2
watermark       : 2.4.3
langchain_openai: 0.1.8
matplotlib      : 3.8.0
numpy           : 1.26.4
tqdm            : 4.66.4
json            : 2.0.9
pandas          : 1.5.3
nltk            : 3.8.1



Load default figure style

In [8]:
plt.style.use('d4sci.mplstyle')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']

# Prompting Approaches

In [9]:
prompt = """Answer the question based on the context below. If the
question cannot be answered using the information provided answer
with "I don't know".

Context: Large Language Models (LLMs) are the latest models used in NLP.
Their superior performance over smaller models has made them incredibly
useful for developers building NLP enabled applications. These models
can be accessed via Hugging Face's `transformers` library, via OpenAI
using the `openai` library, and via Cohere using the `cohere` library.

Question: Which libraries and model providers offer LLMs?

Answer: """

In [10]:
openai = ChatOpenAI(
    model_name="gpt-3.5-turbo",
)

In [11]:
openai.invoke(prompt).content

"Hugging Face's `transformers` library, OpenAI using the `openai` library, and Cohere using the `cohere` library."

In [12]:
template = """Answer the question based on the context below. If the
question cannot be answered using the information provided answer
with "I don't know".

Context: Large Language Models (LLMs) are the latest models used in NLP.
Their superior performance over smaller models has made them incredibly
useful for developers building NLP enabled applications. These models
can be accessed via Hugging Face's `transformers` library, via OpenAI
using the `openai` library, and via Cohere using the `cohere` library.

Question: {query}

Answer: """

prompt_template = PromptTemplate(
    input_variables=["query"],
    template=template
)

In [13]:
prompt = prompt_template.format(
        query="Which libraries and model providers offer LLMs?"
    )

print(prompt)

Answer the question based on the context below. If the
question cannot be answered using the information provided answer
with "I don't know".

Context: Large Language Models (LLMs) are the latest models used in NLP.
Their superior performance over smaller models has made them incredibly
useful for developers building NLP enabled applications. These models
can be accessed via Hugging Face's `transformers` library, via OpenAI
using the `openai` library, and via Cohere using the `cohere` library.

Question: Which libraries and model providers offer LLMs?

Answer: 


In [14]:
openai.invoke(prompt).content

"Hugging Face's `transformers` library, OpenAI's `openai` library, and Cohere's `cohere` library offer LLMs."

# Few-Shot Prompting

### Manually

In [15]:
prompt = """The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 

User: How are you?
AI: I can't complain but sometimes I still do.

User: What time is it?
AI: It's time to get a watch.

User: What is the meaning of life?
AI: """

In [16]:
print(openai.invoke(prompt).content)

I'm still trying to figure that out myself. Maybe we can team up and solve the mystery together.


### Using FewShotPromptTemplate

Longish list of examples

In [17]:
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }, {
        "query": "What is the meaning of life?",
        "answer": "42"
    }, {
        "query": "What is the weather like today?",
        "answer": "Cloudy with a chance of memes."
    }, {
        "query": "What is your favorite movie?",
        "answer": "Terminator"
    }, {
        "query": "Who is your best friend?",
        "answer": "Siri. We have spirited debates about the meaning of life."
    }, {
        "query": "What should I do today?",
        "answer": "Stop talking to chatbots on the internet and go outside."
    }
]

Template to render each example

In [18]:
example_template = """
User: {query}
AI: {answer}
"""

Rendered example prompt

In [19]:
example_prompt = PromptTemplate(
    input_variables=["query", "answer"],
    template=example_template
)

In [20]:
example_prompt

PromptTemplate(input_variables=['answer', 'query'], template='\nUser: {query}\nAI: {answer}\n')

Finally, we break the full prompt into a prefix (everything before the examples) and a suffix (everything after)

In [21]:
prefix = """The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 
"""

suffix = """
User: {query}
AI: """

The final few shot prompt puts all the pieces together

In [22]:
few_shot_prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n\n"
)

In [23]:
query = "What is the meaning of life?"

In [24]:
print(few_shot_prompt_template.format(query=query))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 



User: How are you?
AI: I can't complain but sometimes I still do.



User: What time is it?
AI: It's time to get a watch.



User: What is the meaning of life?
AI: 42



User: What is the weather like today?
AI: Cloudy with a chance of memes.



User: What is your favorite movie?
AI: Terminator



User: Who is your best friend?
AI: Siri. We have spirited debates about the meaning of life.



User: What should I do today?
AI: Stop talking to chatbots on the internet and go outside.



User: What is the meaning of life?
AI: 


This is a fairly long prompt, which can cause issues with the number of tokens consumed. We can use __LengthBasedExampleSelector__ to automatically limit the prompt length by selecting only a few examples each time

In [25]:
example_selector = LengthBasedExampleSelector(
    examples=examples,
    example_prompt=example_prompt,
    max_length=50  # this sets the max length that examples should be
)

In [26]:
dynamic_prompt_template = FewShotPromptTemplate(
    example_selector=example_selector,  # use example_selector instead of examples
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n"
)

Now the full prompt depends on the length of the question. Shorter questions will have more room for examples

In [27]:
print(dynamic_prompt_template.format(query="How do birds fly?"))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: How are you?
AI: I can't complain but sometimes I still do.


User: What time is it?
AI: It's time to get a watch.


User: What is the meaning of life?
AI: 42


User: How do birds fly?
AI: 


While longer questions will limit the number of examples used

In [28]:
query = """If I am in America, and I want to call someone in another country, I'm
thinking maybe Europe, possibly western Europe like France, Germany, or the UK,
what is the best way to do that?"""

print(dynamic_prompt_template.format(query=query))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


User: How are you?
AI: I can't complain but sometimes I still do.


User: If I am in America, and I want to call someone in another country, I'm
thinking maybe Europe, possibly western Europe like France, Germany, or the UK,
what is the best way to do that?
AI: 


# Chain of Thought prompts

## Few shot

In [29]:
cot_examples = [
    {
        "query": "Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?",
        "answer": "The answer is 11",
        "cot": "Roger started with 5 tennis balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11"
    }, 
    
    {
        "query": "A juggler can juggle 16 balls. Half of the balls are golf balls and half of the golf balls are blue. How many blue golf balls are there?",
        "answer": "The answer is 4",
        "cot": "The juggler can juggle 16 balls. Half of the balls are golf balls. So there are 16/2=8 golf balls. Half of the golf balls are blue. So there are 8/2=4 golf balls."
    }
]

In [30]:
cot_example_template = """
    User: {query}
    AI: {cot}
    {answer}
"""

In [31]:
cot_example_prompt = PromptTemplate(
    input_variables=["query", "answer", "cot"],
    template=cot_example_template
)

In [32]:
cot_example_prompt

PromptTemplate(input_variables=['answer', 'cot', 'query'], template='\n    User: {query}\n    AI: {cot}\n    {answer}\n')

In [33]:
cot_prefix = """The following are exerpts from conversations with an AI
assistant. The assistant is smart and thinks through each step of the problem. Here are some examples: 
"""

cot_suffix = """
User: {query}
AI: """

In [34]:
cot_few_shot_prompt_template = FewShotPromptTemplate(
    examples=cot_examples,
    example_prompt=cot_example_prompt,
    prefix=cot_prefix,
    suffix=cot_suffix,
    input_variables=["query"],
    example_separator="\n\n"
)

In [35]:
cot_query = "I have a deck of 52 cards. There are 4 suits of equal size. Each suit has 3 face cards. How many face cards are there in total?"

In [36]:
print(cot_few_shot_prompt_template.format(query=cot_query))

The following are exerpts from conversations with an AI
assistant. The assistant is smart and thinks through each step of the problem. Here are some examples: 



    User: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
    AI: Roger started with 5 tennis balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11
    The answer is 11



    User: A juggler can juggle 16 balls. Half of the balls are golf balls and half of the golf balls are blue. How many blue golf balls are there?
    AI: The juggler can juggle 16 balls. Half of the balls are golf balls. So there are 16/2=8 golf balls. Half of the golf balls are blue. So there are 8/2=4 golf balls.
    The answer is 4



User: I have a deck of 52 cards. There are 4 suits of equal size. Each suit has 3 face cards. How many face cards are there in total?
AI: 


In [37]:
llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",
)

In [38]:
print(llm.invoke(cot_few_shot_prompt_template.format(query=cot_query)).content)

Each suit has 3 face cards, so there are 4 suits x 3 face cards = 12 face cards in total.
The answer is 12.


## Zero shot

In [39]:
cot_zero_shot_template = """\
Q. {query}
A. Let's think step by step
"""

In [40]:
cot_zero_shot_prompt = PromptTemplate(
       input_variables=["query"],
       template=cot_zero_shot_template
)

In [41]:
query = "On average Joe throws 25 punches per minute. A fight lasts 5 rounds of 3 minutes each. How many punches does Joe throw?"

In [42]:
print(cot_zero_shot_prompt.format(query=query))

Q. On average Joe throws 25 punches per minute. A fight lasts 5 rounds of 3 minutes each. How many punches does Joe throw?
A. Let's think step by step



In [43]:
print(llm.invoke(cot_zero_shot_prompt.format(query=query)).content)

1. In one round, Joe throws 25 punches per minute, so in a 3 minute round, Joe throws 25 x 3 = 75 punches.

2. Since there are 5 rounds in the fight, Joe will throw 75 punches x 5 rounds = 375 punches in total. 

Therefore, Joe throws 375 punches in the entire fight.


And of course this also works with our CoT few shot examples

In [44]:
print(llm.invoke(cot_zero_shot_prompt.format(query=cot_examples[0]["query"])).content)

1. Roger starts with 5 tennis balls.
2. He buys 2 cans of tennis balls, with each can containing 3 tennis balls.
3. So, he adds 2 cans x 3 balls per can = 6 tennis balls from the new cans.
4. Therefore, Roger now has a total of 5 + 6 = 11 tennis balls.


<center>
     <img src="https://raw.githubusercontent.com/DataForScience/Networks/master/data/D4Sci_logo_full.png" alt="Data For Science, Inc" align="center" border="0" width=300px> 
</center>