# Session 2 - Demo 2.2 - Model Reliability & Enhancing LLMs

<a href="https://colab.research.google.com/github/dair-ai/pe-for-llms/blob/main/notebooks/session-2/demo-2.1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
# update or install the necessary libraries
!pip install --upgrade openai
!pip install --upgrade langchain
!pip install --upgrade python-dotenv
!pip install chromadb

In [116]:
# load the libraries
import openai
import os
import IPython
from langchain.llms import OpenAI
from dotenv import load_dotenv

# load the environment variables
load_dotenv()

# API configuration
openai.api_key = os.getenv("OPENAI_API_KEY")

# for LangChain
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")
os.environ["SERPAPI_API_KEY"] = os.getenv("SERPAPI_API_KEY")

We will be using `text-davinci-003` for all examples in this tutorial. You can easily switch to `gpt-3.5-turbo`, however, note that this model is more expensive and slower. 

In [96]:
# lm instance
llm_text_davinci = OpenAI(model_name='text-davinci-003', temperature=0)

# Part 1

## Counting token usage

We will use `tiktoken`, an open-source tokenizer by OpenAI.

https://github.com/openai/tiktoken

In [97]:
import tiktoken

In [98]:
# load encoding by name
encoding = tiktoken.get_encoding("cl100k_base")

# load the correct encoding by passing the model name
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [99]:
# tokenize text
encoding.encode("I am feeling happy today")

[40, 1097, 8430, 6380, 3432]

In [100]:
# count tokens
len(encoding.encode("I am feeling happy today"))

5

In [101]:
# using gpt-4 model
gpt4_encoding = tiktoken.encoding_for_model("gpt-4")

In [102]:
gpt4_encoding.encode("I am feeling happy today")

[40, 1097, 8430, 6380, 3432]

You can also calculate token usage with LangChain:

In [103]:
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback

In [104]:
llm = OpenAI(model_name="text-davinci-003")

# anything inside the context manager will be tracked
with get_openai_callback() as cb:
    result = llm("Tell me a short story about a robot")
    print(cb)

Tokens Used: 264
	Prompt Tokens: 8
	Completion Tokens: 256
Successful Requests: 1
Total Cost (USD): $0.005280000000000001


## Be clear and specific when prompting

In [105]:
global_trending_movies = ["The Suicide Squad", "No Time to Die", "Dune",  "Spider-Man: No Way Home", "The French Dispatch", "Black Widow", "Eternals", "The Matrix Resurrections", "West Side Story", "The Many Saints of Newark"]


prompt = """
Your task is to recommend movies to a customer. 

You are responsible to recommend a movie from the top global trending movies from {global_trending_movies}. 

You should refrain from asking users for their preferences and avoid asking for personal information.

If you don't have a movie to recommend, you should respond "Sorry, couldn't find a movie to recommend today.".

Customer: Please recommend a movie based on my interests.
Agent: 
"""

llm_text_davinci(prompt.format(global_trending_movies=global_trending_movies))

"Sorry, couldn't find a movie to recommend today."

An example where the customer provides information about interests:

In [106]:
global_trending_movies = ["The Suicide Squad", "No Time to Die", "Dune",  "Spider-Man: No Way Home", "The French Dispatch", "Black Widow", "Eternals", "The Matrix Resurrections", "West Side Story", "The Many Saints of Newark"]


prompt = """
Your task is to recommends movies to a customer. 

You are responsible to recommend a movie from the top global trending movies from {global_trending_movies}. 

You should refrain from asking users for their preferences and avoid asking for personal information.

If you don't have a movie to recommend, you should respond "Sorry, couldn't find a movie to recommend today.".

Customer: I love super-hero movies. Please recommend a movie based on my interests.
Agent: 
"""

llm_text_davinci(prompt.format(global_trending_movies=global_trending_movies))

"I recommend Spider-Man: No Way Home. It's a highly anticipated sequel to the Marvel Cinematic Universe and is sure to be a hit with fans of super-hero movies."

## Using Delimiters to Distinguish Components of a Prompt

In [108]:
prompt = """
Convert the following code block in the ### <code> ### section to Python:

###
strings2 = Array[]
strings2.push("one")
strings2.push("two")
strings2.push("THREE")
### 
"""

In [109]:
IPython.display.Markdown("```python" + llm_text_davinci(prompt) + "\n```")

```python
strings2 = []
strings2.append("one")
strings2.append("two")
strings2.append("THREE")
```

## Specify Output Format

In [110]:
prompt = """
Your task is: given a product description, return the requested information in the section delimited by ### ###. Format the output as a JSON object.

Product Description: Introducing the Nike Air Max 270 React: a comfortable and stylish sneaker that combines two of Nike's best technologies. With a sleek black design and a unique bubble sole, these shoes are perfect for everyday wear.

###
product_name: the name of the product
product_bran: the name of the brand (if any) 
###
"""

IPython.display.Markdown(llm_text_davinci(prompt))


{
  "product_name": "Nike Air Max 270 React",
  "product_brand": "Nike"
}

## Specifying the Length of the output

In [111]:
prompt = """
Your task is: given a customer support email, which is delimited with ###, generate a shorter 1-2 sentence response.

###
Dear [Customer],

We hope this email finds you well. We wanted to update you on the shipping issue you experienced with your recent order. After investigating the issue, we have located your package and it is currently on its way to you. We apologize for any inconvenience this may have caused and thank you for your patience and understanding while we resolved this matter.

Please note that we have taken steps to prevent similar issues from occurring in the future. We have improved our shipping tracking system and are now better equipped to ensure that packages arrive on time and in good condition. We take the quality of our service very seriously and want to ensure that all of our customers have a positive experience when shopping with us.

Once again, we apologize for any inconvenience this may have caused and hope that you will continue to shop with us in the future. If you have any further questions or concerns, please do not hesitate to contact us. We are always here to help and ensure that your shopping experience is a positive one.

Thank you for your understanding.

Best regards,

[Your Name]

Customer Support Team
###
"""

llm_text_davinci(prompt)

'We apologize for the inconvenience caused by the shipping issue with your recent order. We have located your package and it is on its way to you. We have taken steps to prevent similar issues from occurring in the future. Thank you for your understanding.'

## Avoid deviating and constraining the output

Sometimes it helps to be more specific about what output you expect to avoid the model deviating from the main task of interest. 

In [18]:
llm_text_davinci("Recommend a movie for Saturday:")

"\n\nThe Grand Budapest Hotel (2014). This Wes Anderson classic is a delightful comedy-drama set in a European hotel in the 1930s. It follows the adventures of the hotel's concierge, Gustave H, and his loyal lobby boy, Zero Moustafa, as they try to solve a mystery involving a priceless painting. With an all-star cast, including Ralph Fiennes, Adrien Brody, and Willem Dafoe, this is a must-see for any movie fan."

In [112]:
llm_text_davinci("Recommend a movie for Saturday. Just say the movie, no need for explanations!")

'\n\nThe Princess Bride'

## Split Task into Subtasks

In [113]:
event = """
Summer Beats Festival

The event will be held at the beautiful seaside location of Ocean Park in Miami, Florida.

The festival will take place over two days, from July 15th to July 16th.

The Summer Beats Festival will feature a fantastic lineup of popular musical artists and bands from a variety of genres. Attendees can expect to dance and sing along to live performances from headliners such as Taylor Swift, Bruno Mars, and Post Malone. In addition to the main stage, there will be several smaller stages scattered throughout the park featuring up-and-coming artists and DJs.

The festival will also offer a wide variety of food and drink options for attendees to enjoy. From classic festival fare like hot dogs and funnel cakes to more gourmet offerings like sushi and craft beer, there will be something to suit every taste.

Families with children are welcome, and there will be plenty of activities to keep the little ones entertained. The festival will offer a dedicated children's area with carnival games, face painting, and other fun activities.

For those looking for a more luxurious experience, the Summer Beats Festival will also offer a VIP area with premium viewing of the main stage, private bars, and lounges, and other exclusive perks.

Overall, the Summer Beats Festival promises to be an unforgettable event for music lovers of all ages. With a stunning location, a great lineup of artists, and plenty of activities and amenities, it's sure to be the highlight of the summer!
"""

prompt = """
Your task is to extract the date of the event and the name of the event. The event is delimited by ### ###.

###
Event: {event}
###

Output:
"""

In [114]:
llm_out = llm(prompt.format(event=event))
llm_out

'Event: Summer Beats Festival\nDate: July 15th to July 16th'

As you add more tasks within the prompt ensure you be more detailed and specific

In [115]:
prompt = """
Your task is to explain the event in 2 sentences. Extract the date of the event and the name of the event. The event is delimited by ### ###. 

Transform the dates into a MM/DD.

###
Event: {event}
###

Output format: Explanation | event name | date
"""
llm_out = llm(prompt.format(event=event))
llm_out

'The Summer Beats Festival will feature a fantastic lineup of musical artists and offer a wide variety of food and drink options, as well as activities for families with children. It will take place from 07/15-07/16.'

# Part 2

## Program-aided Language model

In [117]:
# lm instance
pal_llm = OpenAI(model_name='text-davinci-003', temperature=0)

In [128]:
PENGUIN_PROMPT = '''
"""
Q: Here is a table where the first line is a header and each subsequent line is a penguin:
name, age, height (cm), weight (kg) 
Louis, 7, 50, 11
Bernard, 5, 80, 13
Vincent, 9, 60, 11
Gwen, 8, 70, 15
For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. 
We now add a penguin to the table:
James, 12, 90, 12
How many penguins are less than 8 years old?
"""
# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Add penguin James.
penguins.append(('James', 12, 90, 12))
# Find penguins under 8 years old.
penguins_under_8_years_old = [penguin for penguin in penguins if penguin[1] < 8]
# Count number of penguins under 8.
num_penguin_under_8 = len(penguins_under_8_years_old)
answer = num_penguin_under_8
"""
Q: Here is a table where the first line is a header and each subsequent line is a penguin:
name, age, height (cm), weight (kg) 
Louis, 7, 50, 11
Bernard, 5, 80, 13
Vincent, 9, 60, 11
Gwen, 8, 70, 15
For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.
Which is the youngest penguin?
"""
# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Sort the penguins by age.
penguins = sorted(penguins, key=lambda x: x[1])
# Get the youngest penguin's name.
youngest_penguin_name = penguins[0][0]
answer = youngest_penguin_name
"""
Q: Here is a table where the first line is a header and each subsequent line is a penguin:
name, age, height (cm), weight (kg) 
Louis, 7, 50, 11
Bernard, 5, 80, 13
Vincent, 9, 60, 11
Gwen, 8, 70, 15
For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.
What is the name of the second penguin sorted by alphabetic order?
"""
# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Sort penguins by alphabetic order.
penguins_alphabetic = sorted(penguins, key=lambda x: x[0])
# Get the second penguin sorted by alphabetic order.
second_penguin_name = penguins_alphabetic[1][0]
answer = second_penguin_name
"""
{question}
"""
'''.strip() + '\n'

In [129]:
question = "Which is the oldest penguin?"

In [130]:
llm_out = pal_llm(PENGUIN_PROMPT.format(question=question))
print(llm_out)

# Put the penguins into a list.
penguins = []
penguins.append(('Louis', 7, 50, 11))
penguins.append(('Bernard', 5, 80, 13))
penguins.append(('Vincent', 9, 60, 11))
penguins.append(('Gwen', 8, 70, 15))
# Sort the penguins by age.
penguins = sorted(penguins, key=lambda x: x[1], reverse=True)
# Get the oldest penguin's name.
oldest_penguin_name = penguins[0][0]
answer = oldest_penguin_name


In [131]:
exec(llm_out)
print(answer)

Vincent


## ReAct

Example of what the ReAct prompting approach looks like visually: https://www.comet.com/examples/langchain/view/Tgq750ZDhX17skjsgBCcs2w2z/panels

In [132]:
prompt = """Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?"""

response = pal_llm(prompt)

IPython.display.Markdown(response)



Olivia Wilde's boyfriend is Jason Sudeikis. His current age is 43, so raised to the 0.23 power, his age is approximately 4.9.

In [133]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent

In [134]:
llm = OpenAI(temperature=0)

tools = load_tools(["serpapi", "llm-math"], llm=llm)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

In [135]:
# run the agent
agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out who Olivia Wilde's boyfriend is and then calculate his age raised to the 0.23 power.
Action: Search
Action Input: "Olivia Wilde boyfriend"[0m
Observation: [36;1m[1;3mSudeikis and Wilde's relationship ended in November 2020. Wilde was publicly served with court documents regarding child custody while she was presenting Don't Worry Darling at CinemaCon 2022. In January 2021, Wilde began dating singer Harry Styles after meeting during the filming of Don't Worry Darling.[0m
Thought:[32;1m[1;3m I need to find out Harry Styles' age.
Action: Search
Action Input: "Harry Styles age"[0m
Observation: [36;1m[1;3m29 years[0m
Thought:[32;1m[1;3m I need to calculate 29 raised to the 0.23 power.
Action: Calculator
Action Input: 29^0.23[0m
Observation: [33;1m[1;3mAnswer: 2.169459462491557[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: Harry Styles is Olivia Wilde's boyfriend and his cur

"Harry Styles is Olivia Wilde's boyfriend and his current age raised to the 0.23 power is 2.169459462491557."

In [136]:
agent.run("Which teams made it into the 2023 NBA western conference finals?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out which teams are in the 2023 NBA western conference finals
Action: Search
Action Input: "2023 NBA western conference finals teams"[0m
Observation: [36;1m[1;3mThe Denver Nuggets take on the Los Angeles Lakers as the 2023 NBA Western Conference finals begin on Tuesday.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: The Denver Nuggets and the Los Angeles Lakers.[0m

[1m> Finished chain.[0m


'The Denver Nuggets and the Los Angeles Lakers.'