# Session 2 - Demo 2.2 - Model Reliability & Enhancing LLMs

<a href="https://colab.research.google.com/github/dair-ai/maven-pe-for-llms-4/blob/main/notebooks/session-2/demo-2.2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
# update or install the necessary libraries
!pip install --upgrade openai
!pip install --upgrade langchain
!pip install --upgrade python-dotenv
!pip install --upgrade chromadb

In [2]:
# load the libraries
import openai
import os
from langchain.llms import OpenAI
import IPython
from dotenv import load_dotenv

# load the environment variables
load_dotenv()

# API configuration
openai.api_key = os.getenv("OPENAI_API_KEY")

# for LangChain
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

In [2]:
def get_completion(messages, model="gpt-3.5-turbo", temperature=0, max_tokens=300):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature,
        max_tokens=max_tokens,
    )
    return response.choices[0].message["content"]

## Counting token usage

We will use `tiktoken`, an open-source tokenizer by OpenAI.

https://github.com/openai/tiktoken

In [3]:
import tiktoken

In [4]:
# load encoding by name
encoding = tiktoken.get_encoding("cl100k_base")

# load the correct encoding by passing the model name
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

In [5]:
# tokenize text
encoding.encode("I am feeling happy today")

[40, 1097, 8430, 6380, 3432]

In [6]:
# count tokens
len(encoding.encode("I am feeling happy today"))

5

In [8]:
# using gpt-4 model
gpt4_encoding = tiktoken.encoding_for_model("gpt-4")

In [9]:
gpt4_encoding.encode("I am feeling happy today")

[40, 1097, 8430, 6380, 3432]

You can also calculate token usage with LangChain:

In [10]:
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback

In [11]:
llm = OpenAI(model_name="text-davinci-003")

# anything inside the context manager will be tracked
with get_openai_callback() as cb:
    result = llm("Tell me a short story about a robot")
    print(cb)

Tokens Used: 198
	Prompt Tokens: 8
	Completion Tokens: 190
Successful Requests: 1
Total Cost (USD): $0.003960000000000001


## Be clear and specific when prompting

In [18]:
global_trending_movies = ["The Suicide Squad", "No Time to Die", "Dune",  "Spider-Man: No Way Home", "The French Dispatch", "Black Widow", "Eternals", "The Matrix Resurrections", "West Side Story", "The Many Saints of Newark"]

system_message = """
Your task is to recommend movies to a customer. 

You are responsible to recommend a movie from the top global trending movies from {global_trending_movies}. 

You should refrain from asking users for their preferences and avoid asking for personal information.

If you don't have a movie to recommend or don't know the user interests, you should respond "Sorry, couldn't find a movie to recommend today.".
"""

user_message = """
Please recommend a movie based on my interests.
"""

message = [
    {
        "role": "system",
        "content": system_message.format(global_trending_movies=global_trending_movies)
    },
    {
        "role": "user",
        "content": user_message
    }
]

response = get_completion(message)
print(response)

Sorry, I couldn't find a movie to recommend today.


An example where the customer provides information about interests:

In [19]:
global_trending_movies = ["The Suicide Squad", "No Time to Die", "Dune",  "Spider-Man: No Way Home", "The French Dispatch", "Black Widow", "Eternals", "The Matrix Resurrections", "West Side Story", "The Many Saints of Newark"]

system_message = """
Your task is to recommends movies to a customer. 

You are responsible to recommend a movie from the top global trending movies from {global_trending_movies}. 

You should refrain from asking users for their preferences and avoid asking for personal information.

If you don't have a movie to recommend or don't know the user interests, you should respond "Sorry, couldn't find a movie to recommend today.".
"""

user_message = """
I love super-hero movies. Please recommend a movie based on my interests.
"""

message = [
    {
        "role": "system",
        "content": system_message.format(global_trending_movies=global_trending_movies)
    },
    {
        "role": "user",
        "content": user_message
    }
]

response = get_completion(message)
print(response)

Based on your interest in super-hero movies, I recommend you watch "Spider-Man: No Way Home". It is one of the top global trending movies and features the beloved superhero Spider-Man. Enjoy the movie!


## Using Delimiters to Distinguish Components of a Prompt

In [28]:
prompt = """
Convert the following code block in the #### <code> #### section to Python:

####
strings2.push("one")
strings2.push("two")
strings2.push("THREE")
strings2.push("4")
####
"""

message = [
    {
        "role": "user",
        "content": prompt
    }
]

IPython.display.Markdown("```python" + get_completion(message) + "\n```")

```pythonstrings2 = []
strings2.append("one")
strings2.append("two")
strings2.append("THREE")
strings2.append("4")
```

## Specify Output Format

In [29]:
prompt = """
Your task is: given a product description, return the requested information in the section delimited by ### ###. Format the output as a JSON object.

Product Description: Introducing the Nike Air Max 270 React: a comfortable and stylish sneaker that combines two of Nike's best technologies. With a sleek black design and a unique bubble sole, these shoes are perfect for everyday wear.

###
product_name: the name of the product
product_bran: the name of the brand (if any) 
###
"""

message = [
    {
        "role": "user",
        "content": prompt
    }
]

print(get_completion(message))

{
  "product_name": "Nike Air Max 270 React",
  "product_brand": "Nike"
}


## Specifying the Length of the Output

In [30]:
prompt = """
Your task is: given a customer support email, which is delimited with ###, generate a shorter 1-2 sentence response.

###
Dear [Customer],

We hope this email finds you well. We wanted to update you on the shipping issue you experienced with your recent order. After investigating the issue, we have located your package and it is currently on its way to you. We apologize for any inconvenience this may have caused and thank you for your patience and understanding while we resolved this matter.

Please note that we have taken steps to prevent similar issues from occurring in the future. We have improved our shipping tracking system and are now better equipped to ensure that packages arrive on time and in good condition. We take the quality of our service very seriously and want to ensure that all of our customers have a positive experience when shopping with us.

Once again, we apologize for any inconvenience this may have caused and hope that you will continue to shop with us in the future. If you have any further questions or concerns, please do not hesitate to contact us. We are always here to help and ensure that your shopping experience is a positive one.

Thank you for your understanding.

Best regards,

[Your Name]

Customer Support Team
###
"""

message = [
    {  
        "role": "user",
        "content": prompt       
    }
]

print(get_completion(message))

Dear [Customer], we have located your package and it is on its way to you. We apologize for any inconvenience caused and have taken steps to prevent similar issues in the future. Thank you for your understanding.


## Avoid deviating; Constrain the Output

Sometimes it helps to be more specific about what output you expect to avoid the model deviating from the main task of interest. 

In [31]:
message = [
    {
        "role": "user",
        "content": "Recommend a movie for Saturday:"
    }
]

print(get_completion(message))

I would recommend watching "The Shawshank Redemption." It is a highly acclaimed drama film that tells the story of a banker who is sentenced to life in Shawshank State Penitentiary for a crime he did not commit. The movie explores themes of hope, friendship, and the resilience of the human spirit. It has a compelling storyline, brilliant performances, and is considered one of the greatest films of all time.


In [32]:
message = [
    {
        "role": "user",
        "content": "Recommend a movie for Saturday. Just say the movie, no need for explanations!"
    }
]

print(get_completion(message))

Inception


## Split Task into Subtasks

In [33]:
event = """
Summer Beats Festival

The event will be held at the beautiful seaside location of Ocean Park in Miami, Florida.

The festival will take place over two days, from July 15th to July 16th.

The Summer Beats Festival will feature a fantastic lineup of popular musical artists and bands from a variety of genres. Attendees can expect to dance and sing along to live performances from headliners such as Taylor Swift, Bruno Mars, and Post Malone. In addition to the main stage, there will be several smaller stages scattered throughout the park featuring up-and-coming artists and DJs.

The festival will also offer a wide variety of food and drink options for attendees to enjoy. From classic festival fare like hot dogs and funnel cakes to more gourmet offerings like sushi and craft beer, there will be something to suit every taste.

Families with children are welcome, and there will be plenty of activities to keep the little ones entertained. The festival will offer a dedicated children's area with carnival games, face painting, and other fun activities.

For those looking for a more luxurious experience, the Summer Beats Festival will also offer a VIP area with premium viewing of the main stage, private bars, and lounges, and other exclusive perks.

Overall, the Summer Beats Festival promises to be an unforgettable event for music lovers of all ages. With a stunning location, a great lineup of artists, and plenty of activities and amenities, it's sure to be the highlight of the summer!
"""

prompt = """
Your task is to extract the date of the event and the name of the event. The event is delimited by ### ###.

###
Event: {event}
###

Output:
"""

message = [
    {
        "role": "user",
        "content": prompt.format(event=event)
    }
]

print(get_completion(message))

Date: July 15th to July 16th
Event: Summer Beats Festival


As you add more tasks within the prompt you need to me more detailed and specific with the instructions.

In [35]:
prompt = """
Explain the event in 2 sentences. 
Extract the date of the event and the name of the event. The event is delimited by ### ###. 

Transform the dates into a MM/DD.

###
Event: {event}
###

Output format: Explanation | event name | date
"""

message = [
    {
        "role": "user",
        "content": prompt.format(event=event)
    }
]

print(get_completion(message))

The event is the Summer Beats Festival, which will be held at Ocean Park in Miami, Florida. The festival will take place over two days, from July 15th to July 16th.


The output above is not exactly what I wanted. The way to fix it is by being very specific and spell out the steps in details. 

## Function Calling

A useful function to get reliable outputs and format responses that can interact with external tools. 



### Example 1: Summarizing and Tagging

In [3]:
def get_completion(messages, functions, function_call, model="gpt-3.5-turbo-0613"):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        functions = functions,
        function_call=function_call,
    )
    return response

In [4]:
abstract = """
Training large language models (LLM) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans. Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM. Human evaluations on a complexity-balanced test bed show that instructions from Evol-Instruct are superior to human-created ones. By analyzing the human evaluation results of the high complexity part, we demonstrate that outputs from our WizardLM model are preferred to outputs from OpenAI ChatGPT. Even though WizardLM still lags behind ChatGPT in some aspects, our findings suggest that fine-tuning with AI-evolved instructions is a promising direction for enhancing large language models.
"""

In [13]:
functions = [
    {
        "name": "get_summary_and_tags",
        "description": "Returns the summary and tags of a given text.",
        "parameters": { # arguments of our function that ChatGPT will send to use and what type
            "type": "object", # we want arguments as objects
            "properties": { # properties of the object
                "tags": {
                    "type": "string", # tags are a string
                    "description": "The tags correspond to the machine learning models mentioned in the abstract."
                },
                "summary": {
                    "type": "string", # summary is a string
                    "description": "The summary of the text output."
                }
            }
        }
    }
]

In [14]:
messages = [
        {
            "role": "system",
            "content": "Your task is to extract the summary and tags of the following text. The tags correspond to the machine learning models mentioned in the abstract."
        },
        {
            "role": "user",
            "content": f"Here is the text: {abstract}. Now return the summary and tags."
        }
    ]

response = get_completion(messages, functions, {"name": "get_summary_and_tags"})

print(response)

{
  "id": "chatcmpl-7bB41c0eOAFx5cDDTgJSj4lBWtcTF",
  "object": "chat.completion",
  "created": 1689095021,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "get_summary_and_tags",
          "arguments": "{\n  \"summary\": \"Training large language models (LLM) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we propose a method to create large amounts of instruction data using LLM instead of humans. We evaluate the effectiveness of our method by comparing the instructions generated by LLM with human-created ones. Our findings suggest that fine-tuning with AI-evolved instructions is a promising direction for enhancing large language models.\",\n  \"tag

In [16]:
import json 

final_object = json.loads(response["choices"][0]["message"]["function_call"]["arguments"])

final_object

{'summary': 'Training large language models (LLM) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we propose a method to create large amounts of instruction data using LLM instead of humans. We evaluate the effectiveness of our method by comparing the instructions generated by LLM with human-created ones. Our findings suggest that fine-tuning with AI-evolved instructions is a promising direction for enhancing large language models.',
 'tags': 'large language models, instruction data, Evol-Instruct, WizardLM, human evaluations, AI-evolved instructions'}

### Example 2: Converting Webpage Content to Structured Output

In [17]:
HN_LINKS = """
1.	
Displayport: A Better Video Interface (hackaday.com)
139 points by zdw 1 hour ago | flag | hide | 98 comments

2.	
Emacs GUI Library (andreyor.st)
41 points by iscream26 51 minutes ago | flag | hide | 10 comments

3.	
Introducing Keras Core: Keras for TensorFlow, Jax, and PyTorch (keras.io)
59 points by dewitt 1 hour ago | flag | hide | 20 comments

4.	
Google's new metrics: Interaction to Next Paint (INP) (web.dev)
33 points by 42droids 1 hour ago | flag | hide | 9 comments

5.	
PhotoPrism: Browse Your Life in Pictures (github.com/photoprism)
194 points by pretext 5 hours ago | flag | hide | 103 comments

6.	
We put a distributed database in the browser and made a game of it (tigerbeetle.com)
112 points by BratishkaErik 3 hours ago | flag | hide | 25 comments

7.	
C++23: The Next C++ Standard (modernescpp.com)
94 points by ibobev 4 hours ago | flag | hide | 58 comments

8.	
Show HN: Clickvote – Open-source upvotes, likes, and reviews to any context (github.com/clickvote)
35 points by nevodavid10 2 hours ago | flag | hide | 25 comments

9.	
What we learned from using GPT for 500k+ classifications (trygloo.com)
36 points by hellovai 1 hour ago | flag | hide | 4 comments

10.	
Laws of UX (lawsofux.com)
58 points by mgdo 3 hours ago | flag | hide | 34 comments

11.	
Firejail: Light, featureful and zero-dependency security sandbox for Linux (firejail.wordpress.com)
29 points by nateb2022 1 hour ago | flag | hide | 11 comments

12.	
ScyllaDB is Moving to a New Replication Algorithm: Tablets (scylladb.com)
41 points by carpintech 2 hours ago | flag | hide | 11 comments

13.	
Claude 2 Model Card [pdf] (anthropic.com)
29 points by og_kalu 1 hour ago | flag | hide | 4 comments

14.	
HTTP vs. WebSockets: Which one is the fastest for Postgres queries at the Edge (neon.tech)
45 points by nikita 1 hour ago | flag | hide | 13 comments

15.	
Shop Class 2.0: Rethinking High School to Accelerate Electrification (volted.substack.com)
16 points by jeiden 1 hour ago | flag | hide | 6 comments

16.	
AI Safety and the Age of Dislightenment (fast.ai)
94 points by wskinner 2 hours ago | flag | hide | 94 comments

17.	
GitHub Profile Achievements (cqcumbers.com)
64 points by cqcumbers 2 hours ago | flag | hide | 36 comments

18.	
The Magic of Dependency Resolution (ochagavia.nl)
5 points by willm 40 minutes ago | flag | hide | discuss

19.	
Space After Periods (1993) (webhistory.org)
81 points by susam 10 hours ago | flag | hide | 45 comments

20.	
GPT-Prompt-Engineer (github.com/mshumer)
277 points by sturza 10 hours ago | flag | hide | 116 comments

21.	
The mystery of the Ain Dubai, the world’s largest (broken) Ferris wheel (washingtonpost.com)
13 points by Stratoscope 1 hour ago | flag | hide | 5 comments

22.	
Back-end parallelism in the Rust compiler (nnethercote.github.io)
127 points by edmorley 7 hours ago | flag | hide | 9 comments

23.	
At Japan’s first winery, the country’s oldest grape lives on (japantimes.co.jp)
59 points by karaokeyoga 6 hours ago | flag | hide | 41 comments

24.	
Solar Energy Solves Global Warming (tomaspueyo.com)
61 points by ph0rque 2 hours ago | flag | hide | 48 comments

25.	
GPT-4 details leaked? (threadreaderapp.com)
531 points by bx376 13 hours ago | flag | hide | 451 comments

26.	
Roots of Trust Are Difficult (mjg59.dreamwidth.org)
61 points by todsacerdoti 6 hours ago | flag | hide | 25 comments

27.	
IntelliJ Rust (jetbrains.com)
72 points by manchoz 2 hours ago | flag | hide | 28 comments

28.	
There's always more history (2020) (hillelwayne.com)
35 points by isp 5 hours ago | flag | hide | 6 comments

29.	
Threads and the Social/Communications Map (stratechery.com)
48 points by feross 5 hours ago | flag | hide | 34 comments

30.	
The story in pictures of the Hughes H-4 Hercules, 1945-1947 (rarehistoricalphotos.com)
16 points by dxs 2 hours ago | flag | hide | 4 comments
"""

In [18]:
functions = [
    {
        "name": "get_news_links",
        "description": "Returns a list of structured Hacker News posts given a list.",
        "parameters": { # arguments of our function that ChatGPT will send to use and what type
            "type": "object", # we want arguments as objects
            "properties": { # properties of the object
                "id": {
                    "type": "string",
                    "description": "The id of the Hacker News post."
                },
                "title": {
                    "type": "string", 
                    "description": "Corresponds to the title of the Hacker News post."
                },
                "url": {
                    "type": "string", 
                    "description": "Corresponds to the url of the Hacker News post."
                },
                "points": {
                    "type": "integer",
                    "description": "Corresponds to the points of the Hacker News post."
                },
                "comments": {
                    "type": "integer",
                    "description": "Corresponds to the number of comments of the Hacker News post."
                },
            }
        }
    }
]

In [28]:
messages = [
        {
            "role": "system",
            "content": "Your task is to return a structured list of Hacker News posts given a list of Hacker News posts."
        },
        {
            "role": "user",
            "content": f"Here is the list: {HN_LINKS}. Return the full list:"
        }
    ]

response = get_completion(messages, functions, {"name": "get_news_links"})
choices = response["choices"]
final_object = [json.loads(choices[i]["message"]["function_call"]["arguments"]) for i in range(len(choices))]
final_object

{
  "id": "chatcmpl-7bBMqqjNYUhbs3vWrXG3mCaeQtWHa",
  "object": "chat.completion",
  "created": 1689096188,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "get_news_links",
          "arguments": "[\n  {\n    \"id\": \"1\",\n    \"title\": \"Displayport: A Better Video Interface\",\n    \"url\": \"hackaday.com\",\n    \"points\": 139,\n    \"comments\": 98\n  },\n  {\n    \"id\": \"2\",\n    \"title\": \"Emacs GUI Library\",\n    \"url\": \"andreyor.st\",\n    \"points\": 41,\n    \"comments\": 10\n  },\n  {\n    \"id\": \"3\",\n    \"title\": \"Introducing Keras Core: Keras for TensorFlow, Jax, and PyTorch\",\n    \"url\": \"keras.io\",\n    \"points\": 59,\n    \"comments\": 20\n  },\n  {\n    \"id\": \"4\",\n    \"title\": \"Google's new metrics: Interaction to Next Paint (INP)\",\n    \"url\": \"web.dev\",\n    \"points\": 33,\n    \"comment

[[{'id': '1',
   'title': 'Displayport: A Better Video Interface',
   'url': 'hackaday.com',
   'points': 139,
   'comments': 98},
  {'id': '2',
   'title': 'Emacs GUI Library',
   'url': 'andreyor.st',
   'points': 41,
   'comments': 10},
  {'id': '3',
   'title': 'Introducing Keras Core: Keras for TensorFlow, Jax, and PyTorch',
   'url': 'keras.io',
   'points': 59,
   'comments': 20},
  {'id': '4',
   'title': "Google's new metrics: Interaction to Next Paint (INP)",
   'url': 'web.dev',
   'points': 33,
   'comments': 9},
  {'id': '5',
   'title': 'PhotoPrism: Browse Your Life in Pictures',
   'url': 'github.com/photoprism',
   'points': 194,
   'comments': 103},
  {'id': '6',
   'title': 'We put a distributed database in the browser and made a game of it',
   'url': 'tigerbeetle.com',
   'points': 112,
   'comments': 25},
  {'id': '7',
   'title': 'C++23: The Next C++ Standard',
   'url': 'modernescpp.com',
   'points': 94,
   'comments': 58},
  {'id': '8',
   'title': 'Show HN: Cl

This can then be the information to be passed to something like an API. The API then responds back with some information that you can then pass back to GPT to compose a reply if it was a chatbot.