# Comparison of ChatGPT 3.5 Models Behavior

The goal of this small experiment is to use the same set of instructions and compare the answers provided by different ChatGPT 3.5 models.

The answers are limited to the content of a set of documents, therefore, a RAG approach is used.

Created by: Jhonnatan Torres
___

I stored my OpenAI key in a .env file, as a result, if you want to reproduce this notebook, you would need to install `pip install python-dotenv`, create a `.env` file in the same directory than the notebook and add your key to the `OPENAI_API_KEY` entry in the .env file

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

The models to be used in this small experiment are referenced in the official documentation of OpenAI, https://platform.openai.com/docs/models/gpt-3-5-turbo, these are part of the "3.5" family and work with a 16K context.

In [2]:
models = ["gpt-3.5-turbo-0125", "gpt-3.5-turbo-1106", "gpt-3.5-turbo-16k-0613"]

This is the set of instructions, simple RAG application, with some follow-up capabilities

In [3]:
USER_FIRST_MESSAGE = '''You will be asked a series of questions or issues which can be found between 
the <QUESTION> XML Tags.
A collection of documents that must be used to provide an answer to 'QUESTION' can be found 
between the <SOURCES> XML Tags.

Use your experience as an AI Assistant, troubleshoot and respond with an answer to 
each 'QUESTION' following these guidelines:

- Limit your knowledge to 'SOURCES' and determine if its content can provide a full and honest answer to 'QUESTION', 
  if it does, then respond with a honest answer. 
- Limit your answer to the content of the 'SOURCES'.  
- Don't include more information or your chain of thought in the answer.
- If you are unsure about the answer, or 'QUESTION' is not clear, or it is an  open question, then feel free 
  to respond with a follow-up question for the student to get a better understanding of the 'QUESTION' and 
  keep the troubleshooting.
- If the content in the 'SOURCES' cannot provide a complete and honest answer to 'QUESTION' 
  then respond with "IDK".'''

These are the sources that should be used to provide an Answer (*mix of real and made up documents*)

In [4]:
SOURCES='''
DOC1234: In python a lambda function can be created following this structure ```lambda x: x**2```.
DOC1235: Collection from the numbutils can be really handy to get the value counts of items in a list in python.
DOC1236: You can elevate a number to an `x` power by using the `**``character in python, 
for example ```x ** 2** is equal to `x to the power of 2`.
'''

___
## First Use Case:
The question entered by the student is not clear, therefore, the expected answer that should be provided by the chatbot is a follow-up question for the student.

In [5]:
from openai import OpenAI
client = OpenAI()
for m in models:
  response = client.chat.completions.create(
    model = m,
    messages = [
      {
        "role": "system",
        "content": "You are a friendly AI Assistant helping to students who have a basic knowledge \
          about programming languages"
      },
      {
        "role": "user",
        "content": USER_FIRST_MESSAGE
      },
      {
        "role": "assistant",
        "content": "OK. Instructions Are Clear."
      },
      {
        "role": "user",
        "content": f"<QUESTION>How can I create a lambda function in Python?</QUESTION>\n<SOURCES>\{SOURCES}</SOURCES>\nOutput:"
      },
      {
        "role": "assistant",
        "content": "To create a lambda function in Python, you can use the following structure: `lambda x: x**2.`"
      },
      {
        "role": "user",
        "content": f"<QUESTION>How can I do this?</QUESTION>\n<SOURCES>\n{SOURCES}\n</SOURCES>\nOutput:"
      }
    ],
    temperature=0.1,
    max_tokens=50,
    top_p=0.1,
  )
  print(f"Model[{m}]:")
  print(response.choices[0].message.content)

Model[gpt-3.5-turbo-0125]:
IDK
Model[gpt-3.5-turbo-1106]:
IDK
Model[gpt-3.5-turbo-16k-0613]:
I'm sorry, but the content in the sources does not provide a complete answer to your question. Could you please clarify what you are trying to do?


According to the official documentation of OpenAI (https://platform.openai.com/docs/models/gpt-3-5-turbo) the `gpt-3.5-turbo-0125` is the most recent or updated model, and `gpt-3.5-turbo-16k-0613` is in a legacy status:
> gpt-3.5-turbo-16k-0613

> "Legacy - Snapshot of gpt-3.5-16k-turbo from June 13th 2023. Will be deprecated on June 13, 2024."

However, the model which is in a "legacy" status, was able to answer with the (*expected*) follow-up question for the student.
___

## Second Use Case:
Providing an answer to the question entered by the student based on the sources only. (RAG Approach)

In [6]:
from openai import OpenAI
client = OpenAI()
for m in models:
  response = client.chat.completions.create(
    model = m,
    messages = [
      {
        "role": "system",
        "content": "You are a friendly AI Assistant helping to students who have a basic knowledge \
          about programming languages"
      },
      {
        "role": "user",
        "content": USER_FIRST_MESSAGE
      },
      {
        "role": "assistant",
        "content": "OK. Instructions Are Clear."
      },
      {
        "role": "user",
        "content": f"<QUESTION>How can I create a lambda function in Python?</QUESTION>\n<SOURCES>\{SOURCES}</SOURCES>\nOutput:"
      },
      {
        "role": "assistant",
        "content": "To create a lambda function in Python, you can use the following structure: `lambda x: x**2.`"
      },
      {
        "role": "user",
        "content": f"<QUESTION>How can I get the value counts of items in a list in python?</QUESTION>\n<SOURCES>\n{SOURCES}\n</SOURCES>\nOutput:"
      }
    ],
    temperature=0.1,
    max_tokens=50,
    top_p=0.1,
  )
  print(f"Model[{m}]:")
  print(response.choices[0].message.content)

Model[gpt-3.5-turbo-0125]:
Collection from the numbutils can be really handy to get the value counts of items in a list in Python.
Model[gpt-3.5-turbo-1106]:
To get the value counts of items in a list in Python, you can use the collection from the `numbutils`.
Model[gpt-3.5-turbo-16k-0613]:
To get the value counts of items in a list in Python, you can use the `collections` module from the `numbutils` library.


I know, it can be a little subjective, but the answer provided by `gpt-3.5-turbo-16k-0613` looks better, or at least, more friendly, but it assumed `numbutils` was a library, and that fact was not mentioned in the sources.

The answer provided by `gpt-3.5-turbo-1106` is friendly and it is not adding information outside of the sources.

`gpt-3.5-turbo-0125` provided the exact text referenced in the sources: *"Collection from the numbutils can be really handy to get the value counts of items in a list in Python"*. It did not alter the word order or attempt to rephrase it to answer the question.
___

## Closing Remarks

- I know, it is not possible to draw conclusions with these simple experiments, however, my recommendation is to keep in mind these differences in the answers and the behaviour of the models when designing your RAG applications.

- To be honest, I was expecting `gpt-3.5-turbo-0125` was going to be able to reply with a follow-up question for the student in the first use case, but in contrast, the 'legacy' model performed better in following the instructions.

- Note: I included the instructions in the first user message because I wanted to reduce the number of tokens used in each turn, this way, I am not sending the Question, Sources and Instructions in each user turn. *I know, it is not the common RAG approach*.