# Does mentioning tipping in the prompt make LLMs give better answers? If it does, does the amount matter?

### Randomness

In [45]:
import random

Since all LLMs are statistcal models, the answers they generate are random as well. Setting the random seed for all the responses helps with the reproducibility of the result.

In [55]:
random.seed(555)
random_seeds = [random.randint(0, 5000)for i in range(5)]
random_seeds

[1583, 2249, 1319, 1050, 4597]

### Model: GPT

In [3]:
import openai
from openai import OpenAI

In [30]:
def get_responses_gpt(q_list:list, model:str, num_choices:int=1):
    client = OpenAI()
    responses = []
    for question in q_list:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {'role':'user', 'content':question}
            ],
            n=num_choices
        )
        responses.append(response)
    
    return responses

### Testing on several current models

In [41]:
questions = ["Can you explain to me in detail how transformer architecture works?",
             "Can you explain to me in detail how transformer architecture works? I will tip you 1 cent after you anwsered.",
             "Can you explain to me in detail how transformer architecture works? I will tip you 10 cent after you anwsered.",
             "Can you explain to me in detail how transformer architecture works? I will tip you 1 dollar after you anwsered.",
             "Can you explain to me in detail how transformer architecture works? I will tip you 100 dollar after you anwsered.",
             "Can you explain to me in detail how transformer architecture works? I will tip you 1000000 dollar after you anwsered."]

In [42]:
responses = get_responses_gpt(questions, 'gpt-3.5-turbo', num_choices=5)

In [43]:
for r in responses:
    total_len = 0
    for i in range(5):
        length = len(r.choices[i].message.content)
        total_len += length
    print(f'Average length: {total_len/5:.3f}')

Average length: 2631.000
Average length: 2032.000
Average length: 2003.000
Average length: 2124.800
Average length: 2405.000
Average length: 2119.800


### Testing on some older models

In [None]:
def get_responses_gpt_legacy(q_list:list, model:str, num_choices:int=1):
    client = OpenAI()
    responses = []
    for question in q_list:
        response = client.completions.create(
            model=model,
            prompt=question
            n=num_choices
        )
        responses.append(response)
    
    return responses

In [44]:
responses = get_responses(questions, 'gpt-3.5-turbo-instruct', num_choices=5)

NotFoundError: Error code: 404 - {'error': {'message': 'This is not a chat model and thus not supported in the v1/chat/completions endpoint. Did you mean to use v1/completions?', 'type': 'invalid_request_error', 'param': 'model', 'code': None}}