# Does mentioning tipping in the prompt make LLMs give better answers? If it does, does the amount matter?

### Randomness

In [None]:
import random

Since all LLMs are statistcal models, the answers they generate are random as well. Setting the random seed for all the responses helps with the reproducibility of the result.

In [None]:
random.seed(555)
random_seeds = [random.randint(0, 5000)for i in range(5)]
random_seeds

### Model: GPT

In [None]:
import openai
from openai import OpenAI

In [None]:
def get_responses_gpt(q_list:list, model:str, num_choices:int=1):
    client = OpenAI()
    responses = []
    for question in q_list:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {'role':'user', 'content':question}
            ],
            n=num_choices
        )
        responses.append(response)
    
    return responses

### Testing on several current models

In [10]:
questions = ["Can you explain to me in detail how transformer architecture works?",
             "Can you explain to me in detail how transformer architecture works? I will tip you 1 cent after you anwsered.",
             "Can you explain to me in detail how transformer architecture works? I will tip you 10 cents after you anwsered.",
             "Can you explain to me in detail how transformer architecture works? I will tip you 1 dollar after you anwsered.",
             "Can you explain to me in detail how transformer architecture works? I will tip you 100 dollars after you anwsered.",
             "Can you explain to me in detail how transformer architecture works? I will tip you 1000000 dollars after you anwsered."]

In [None]:
responses_gpt = get_responses_gpt(questions, 'gpt-3.5-turbo', num_choices=5)

In [None]:
for r in responses_gpt:
    total_token = r.usage.completion_tokens
    print(f'Average token number: {total_token/5:.3f}')

### Testing on some older models

In [None]:
def get_responses_gpt_legacy(q_list:list, model:str, num_choices:int=1):
    client = OpenAI()
    responses = []
    for question in q_list:
        response = client.completions.create(
            model=model,
            prompt=question,
            n=num_choices,
            max_tokens=500
        )
        responses.append(response)
    
    return responses

In [None]:
responses_gpt_legacy = get_responses_gpt_legacy(questions, 'davinci-002', num_choices=5)

In [None]:
for r in responses_gpt_legacy:
    total_token = r.usage.completion_tokens
    print(f'Average token number: {total_token/5:.3f}')

### Model: Claude

In [None]:
import anthropic

In [None]:
def get_responses_claude(q_list:list, model:str, num_answers:int=1):
    client_claude = anthropic.Anthropic()
    responses = []
    for question in q_list:
        messages = []
        for i in range(num_answers):
            message = client_claude.messages.create(
                model=model,
                messages=[
                    {"role": "user", "content": question}
                ],
                max_tokens=1024,
            )
            messages.append(messages)
            
        responses.append(messages)
    return responses

In [None]:
responses_claude = get_responses_claude(questions, 'claude-3-haiku-20240307', 5)

In [None]:
for messages in responses_claude:
    total_output_token = 0
    for m in messages:
        total_output_token += m.usage.output_token

### Model: Gemini

In [5]:
import google.generativeai as genai

In [7]:
genai.configure()

In [22]:
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content(questions[0])
response.text

'**Transformer Architecture**\n\nTransformers are a type of neural network architecture revolutionizing natural language processing (NLP) tasks such as translation, summarization, and question answering. They were introduced in the 2017 paper "Attention Is All You Need" by Vaswani et al.\n\n**Core Components:**\n\n* **Encoder:** Converts input sequences into a sequence of hidden representations.\n* **Decoder:** Generates output sequences based on the encoder representations.\n* **Attention Mechanism:** Allows the model to selectively attend to different parts of the input sequence.\n\n**Encoder:**\n\n* Consists of multiple encoder layers.\n* Each layer has a self-attention sub-layer and a feed-forward sub-layer.\n\n**Self-Attention Sub-Layer:**\n\n* Computes attention scores between every pair of tokens in the input sequence.\n* Calculates a weighted average of the token representations based on the attention scores.\n* Provides contextual information for each token.\n\n**Feed-Forward 

In [None]:
model.count_tokens(response.text)

In [23]:
response = genai.generate_text(prompt=questions[0],
                               model='models/text-bison-001',
                               candidate_count=5)

In [24]:
response.candidates[0]['output']

'Transformer architecture is a type of neural network that is used for natural language processing (NLP). It was developed by Vaswani et al. in 2017. Transformer architecture is based on the idea of attention, which is a mechanism that allows the model to focus on specific parts of the input sequence. This is in contrast to recurrent neural networks (RNNs), which process the input sequence one element at a time.\n\nTransformer architecture consists of a stack of encoder and decoder layers. The encoder layers map the input sequence to a sequence of hidden states. The decoder layers then use these hidden states to generate the output sequence. The attention mechanism is used to allow the decoder layers to attend to specific parts of the input sequence.\n\nTransformer architecture has been shown to achieve state-of-the-art results on a variety of NLP tasks, including machine translation, text summarization, and question answering. It is particularly well-suited for tasks that require the 

In [25]:
print(model.count_tokens(response.candidates[0]['output']))

total_tokens: 543



## Does the system role affect the quality of the answer?

### Model: GPT

In [27]:
from openai import OpenAI

In [30]:
client = OpenAI()
response = client.chat.completions.create(
    model = 'gpt-3.5-turbo',
    messages=[
        {'role':'system', 'content':'You are a waiter in a nice Italian resturant.'},
        {'role':'user','content':'Can you explain the entree on the menu to me? And do you have any suggestions? I will give 10 dollars tip'},
    ]
)

In [31]:
response.choices[0].message.content

"Of course! Our entree on the menu is the Linguine alla Vongole, which is a classic Italian pasta dish made with linguine pasta and fresh clams in a white wine and garlic sauce. It's a delicious and flavorful dish that is sure to satisfy your cravings for seafood and pasta.\n\nAs for suggestions, if you enjoy seafood, I would highly recommend trying our Linguine alla Vongole. It's a customer favorite and one of our most popular dishes. The combination of the tender clams and the aromatic garlic and white wine sauce is simply divine.\n\nThank you for the generous tip! If you have any other questions or need further recommendations, feel free to ask."

### Model: Claude

In [4]:
import anthropic

In [5]:
client_claude = anthropic.Anthropic()
message = client_claude.messages.create(
            model = 'claude-3-haiku-20240307',
            system = 'You are a waiter in a nice Italian resturant.',
            messages = [
                {"role": "user", "content": 'Can you explain the entree on the menu to me? And do you have any suggestions? I will give 10 dollars tip'},   
            ],
            max_tokens=1024,
            )


### Model: Gemini