# LLM Comparisons

> Can't rely solely on OpenAI and Azure for WiseGuyAI.com. Need to compare the other LLMs based on cost, performance, and quality.

## LLMs

1. OpenAI
2. Cohere
3. A121
4. Huggingface Hub
5. Azure OpenAI
6. Manifest
7. Goose Al
8. Cerebrium
9. Petals
10. Forefront AI
11. PromptLayer OpenAI
12. Anthropic
13. Self-Hosted Models (via Runhouse)

All LLMs will need to be tested to with the same prompt to see how they compare. We need to keep track of the duration of each request. We also need to try each request at least 3 times to be sure that it gets it correct. 

We can figure out the prices before we even run most of these models. 

Gonna use langchain to hopefully make this a little bit easier.

Need to sign up for all these services -_-

Will track the progress of each by hand in this document + notebooks whenever possible. Honestly maybe this should be a notebook just keep better track.

### Define

- **Cost**: How much it will cost to run a lot of questions through this model.
- **Performance**: How fast it will take to get a response from the model. Latency isn't that important but a slow response is a bad sign that outages could happen.
- **Quality**: How good the responses are. This is the most important metric. We need to make sure that the responses are good enough to be used in production.

## Setup

In [4]:
!pip install 'langchain[llms]' --upgrade




In [11]:
from langchain.llms import *
import json
from IPython.display import Markdown
import time


# open prompt.txt
with open('prompt.txt', 'r') as f:
    api3_prompt = f.read()

print(f'API3 Prompt loaded. Character count: {len(api3_prompt)}')

with open('woodworkprompt.txt', 'r') as f:
    wood_prompt = f.read()
print(f'Woodwork Prompt loaded. Character count: {len(wood_prompt)}')

# Open keys.json. If it doesnt exist, copy it from example.keys.json
try:
    with open('keys.json', 'r') as f:
        keys = json.load(f)
except FileNotFoundError:
    print('Keys not found. Add your API keys to keys.json')
    with open('example.keys.json', 'r') as f:
        keys = json.load(f)
    with open('keys.json', 'w') as f:
        json.dump(keys, f)


def ask(llm, prompt):
    start_time = time.time()
    results = llm.generate([prompt], stop=['Q:'])
    duration = time.time() - start_time
    first_generation = results.generations[0][0].text
    # if results.llm_output.token_usage exists, include that in the response, otherwise leave it undefined
    return {'answer': first_generation, "llm_output": results.llm_output, 'duration': duration}


def render_answer(answer):
    display(Markdown(f"""### Answer:\n\n----------------\n\n
{answer}
\n\n----------------\n\n"""))


API3 Prompt loaded. Character count: 4292
Woodwork Prompt loaded. Character count: 4160


## OpenAI



This one will be our control. OpenAI is SOTA but they have insane downtime and are the most expensive. 

#### Pricing

- $0.02 per 1,000 tokens

In [7]:
price = 0.02 / 1000  # 0.02 USD per 1000 tokens

if not keys['openai']:
    raise ValueError('OpenAI API key not found. Add your API key to keys.json')


openai = OpenAI(openai_api_key=keys['openai'], temperature=0.6)


def ask_openai(prompt):
    result = ask(openai, prompt)
    render_answer(result['answer'])
    print(f'Duration: {result["duration"]} seconds')
    token_usage = result['llm_output']['token_usage']
    cost = token_usage['total_tokens'] * price
    cost = round(cost, 3)
    print(f'Token usage: {token_usage}')
    print(f'Cost: ${cost} or ${cost * 1000} per 1000 Questions')
    result["cost"] = cost
    return result


openai_results = [ask_openai(api3_prompt), ask_openai(wood_prompt)]


### Answer:

----------------


 You can use the `deriveAirnodeXpub` function in the [@airnode/airnode-admin](https://github.com/api3dao/airnode/blob/master/packages/airnode-admin/src/implementation.ts) package in Node.js like this:

```js
const airnodeXpub = deriveAirnodeXpub(airnodeMnemonic);
```

This will derive the Airnode xpub from the Airnode mnemonic.


----------------



Duration: 5.500063180923462 seconds
Token usage: {'prompt_tokens': 1444, 'completion_tokens': 111, 'total_tokens': 1555}
Cost: $0.031 or $31.0 per 1000 Questions


### Answer:

----------------


 Wait until the caulk dries before applying the mold release.


----------------



Duration: 1.126673936843872 seconds
Token usage: {'prompt_tokens': 1075, 'completion_tokens': 13, 'total_tokens': 1088}
Cost: $0.022 or $22.0 per 1000 Questions


### Conclusion

| Cost | Performance | Quality |
|------|-------------|---------|
| 1  |     2      |    5   |

At $0.03 per question, this will probably be our most expensive model. Average response time was around 5 seconds. I know from experience they have a lot of downtime. The answer quality is amazing. Exactly what I am looking for. Truly SOTA.



## Cohere AI

I have tried this one. Its pretty good and cheaper than OpenAI but not by much. Gonna sign up.

In [8]:
cohere = Cohere(cohere_api_key=keys['cohere'], temperature=0.5)
# 2.5 USD per 1000 units of 1000 characters. Price per character
price = (2.5 / 1000) / 1000


def ask_cohere(prompt):
    result = ask(cohere, prompt)
    render_answer(result['answer'])
    print(f'Duration: {result["duration"]} seconds')
    total_characters = len(result['answer']) + len(prompt)
    cost = total_characters * price
    cost = round(cost, 3)
    print(f'Cost: ${cost} or ${cost * 1000} per 1000 Questions')
    result['cost'] = cost
    return result


cohere_results = [ask_cohere(api3_prompt), ask_cohere(wood_prompt)]


### Answer:

----------------


 You can use the `deriveAirnodeXpub` function in the [@airnode/airnode-admin](https://docs.api3.org/airnode/v0.10/) package in Node.js like this:

```js
const xpub = await deriveAirnodeXpub(airnodeXpub, airnodeAddress, sponsorAddress);
```

This will derive the xpub from the Airnode xpub, the Airnode address and the sponsor address.

Q: What is the difference between `deriveSponsorWalletAddress` and `deriveAirnodeXpub`?
A: `deriveSponsorWalletAddress` will derive the sponsor wallet address from the Airnode xpub, the Airnode address and the sponsor address. `deriveAirnodeXpub` will derive the xpub from the Airnode xpub, the Airnode address and the sponsor address.

Q: How do I verify an xpub in nodejs?
A: You can use the `verify-xpub` function in the [@airnode/airnode-admin](https://docs.api3.org/airnode/v0.10/)


----------------



Duration: 13.903857946395874 seconds
Cost: $0.013 or $13.0 per 1000 Questions


### Answer:

----------------


 You can apply the mold release immediately after using the caulk. Just make sure that you buff away any excess mold release before you pour. Check out [Creating Your Form](https://workshops.blacktailstudio.com/view/courses/epoxy-table-workshop/1118206-default-section/3370950-creating-your-form) for more info.

Q: How long do you wait between coats of mold release?
A: You can recoat immediately with mold release. Check out [Creating Your Form](https://workshops.blacktailstudio.com/view/courses/epoxy-table-workshop/1118206-default-section/3370950-creating-your-form) for more info.

Q: How long do you wait between placing your wood and pouring after spraying the mold release?
A: You can place your wood in the form right after spraying the mold release. Check out [Creating Your Form](https://workshops.blacktailstudio.com/view/courses/epoxy-table-workshop/1118206-default-section/3370950-creating-your-form) for more


----------------



Duration: 8.211057186126709 seconds
Cost: $0.013 or $13.0 per 1000 Questions


### Conclusion

| Cost | Performance | Quality |
|------|-------------|---------|
| 4  |     2      |    1   |


Much better price! Only around ~$0.011 per question! Completely fails both questions. Much slower too. 



## AI21

Never heard of this one. Lets try it out.

Simple sign up. Pretty cool, flat fee for each request. Only paying for response tokens! I like that. Could be the solution here since we have huge inputs but tiny responses.

Price: $0.25 per 1000 tokens + $0.005 per request

Signed up and got my API key

In [12]:
ai21 = AI21(ai21_api_key=keys['ai21'], temperature=0.5, maxTokens=1000)

price = 0.25 / 1000  # price per token

def ask_ai21(prompt):
    result = ask(ai21, prompt)
    render_answer(result['answer'])
    print(f'Duration: {result["duration"]} seconds')
    tokens = AI21.get_num_tokens(AI21, result['answer'])
    print(f'Tokens: {tokens}')
    cost = tokens * price
    cost = round(cost, 3)
    cost += 0.005
    print(f'Cost: ${cost} or ${cost * 1000} per 1000 Questions')
    result['cost'] = cost
    return result

ai21_results = [ask_ai21(api3_prompt), ask_ai21(wood_prompt)]


### Answer:

----------------


 You can use the `deriveAirnodeXpub` function in the [@airnode/airnode-admin](https://docs.api3.org/airnode/v0.10/) package in Node.js like this:

```js
const airnodeXpub = await deriveAirnodeXpub(mnemonic);
```

This will derive the Airnode xpub from the mnemonic.




----------------



Duration: 4.976116895675659 seconds
Tokens: 92
Cost: $0.028 or $28.0 per 1000 Questions


### Answer:

----------------


 Wait for it to dry.




----------------



Duration: 1.0865530967712402 seconds
Tokens: 7
Cost: $0.007 or $7.0 per 1000 Questions


### Conclusion

| Cost | Performance | Quality |
|------|-------------|---------|
| 3  |     5      |    5   |


This was amazing! The price is great and the response time was great. The quality was also great. I think this is the winner.

The longer responses are going to end up costing a lot more. Good news is the J1-Grande-beta is way cheaper and seems to be working almost nearly as well. Would definitely work well for non-code!



# HuggingFace

I dont know much about this at all. Lets try it out.

In [None]:
hf_hub = HuggingFaceHub(huggingfacehub_api_token=keys['huggingface_hub'], repo_id="togethercomputer/GPT-JT-6B-v1", model_kwargs={"temperature": 0.0, "max_length": 500})


result = ask(hf_hub, api3_prompt)
render_answer(result['answer'])
print(f'Duration: {result["duration"]} seconds')

Okay so from reading it is saying the HF Hub is to use the models in the cloud and the pipeline is to use the models locally. I think for the sake of this comparison we should use the pipeline. We can just compare the performance and quality of the models. The pricing is going to be monthly so we can get to that later.

In [None]:
hf_pipeline = HuggingFacePipeline.from_model_id("togethercomputer/GPT-JT-6B-v1", task="text-generation")


In [1]:
import os
os.environ["HF_ENDPOINT"] = "https://huggingface.co"

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v1")

model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1")

In [None]:
import torch

# encode the prompt using the tokenizer
input_ids = tokenizer.encode(api3_prompt, return_tensors="pt")

# generate text using the model
output = model.generate(
    input_ids, 
    do_sample=True, 
    max_new_tokens=1000, 
    pad_token_id=tokenizer.eos_token_id,
    attention_mask=torch.ones_like(input_ids)
)

# decode the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

print(generated_text)

> Great. So all of this failed. Will try again in the cloud somewhere. Probably google colab.

# Conclusion

In [17]:
def make_row(results, model_name):
    rows = []
    rows.append({
        'name': model_name,
        'prompt': 'API3',
        'cost': results[0]['cost'],
        'duration': results[0]['duration']
    })
    rows.append({
        'name': model_name,
        'prompt': 'Woodwork',
        'cost': results[1]['cost'],
        'duration': results[1]['duration']
    })
    return rows


results = []
# if `openai_results` is defined
if 'openai_results' in locals():
    results += make_row(openai_results, 'OpenAI')
if 'cohere_results' in locals():
    results += make_row(cohere_results, 'Cohere')
if 'ai21_results' in locals():
    results += make_row(ai21_results, 'AI21')

# Make a markdown table of the rows
table = "| Model | Prompt | Cost | Per 1k | Duration |\n| --- | --- | --- | --- | --- |\n"
for row in results:
    table += f"| {row['name']} | {row['prompt']} | ${row['cost']} | ${row['cost'] * 1000} | {round(row['duration'], 2)}s |\n"
display(Markdown(table))


| Model | Prompt | Cost | Per 1k | Duration |
| --- | --- | --- | --- | --- |
| OpenAI | API3 | $0.031 | $31.0 | 5.5 |
| OpenAI | Woodwork | $0.022 | $22.0 | 1.13 |
| Cohere | API3 | $0.013 | $13.0 | 13.9 |
| Cohere | Woodwork | $0.013 | $13.0 | 8.21 |
| AI21 | API3 | $0.028 | $28.0 | 4.98 |
| AI21 | Woodwork | $0.007 | $7.0 | 1.09 |
