# LLM Comparisons

> Can't rely solely on OpenAI and Azure for WiseGuyAI.com. Need to compare the other LLMs based on cost, performance, and quality.

## LLMs

1. OpenAI
2. Cohere
3. A121
4. Huggingface Hub
5. Azure OpenAI
6. Manifest
7. Goose Al
8. Cerebrium
9. Petals
10. Forefront AI
11. PromptLayer OpenAI
12. Anthropic
13. Self-Hosted Models (via Runhouse)

All LLMs will need to be tested to with the same prompt to see how they compare. We need to keep track of the duration of each request. We also need to try each request at least 3 times to be sure that it gets it correct. 

We can figure out the prices before we even run most of these models. 

Gonna use langchain to hopefully make this a little bit easier.

Need to sign up for all these services -_-

Will track the progress of each by hand in this document + notebooks whenever possible. Honestly maybe this should be a notebook just keep better track.

### Define

- **Cost**: How much it will cost to run a lot of questions through this model.
- **Performance**: How fast it will take to get a response from the model. Latency isn't that important but a slow response is a bad sign that outages could happen.
- **Quality**: How good the responses are. This is the most important metric. We need to make sure that the responses are good enough to be used in production.

## Setup

In [None]:
!pip install 'langchain[llms]' --upgrade


In [29]:
from langchain.llms import *
import json
from IPython.display import Markdown
import time


# open prompt.txt
with open('prompt.txt', 'r') as f:
    api3_prompt = f.read()

print(f'API3 Prompt loaded. Character count: {len(api3_prompt)}')

with open('woodworkprompt.txt', 'r') as f:
    wood_prompt = f.read()
print(f'Woodwork Prompt loaded. Character count: {len(wood_prompt)}')

# Open keys.json. If it doesnt exist, copy it from example.keys.json
try:
    with open('keys.json', 'r') as f:
        keys = json.load(f)
except FileNotFoundError:
    print('Keys not found. Add your API keys to keys.json')
    with open('example.keys.json', 'r') as f:
        keys = json.load(f)
    with open('keys.json', 'w') as f:
        json.dump(keys, f)


def ask(llm, price, prompt):
    start_time = time.time()
    results = llm.generate([prompt], stop=['Q: '])
    duration = time.time() - start_time
    first_generation = results.generations[0][0].text
    # if results.llm_output.token_usage exists, include that in the response, otherwise leave it undefined
    try:
        token_usage = results.llm_output['token_usage']
        cost = token_usage['total_tokens'] * price
        cost = round(cost, 3)
    except:
        token_usage = None
        cost = None

    display(Markdown(f"""### Answer:\n\n----------------\n\n
{first_generation}
\n\n----------------\n\n"""))
    print(f'Duration: {duration} seconds')

    if token_usage:
        print(f'Token usage: {token_usage}')
        print(f'Cost: ${cost} or ${cost * 1000} per 1000 Questions')

    return {'answer': first_generation, 'token_usage': token_usage, 'cost': cost, 'duration': duration}


API3 Prompt loaded. Character count: 4292
Woodwork Prompt loaded. Character count: 4160


## OpenAI



This one will be our control. OpenAI is SOTA but they have insane downtime and are the most expensive. 

#### Pricing

- $0.02 per 1,000 tokens

In [28]:
price = 0.02 / 1000  # 0.02 USD per 1000 tokens

# If keys['openai'] falsy, throw error
if not keys['openai']:
    raise ValueError('OpenAI API key not found. Add your API key to keys.json')


openai = OpenAI(model_name="text-davinci-003",
                openai_api_key=keys['openai'], temperature=0.6)

openai_results = [ask(openai, price, api3_prompt),
                  ask(openai, price, wood_prompt)]


LLMResult(generations=[[Generation(text=' You can use the `deriveAirnodeXpub` function in the [@airnode/airnode-admin](https://github.com/api3dao/airnode/blob/master/packages/airnode-admin/src/implementation.ts) package in Node.js like this:\n\n```js\nconst airnodeXpub = deriveAirnodeXpub(airnodeMnemonic);\n```\n\nThis will derive the Airnode xpub from the Airnode mnemonic.', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {'total_tokens': 1555, 'completion_tokens': 111, 'prompt_tokens': 1444}})


### Answer:

----------------


 You can use the `deriveAirnodeXpub` function in the [@airnode/airnode-admin](https://github.com/api3dao/airnode/blob/master/packages/airnode-admin/src/implementation.ts) package in Node.js like this:

```js
const airnodeXpub = deriveAirnodeXpub(airnodeMnemonic);
```

This will derive the Airnode xpub from the Airnode mnemonic.


----------------



Duration: 4.660804986953735 seconds
Token usage: {'total_tokens': 1555, 'completion_tokens': 111, 'prompt_tokens': 1444}
Cost: $0.031 or $31.0 per 1000 Questions
LLMResult(generations=[[Generation(text=' Wait till the caulk dries before applying the mold release.', generation_info={'finish_reason': 'stop', 'logprobs': None})]], llm_output={'token_usage': {'total_tokens': 1088, 'completion_tokens': 13, 'prompt_tokens': 1075}})


### Answer:

----------------


 Wait till the caulk dries before applying the mold release.


----------------



Duration: 1.284977912902832 seconds
Token usage: {'total_tokens': 1088, 'completion_tokens': 13, 'prompt_tokens': 1075}
Cost: $0.022 or $22.0 per 1000 Questions


#### Conclusion

| Cost | Performance | Quality |
|------|-------------|---------|
| 1  |     2      |    5   |

At $0.03 per question, this will probably be our most expensive model. Average response time was around 5 seconds. I know from experience they have a lot of downtime. The answer quality is amazing. Exactly what I am looking for. Truly SOTA.



## Cohere AI

I have tried this one. Its pretty good and cheaper than OpenAI but not by much. Gonna sign up.

In [32]:
cohere = Cohere(cohere_api_key=keys['cohere'], temperature=0.1)
# 2.5 USD per 1000 units of 1000 characters. Price per character
price = (2.5 / 1000) / 1000


def ask_cohere(prompt):
    result = ask(cohere, price, prompt)
    total_characters = len(result['answer']) + len(prompt)
    cost = total_characters * price
    cost = round(cost, 3)
    print(f'Cost: ${cost} or ${cost * 1000} per 1000 Questions')
    result['cost'] = cost
    return result


cohere_results = [ask_cohere(api3_prompt), ask_cohere(wood_prompt)]


### Answer:

----------------


 You can use the `deriveAirnodeXpub` function in the [@airnode/airnode-admin](https://docs.api3.org/airnode/v0.10/) package in Node.js like this:

```js
const xpub = await deriveAirnodeXpub(airnodeXpub, airnodeAddress);
```

This will derive the Airnode xpub from the Airnode xpub, the Airnode address and the Airnode mnemonic.




----------------



Duration: 22.349201917648315 seconds
Cost: $0.012 or $12.0 per 1000 Questions


### Answer:

----------------


 You can recoat immediately. And it won't dry out over weeks or longer.




----------------



Duration: 10.559345006942749 seconds
Cost: $0.011 or $11.0 per 1000 Questions


#### Conclusion

| Cost | Performance | Quality |
|------|-------------|---------|
| 4  |     2      |    3   |


Much better price! Only around ~$0.011 per question! Completely fails at the coding challenge but nails the context challenge. Could be a useful cheaper alternative. Much slower though. 

