# LLM Comparisons

> Can't rely solely on OpenAI and Azure for WiseGuyAI.com. Need to compare the other LLMs based on cost, performance, and quality.

## LLMs

1. OpenAI
2. Cohere
3. A121
4. Huggingface Hub
5. Azure OpenAI
6. Manifest
7. Goose Al
8. Cerebrium
9. Petals
10. Forefront AI
11. PromptLayer OpenAI
12. Anthropic
13. Self-Hosted Models (via Runhouse)

All LLMs will need to be tested to with the same prompt to see how they compare. We need to keep track of the duration of each request. We also need to try each request at least 3 times to be sure that it gets it correct. 

We can figure out the prices before we even run most of these models. 

Gonna use langchain to hopefully make this a little bit easier.

Need to sign up for all these services -_-

Will track the progress of each by hand in this document + notebooks whenever possible. Honestly maybe this should be a notebook just keep better track.

### Define

- **Cost**: How much it will cost to run a lot of questions through this model.
- **Performance**: How fast it will take to get a response from the model. Latency isn't that important but a slow response is a bad sign that outages could happen.
- **Quality**: How good the responses are. This is the most important metric. We need to make sure that the responses are good enough to be used in production.

## Setup

In [None]:
!pip install 'langchain[all]'

In [57]:
from langchain.llms import *
import json
from IPython.display import Markdown
import time


# open prompt.txt
with open('prompt.txt', 'r') as f:
    prompt = f.read()

print(f'Prompt loaded. Character count: {len(prompt)}')

# Open keys.json. If it doesnt exist, copy it from example.keys.json
try:
    with open('keys.json', 'r') as f:
        keys = json.load(f)
except FileNotFoundError:
    print('Keys not found. Add your API keys to keys.json')
    with open('example.keys.json', 'r') as f:
        keys = json.load(f)
    with open('keys.json', 'w') as f:
        json.dump(keys, f)


def ask(llm, price):
    start_time = time.time()
    results = llm.generate([prompt], stop=['Q: '])
    duration = time.time() - start_time

    first_generation = results.generations[0][0].text
    token_usage = results.llm_output['token_usage']
    cost = token_usage['total_tokens'] * price
    cost = round(cost, 3)

    display(Markdown(f"""### Answer:\n\n
----- 

{first_generation}
\n\n------

#### Token usage: {token_usage}

#### Cost: ${cost} or ${cost * 1000} per 1000 Questions

#### Duration: {duration} seconds"""))

    return {'answer': first_generation, 'token_usage': token_usage, 'cost': cost, 'duration': duration}


Prompt loaded. Character count: 4292


## OpenAI



This one will be our control. OpenAI is SOTA but they have insane downtime and are the most expensive. 

#### Pricing

- $0.02 per 1,000 tokens

In [58]:
price = 0.02 / 1000  # 0.02 USD per 1000 tokens

# If keys['openai'] falsy, throw error
if not keys['openai']:
    raise ValueError('OpenAI API key not found. Add your API key to keys.json')


openai = OpenAI(model_name="text-davinci-003", openai_api_key=keys['openai'])

openai_results = ask(openai, price)


### Answer:


----- 

 You can use the `deriveAirnodeXpub` function in the [@airnode/airnode-admin](https://github.com/api3dao/airnode/blob/master/packages/airnode-admin/src/implementation.ts#L135) package like this:

```js
const airnodeXpub = deriveAirnodeXpub(airnodeMnemonic);
```


------

#### Token usage: {'total_tokens': 1536, 'completion_tokens': 92, 'prompt_tokens': 1444}

#### Cost: $0.031 or $31.0 per 1000 Questions

#### Duration: 4.555056095123291 seconds

#### Conclusion

| Cost | Performance | Quality |
|------|-------------|---------|
| 1  |     2      |    5   |

At $0.03 per question, this will probably be our most expensive model. Average response time was around 5 seconds. I know from experience they have a lot of downtime. The answer quality is amazing. Exactly what I am looking for. Truly SOTA.

