# The 2nd Way to call Frontier Models - via their APIs

In this notebook, we'll explore calling 3 Frontier Models using their APIs.

In the last experiment, we tried out a prompt that was ideally suited to LLMs. This time we will try something they're less good at - telling jokes. Let's see how they get on.

## Setting up your keys

If you haven't done so already, you'll need to create API keys from OpenAI, Anthropic and Google.

For OpenAI, visit https://openai.com/api/
For Anthropic, visit https://console.anthropic.com/
For Google, visit https://ai.google.dev/gemini-api

When you get your API keys, you need to set them as environment variables.

EITHER (recommended) create a file called .env in this project root directory, and set your keys there:
```
OPENAI_API_KEY=xxxx
ANTHROPIC_API_KEY=xxxx
GOOGLE_API_KEY=xxxx
````

OR enter the keys directly in the cells below.

In [None]:
# imports

import os
from dotenv import load_dotenv
from openai import OpenAI
import google.generativeai
import anthropic

In [None]:
# Load environment variables in a file called .env

load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY', 'your-key-if-not-using-env')
os.environ['ANTHROPIC_API_KEY'] = os.getenv('ANTHROPIC_API_KEY', 'your-key-if-not-using-env')
os.environ['GOOGLE_API_KEY'] = os.getenv('GOOGLE_API_KEY', 'your-key-if-not-using-env')

In [None]:
# Connect to OpenAI, Anthropic and Google
# All 3 APIs are similar

gpt = OpenAI()

claude = anthropic.Anthropic()

google.generativeai.configure()
gemini = google.generativeai.GenerativeModel('gemini-pro')

## Asking LLMs to tell a joke

It turns out that LLMs don't do a great job of telling jokes! Let's compare a few models - let me know what you think in the chat.
Later we will be putting LLMs to better use!

### What information is included in the API

Typically we'll pass to the LLM:
- The name of the model that should be used
- A system message that gives overall context for the role the LLM is playing
- A user message that provides the actual prompt

There are other parameters that can be used, including *temperature* which is between 0 and 2; higher for more random output; lower for more focused and deterministic.


In [None]:
# GPT-3.5-Turbo

completion = gpt.chat.completions.create(
    model='gpt-3.5-turbo',
    messages=[
    {"role": "system", "content": "You are an assistant that is great at telling jokes"},
    {"role": "user", "content": "Tell a light joke for a room full of data scientists"}
  ],
    temperature=1.0,
)
print(completion.choices[0].message.content)

In [None]:
# GPT-4-Turbo

completion = gpt.chat.completions.create(
    model='gpt-4-turbo',
    messages=[
    {"role": "system", "content": "You are an assistant that is great at telling jokes"},
    {"role": "user", "content": "Tell a light joke for a room full of data scientists"}
  ],
    temperature=1.0
)
print(completion.choices[0].message.content)

In [None]:
# GPT-4o

completion = gpt.chat.completions.create(
    model='gpt-4o',
    messages=[
    {"role": "system", "content": "You are an assistant that is great at telling jokes"},
    {"role": "user", "content": "Tell a light joke for a room full of data scientists"}
  ],
    temperature=1.0
)
print(completion.choices[0].message.content)

In [None]:
# Claude 3.5 Sonnet

message = claude.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=200,
    temperature=1.0,
    system="You are an assistant that is great at telling jokes",
    messages=[
        {"role": "user", "content": "Tell a light joke for a room full of data scientists"},
    ],
)

print(message.content[0].text)

In [None]:
# Gemini

response = gemini.generate_content("Tell a light joke for a room full of data scientists")
print(response.text)

In [None]:
# To be serious! GPT-4o with the original question

response = gpt.chat.completions.create(
    model='gpt-4o',
    messages=[
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "How do I decide if a business problem is suitable for an LLM solution"}
  ],
    temperature=1.0,
    stream=True
)

for chunk in response:
    delta = chunk.choices[0].delta.content or ''
    print(delta, end='')

## Recap: first we tried 6 Frontier LLMs through their chat interfaces
## Then in this notebook we called Cloud APIs
## Now try the 3rd way to use LLMs - direct inference - starting with llama.cpp