To run any prompt through a model, we need to set a foundation for how we will access generative AI models and perform inference. There is a huge variety in the landscape of generative AI models in terms of size, access patterns, licensing, etc. However, a common theme is the usage of LLMs through a REST API, which is either:
- Provided by a third party service (OpenAI, Anthropic, Cohere, etc.)
- Self-hosted in your own infrastructure or in an account you control with a model hosting provider (Replicate, Baseten, etc.)
- Self-hosted using a DIY model serving API (Flask, FastAPI, etc.)

We will use a tool called [Prediction Guard](https://www.predictionguard.com/) to call both proprietary models (like OpenAI) and open access LLMs (like Llama 2, WizardCoder, MPT, etc.) via a standardized OpenAI-like API. This will allow us to explore the full range of LLMs available. Further, it will illustrate how companies can access a wide range of models (outside of the GPT family).

If you are interested, Prediction Guard does provide some significant functionality on top of this standardized API (see the [docs](https://docs.predictionguard.com/)). Specifically, it lets you:

- **Control** the structure of and easily constrain LLM output to the types, formats, and information relevant to your business;
- **Validate** and check LLM output to guard against hallucination and toxicity; and
- **Implement compliant LLM systems** (HIPAA, and self-hosted) that give your legal counsel warm fuzzy feeling while still delighting your customers with AI features.

To run your first LLM prompt with *Prediction Guard*, you will need a Prediction Guard access token that will be provided to you by the instructor.

# Install dependences, imports

In [None]:
! pip install predictionguard

In [None]:
import os
import json

import predictionguard as pg
from getpass import getpass

In [None]:
pg_access_token = getpass('Enter your Prediction Guard access token: ')
os.environ['PREDICTIONGUARD_TOKEN'] = pg_access_token

# List available models

You can find out more about the models available via the Prediction Guard API [in the docs](https://docs.predictionguard.com/models).

In [None]:
pg.Completion.list_models()

# Generate some text from the latest open access LLMs

In [None]:
response = pg.Completion.create(model="Nous-Hermes-Llama2-13B",
                          prompt="The best joke I know is: ")

print(json.dumps(
    response,
    sort_keys=True,
    indent=4,
    separators=(',', ': ')
))

# Generate text from a proprietary LLM (OpenAI)

In [None]:
openai_api_key = getpass('Enter your OpenAI API key: ')
os.environ['OPENAI_API_KEY'] = openai_api_key

In [None]:
response = pg.Completion.create(model="OpenAI-text-davinci-003",
                          prompt="The best joke I know is: ")

print(response['choices'][0]['text'])