# Using prompts programmatically

We'll start looking at using prompts with APIs, and specifically with the OpenAI Python API library.
* This is what we use with `from openai import OpenAI`
  * https://github.com/openai/openai-python
* AIMLAPI is a separate developer-focused platform providing API access to models, but it is compatible with the OpenAI SDK and provides pretty docs for models' API calls
  * https://docs.aimlapi.com

At the API level, using an LLM is basically:
* Send a prompt (plus model + parameters)
* Get back a response object
* Pull out the text you care about

In [None]:
# import this for some better output printing
from IPython.display import display, Markdown

# API

Initialize the client object that we'll use to make API calls.
* You should import your own API key

In [None]:
from openai import OpenAI
import keys
client = OpenAI(api_key = keys.OPENAI_API_KEY)

The primary API for interacting with OpenAI models is the Responses API.

In [None]:
response = client.responses.create(
    model="gpt-5-nano",        # pick a model
    input="Explain Python list comprehensions in 3 sentences."
)
print(response.output_text)

In [None]:
response

While `responses` is the recommended API approach.  There is an older approach with `chat.completions` that we will use instead.  
* Rather than stack up API usage with OpenAI models, we'll use an API gateway through the platform supporting the class JupyterHub.
* this API endpoint does not support `responses` **yet**
* this will suit just fine, though in the near future (or if you use OpenAI for other work), you should be prepared to switch over to `responses`

Key differences for us are:
* use `client.chat.completions.create` rather than `client.responses.create`
* different input arguments and output
  * `messages` rather than `input`
    * `messages` must strictly be an array of objects
  * `max_tokens` rather than `max_output_tokens`
  * `response.choices[0].message.content` rather than `response.output_text`
* consult the API docs for reference

In [None]:
# Equivalent approach:

response = client.chat.completions.create(
    model="gpt-5-nano",        # pick a model
    messages=[
        {'role': 'user', 
         'content': 'Explain Python list comprehensions in 3 sentences.'}
    ]
)
print(response.choices[0].message.content)

# One more example of responses vs chat completions

We can also use the list of dict as a value for `input` in the call to the `responses` argument.

In [None]:
# Equivalent approach:

response = client.responses.create(
    model="gpt-5-nano",        # pick a model
    input=[
        {'role': 'user', 
         'content': 'Explain Python list comprehensions in 3 sentences.'}
    ]
)
print(response.output_text)

# Flipping over to a different API

Functionally we can work with our different endpoint very similarly. We will simply be using a different URL to run a different model, using a different token for authentication.

We are using this as a courtesy of the National Research Platform.  I have loaded an API token for use directly into your environment variables.  Please be mindful and considerate with your use.

In [None]:
import os
NRP_TOK = os.environ.get('NRP_TOK')

In [None]:
# using different API token and different API endpoint
client = OpenAI(api_key = NRP_TOK,
                base_url = "https://ellm.nrp-nautilus.io/v1")

In [None]:
nrp_hosted_model = 'gpt-oss'

# Pieces

* `model` – which model to use.
* `messages` – a list of role-tagged messages.
* Optional knobs: `max_tokens`, `temperature`, `top_p`, `top_k`
  * note that if we were using `client.responses.create`, we would switch `messages` -> `input` and `max_tokens` -> `max_output_tokens`

In [None]:
response = client.chat.completions.create(
    model=nrp_hosted_model,
    messages=[
        {
            "role": "system",
            "content": "You are a friendly Python tutor. Explain things simply."
        },
        {
            "role": "user",
            "content": "What is a generator in Python? One short example, please."
        },
    ],
    max_tokens=256,          # cap length
    temperature=0.3,         # more deterministic
    top_p=1.0,               # consider full distribution
)

text = response.choices[0].message.content
display(Markdown(text))

In [None]:
text

In [None]:
def givemecoffee(temp=1.0):
    response = client.chat.completions.create(
        model=nrp_hosted_model,
        messages=[
            {
                "role": "user",
                "content": "Give me 5 weird startup ideas about coffee."
            },
        ],
        max_tokens=256,
        temperature=temp,
    )
    text = response.choices[0].message.content
    display(Markdown(text))

In [None]:
givemecoffee(1.0)

In [None]:
givemecoffee(1.0)

In [None]:
givemecoffee(1.9)

In [None]:
givemecoffee(0.1)

In [None]:
givemecoffee(0.1)

In [None]:
def givemeCOLDcoffee(top_p=1.0):
    response = client.chat.completions.create(
        model=nrp_hosted_model,
        messages=[
            {
                "role": "user",
                "content": "Give me 5 weird startup ideas about coffee."
            },
        ],
        max_tokens=256,
        temperature=0.1,
        top_p=top_p
    )
    text = response.choices[0].message.content
    display(Markdown(text))

In [None]:
givemeCOLDcoffee(1.0)

In [None]:
givemeCOLDcoffee(1.0)

In [None]:
givemeCOLDcoffee(0.1)

In [None]:
givemeCOLDcoffee(0.1)