## **Part 4: [Exercise]** Trying Out The AI Foundation Endpoints

To start out, let's use the provided `requests` boilerplate from an LLM model's API entry. Perhaps `Llama-2-13B` or `mixtral-7bx8`is a reasonable candidate, but feel free to try your own and consider the pool of active options.

To query the model, you can paste the code for the python routine in the cell below. It is much easier to use the streaming script in Python thanks to some special features of the `requests` library, so we have provided some hints to get that started for you. *Feel free to try the non-streaming way if you have the time and interest.*

**NOTE:** If you would like, you can bypass this exercise by simply clicking the `Execute` button under the code block. Doing this will show what happens when you try to invoke the provided example.

In [None]:
import requests
import json

####################################################################################
## HELPERS

## HINT 1: The following streaming header tosses in your API key from the environment:

headers = {
    "Authorization": f"Bearer {os.environ.get('NVIDIA_API_KEY')}",
    "accept": "text/event-stream",
    "content-type": "application/json",
}

## HINT 2: If you're streaming, you can use print(line.decode("utf-8")) for raw responses
##  For more user-friendly responses, you may want to get_stream_token(line):
def get_stream_token(entry: bytes):
    """Utility: Coerces out ['choices'][0]['delta'][content] from the bytestream"""
    if not entry: return ""
    entry = entry.decode('utf-8')
    if entry.startswith('data: '):
        try: entry = json.loads(entry[5:])
        except ValueError: return ""
    return entry.get('choices', [{}])[0].get('delta', {}).get('content')

####################################################################################
## TODO: Save the invocation URL for the endpoint here

# invoke_url = ...

invoke_url = "https://api.nvcf.nvidia.com/v2/nvcf/pexec/functions/e0bb7fb9-5333-4a27-8534-c6288f921d3f"

## TODO: Construct the payload, which will be sent over to the endpoint
payload = {
  "messages": [
    {
      "content": "I am going to Paris, what should I see?",
      "role": "user"
    }
  ],
  "temperature": 0.2,
  "top_p": 0.7,
  "max_tokens": 1024,
  "seed": 42,
  "stream": True
}

## Use requests.post to send the header (streaming meta-info) the payload to the endpoint
## Make sure streaming is enabled, and expect the response to have an iter_lines response.
response = requests.post(invoke_url, headers=headers, json=payload, stream=True)

## If your response is an error message, this will raise an exception in Python
response.raise_for_status()

## If the post request is honored, you should be able to iterate over 
for line in response.iter_lines():
    print(get_stream_token(line), end="")
    # if line: print(line.decode("utf-8"))