## Get tokens usage (Stream=True)
In this notebook, you'll analyse token usage for Chat Completions with enabled Streaming.

### Setup
Load the API key and relevant Python libaries.

In [1]:
# Import required packages
import openai
import os

# Define Azure OpenAI endpoint parameters
openai.api_type = "azure"
openai.api_version = "2023-07-01-preview"
openai.api_base = os.getenv("OPENAI_API_BASE") # Set AOAI endpoint value as env variable
openai.api_key = os.getenv("OPENAI_API_KEY") # Set AOAI API key as env variable
aoai_deployment = os.getenv("OPENAI_API_DEPLOY") # Set AOAI deployment name as env variable

### Helper function: Completions
This helper function will make it easier to use prompts and retrieve chat completion outcomes.

In [2]:
# Helper function
def get_completion(prompt, model="gpt-3.5-turbo", engine=aoai_deployment):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        # model=model, # used by original OpenAI endpoint
        engine=engine, # used by Azure OpenAI endpoint
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
        stream=True, # this parameter can enable streaming
    )
    return response

### Helper function: Tokens Count
This helper function will help with token count using OpenAI's tiktoken library.

It's based on the OpenAI's original sample code from this GitHub repo: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb.

In [3]:
import tiktoken

def num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613"):
    """
    Return the number of tokens used by a list of messages.
    """
    
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        print("Warning: model not found. Using cl100k_base encoding.")
        encoding = tiktoken.get_encoding("cl100k_base")

    if model in {
        "gpt-3.5-turbo-0613",
        "gpt-3.5-turbo-16k-0613",
        "gpt-4-0314",
        "gpt-4-32k-0314",
        "gpt-4-0613",
        "gpt-4-32k-0613",
        }:
        tokens_per_message = 3
        tokens_per_name = 1
    elif model == "gpt-3.5-turbo-0301":
        tokens_per_message = 4  # every message follows <|start|>{role/name}\n{content}<|end|>\n
        tokens_per_name = -1  # if there's a name, the role is omitted
    elif "gpt-3.5-turbo" in model:
        print("Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.")
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
    elif "gpt-4" in model:
        print("Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.")
        return num_tokens_from_messages(messages, model="gpt-4-0613")
    else:
        raise NotImplementedError(
            f"""num_tokens_from_messages() is not implemented for model {model}. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens."""
        )
    
    num_tokens = 0

    if type(messages) == list:
        for message in messages:
            num_tokens += tokens_per_message
            for key, value in message.items():
                num_tokens += len(encoding.encode(value))
                if key == "name":
                    num_tokens += tokens_per_name
        num_tokens += 3  # every reply is primed with <|start|>assistant<|message|>
    elif type(messages) == str:
        num_tokens += len(encoding.encode(messages))
    return num_tokens

### Chat Completions
Getting output of ChatCompletion API in chunks with enabled Streaming

In [4]:
prompt = "Tell me about Solar system"

response = get_completion(prompt)
result = []

for chunk in response:
    try:
        content = chunk["choices"][0].get("delta", {}).get("content")
        if content:
            print(content, end="")
            result.append(content)
    except:
        pass

The solar system is a collection of celestial bodies that orbit around the Sun. It consists of the Sun, eight planets, their moons, asteroids, comets, and other smaller objects. The Sun is at the center of the solar system and provides heat, light, and energy to all the other bodies.

The eight planets in the solar system, in order of their distance from the Sun, are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. These planets vary in size, composition, and atmospheric conditions. Mercury is the smallest planet, while Jupiter is the largest. Earth is the only known planet to support life.

Each planet has its own unique characteristics. For example, Venus has a thick atmosphere and a runaway greenhouse effect, making it the hottest planet in the solar system. Mars is known as the "Red Planet" due to its reddish appearance caused by iron oxide on its surface. Jupiter is a gas giant and has a prominent system of rings and numerous moons, including the largest moon in 

### Tokens usage details

In [5]:
tiktoken_prompt = [{"role": "user", "content": prompt}]
tiktoken_prompt_tokens = num_tokens_from_messages(tiktoken_prompt)
tiktoken_completion_tokens = num_tokens_from_messages("".join(result))

print("Tiktoken-calculated token usage:")
print(f"- Prompt tokens: {tiktoken_prompt_tokens}")
print(f"- Completion tokens: {tiktoken_completion_tokens}")
print(f"- Total tokens: {tiktoken_prompt_tokens + tiktoken_completion_tokens}")

Tiktoken-calculated token usage:
- Prompt tokens: 12
- Completion tokens: 366
- Total tokens: 378
