<a href="https://colab.research.google.com/github/microsoft/autogen/blob/main/notebook/oai_client_cost.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Copyright (c) Microsoft Corporation. All rights reserved. 

Licensed under the MIT License.

# Use AutoGen's OpenAIWrapper for cost estimation
The `OpenAIWrapper` from `autogen` tracks token counts and costs of your API calls. Use the `create()` method to initiate requests and `print_usage_summary()` to retrieve a detailed usage report, including total cost and token usage for both cached and actual requests.

- `mode=["actual", "total"]` (default): print usage summary for non-caching completions and all completions (including cache).
- `mode='actual'`: only print non-cached usage.
- `mode='total'`: only print all usage (including cache).

Reset your session's usage data with `clear_usage_summary()` when needed.

## Requirements

AutoGen requires `Python>=3.8`:
```bash
pip install "pyautogen"
```

## Set your API Endpoint

The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file.


In [1]:
import autogen

# config_list = autogen.config_list_from_json(
#     "OAI_CONFIG_LIST",
#     filter_dict={
#         "model": ["gpt-3.5-turbo", "gpt-4-1106-preview"],
#     },
# )

config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt-3.5-turbo", "gpt-35-turbo"],
    },
)

It first looks for environment variable "OAI_CONFIG_LIST" which needs to be a valid json string. If that variable is not found, it then looks for a json file named "OAI_CONFIG_LIST". It filters the configs by models (you can filter by other keys as well).

The config list looks like the following:
```python
config_list = [
    {
        "model": "gpt-4",
        "api_key": "<your OpenAI API key>",
    },  # OpenAI API endpoint for gpt-4
    {
        "model": "gpt-35-turbo-0631",  # 0631 or newer is needed to use functions
        "base_url": "<your Azure OpenAI API base>", 
        "api_type": "azure", 
        "api_version": "2023-08-01-preview", # 2023-07-01-preview or newer is needed to use functions
        "api_key": "<your Azure OpenAI API key>"
    }
]
```

You can set the value of config_list in any way you prefer. Please refer to this [notebook](https://github.com/microsoft/autogen/blob/main/notebook/oai_openai_utils.ipynb) for full code examples of the different methods.

## OpenAIWrapper with cost estimation

In [2]:
from autogen import OpenAIWrapper

client = OpenAIWrapper(config_list=config_list)
messages = [{'role': 'user', 'content': 'Can you give me 3 useful tips on learning Python? Keep it simple and short.'},]
response = client.create(messages=messages, model="gpt-35-turbo-1106", cache_seed=None)
print(response.cost)

In update_usage_summary
0.0001555


## Usage Summary

When creating a instance of OpenAIWrapper, cost of all completions from the same instance is recorded. You can call `print_usage_summary()` to checkout your usage summary. To clear up, use `clear_usage_summary()`.


In [14]:
from autogen import OpenAIWrapper

client = OpenAIWrapper(config_list=config_list)
messages = [{'role': 'user', 'content': 'Can you give me 3 useful tips on learning Python? Keep it simple and short.'},]
client.print_usage_summary() # print usage summary

No usage summary. Please call "create" first.


In [15]:
# The first creation
# By default, cache_seed is set to 41 and enabled. If you don't want to use cache, set cache_seed to None.
response = client.create(messages=messages, model="gpt-35-turbo-1106", cache_seed=41)
client.print_usage_summary() # default to ["actual", "total"]
client.print_usage_summary(mode='actual') # print actual usage summary
client.print_usage_summary(mode='total') # print total usage summary

In update_usage_summary
----------------------------------------------------------------------------------------------------
Usage summary excluding cached usage: 
Total cost: 0.00026
* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135

All completions are non-cached: the total cost with cached completions is the same as actual cost.
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
Usage summary excluding cached usage: 
Total cost: 0.00026
* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
Usage summary including cached usage: 
Total cost: 0.

In [16]:
# take out cost
print(client.actual_usage_summary)
print(client.total_usage_summary)

{'total_cost': 0.0002575, 'gpt-35-turbo': {'cost': 0.0002575, 'prompt_tokens': 25, 'completion_tokens': 110, 'total_tokens': 135}}
{'total_cost': 0.0002575, 'gpt-35-turbo': {'cost': 0.0002575, 'prompt_tokens': 25, 'completion_tokens': 110, 'total_tokens': 135}}


In [17]:
# Since cache is enabled, the same completion will be returned from cache, which will not incur any actual cost. 
# So acutal cost doesn't change but total cost doubles.
response = client.create(messages=messages, model="gpt-35-turbo-1106", cache_seed=41)
client.print_usage_summary()

In update_usage_summary
----------------------------------------------------------------------------------------------------
Usage summary excluding cached usage: 
Total cost: 0.00026
* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135

Usage summary including cached usage: 
Total cost: 0.00052
* Model 'gpt-35-turbo': cost: 0.00052, prompt_tokens: 50, completion_tokens: 220, total_tokens: 270
----------------------------------------------------------------------------------------------------


In [18]:
# clear usage summary
client.clear_usage_summary() 
client.print_usage_summary()

No usage summary. Please call "create" first.


In [19]:
# all completions are returned from cache, so no actual cost incurred.
response = client.create(messages=messages, model="gpt-35-turbo-1106", cache_seed=41)
client.print_usage_summary()

In update_usage_summary
----------------------------------------------------------------------------------------------------
No actual cost incurred (all completions are using cache).

Usage summary including cached usage: 
Total cost: 0.00026
* Model 'gpt-35-turbo': cost: 0.00026, prompt_tokens: 25, completion_tokens: 110, total_tokens: 135
----------------------------------------------------------------------------------------------------
