# Cost tracking with langchain

At this point, I think everyone should be using langchain for interacting with LLMs. This library has an incredibly powerful suite of tools. One task it makes very easy is tracking your API token usage and cost. Langchain accomplishes this through "callbacks" (a.k.a. "context managers"), its logging tools. Most langchain callbacks are only for event logging, but there is one special callback handler called `OpenAICallbackHandler` that does cost tracking.

Note that this handler tracks cost cumulatively. For example, if you define the callback handler (`handler = OpenAICallbackHandler()`) you and supply it as an argument in two sequential calls to a model (`llm = OpenAI(callbacks=[handler])`), it will store the summed cost for both API requests rather than the cost for each individual request. 

Also note the following:

- The `callbacks` argument takes a list, so you must always wrap your handler(s) in brackets.
- The handler will be aliased as a list item in the model's `callbacks` property, so you can access it either by name or via that property.
- The property is an alias, not a copy, which means that calls to a second model using the same handler will affect `callbacks` property of the first model.
- To track two models separately, you can initialize a separate handler object for each model (`llm = OpenAI(callbacks=[OpenAICallbackHandler()])`).
- `handler.total_tokens` stores the token count of prompt + completion as an int, while `handler.total_cost` stores the cost as a float. 
- The `prompt_tokens` and `completion_tokens` properties track prompt and completion token usage separately, since some OpenAI models charge different per-token rates for prompt and completion tokens.

In [24]:
from langchain.callbacks import OpenAICallbackHandler
from langchain.llms import OpenAI
from dotenv import load_dotenv

load_dotenv()

handler = OpenAICallbackHandler()
llm = OpenAI(model_name="text-davinci-002",
             n=2,
             best_of=2,
             temperature=0.95,
             callbacks=[handler])

# Track token usage over multiple API calls
result = llm("Tell me a joke")
print(result)
print(handler)
print("Total cost of first prompt + completion: " + str(handler.total_cost))
result2 = llm("Tell me a joke")
print(result2)
print(llm.callbacks)
print("Total cost of first prompt + completion: " + str(llm.callbacks[0].total_cost))

llm2 = OpenAI(model_name="text-davinci-002",
             n=2,
             best_of=2,
             temperature=0.95,
             callbacks=[handler])

result3 = llm2("Tell me a joke")
print(result3)
print("Total cost of first prompt + completion: " + str(handler.total_cost))
print("Total cost of first prompt + completion: " + str(llm.callbacks[0].total_cost))




Why did the chicken cross the road?

To get to the other side!
Tokens Used: 42
	Prompt Tokens: 4
	Completion Tokens: 38
Successful Requests: 1
Total Cost (USD): $0.00084
Total cost of first prompt + completion: 0.00084


Why did the chicken cross the road?

To get to the other side!
[Tokens Used: 84
	Prompt Tokens: 8
	Completion Tokens: 76
Successful Requests: 2
Total Cost (USD): $0.00168]
Total cost of first prompt + completion: 0.00168


Why don't scientists trust atoms?

Because they make up everything.
Total cost of first prompt + completion: 0.00248
Total cost of first prompt + completion: 0.00248


As an alternative syntax, you can use `with get_openai_callback() as cb`, which seems to be the preferred syntax in the langchain documentation. Note that this will scope your cost tracking much more locally, as it's effectively creating a handler object that applies only to the code you're wrapping, and then is removed from memory after this code runs. I find this approach to be of limited usefulness, but it may be appropriate for some use cases. 

In [None]:
# If you're using langchain to make an API call, you can get the cost from the callback
from langchain.llms import OpenAI
from langchain.callbacks import get_openai_callback
from dotenv import load_dotenv

load_dotenv()

llm = OpenAI(model_name="text-davinci-002", n=2, best_of=2)

with get_openai_callback() as cb:
    # Track token usage over multiple API calls
    result = llm("Tell me a joke")
    result2 = llm("Tell me a joke")

    # Save cost to a variable of type float
    print(cb)

Note that instead of passing a callback handler to a model object, you can instead pass it to a chain (`chain = LLMChain(callbacks=[handler])`) or a call to a chain (`chain.call(inputs, callbacks=[handler])`).

# Cost tracking with OpenAI library

If you're using the base `openai` library rather than `langchain`, you can get token usage, but not cost, directly from an API response:

In [27]:
import openai
import os

openai.api_key = os.environ["OPENAI_API_KEY"]

response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo',
        messages=[{"role": "system", "content": "this is a"}],
        max_tokens=2,
        temperature=0)

print(response)

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "message": {
        "content": "test.",
        "role": "assistant"
      }
    }
  ],
  "created": 1684848788,
  "id": "chatcmpl-7JMQOncp5eXAWDijGoGrjBD8XVBXa",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 2,
    "prompt_tokens": 11,
    "total_tokens": 13
  }
}


As long as you know the per-token cost of the model you're using, cost can be calculated from token usage. Thus:

In [32]:
# Calculate token cost from token usage
prompt_cost_per_token = 0.002 / 1000
completion_cost_per_token = 0.002 / 1000
cost_of_last_api_call = (
        response["usage"]["prompt_tokens"]*prompt_cost_per_token + 
        response["usage"]["completion_tokens"]*completion_cost_per_token
    )
print("Cost of last API call: $"+f'{cost_of_last_api_call:.6f}')

Cost of last API call: $0.000026
