Exposing usage data from API response #103

derekkraan · 2024-04-22T07:56:15Z

derekkraan
Apr 22, 2024

Hi @brainlid, first of all thanks for this project, it has been very useful to us so far.

We are running into the issue that we want to be able to track our token usage on OpenAI. This is given as part of the response, but I believe LangChain doesn't do anything with this information yet.

I am wondering if you would consider a PR to expose this data somehow.

And if you are, whether you have a preferred way to do this. We would probably be happy with simply making the raw response available somehow, as a trap door. But if you want to structure this data and translate it per API, we could also talk about that.

Thanks, Derek

brainlid · 2024-04-26T05:28:51Z

brainlid
Apr 26, 2024
Maintainer

Hi @derekkraan! I'm glad you've found it helpful! Yes, I am open to a PR for this. I've been thinking about it as I've been working with more models.

For ChatGPT, Anthropic (Claude), and Bumblebee, I think we get the token counts at the end of the a completion. It's a form of metadata, so I can see some type of metadata being exposed on the chain at the completion. Not sure yet though what makes the most sense.

Yes, let's talk about. I'm converting this to a discussion item.

0 replies

derekkraan · 2024-05-02T06:17:19Z

derekkraan
May 2, 2024
Author

That's great.

Perhaps it would make sense to do both:

offer a trapdoor to access the raw response data
add structured data pertaining to billing to the LangChain data structure

0 replies

derekkraan · 2024-06-04T12:19:18Z

derekkraan
Jun 4, 2024
Author

Have done some research:

OpenAI returns "prompt_tokens", "completion_tokens", and "total_tokens" (simply a + b)
Anthropic returns "input_tokens" and "output_tokens" (which I believe map to the above)
Bumblebee I'm not sure, since the docs are unclear on the issue. But perhaps it is not such an issue if you are self-hosting?

2 replies

brainlid Jun 6, 2024
Maintainer

Bumblebee also includes token information. We can get that too.

I'm about to publish an RC that adds a new callback system for getting access to various internal events. The token information along with response headers that report API throttling information are all perfect candidates for being exposed.

brainlid Jun 6, 2024
Maintainer

The code is pushed to main currently. I'm doing more testing/updates before publishing it, but you check out the way callbacks work. These would be callbacks on the chat model.

brainlid · 2024-06-12T04:30:27Z

brainlid
Jun 12, 2024
Maintainer

On the main branch, token usage is now supported! 🎉

It works on:

ChatOpenAI
ChatAnthropic
ChatBumblebee

On the model, a new callback :on_llm_token_usage will fire. It includes the LangChain.TokenUsage struct which returns the input and output tokens as provided by the LLM. Input is the prompt and output is the generated result.

@derekkraan the TokenUsage struct includes a function for computing the total tokens because most LLMs don't return that.

3 replies

brainlid Jun 12, 2024
Maintainer

It was merged in PR #137

tubedude Jul 1, 2024

@brainlid Do you think it would be useful to add telemetry at this point?
I imagine emitting telemetry events for with duration of response cycle, token usage, errors.

If you think it is a good idea, I could work on a PR.

Thanks for the work!

derekkraan Jul 2, 2024
Author

I think having :telemetry support would be great, and I would be willing to contribute an OpenTelemetry adapter if we have :telemetry support.

Just a small note -- it should not replace any of the functionality we have now, since :telemetry isn't suitable for business logic.

brainlid · 2024-06-18T03:40:31Z

brainlid
Jun 18, 2024
Maintainer

This is included in the published v0.3.0-rc.0 release.

0 replies

derekkraan · 2024-06-18T07:10:43Z

derekkraan
Jun 18, 2024
Author

Thanks for your work on this.

0 replies

derekkraan · 2024-06-18T07:11:39Z

derekkraan
Jun 18, 2024
Author

I integrated yesterday, and it seems to be working well.

The callback format makes it difficult to associate a call with a particular message, but I think that's not an issue for us at this time. The timestamps will make it possible to correlate should the need arise.

4 replies

brainlid Jun 18, 2024
Maintainer

The LLM callback passes in the model. Chain callbacks pass in the LLMChain. I've contemplated adding a "metadata" to the model which you could use to set whatever identifier or data you need to associate it with something in your application. Would that solve the issue?

Is the issue that you have a generic callback handler and it's not being processed by the same process? For instance, if I have a LiveView process, sending the token usage information to it would have all the context I would need. What is your situation like?

derekkraan Jun 18, 2024
Author

I'm not sure a metadata field would help, since we can already use a closure in the callback to include any arbitrary data we need.

We are running everything in the same process, so that's not it (and I wouldn't want to rely on it being the same process anyways). Currently I am passing the chat ID to the closure so that it can associate the token usage with the particular chat session that generated it, but the link between token usage and a particular message is not there.

I could probably jump through some hoops to put the "next message ID" in the callback closure, but then I would have to generate the LLM struct for every new message, and it would just over-complicate things.

Compared to how things come back from OpenAI's API at least, which is to say, usage data is communicated directly in the API call response.

So it's this decoupling of response and usage data that is causing the "issue".

On the other hand, decoupling them probably has its own benefits. I imagine it's easier to support multiple APIs this way.

I am not overly concerned with this whole thing by the way, we can get usage data now and as I said, we can correlate messages to usage by timestamp if it is needed.

brainlid Jun 18, 2024
Maintainer

Interesting. As I understand it, neither OpenAI nor Anthropic provide per-message token usage. When we add a new message, the entire message chain must be submitted because the LLM is stateless. So the entire chain is the input token count and the assistant's generated and returned message is the output token count. This is also how it works with Bumblebee. We only get the one result after the completion of the run.

If you need more per-message token counts then I'd suggest using a Bumblebee (or other?) service that can break a message into tokens.

Regardless, I'm glad it can help meet your needs!

derekkraan Jun 19, 2024
Author

Ah I didn't really mean "per message" but more like "per call", and a call (at least in my system) is always precipitated by a new message from the user.

Anyways, it's not really important :D. We have usage data now and I am happy!

brainlid · 2024-07-06T01:04:36Z

brainlid
Jul 6, 2024
Maintainer

Added support for this to ChatGoogleAI in PR #152

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exposing usage data from API response #103

{{title}}

Replies: 8 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Exposing usage data from API response #103

derekkraan Apr 22, 2024

Replies: 8 comments · 9 replies

brainlid Apr 26, 2024 Maintainer

derekkraan May 2, 2024 Author

derekkraan Jun 4, 2024 Author

brainlid Jun 6, 2024 Maintainer

brainlid Jun 6, 2024 Maintainer

brainlid Jun 12, 2024 Maintainer

brainlid Jun 12, 2024 Maintainer

tubedude Jul 1, 2024

derekkraan Jul 2, 2024 Author

brainlid Jun 18, 2024 Maintainer

derekkraan Jun 18, 2024 Author

derekkraan Jun 18, 2024 Author

brainlid Jun 18, 2024 Maintainer

derekkraan Jun 18, 2024 Author

brainlid Jun 18, 2024 Maintainer

derekkraan Jun 19, 2024 Author

brainlid Jul 6, 2024 Maintainer

derekkraan
Apr 22, 2024

Replies: 8 comments 9 replies

brainlid
Apr 26, 2024
Maintainer

derekkraan
May 2, 2024
Author

derekkraan
Jun 4, 2024
Author

brainlid Jun 6, 2024
Maintainer

brainlid Jun 6, 2024
Maintainer

brainlid
Jun 12, 2024
Maintainer

brainlid Jun 12, 2024
Maintainer

derekkraan Jul 2, 2024
Author

brainlid
Jun 18, 2024
Maintainer

derekkraan
Jun 18, 2024
Author

derekkraan
Jun 18, 2024
Author

brainlid Jun 18, 2024
Maintainer

derekkraan Jun 18, 2024
Author

brainlid Jun 18, 2024
Maintainer

derekkraan Jun 19, 2024
Author

brainlid
Jul 6, 2024
Maintainer