Skip to content

Create an LLM usage debug plugin #1413

@waldekmastykarz

Description

@waldekmastykarz

Create a plugin that writes LLM usage information to a file as it intercepts LLM requests and responses. This information is helpful to understand token usage over time and how it might lead to throttling.

  • triggered only for LLM requests (use the same detection method as in the OpenAITelemetryPlugin
  • for an intercepted LLM response gets the following information:
    • time (response.headers.date)
    • status (http status code)
    • retry_after (response.headers.retry-after)
    • policy (response.headers.policy-id, useful to understand why throttling occurred)
    • prompt_tokens (response.body.usage.prompt_tokens)
    • completion_tokens (response.body.usage.completion_tokens)
    • cached_tokens (response.body.usage.prompt_tokens_details.cached_tokens)
    • total_tokens (response. body.usage.total_tokens)
    • remaining_tokens (response.headers.x-ratelimit-remaining-tokens)
    • remaining_requests (response.headers.x-ratelimit-remaining-requests)
  • each time the plugin intercepts a response, it gathers the information and appends it to a file named devproxy-llm-usage.csv. On startup, it checks if a file with that name already exists. If it does, it appends the current date and time until it finds a unique name. The plugin stores the file name to use while Dev Proxy is running.
  • if the file doesn't exist, the plugin creates it including the headers and the first line of information

Metadata

Metadata

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions