- 
                Notifications
    You must be signed in to change notification settings 
- Fork 78
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomerswork in progress
Description
Create a plugin that writes LLM usage information to a file as it intercepts LLM requests and responses. This information is helpful to understand token usage over time and how it might lead to throttling.
- triggered only for LLM requests (use the same detection method as in the OpenAITelemetryPlugin
-  for an intercepted LLM response gets the following information:
-  time(response.headers.date)
-  status(http status code)
-  retry_after(response.headers.retry-after)
-  policy(response.headers.policy-id, useful to understand why throttling occurred)
-  prompt_tokens(response.body.usage.prompt_tokens)
- completion_tokens (response.body.usage.completion_tokens)
- cached_tokens (response.body.usage.prompt_tokens_details.cached_tokens)
- total_tokens (response. body.usage.total_tokens)
- remaining_tokens (response.headers.x-ratelimit-remaining-tokens)
- remaining_requests (response.headers.x-ratelimit-remaining-requests)
 
-  
-  each time the plugin intercepts a response, it gathers the information and appends it to a file named devproxy-llm-usage.csv. On startup, it checks if a file with that name already exists. If it does, it appends the current date and time until it finds a unique name. The plugin stores the file name to use while Dev Proxy is running.
- if the file doesn't exist, the plugin creates it including the headers and the first line of information
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomerswork in progress