Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reciords actual token counts for OpenAI and Anthropic reponses #6339

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lonelycode
Copy link
Member

@lonelycode lonelycode commented Jun 11, 2024

User description

This response middleware will extract token usage from. OpenAI and Anthropic APIs and provide them as context to be recorded in Analytics

Description

  • Adds res_handler_llm_reporting.mw
  • Middleware will:
    • Search for openai or anthropic tags in the Spec to activate
    • If present, will decode responses from the API into generic holders
    • Will extract token count information from the response for reporting
    • Creates three context entries: LLMResponseReporterInputTokens, LLMResponseReporterOutputTokens, LLMResponseReporterTotalTokens which hold the data for further processing.

Motivation and Context

Similarly to the other middleware, instead of providing data pre-proxy, this middleware will provide actuals from the vendor, this could be used for quota work, while the prior middleware could be used for rate limiting or flagging usage patterns.

How This Has Been Tested

  • Manual testing against the anthropic and OpenAI APIs

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Refactoring or add test (improvements in base code or adds test coverage to functionality)

Checklist

  • I ensured that the documentation is up to date
  • I explained why this PR updates go.mod in detail with reasoning why it's required
  • I would like a code coverage CI quality gate exception and have explained why

PR Type

Enhancement, Other


Description

  • Added new middleware LLMResponseReporter to handle and report token usage from OpenAI and Anthropic API responses.
  • Introduced constants and context functions to manage LLM response token counts.
  • Updated quickstart app configuration to include LLM response reporting.

Changes walkthrough 📝

Relevant files
Enhancement
ctx.go
Add constants for LLM response token reporting                     

ctx/ctx.go

  • Added constants for LLM response token reporting.
+4/-0     
api.go
Add functions to retrieve LLM response token counts           

gateway/api.go

  • Added functions to get LLM response token counts from the request
    context.
  • +21/-0   
    middleware.go
    Integrate LLM response reporter middleware                             

    gateway/middleware.go

  • Added case for llm_response_reporter in response processor switch.
  • +3/-0     
    res_handler_llm_reporting.go
    Implement LLMResponseReporter middleware for token usage reporting

    gateway/res_handler_llm_reporting.go

  • Created new LLMResponseReporter middleware.
  • Implemented response handling for OpenAI and Anthropic APIs.
  • Extracted and set token usage information in the request context.
  • +119/-0 
    Configuration changes
    quickstart.json
    Update quickstart app configuration for LLM response reporting

    apps/quickstart.json

  • Added tags for LLM and Anthropic.
  • Configured response processor for LLM response reporting.
  • +9/-2     

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link
    Contributor

    API Changes

    --- prev.txt	2024-06-11 05:43:44.281717256 +0000
    +++ current.txt	2024-06-11 05:43:41.337705087 +0000
    @@ -6860,6 +6860,10 @@
     	GraphQLIsWebSocketUpgrade
     	OASOperation
     
    +	LLMResponseReporterInputTokens
    +	LLMResponseReporterOutputTokens
    +	LLMResponseReporterTotalTokens
    +
     	// CacheOptions holds cache options required for cache writer middleware.
     	CacheOptions
     	OASDefinition
    @@ -8790,6 +8794,26 @@
     
     func (l *LDAPStorageHandler) SetRollingWindow(keyName string, per int64, val string, pipeline bool) (int, []interface{})
     
    +type LLMResponseReporter struct {
    +	BaseTykResponseHandler
    +	// Has unexported fields.
    +}
    +
    +func (h *LLMResponseReporter) Base() *BaseTykResponseHandler
    +
    +func (h *LLMResponseReporter) Enabled() bool
    +
    +func (h *LLMResponseReporter) HandleError(rw http.ResponseWriter, req *http.Request)
    +
    +func (h *LLMResponseReporter) HandleResponse(rw http.ResponseWriter, res *http.Response, req *http.Request, ses *user.SessionState) error
    +
    +func (h *LLMResponseReporter) Init(c interface{}, spec *APISpec) error
    +
    +func (*LLMResponseReporter) Name() string
    +
    +type LLMResponseReporterOptions struct {
    +}
    +
     type LogMessageEventHandler struct {
     	Gw *Gateway `json:"-"`
     	// Has unexported fields.

    Copy link
    Contributor

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review [1-5]

    3

    🧪 Relevant tests

    No

    🔒 Security concerns

    No

    ⚡ Key issues to review

    Possible Bug:
    Typo in the middleware name "llm_reponse_reporter" should be "llm_response_reporter".

    Error Handling:
    The HandleResponse method in res_handler_llm_reporting.go does not handle the case where the response body cannot be unmarshalled into the expected structures (openAIResponse or anthropicResponse). This could lead to unhandled errors if the response format is not as expected.

    Performance Concern:
    The method HandleResponse reads the entire response body into memory which might not be efficient for large responses. Consider streaming or chunk processing if applicable.

    Copy link
    Contributor

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Best practice
    Use the comma-ok idiom for type assertion to prevent potential runtime panics

    The type assertion v.(int) can cause a panic if v is not actually of type int. It's safer
    to use the comma-ok idiom to assert the type safely.

    gateway/api.go [3109-3110]

    -if v := r.Context().Value(ctx.LLMResponseReporterInputTokens); v != nil {
    -    return v.(int)
    +if v, ok := r.Context().Value(ctx.LLMResponseReporterInputTokens).(int); ok {
    +    return v
     }
     
    Suggestion importance[1-10]: 10

    Why: Using the comma-ok idiom is a best practice in Go to prevent potential runtime panics, making the code more robust and reliable.

    10
    Possible bug
    Correct the typo in the middleware handler case label

    There appears to be a typo in the case label for the middleware handler. It should be
    "llm_response_reporter" instead of "llm_reponse_reporter" to maintain consistency and
    avoid potential bugs due to misspelling.

    gateway/middleware.go [1025-1026]

    -case "llm_reponse_reporter":
    +case "llm_response_reporter":
         return &LLMResponseReporter{BaseTykResponseHandler: baseHandler}
     
    Suggestion importance[1-10]: 9

    Why: Correcting the typo is crucial for the middleware handler to function correctly. Misspellings can lead to runtime errors and unexpected behavior, making this a high-priority fix.

    9
    Consistency
    Update the handler name in the JSON configuration to match the corrected name in the Go code

    The handler name "llm_reponse_reporter" in the JSON configuration should match the
    corrected handler name in the Go code. Assuming the Go code is corrected to
    "llm_response_reporter", update this JSON configuration accordingly.

    apps/quickstart.json [37]

    -"name": "llm_reponse_reporter"
    +"name": "llm_response_reporter"
     
    Suggestion importance[1-10]: 8

    Why: Ensuring consistency between the Go code and JSON configuration is important for the correct functioning of the system. This change aligns the configuration with the corrected handler name.

    8
    Enhancement
    Implement error handling or logging in the HandleError method

    The HandleError method is implemented but does nothing. It's a good practice to at least
    log the error or handle it in some meaningful way to aid in debugging and maintain
    robustness.

    gateway/res_handler_llm_reporting.go [73-74]

     func (h *LLMResponseReporter) HandleError(rw http.ResponseWriter, req *http.Request) {
    +    log.Println("Error handling response in LLMResponseReporter")
     }
     
    Suggestion importance[1-10]: 7

    Why: Adding error logging or handling in the HandleError method improves debugging and robustness. While not critical, it enhances maintainability and provides useful information during failures.

    7

    Copy link

    sonarcloud bot commented Jun 11, 2024

    Quality Gate Failed Quality Gate failed

    Failed conditions
    0.0% Coverage on New Code (required ≥ 80%)
    C Reliability Rating on New Code (required ≥ A)

    See analysis details on SonarCloud

    Catch issues before they fail your Quality Gate with our IDE extension SonarLint

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    None yet

    1 participant