-
Notifications
You must be signed in to change notification settings - Fork 13
Add OpenAIRecorder #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only thing I can think of is maybe we also want to record User-Agent
in case the user has multiple components interacting with the model runner (e.g. in an agentic app) and wants to be able to distinguish between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I added a couple of comments, nothing blocking, they can be addressed in follow up PRs if needed
} | ||
} | ||
|
||
func (r *OpenAIRecorder) convertStreamingResponse(streamingBody string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could use the OpenAI Go SDK to get this conversion.
They have the acc := openai.ChatCompletionAccumulator{}
which might be useful:
https://github.com/openai/openai-go/blob/main/examples/chat-completion-accumulating/main.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, but I would still need to split the full body on chunks and pass them separately to the accumulator as it is designed to work with individual streaming chunks 🤔 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I used the sdk to convert chunks into a full response in the past, but I actually don't remember if I used the accumulator.
In any case, not a blocker so we can revisit eventually
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
Good point, thanks @xenoscopic! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
UserAgent string `json:"user_agent,omitempty"` | ||
} | ||
|
||
type ModelData struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit OpenAIRecord
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's a bit confusing, but the backend configuration are backend specific, not OpenAI specific.
} | ||
} | ||
|
||
func (r *OpenAIRecorder) convertStreamingResponse(streamingBody string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I used the sdk to convert chunks into a full response in the past, but I actually don't remember if I used the accumulator.
In any case, not a blocker so we can revisit eventually
Adds recording functionality for OpenAI inference requests and responses:
GET /engines/requests?model=<model>
endpointNecessary work left to be done:
remove records on model eviction(2nd commit)include the configured context size for the model(3rd commit)With the backend configuration included (the 3rd commit):
With the User-Agent captured in records if it's not empty (the 4th commit):