Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement OpenAI token usage #150

Merged
merged 13 commits into from
Jun 24, 2024
Merged

implement OpenAI token usage #150

merged 13 commits into from
Jun 24, 2024

Conversation

aniketmaurya
Copy link
Collaborator

@aniketmaurya aniketmaurya commented Jun 21, 2024

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

RIght now we don't provide a way to track prompt tokens and completion tokens and send a dummy value in the response.

This PR will proposes a way to track and update the usage info for completion_tokens, prompt_tokens and total_tokens.

Proposal

Return the prompt_tokens and completion_tokens along with the generated content and LitServe will automatically update it and send in response.

class OpenAIWithUsage(ls.LitAPI):
    def setup(self, device):
        self.model = None

    def predict(self, x):
        yield {"role": "assistant", "content": "This is a generated output", "prompt_tokens": 5, "completion_tokens": 10}

Just added test for the expected flow right now, if you approve @lantiga this proposal then I can update the rest asap.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@lantiga
Copy link
Collaborator

lantiga commented Jun 21, 2024

thanks for this!

It's a possible design, and it's simple to understand (simpler than requiring to populate context).

A few considerations:

  • since we're talking about a streaming API, should we request the user to provide the total over the generation, or should we return what was generated in that step and we accumulate? I'd rather do the former because we need to make fewer assumptions, users need to do the work themselves but it's more explicit on their side too
  • let's make sure everything works out in case of batching (it does, but let's always include batching in the design example)
  • let's make returning this optional: e.g. returning only at the end of generation would be fine (this is what OpenAI does too)

@aniketmaurya
Copy link
Collaborator Author

since we're talking about a streaming API, should we request the user to provide the total over the generation, or should we return what was generated in that step and we accumulate? I'd rather do the former because we need to make fewer assumptions, users need to do the work themselves but it's more explicit on their side too

Yes, letting users do it explicitly would be cleaner and less assumptious.

let's make sure everything works out in case of batching (it does, but let's always include batching in the design example)

Thanks for bringing this, I will add a test for this too.

let's make returning this optional: e.g. returning only at the end of generation would be fine (this is what OpenAI does too)

+1

@williamFalcon
Copy link
Contributor

great idea @aniketmaurya

Copy link

codecov bot commented Jun 23, 2024

Codecov Report

Attention: Patch coverage is 60.00000% with 26 lines in your changes missing coverage. Please review.

Project coverage is 80%. Comparing base (53de741) to head (a5d85aa).

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #150   +/-   ##
===================================
+ Coverage    79%    80%   +1%     
===================================
  Files        13     13           
  Lines       933    985   +52     
===================================
+ Hits        737    792   +55     
+ Misses      196    193    -3     

@aniketmaurya aniketmaurya changed the title [WIP] implement OpenAI token usage implement OpenAI token usage Jun 23, 2024
@aniketmaurya aniketmaurya added the enhancement New feature or request label Jun 23, 2024
@aniketmaurya aniketmaurya self-assigned this Jun 23, 2024
Copy link
Collaborator

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

See comment, we need to keep code readable and this PR makes it less so. See if you can fix it.

src/litserve/specs/openai.py Outdated Show resolved Hide resolved
@aniketmaurya aniketmaurya enabled auto-merge (squash) June 24, 2024 14:11
@aniketmaurya aniketmaurya merged commit c469a13 into main Jun 24, 2024
19 checks passed
@aniketmaurya aniketmaurya deleted the openai/add-token-usage branch June 24, 2024 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants