implement OpenAI token usage #150

aniketmaurya · 2024-06-21T09:39:00Z

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

RIght now we don't provide a way to track prompt tokens and completion tokens and send a dummy value in the response.

This PR will proposes a way to track and update the usage info for completion_tokens, prompt_tokens and total_tokens.

Proposal

Return the prompt_tokens and completion_tokens along with the generated content and LitServe will automatically update it and send in response.

class OpenAIWithUsage(ls.LitAPI):
    def setup(self, device):
        self.model = None

    def predict(self, x):
        yield {"role": "assistant", "content": "This is a generated output", "prompt_tokens": 5, "completion_tokens": 10}

Just added test for the expected flow right now, if you approve @lantiga this proposal then I can update the rest asap.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

lantiga · 2024-06-21T12:03:51Z

thanks for this!

It's a possible design, and it's simple to understand (simpler than requiring to populate context).

A few considerations:

since we're talking about a streaming API, should we request the user to provide the total over the generation, or should we return what was generated in that step and we accumulate? I'd rather do the former because we need to make fewer assumptions, users need to do the work themselves but it's more explicit on their side too
let's make sure everything works out in case of batching (it does, but let's always include batching in the design example)
let's make returning this optional: e.g. returning only at the end of generation would be fine (this is what OpenAI does too)

aniketmaurya · 2024-06-21T12:08:54Z

since we're talking about a streaming API, should we request the user to provide the total over the generation, or should we return what was generated in that step and we accumulate? I'd rather do the former because we need to make fewer assumptions, users need to do the work themselves but it's more explicit on their side too

Yes, letting users do it explicitly would be cleaner and less assumptious.

let's make sure everything works out in case of batching (it does, but let's always include batching in the design example)

Thanks for bringing this, I will add a test for this too.

let's make returning this optional: e.g. returning only at the end of generation would be fine (this is what OpenAI does too)

+1

williamFalcon · 2024-06-21T12:42:41Z

great idea @aniketmaurya

codecov · 2024-06-23T11:09:11Z

Codecov Report

Attention: Patch coverage is 60.00000% with 26 lines in your changes missing coverage. Please review.

Project coverage is 80%. Comparing base (53de741) to head (a5d85aa).

Additional details and impacted files

@@         Coverage Diff         @@
##           main   #150   +/-   ##
===================================
+ Coverage    79%    80%   +1%     
===================================
  Files        13     13           
  Lines       933    985   +52     
===================================
+ Hits        737    792   +55     
+ Misses      196    193    -3

src/litserve/specs/openai.py

lantiga

Looks good

See comment, we need to keep code readable and this PR makes it less so. See if you can fix it.

src/litserve/specs/openai.py

track token usage

c8c2d10

aniketmaurya requested a review from lantiga as a code owner June 21, 2024 09:39

Merge branch 'main' into openai/add-token-usage

8b0271c

aniketmaurya added 6 commits June 21, 2024 13:59

update

6c2ca25

update

0846541

complex example

98ca931

Merge branch 'main' into openai/add-token-usage

d324dfa

add batching test

9c6eb1c

refactor loop

b540ff2

aniketmaurya added 2 commits June 23, 2024 12:35

update

557b400

add comment

1c4be69

aniketmaurya changed the title ~~[WIP] implement OpenAI token usage~~ implement OpenAI token usage Jun 23, 2024

aniketmaurya added the enhancement New feature or request label Jun 23, 2024

aniketmaurya self-assigned this Jun 23, 2024

aniketmaurya commented Jun 23, 2024

View reviewed changes

src/litserve/specs/openai.py Show resolved Hide resolved

fix punctuations

faee3c3

lantiga approved these changes Jun 24, 2024

View reviewed changes

src/litserve/specs/openai.py Outdated Show resolved Hide resolved

aniketmaurya added 2 commits June 24, 2024 15:01

update type hint

d57e0bb

Merge branch 'main' into openai/add-token-usage

a5d85aa

aniketmaurya enabled auto-merge (squash) June 24, 2024 14:11

aniketmaurya merged commit c469a13 into main Jun 24, 2024
19 checks passed

aniketmaurya deleted the openai/add-token-usage branch June 24, 2024 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement OpenAI token usage #150

implement OpenAI token usage #150

aniketmaurya commented Jun 21, 2024 •

edited

Loading

lantiga commented Jun 21, 2024

aniketmaurya commented Jun 21, 2024

williamFalcon commented Jun 21, 2024

codecov bot commented Jun 23, 2024 •

edited

Loading

lantiga left a comment

implement OpenAI token usage #150

implement OpenAI token usage #150

Conversation

aniketmaurya commented Jun 21, 2024 • edited Loading

What does this PR do?

Proposal

PR review

Did you have fun?

lantiga commented Jun 21, 2024

aniketmaurya commented Jun 21, 2024

williamFalcon commented Jun 21, 2024

codecov bot commented Jun 23, 2024 • edited Loading

Codecov Report

lantiga left a comment

Choose a reason for hiding this comment

aniketmaurya commented Jun 21, 2024 •

edited

Loading

codecov bot commented Jun 23, 2024 •

edited

Loading