Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usage-based-routing-ttl-on-cache #3412

Merged

Conversation

sumanth13131
Copy link
Contributor

@sumanth13131 sumanth13131 commented May 3, 2024

RPM/TPM counter cache using a 1-minute window policy. It is better to put the TTL for that counter key.

Copy link

vercel bot commented May 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 14, 2024 4:36am

@krrishdholakia
Copy link
Contributor

the ttl for usage is 1min because the TPM/RPM limits are across 1 minute @sumanth13131

can you share a repro of the problem you're facing?

@krrishdholakia krrishdholakia self-requested a review May 10, 2024 17:28
@krrishdholakia
Copy link
Contributor

bump on this? @sumanth13131

@sumanth13131
Copy link
Contributor Author

Hi @krrishdholakia,
I attached the Screenshot below.
Currently, all keys are set by usage-based-strategy as default TTL(-1), which means no TTL until Redis LRU Cache evicts.

PS: The same key can be possible by the next day also. This causes no deployment available if that time frame quota(rpm/tpm) is already exhausted.

Screenshot 2024-05-11 at 10 23 43 PM

@krrishdholakia
Copy link
Contributor

@sumanth13131 so if i understand the problem

  • keys can have the same name +24 hours later. This means incorrect tpm/rpm values are used?

Can we fix this by having a more precise key name? including the current date as part of it

we do this for lowest latency routing -

precise_minute = f"{current_date}-{current_hour}-{current_minute}"

@sumanth13131
Copy link
Contributor Author

sumanth13131 commented May 11, 2024

Putting TTL is fine right?
Please check this code change
https://github.com/BerriAI/litellm/pull/3412/files

@krrishdholakia
Copy link
Contributor

You're right. But instead of hardcoding it, can we have it be a controllable param

Like in lowest latency routing -

class RoutingArgs(LiteLLMBase):

Then in the test, let's have it set to a very low amount (5s?) -> and check if it's set + evicted within the expected timeframe

@sumanth13131
Copy link
Contributor Author

Sure

@sumanth13131
Copy link
Contributor Author

Hi @krrishdholakia,
I've updated the PR as we talked about.

@@ -134,6 +134,56 @@ async def test_acompletion_caching_on_router():
traceback.print_exc()
pytest.fail(f"Error occurred: {e}")

@pytest.mark.asyncio
async def test_completion_caching_on_router():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @sumanth13131 how does this test your cache ttl change?

redis_port=os.environ["REDIS_PORT"],
cache_responses=True,
timeout=30,
routing_strategy_args={"ttl": 10},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @krrishdholakia
followed the same as latency-based-routing.
default TTL value is 1min(60secs), for this test case alone here we set it for only 10sec.

@krrishdholakia krrishdholakia merged commit 2cda5a2 into BerriAI:main May 21, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants