usage-based-routing-ttl-on-cache #3412

sumanth13131 · 2024-05-03T05:31:28Z

RPM/TPM counter cache using a 1-minute window policy. It is better to put the TTL for that counter key.

vercel · 2024-05-03T05:31:32Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
litellm	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	May 14, 2024 4:36am

krrishdholakia · 2024-05-10T17:28:04Z

the ttl for usage is 1min because the TPM/RPM limits are across 1 minute @sumanth13131

can you share a repro of the problem you're facing?

krrishdholakia · 2024-05-11T16:29:52Z

bump on this? @sumanth13131

sumanth13131 · 2024-05-11T16:56:55Z

Hi @krrishdholakia,
I attached the Screenshot below.
Currently, all keys are set by usage-based-strategy as default TTL(-1), which means no TTL until Redis LRU Cache evicts.

PS: The same key can be possible by the next day also. This causes no deployment available if that time frame quota(rpm/tpm) is already exhausted.

krrishdholakia · 2024-05-11T17:10:28Z

@sumanth13131 so if i understand the problem

keys can have the same name +24 hours later. This means incorrect tpm/rpm values are used?

Can we fix this by having a more precise key name? including the current date as part of it

we do this for lowest latency routing -

litellm/litellm/router_strategy/lowest_latency.py

Line 325 in d33e494

precise_minute = f"{current_date}-{current_hour}-{current_minute}"

sumanth13131 · 2024-05-11T17:13:44Z

Putting TTL is fine right?
Please check this code change
https://github.com/BerriAI/litellm/pull/3412/files

krrishdholakia · 2024-05-11T17:16:25Z

You're right. But instead of hardcoding it, can we have it be a controllable param

Like in lowest latency routing -

litellm/litellm/router_strategy/lowest_latency.py

Line 31 in d33e494

class RoutingArgs(LiteLLMBase):

Then in the test, let's have it set to a very low amount (5s?) -> and check if it's set + evicted within the expected timeframe

sumanth13131 · 2024-05-11T17:20:54Z

Sure

sumanth13131 · 2024-05-14T04:38:49Z

Hi @krrishdholakia,
I've updated the PR as we talked about.

krrishdholakia · 2024-05-14T04:55:53Z

litellm/tests/test_router_caching.py

@@ -134,6 +134,56 @@ async def test_acompletion_caching_on_router():
        traceback.print_exc()
        pytest.fail(f"Error occurred: {e}")

+@pytest.mark.asyncio
+async def test_completion_caching_on_router():


hey @sumanth13131 how does this test your cache ttl change?

sumanth13131 · 2024-05-14T08:46:53Z

litellm/tests/test_router_caching.py

+            redis_port=os.environ["REDIS_PORT"],
+            cache_responses=True,
+            timeout=30,
+            routing_strategy_args={"ttl": 10},


Hi @krrishdholakia
followed the same as latency-based-routing.
default TTL value is 1min(60secs), for this test case alone here we set it for only 10sec.

usage-based-routing-ttl-on-cache

3bc6b5d

vercel bot deployed to Preview May 3, 2024 05:32 View deployment

krrishdholakia self-requested a review May 10, 2024 17:28

Merge branch 'BerriAI:main' into usage-based-routing-ttl-on-cache

978672a

vercel bot deployed to Preview May 14, 2024 03:38 View deployment

addressed comments

71e0294

vercel bot deployed to Preview May 14, 2024 04:36 View deployment

krrishdholakia reviewed May 14, 2024

View reviewed changes

sumanth13131 commented May 14, 2024

View reviewed changes

krrishdholakia merged commit 2cda5a2 into BerriAI:main May 21, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usage-based-routing-ttl-on-cache #3412

usage-based-routing-ttl-on-cache #3412

sumanth13131 commented May 3, 2024 •

edited

Loading

vercel bot commented May 3, 2024 •

edited

Loading

krrishdholakia commented May 10, 2024

krrishdholakia commented May 11, 2024

sumanth13131 commented May 11, 2024

krrishdholakia commented May 11, 2024

sumanth13131 commented May 11, 2024 •

edited

Loading

krrishdholakia commented May 11, 2024

sumanth13131 commented May 11, 2024

sumanth13131 commented May 14, 2024

krrishdholakia May 14, 2024

sumanth13131 May 14, 2024

usage-based-routing-ttl-on-cache #3412

usage-based-routing-ttl-on-cache #3412

Conversation

sumanth13131 commented May 3, 2024 • edited Loading

vercel bot commented May 3, 2024 • edited Loading

krrishdholakia commented May 10, 2024

krrishdholakia commented May 11, 2024

sumanth13131 commented May 11, 2024

krrishdholakia commented May 11, 2024

sumanth13131 commented May 11, 2024 • edited Loading

krrishdholakia commented May 11, 2024

sumanth13131 commented May 11, 2024

sumanth13131 commented May 14, 2024

krrishdholakia May 14, 2024

Choose a reason for hiding this comment

sumanth13131 May 14, 2024

Choose a reason for hiding this comment

sumanth13131 commented May 3, 2024 •

edited

Loading

vercel bot commented May 3, 2024 •

edited

Loading

sumanth13131 commented May 11, 2024 •

edited

Loading