# Different kinds of Caching to reduce latency of GenAI Application

There is [KV-Caching](https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/unleashing-ptu-token-throughput-with-kv-cache-friendly-prompt-on/ba-p/4170161), [Semantic Caching with APIM](https://learn.microsoft.com/en-us/azure/api-management/azure-openai-enable-semantic-caching) and [Semantic Caching with Redis](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/azure-cache-for-redis/cache-tutorial-semantic-cache.md)


This notebook is explore Semantic Caching with Redis

**References:**
1. https://github.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/tree/main
2. https://github.com/Azure/The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications/blob/main/notebooks-with-techniques/semantic-caching/semantic-caching.ipynb
3. https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/azure-cache-for-redis/cache-tutorial-semantic-cache.md
4. https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/azure-cache-for-redis/quickstart-create-redis-enterprise.md

### Install Python Libraries

In [None]:

%pip install openai langchain redis tiktoken

In [14]:
#%pip install -U langchain-openai

### Configure LLM Models

- Import libraries
- Configure access information and paths
- Configure model parameters
- Set Redis connection
- Configure Azure Cache for Redis to be used as a semantic cache

In [1]:
import openai
import redis
import os
import langchain


from langchain_openai import AzureOpenAIEmbeddings
from langchain.globals import set_llm_cache
from langchain.cache import RedisSemanticCache
import time
from dotenv import load_dotenv
import os
from langchain.globals import set_llm_cache
from langchain_openai import AzureChatOpenAI

# Load environment variables
load_dotenv()

AZURE_ENDPOINT=os.getenv("AZURE_OPENAI_ENDPOINT")
API_KEY=os.getenv("AZURE_OPENAI_API_KEY")
API_VERSION=os.getenv("AZURE_OPENAI_API_VERSION")

LLM_MODEL_NAME="gpt-4o"

EMBEDDINGS_MODEL_NAME='text-embedding-ada-002'

REDIS_ENDPOINT=os.getenv("REDIS_ENDPOINT")
REDIS_PASSWORD=os.getenv("REDIS_PASSWORD")



In [2]:
import os

os.environ["OPENAI_API_VERSION"] = API_VERSION
os.environ["AZURE_OPENAI_ENDPOINT"] = AZURE_ENDPOINT
os.environ["AZURE_OPENAI_API_KEY"] = API_KEY

In [3]:
llm = AzureChatOpenAI(
    deployment_name=LLM_MODEL_NAME,
    model_name="gpt-4o",
    openai_api_key=API_KEY,
    azure_endpoint=AZURE_ENDPOINT,
    openai_api_version=API_VERSION,
)

In [4]:
from langchain_openai import AzureOpenAIEmbeddings
embeddings = AzureOpenAIEmbeddings(
    model=EMBEDDINGS_MODEL_NAME,
)

In [5]:
# This example assumes TLS is enabled. If not, use "redis://" instead of "rediss://
redis_url = "rediss://:" + REDIS_PASSWORD + "@"+ REDIS_ENDPOINT

# set up the semantic cache for your llm
set_llm_cache(RedisSemanticCache(redis_url = redis_url, embedding=embeddings, score_threshold=0.05))

#note: you can use score_threshold to change how sensitive the semantic cache is. The lower the score, the less likely it is to use a cached result.

### Run the LLM
Try runnning again with different queries to see what is cached and what is not.

### First request (not cached)

In [6]:
%%time
response = llm("Write a poem about cute puppies.")
print(response)


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  from langchain_community.vectorstores.redis.schema import (


content="In a world of joy where dreams take flight,\nPuppies prance in morning light.\nWith eyes so bright and hearts so pure,\nTheir playful spirits, a delightful cure.\n\nTiny paws that pitter-pat,\nWagging tails that go rat-tat-tat.\nSoft fur like a gentle cloud,\nTheir happy barks, both clear and loud.\n\nThey chase their shadows, tumble, and play,\nBringing sunshine to any day.\nWith every lick and nuzzle sweet,\nThey make our hearts skip a beat.\n\nIn their presence, worries fade,\nA gift of love they serenade.\nBoundless energy, endless cheer,\nInnocence that knows no fear.\n\nCurled up snug in cozy beds,\nDreams of bones and soft grass spreads.\nA bundle of joy, a furry friend,\nIn puppy love, there's no pretend.\n\nSo here's to puppies, small and grand,\nWith their magic, life feels planned.\nA touch of wonder, a dash of glee,\nIn their eyes, the world we see." additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 207, 'prompt_tokens': 14

### Second request, exact match (cached)
As the second request is cached, it returns in <1s. This was the instruction, where every character was an exact match.

In [7]:
%%time
response = llm("Write a poem about cute puppies.")
print(response)

content="In a world of joy where dreams take flight,\nPuppies prance in morning light.\nWith eyes so bright and hearts so pure,\nTheir playful spirits, a delightful cure.\n\nTiny paws that pitter-pat,\nWagging tails that go rat-tat-tat.\nSoft fur like a gentle cloud,\nTheir happy barks, both clear and loud.\n\nThey chase their shadows, tumble, and play,\nBringing sunshine to any day.\nWith every lick and nuzzle sweet,\nThey make our hearts skip a beat.\n\nIn their presence, worries fade,\nA gift of love they serenade.\nBoundless energy, endless cheer,\nInnocence that knows no fear.\n\nCurled up snug in cozy beds,\nDreams of bones and soft grass spreads.\nA bundle of joy, a furry friend,\nIn puppy love, there's no pretend.\n\nSo here's to puppies, small and grand,\nWith their magic, life feels planned.\nA touch of wonder, a dash of glee,\nIn their eyes, the world we see." additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 207, 'prompt_tokens': 14

### Third request, semantically similar match (cached)
Cached responses do not have to be the exact same input string- semantically similar questions will also match, saving cost and speed.

In [10]:
%%time
response = llm("Write a poem about cute little dogs.")
print(response)

content="In a world of bustling days and nights,\nWhere worries weave their tangled plight,\nThere exists a joy that's pure and bright—\nThe charm of dogs, so sweet, so light.\n\nWith noses twitching, tails that wag,\nThrough fields of daisies, they bound and brag.\nTiny paws that dance in play,\nChasing butterflies away.\n\nTheir eyes, two orbs of gleaming trust,\nIn every glance, a love robust.\nEars that perk at every sound,\nIn their presence, peace is found.\n\nThey greet each morning with a cheer,\nTheir happiness so crystal-clear.\nA bark, a yip, a playful tug,\nEach moment shared, a heartfelt hug.\n\nIn cozy nooks or sunny spots,\nThey curl in dreams, in sleep, they plot\nAdventures grand in lands unknown,\nYet always find their way back home.\n\nTheir fur, a palette soft and warm,\nIn shades that nature does adorn.\nFrom snowy white to midnight black,\nA spectrum on a fluffy track.\n\nOh, little dogs, with hearts so full,\nYou make the world a delightful lull.\nIn every wag, i