**# Step 1 : Start the Redis**

In [1]:
#Redis is not pre-installed on Colab, so you'll need to install it using apt-get:
!apt-get update
!apt-get install redis-server


0% [Working]            Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [Waiting for headers] [Connecting to security.ubuntu.com (185.125.190.81)] [Connected to cloud.r-                                                                                                    Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
0% [2 InRelease 12.7 kB/128 kB 10%] [Waiting for headers] [Waiting for headers] [Waiting for headers                                                                                                    Get:3 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
0% [2 InRelease 15.6 kB/128 kB 12%] [Waiting for headers] [Waiting for headers] [Waiting for headers                                                                                                    Get:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Get:5 http://security.ubuntu.com/ubuntu jammy-secu

In [2]:
#After installing Redis, start the server with the following command:
!redis-server --daemonize yes
#This starts the Redis server in the background (daemon mode).

In [3]:
#Verify Redis is Running
#You can check if Redis is running by pinging it:
!redis-cli ping
#If everything is working correctly, you should see the responseas PONG:

PONG


In [4]:
#To interact with Redis in Python, install the redis Python package:
!pip install redis

Collecting redis
  Downloading redis-5.2.1-py3-none-any.whl.metadata (9.1 kB)
Downloading redis-5.2.1-py3-none-any.whl (261 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.5/261.5 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: redis
Successfully installed redis-5.2.1


**# Step 2: RAG Cache with Open-Source LLM Integration with help of redis**

In [5]:
!pip install transformers



In [6]:
# Import necessary libraries
import hashlib  # For generating unique hash keys for caching
import redis  # Redis client for caching
from transformers import pipeline  # Hugging Face library for text generation
from typing import Optional  # For optional type hinting

class RAGCache:
    def __init__(self, redis_host: str = 'localhost', redis_port: int = 6379):
        """Initialize Redis connection."""
        self.redis_client = redis.StrictRedis(host=redis_host, port=redis_port, decode_responses=True)

    @staticmethod
    def generate_key(query: str) -> str:
        """Generate a unique cache key for a query using hashing."""
        return hashlib.sha256(query.encode()).hexdigest()

    def get_cached_response(self, query: str) -> Optional[str]:
        """Retrieve the cached response for a query."""
        key = self.generate_key(query)
        return self.redis_client.get(key)

    def cache_response(self, query: str, response: str, ttl: int = 3600):
        """Cache the response with a Time-To-Live (TTL)."""
        key = self.generate_key(query)
        self.redis_client.setex(key, ttl, response)

    def process_query(self, query: str, generate_response_func):
        """
        Process the query:
        - Check if the response is cached.
        - If not, generate the response, cache it, and return.
        """
        cached_response = self.get_cached_response(query)
        if cached_response:
            print("Cache Hit 🚀")
            return cached_response

        print("Cache Miss 🔄")
        response = generate_response_func(query)
        self.cache_response(query, response)
        return response

# Load the LLM using Hugging Face Transformers
def generate_response_with_llm(query: str) -> str:
    """
    Generate a response using an open-source LLM.
    """
    # Using Hugging Face's pipeline for conversational AI or question answering
    model_pipeline = pipeline("text-generation", model="gpt2")  # Replace 'gpt2' with your preferred model
    generated = model_pipeline(query, max_length=100, num_return_sequences=1)
    return generated[0]['generated_text']

In [7]:
# Execute the RAGCache functionality
# Initialize RAGCache with Redis server details
rag_cache = RAGCache(redis_host='127.0.0.1', redis_port=6379)

In [8]:
# Defining a sample query
query = "What is Retrieval-Augmented Generation (RAG)?"

In [9]:
# Process the query using the caching mechanism
response = rag_cache.process_query(query, generate_response_with_llm)

Cache Miss 🔄


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


In [10]:
print(response)

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is the next generation digital media, including digital media that is designed or constructed to be used simultaneously with the content or other devices that the user will be consuming on their mobile device.

Why use "G" or "RTG?"

RTG is an electronic media, particularly mobile media devices, and is primarily used for video, audio, and text-only communications


In [11]:
# Process the same query again to check whether your caching mechanism is working or not. Now it should cache hit
response1 = rag_cache.process_query(query, generate_response_with_llm)

Cache Hit 🚀


In [12]:
print(response1)

What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is the next generation digital media, including digital media that is designed or constructed to be used simultaneously with the content or other devices that the user will be consuming on their mobile device.

Why use "G" or "RTG?"

RTG is an electronic media, particularly mobile media devices, and is primarily used for video, audio, and text-only communications


**# Step 3: Finally stop the redis**

In [13]:
# Stop the Redis server (optional, to free resources)
!redis-cli shutdown