<div align="center">
  <img src="https://media.licdn.com/dms/image/v2/D5622AQEgdGokK9B4QQ/feedshare-shrink_800/B56ZRpOiggGQAk-/0/1736932208572?e=2147483647&v=beta&t=GdR_RfRuudlFrspbx5MLaWGn9cDkuUVZE-BaS8EDem8" alt="Http cat">
</div>

Cache-Augmented Generation (CAG)
--------------------------------
CAG enhances LLM responses by storing and reusing previous query-response pairs in a cache to:

-   ✔ Reduce redundant computations
-   ✔ Improve response time
-   ✔ Enhance consistency
-   ✔ Lower API costs

🛠 How Does CAG Work?
----------------------

-   Cache Lookup: Before generating, the model checks if the query exists in the cache.

-   Cache Hit → Returns a stored response immediately.

-   Cache Miss → Calls the LLM, generates a response, and stores it in the cache for future use.

🚀 Advanced CAG Techniques
----------------------------
-   🔹 Hybrid CAG-RAG: Use cache first, then fall back on retrieval.
-   🔹 LRU Cache: Remove least-used entries to prevent memory overflow.
-   🔹 Distributed Caching: Store responses across multiple nodes for scalability.

| Feature               | CAG (Cache-Augmented Generation)         | RAG (Retrieval-Augmented Generation)   |
|----------------------|----------------------------------|----------------------------------|
| **Core Mechanism**   | Uses caching to store and reuse past responses | Dynamically retrieves documents from an external knowledge base |
| **Response Time**    | Faster, as cached responses are instantly retrieved | Slower, due to real-time retrieval and processing |
| **Knowledge Updates** | Limited; requires cache updates for new information | Dynamically retrieves the latest information |
| **Storage**          | Requires local storage for caching responses | Requires storage for vector embeddings (FAISS, ChromaDB, etc.) |
| **Use Case**         | Best for queries with repetitive patterns | Best for queries requiring real-time knowledge retrieval |
| **Computational Cost** | Lower, as it avoids frequent LLM calls | Higher, as each query involves retrieval and generation |


REQUIRED LIBRARIES
--------------------

-   PyPDF → To extract text from PDFs

-   Langchain → Framework to build the CAG pipeline

-   FAISS → Vector database for similarity search

-   Ollama LLM → Open-source model for answering queries

-   Pickle → To store and retrieve cached responses efficiently

-   Hashlib → For generating hash-based cache keys


PACKAGE INSTALLATION 
-----------------------
```bash
pip install PyPDF
pip install -U langchain-community
pip install faiss-cpu
pip install -U langchain_ollama
pip install pickle-mixin
```


In [None]:
import hashlib
import pickle
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_ollama import OllamaEmbeddings 
from langchain.vectorstores import FAISS
from langchain_ollama.llms import OllamaLLM
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

# Initialize cache (dictionary-based storage)
cache = {}
CACHE_FILE = "cache.pkl"

# Load existing cache if available
def load_cache():
    global cache
    try:
        with open(CACHE_FILE, "rb") as f:
            cache = pickle.load(f)
    except FileNotFoundError:
        cache = {}

# Save cache to a file
def save_cache():
    with open(CACHE_FILE, "wb") as f:
        pickle.dump(cache, f)

load_cache()

# Load and process PDF data
def load_pdf(file_path):
    loader = PyPDFLoader(file_path)
    pages = loader.load()
    
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=500,
        chunk_overlap=100,
        length_function=len
    )
    docs = text_splitter.split_documents(pages)
    return docs

# Create embeddings and FAISS vector store
def create_vectorstore(docs):
    embeddings = OllamaEmbeddings(model="nomic-embed-text")
    vectorstore = FAISS.from_documents(docs, embeddings)
    return vectorstore

# Generate response with caching
def generate_response(query, vectorstore):
    query_hash = hashlib.md5(query.encode()).hexdigest()
    
    if query_hash in cache:
        print("✅ Cache Hit! Returning stored response.")
        return cache[query_hash]
    
    print("❌ Cache Miss. Retrieving relevant context and generating response...")
    retriever = vectorstore.as_retriever()
    context_docs = retriever.get_relevant_documents(query)
    context = "\n".join([doc.page_content for doc in context_docs])
    
    llm = OllamaLLM(model="llama3.2")
    prompt = PromptTemplate(
        template="""
        Use the following context to answer the query:
        Context: {context}
        Query: {query}
        Response: 
        """,
        input_variables=["context", "query"]
    )
    chain = LLMChain(llm=llm, prompt=prompt)
    response = chain.invoke({"context": context, "query": query})
    
    cache[query_hash] = response
    save_cache()
    return response

# Example usage
pdf_path = r"C:\Users\santh\OneDrive\Desktop\MDTM37\python_assignment (1).pdf"  # Update with your PDF path
docs = load_pdf(pdf_path)
vectorstore = create_vectorstore(docs)
query = "What is the main topic of the document?"
print(generate_response(query, vectorstore))


In [None]:
query = "What is the main topic of the document?"
print(generate_response(query, vectorstore))

In [None]:
query = "What is the main topic ?"
print(generate_response(query, vectorstore))

In [None]:
query = "What is the main topic ?"
print(generate_response(query, vectorstore))