#### Resources: [Contextual RAG](https://python.langchain.com/docs/how_to/contextual_compression/)

In [None]:
%%capture
!pip install  -U langchain torch langchain_community transformers sentence-transformers PyPDF2  chromadb faiss-cpu

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) y
Token is valid (permission: write

In [None]:
import torch
from PyPDF2 import PdfReader
from transformers import AutoModelForCausalLM,AutoTokenizer,pipeline
from langchain_community.document_loaders.csv_loader import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain import HuggingFacePipeline
from langchain.docstore.document import Document
from langchain_core.output_parsers import StrOutputParser
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough



### Data Ingestion

In [None]:
file_path = '/content/VIGGO.pdf'
reader = PdfReader(file_path)
documents = []
for page in reader.pages:
    text = page.extract_text()
    documents.append({"text": text})

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
docs = text_splitter.split_documents([Document(page_content=d['text']) for d in documents])

### Embeddings

In [None]:
model_name = "sentence-transformers/all-MiniLM-l6-v2"
device = "cpu"
normalize = False

embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs={"device": device},
    encode_kwargs={"normalize_embeddings": normalize}
)

  embeddings = HuggingFaceEmbeddings(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## Indexing

In [None]:
vectorstore = Chroma.from_documents(docs, embeddings)

## Retriever



In [None]:
retriever=vectorstore.as_retriever()

## Contextual Retriever

### LLM

In [None]:
model_name = "meta-llama/Meta-Llama-3.1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir="./cache")

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    cache_dir="./cache",
    torch_dtype=torch.float16,
    device_map="auto",
    low_cpu_mem_usage=True
)


pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,     # Enable sampling
)

llm = HuggingFacePipeline(pipeline=pipe)

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

Device set to use cuda:0
  llm = HuggingFacePipeline(pipeline=pipe)


#### Compression Retriever

In [None]:
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

In [None]:
# checking compressed doc
compressed_docs = compression_retriever.invoke("How do ViGGO's 'List' slots differ from other NLG datasets like E2E and Hotel?")
compressed_docs


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


[Document(metadata={}, page_content="Given the following question and context, extract any part of the context *AS IS* that is relevant to answer the question. If none of the context is relevant return NO_OUTPUT. \n\nRemember, *DO NOT* edit the extracted parts of the context.\n\n> Question: How do ViGGO's 'List' slots differ from other NLG datasets like E2E and Hotel?\n> Context:\n>>>\nDA types, on the other hand, is skewed slightly to-\nward fewer inform DA instances and a higher pro-\nportion of the less prevalent DAs in the validation\nand test sets (see Figure 2). With the exact parti-\ntion sizes indicated in the diagram, the ﬁnal ratio\nof samples is approximately 7:5 : 1 : 1:5.\n2.5 ViGGO vs. E2E\nOur new dataset was constructed under different\nconstraints than the E2E dataset. First, in ViGGO\nwe did notallow any omissions of slot mentions,\nas those are not justiﬁable for data-to-text gen-\neration with no previous context, and it makes\nthe evaluation ambiguous. Second, the 

In [None]:

template = """
You are a helpful assistant that answers questions based on the following context.
If you don't find the answer in the context, just say that you don't know.
Context: {context}

Question: {input}

Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

rag_chain = (
    {"context": compression_retriever, "input": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## Querying

In [None]:
response = rag_chain.invoke("How do ViGGO's 'List' slots differ from other NLG datasets like E2E and Hotel?")
print(response)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Human: 
You are a helpful assistant that answers questions based on the following context.
If you don't find the answer in the context, just say that you don't know.
Context: [Document(metadata={}, page_content='Given the following question and context, extract any part of the context *AS IS* that is relevant to answer the question. If none of the context is relevant return NO_OUTPUT. \n\nRemember, *DO NOT* edit the extracted parts of the context.\n\n> Question: How do ViGGO\'s \'List\' slots differ from other NLG datasets like E2E and Hotel?\n> Context:\n>>>\nDA types, on the other hand, is skewed slightly to-\nward fewer inform DA instances and a higher pro-\nportion of the less prevalent DAs in the validation\nand test sets (see Figure 2). With the exact parti-\ntion sizes indicated in the diagram, the ﬁnal ratio\nof samples is approximately 7:5 : 1 : 1:5.\n2.5 ViGGO vs. E2E\nOur new dataset was constructed under different\nconstraints than the E2E dataset. First, in ViGGO\nwe did n