## Basic RAG Pipeline

In this notebook we will set up a basic RAG system using **Langchain** and Meta's **Llama** 3.2 1B model from Huggingface.

We will use the **FAISS** vector database to store the document indexing.

## Import dependencies

In [2]:
# Core dependencies
import torch
from torch import cuda
import gc
import os
import warnings
import pickle
from time import time

# ML/NLP dependencies
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
from langchain.llms import HuggingFacePipeline
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import ConversationalRetrievalChain 
from langchain.prompts import PromptTemplate 
from accelerate import Accelerator

  from .autonotebook import tqdm as notebook_tqdm


## Load LLM models

The **sentence transformer model** is used for generating vector embeddings from the documents. These vector emebeddings are used by the vector store when indexing documents and calculating the vector or semantic similarity between documents.

The **Llama Instruct model** is used for text generation -- generating the response to the user query.

#### Tokenizer model

In [3]:
import os
ACCESS_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN") 
device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'  
MODEL_ID = 'meta-llama/Llama-3.2-1B-Instruct'  

In [4]:
def load_tokenizer():
     
    # Load tokenizer and model pipeline
    tokenizer = AutoTokenizer.from_pretrained(
        MODEL_ID, 
        token=ACCESS_TOKEN,  
        device=device)   
    return tokenizer

tokenizer = load_tokenizer()

#### Model for text generation

In [5]:
def load_model(): 
    try:
        accelerator = Accelerator(cpu=True if device == 'cpu' else False)  # Ensure we use the CPU with accelerate
        model = AutoModelForCausalLM.from_pretrained(
            MODEL_ID, 
            device_map=device,
            torch_dtype=torch.float32  # Ensure we load in float32 precision (default)
        )
        model = accelerator.prepare(model)  # Prepare the model for CPU execution
        return model
    except Exception as e:
        print(f"Error loading model: {str(e)}")
        return None
    
model = load_model()

### Embeddings model

In [6]:

embeddings = None
# Embeddings and vector database setup
model_name = "sentence-transformers/all-mpnet-base-v2"  
model_kwargs = {"device": "cuda" if cuda.is_available() else "cpu"}  # Use GPU if available
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs) 

## Load Vector DB

In [25]:
index_name = 'vector_db'

# Get the environment variable for the index folder
index_path = os.getenv("INDEX_PATH", default=os.path.join(os.getcwd(), '..', 'index', 'vector_db')) 

In [27]:
vectordb = FAISS.load_local(folder_path=index_path,index_name=index_name,embeddings=embeddings) #, allow_dangerous_deserialization = True)
# Load the FAISS vector store (retriever_vectordb)
retriever = vectordb.as_retriever(search_kwargs={"k":2})

### Test the retriever

In [28]:
docs = retriever.get_relevant_documents("How to remove an item from a list?")
for doc in docs:
    print(doc.page_content)
    print(doc.metadata) 

SeeUnpacking Argument Listsfordetailsontheasteriskinthisline.
5.2 Thedel statement
Thereisawaytoremoveanitemfromalistgivenitsindexinsteadofitsvalue: the del statement. Thisdiffersfrom
the~list.pop() method which returns a value. Thedel statement can also be used to remove slices from a list
orcleartheentirelist(whichwedidearlierbyassignmentofanemptylisttotheslice). Forexample:
>>> a = [-1, 1, 66.25, 333, 333, 1234.5]
>>> del a[0]
>>> a
[1, 66.25, 333, 333, 1234.5]
>>> del a[2:4]
>>> a
[1, 66.25, 1234.5]
>>> del a[:]
>>> a
[]
del canalsobeusedtodeleteentirevariables:
>>> del a
Referencing the namea hereafter is an error (at least until another value is assigned to it). We’ll find other uses for
del later.
5.2. Thedel statement 37
{'source': '../data\\tutorial.pdf', 'page': 42}
range.
list.clear()
Removeallitemsfromthelist. Similarto del a[:].
list.index(x[,start[,end ]])
Returnzero-basedindexinthelistofthefirstitemwhosevalueisequalto x. Raisesa ValueError ifthere
isnosuchitem.
The optio

## RAG Pipeline 

### Define prompt template

In [10]:
# Prompt template setup
prompt_template = PromptTemplate(
    input_variables=["question", "context"],
    template="""
    You are an python documentation assistant designed to provide a summarized, complete and holistic answer to the question using the information from the given context. 
    Do not provide general information or assumptions outside the given context. 
    If you dont know the answer, just say that you don't know.
    Context: {context}
    Question: {question}
    """
) 


In [12]:
query_pipeline = transformers.pipeline(
    "text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_new_tokens=512, 
    top_k=5, 
    temperature=0.2,  
    repetition_penalty=1.2,  
    token=ACCESS_TOKEN,
    eos_token_id=tokenizer.eos_token_id,  
    early_stopping=False,  # Disable early stopping to allow complete responses
    # device=0 if cuda.is_available() else -1,
    )  


In [13]:
llm = HuggingFacePipeline(pipeline=query_pipeline)

In [14]:
def get_llm_response(question, chat_history):
    start_time = time()
    # Conversational retrieval chain setup
    qa_1 = ConversationalRetrievalChain.from_llm(   
        llm=llm,   
        retriever = retriever,
        return_source_documents=True, 
        combine_docs_chain_kwargs={"prompt": prompt_template}, 
        verbose = True,
        ) 
    chain = qa_1({"question": question, 'chat_history': chat_history})
    stop_time  = time()
    print(f'Response generation took {stop_time - start_time:.2f} seconds.')
    return chain


### Test the pipeline

In [15]:
question = "how to remove an item from a list?"
chat_history = []

In [16]:
chain = get_llm_response(question,chat_history) 

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.




[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
    You are an python documentation assistant designed to provide a summarized, complete and holistic answer to the question using the information from the given context. 
    Do not provide general information or assumptions outside the given context. 
    If you dont know the answer, just say that you don't know.
    Context: SeeUnpacking Argument Listsfordetailsontheasteriskinthisline.
5.2 Thedel statement
Thereisawaytoremoveanitemfromalistgivenitsindexinsteadofitsvalue: the del statement. Thisdiffersfrom
the~list.pop() method which returns a value. Thedel statement can also be used to remove slices from a list
orcleartheentirelist(whichwedidearlierbyassignmentofanemptylisttotheslice). Forexample:
>>> a = [-1, 1, 66.25, 333, 333, 1234.5]
>>> del a[0]
>>> a
[1, 66.25, 333, 333, 1234.5]
>>> del a[2:4]
>>> a
[1, 66.25, 1234.5]
>>> del a[:]
>>> a
[]
del

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)



[1m> Finished chain.[0m

[1m> Finished chain.[0m
Response generation took 67.87 seconds.


In [17]:
print(chain["answer"])

 Answer: use the `del` statement.

### Example Use Case:

```python
my_list = [1, 2, 3, 4, 5]

# Remove first element by its position
print(my_list.remove(1)) # Output: 2
```

In this example, we're removing the first element at index 0 because there's no such thing called "remove" function but instead we have `remove()` method provided by Python List class. 

Note - In some cases like when trying to delete multiple elements with same index then only one will get deleted while others remain intact. Also note that deleting an element doesn’t change any existing indices so they still point to original values before deletion.
