# Retrieval-Augmented Generation (RAG)  

## 1. What is RAG?  

<img src="https://python.langchain.com/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png" alt="Langchain Pipeline" style="width:800px;">  

Retrieval-Augmented Generation (RAG) is a technique that enhances language model responses by retrieving relevant information from an external knowledge source, such as a database or document collection. This allows the model to generate more accurate, contextually relevant, and up-to-date answers while reducing hallucinations.



## 1.1 Your Task: Build a RAG Pipeline  

Your goal is to construct a **Retrieval-Augmented Generation (RAG) pipeline** using a provided `LLM` and `vector_store`.  

### Steps to follow:  
1. **Retrieve relevant documents** – Fetch the top `n` most relevant documents from the `vector_store` based on a user query.  
2. **Generate a response** – Use the `LLM` to process the retrieved documents and generate a well-informed answer.  

Refer to Langchain’s [RAG Documentation](https://python.langchain.com/docs/tutorials/rag/#preview) for guidance on implementing this pipeline effectively.



In [None]:
# Predefined imports
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.prompts import PromptTemplate


OLLAMA_URL = "http://localhost:11434"

#Load our models
llm = ChatOllama(model="smollm2:360m" ,base_url=OLLAMA_URL)
embedding_provider = OllamaEmbeddings(model="granite-embedding:278m", base_url=OLLAMA_URL)

#Load Vector store from disk
vector_store = FAISS.load_local("tei-client-index", embedding_provider, allow_dangerous_deserialization=True)

#Define a Prompt to use
prompt = PromptTemplate(
    input_variables=["question", "context"],
    template=(
        "You are an assistant for code understanding tasks. "
        "Use the following pieces of retrieved code to answer the question. "
        "If you don't know the answer, just say that you don't know. "
        "Try to answer in markdown syntax."
        "Question: {question}\n"
        "Context: {context}\n"
        "Answer:"
    )
)

### Add your implementation bellow:

In [None]:
from typing_extensions import List, TypedDict
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph

# Your code goes here

Ask the RAG pipeline a question:

In [None]:
from IPython.display import display, Markdown

response = graph.invoke({"question": "How do i use the embed methode?"})
display(Markdown(response["answer"]))

## (Optional) Optimize the Implementation  

Enhance the pipeline by adjusting the number of retrieved documents (`n`) or refining the prompt to improve the accuracy of generated answers.  
You can also experiment with larger models, such as [`llama3.2`](https://ollama.com/library/llama3.2), to achieve better performance.

In [None]:
# Your code goes here