<a href="https://colab.research.google.com/github/AhlemAmmar/AI-Powered-FAQ-Bot-RAG-based-/blob/main/AI_Powered_FAQ_Bot_(RAG_based).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## **⚡ RAG (Retrieval-Augmented Generation) minimal implementation**


A RAG application combines retrieval and generation to give accurate, context-based answers.

🔹 Workflow



1.   **Indexing**
  * **Load**: Import data with Document Loaders.

  * **Split**: Break documents into smaller chunks.

   * **Store**: Save chunks in a Vector Store using embeddings.
2.   **Retrieval & Generation**
* **Retrieve**: Fetch relevant chunks for a user query.

* **Generate**: An LLM creates an answer using the query + retrieved data.

In [None]:
%pip install --quiet --upgrade langchain-text-splitters langchain-community langgraph

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m153.3/153.3 kB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.9/43.9 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.8/56.8 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.7/64.7 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m216.7/216.7 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does 

###




### 🔗 LangChain

LangChain is a framework that makes it easier to build applications powered by Large Language Models (LLMs).  
It connects LLMs with external data sources and tools, enabling **context-aware** and more powerful AI apps.

---
 ✨ **Key Features**
- **Components** → Abstractions for LLMs, retrievers, parsers, etc.  
- **Chains** → Combine components into sequences or graphs for complex workflows.  
- **Agents** → Let LLMs interact with their environment and decide actions.  
- **Indexing** → Load, structure, and query external data.  
- **LangServe** → Deploy LangChain apps as APIs.  

📖 [Learn more in the docs](https://python.langchain.com/docs/introduction)


🛠️ **LangSmith**

**LangSmith** is a platform for building **production-grade LLM applications**.  
It allows you to **monitor, evaluate, and debug** your applications so you can ship faster and with confidence.  

---

✨ **Key Capabilities**
- **Monitoring** → Track application performance in real time.  
- **Evaluation** → Assess outputs for quality and reliability.  
- **Debugging** → Inspect inputs, outputs, and intermediate steps.  
- **Optimization** → Continuously improve your LLM pipelines.  

📖 [Learn more in the docs](https://docs.langchain.com/langsmith/home)


### Setup

In [None]:
from google.colab import userdata
import os

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"]=userdata.get('LANGSMITH_API_KEY')



1. ***chat model: Google Gemini***



In [10]:
%pip install -qU "langchain[google-genai]"

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.4 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━[0m [32m0.7/1.4 MB[0m [31m19.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m22.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-generativeai 0.8.5 requires google-ai-generativelanguage==0.6.15, but you have google-ai-generativelanguage 0.7.0 which is incompatible.[0m[31m
[0m

In [16]:

if not os.environ.get("GOOGLE_API_KEY"):
  os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')


In [17]:
from langchain.chat_models import init_chat_model
# chat model
llm = init_chat_model("gemini-2.5-flash", model_provider="google_genai")

2. ***embeddings model: OpenAI***

In [31]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
#embeddings model:
embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")

3. ***vector store: In-memory***

In [23]:
%pip install -qU langchain-core

In [32]:
from langchain_core.vectorstores import InMemoryVectorStore
#vector store
vector_store = InMemoryVectorStore(embeddings)

### RAG chain

In [27]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

In [29]:


# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)


In [33]:
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

In [34]:
# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")

In [36]:
# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps

def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}



In [37]:

# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

In [39]:
response = graph.invoke({"question": "What is spaghetti?"})
print(response["answer"])

I don't know the answer. The provided context discusses types of human memory and best practices for code commenting, but it does not contain any information about spaghetti.
