## Creating a Retrieval-Augmented Generation (RAG) System Using LLMs
This project builds a Retrieval-Augmented Generation (RAG) system, integrating both OpenAI's GPT and Llama3.1 models. It enables document retrieval and efficient question answering, combining the power of large language models with context-based responses.

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
# Use gpt-3.5-turbo only if you have a paid api key
#MODEL = "gpt-3.5-turbo"
MODEL = "llama3.1"

### Initializing Model and Embeddings
Here, based on the selected model (gpt or llama), the corresponding language model and embedding class are initialized. If OpenAI’s GPT model is used, the OpenAIEmbeddings and ChatOpenAI classes are instantiated. Otherwise, the Ollama model and embeddings are initialized. A test response is invoked to verify the model is set up correctly.

In [None]:
from langchain_openai.embeddings import OpenAIEmbeddings 
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.llms import Ollama
from langchain_openai.chat_models import ChatOpenAI

if MODEL.startswith("gpt"):
    model= ChatOpenAI(api_key=OPENAI_API_KEY, model=MODEL)
    embeddings = OpenAIEmbeddings()
else:
    model = Ollama(model=MODEL)
    embeddings = OllamaEmbeddings()
    response = model.invoke("Where is the capital city of Afghanistan?")
print(response)

### Setting Up Output Parser and Simple Chain
In this cell, a string output parser (StrOutputParser) is defined to parse the model's output into a string format. The chain combines the model and parser, allowing you to ask questions like "Name 3 biggest cities of Afghanistan" and receive parsed results.

In [None]:
from langchain_core.output_parsers import StrOutputParser
# in case gpt model is being used. 
parser = StrOutputParser()

chain = model | parser
chain.invoke("Name 3 biggest cities of Afghanistan")

### Loading and Splitting a PDF Document
The cell uses the PyPDFLoader to load and split a PDF document into pages. This allows you to work with documents in the context of a Retrieval-Augmented Generation (RAG) system. The file afghanistan.pdf is loaded, and its content is split into manageable chunks (pages).

In [None]:
from langchain_community.document_loaders import PyPDFLoader
#You can replace your own pdf here, put your pdf file in the rag-local folder
loader =PyPDFLoader("afghanistan.pdf")
pages = loader.load_and_split()
pages

### Defining a Custom Question-Answer Prompt
This cell defines a custom PromptTemplate with placeholders for context and question. The template is designed to guide the model’s response based on the provided context. The formatted prompt is tested by filling it with sample data.

In [None]:
from langchain.prompts import PromptTemplate

template = """
APlease answer the question based on the below context. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="Here is the context", question="Here is the question"))

In [None]:
chain = prompt | model | parser
chain.invoke(
    {
        "context": "The capital city of Afghanistan is Kabul",
        "question": "Where is the capital of Afghanistan?"
    }
)

In [None]:
chain.input_schema.schema()

### Setting Up Hugging Face Embeddings and Vector Store
Hugging Face embeddings (sentence-transformers/all-MiniLM-L6-v2) are initialized in this cell for document vectorization. The vector store, DocArrayInMemorySearch, is created using the previously loaded PDF pages and the Hugging Face embeddings. This setup allows for efficient document retrieval in the RAG pipeline.

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import DocArrayInMemorySearch

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

vectorstore = DocArrayInMemorySearch.from_documents(
    pages, 
    embedding=embeddings
)

### Retrieving Documents Based on a Query
In this cell, the retriever is initialized from the vector store, and a query ("Afghanistan cities") is invoked. The retriever searches for relevant documents from the vectorized pages based on the query, returning the most relevant results.

In [None]:
retriever = vectorstore.as_retriever()
retriever.invoke("Afghanistan cities" )

### Combining Document Retrieval and Question-Answering Pipeline
In this cell, a more advanced chain is constructed to combine document retrieval with the question-answering pipeline. The retriever extracts the relevant context based on the question, and this context is passed through a series of components: the prompt template, the language model, and the output parser. This allows the model to generate a context-aware answer for the query."

In [None]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question")
    }
    |prompt
    |model
    |parser
)

chain.invoke({"question": "tell me about Bamiyan"})

In [None]:
questions = {
    "What are the languages of Afghanistan?",
    "How much is the population of Afghanistan"
}
for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question':question})}")
    print

### Streaming Responses from the Model
This cell showcases the ability to stream responses from the model. Instead of waiting for a complete answer, partial outputs are printed as they are generated by the model. This allows for real-time interaction with longer, more detailed queries.

In [None]:
for s in chain.stream({"question": "Give me an overview of Afghanistan"}):
    print(s, end="", flush=True)

In [None]:
chain.batch([{"question": q} for q in questions])