# RAG using Ollama and Langchain

## Objective

The objective of this project is to develop a Retrieval-Augmented Generation (RAG) model that provides assistance based on user queries. The key components and goals of this project are as follows:

1. **Integration of LangChain**:
   - Utilize LangChain, a powerful framework designed for building language model applications, to streamline the development process.
   - Leverage its document loaders, text splitters, and retrievers to efficiently manage and process documents.

2. **Embedding Generation**:
   - Employ high-quality embeddings using Ollama's implementation of LLaMA 3 to capture the semantic meaning of texts.
   - Use the `OllamaEmbeddings` model to generate embeddings that enhance the retrieval accuracy of relevant documents.

3. **Contextual Response Generation**:
   - Enable the model to generate context-aware responses by utilizing the capabilities of LLaMA 3 integrated with Ollama.
   - Create dynamic prompts that guide the model in generating accurate information based on user input.

4. **Real-Time Information Retrieval**:
   - Implement an efficient retrieval mechanism that allows users to access relevant documents and information in real time.
   - Use a multi-query retriever from LangChain to generate variations of user queries, improving the likelihood of finding pertinent information.

5. **User-Friendly Interface**:
   - Develop an intuitive user interface using Gradio, allowing users to easily input their queries and receive prompt, relevant responses.
   - Ensure the interface is accessible and responsive, enhancing the overall user experience.



## Installations

To set up the RAG model, the following libraries and tools are required:

1. **Python**: Ensure Python (version 3.7 or later) is installed on your system.

2. **PyTorch**: PyTorch is an open-source machine learning library used for applications such as computer vision and natural language processing. It provides a flexible framework for building deep learning models and is particularly known for its ease of use and dynamic computation graph.
   - Install PyTorch based on availability of NVIDIA GPU and CUDA compatibility.

3. **Ollama**: 
   - Ollama provides an easy way to run and manage large language models, including LLaMA 3. Ensure you follow the [installation instructions from the Ollama website](https://ollama.com/download) to get started.
   - Pull the required models from ollama.

4. **LangChain**: A library for building language model applications.

5. **LangChain Community**: For additional functionality and extensions.

6. **Unstructured**: A library for processing unstructured data.
   - Includes various document loaders for processing different types of documents.

7. **Chroma Vector Store**: For managing the vector embeddings.

8. **LangChain Text Splitters**: For splitting text into manageable chunks.

9. **Other Required Libraries**: Ensure any additional libraries needed for specific functionalities are also installed.


In [3]:
# %pip install langchain_community unstructured unstructured[all-docs] langchain chromadb langchain-text-splitters

### Installation of Torch for CPU only System

In [4]:
# %pip install torch torchvision torchaudio

### Installation of Torch for CUDA enabled GPU systems

In [25]:
# Replace cu124 with your CUDA version
# %pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

### Pulling required models from Ollama

In [None]:
!ollama pull llama3 
!ollama pull nomic-embed-text

In [None]:
!ollama list

## Procedure
The following steps outline the procedure to set up and run the RAG model:

1. **Load the PDF Document**:
   - Use `UnstructuredPDFLoader` to load the legal document from which the model will retrieve information.

In [1]:
from langchain_community.document_loaders import UnstructuredPDFLoader
loader = UnstructuredPDFLoader(file_path="MotorACT.pdf")
data = loader.load()

  from .autonotebook import tqdm as notebook_tqdm


2. **Split the Text into Chunks**:
   - Use `RecursiveCharacterTextSplitter` to divide the loaded document into manageable chunks for efficient retrieval.

In [2]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

3. **Set Up the Vector Store**:
   - Create a vector store using Chroma to hold the embeddings of the document chunks.

In [4]:
import os
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings

embedding_function = OllamaEmbeddings(model="nomic-embed-text", show_progress=True)
current_dir = os.getcwd()
persistent_directory = os.path.join(current_dir, "db", "chroma_db_for_MotorAct")

if os.path.exists(persistent_directory):
    vector_db = Chroma(
        persist_directory=persistent_directory, 
        embedding_function=embedding_function,
        collection_name="local-rag"
    )
    print("Loaded existing Chroma vector store.")
else:
    vector_db = Chroma.from_documents(
        documents=chunks, 
        embedding=OllamaEmbeddings(model="nomic-embed-text", show_progress=True),
        collection_name="local-rag",
        persist_directory=persistent_directory
    )
    vector_db.persist()

Loaded existing Chroma vector store.


4. **Define the Prompt and Model**:
   - Create prompts for generating responses based on user queries and initialize the language model.

In [5]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_community.chat_models import ChatOllama


llm = ChatOllama(model="llama3", show_progress=True)

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI legal assistant. Your task is to generate five
    different versions of the given legal question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)


5. **Set Up the Retriever**:
   - Use a multi-query retriever to enhance the retrieval process, generating variations of user queries.

In [6]:
from langchain.retrievers.multi_query import MultiQueryRetriever
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(),
    llm=llm,
    prompt=QUERY_PROMPT
)

6. **Defining the Prompt Template**:
   - Create a prompt template for the AI legal assistant to ensure it answers user questions based solely on the provided legal context, generating accurate and context-specific responses.


In [7]:
template = """You are an AI legal assistant. Answer the question based ONLY on the following legal context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

7. **Function to Retrieve Answers**:
   - This function retrieves answers from the AI legal assistant based on the user's question, utilizing a defined prompt and retrieval chain.


In [8]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

def get_answer(question):
    if not question:
        return "Please enter a question."

    try:
        # Create the chain
        chain = (
            {"context": retriever, "question": RunnablePassthrough()}
            | prompt
            | llm
            | StrOutputParser()
        )
        answer = chain.invoke(question)
        return answer
    except Exception as e:
        return f"Error occurred: {str(e)}"

8. **Create the Gradio Interface**:
   - Develop a user-friendly interface using Gradio to allow users to input queries and receive responses.

In [9]:
import gradio as gr
with gr.Blocks() as iface:
    gr.Markdown("<h1 style='text-align: center;'>Legal Assistant Chatbot</h1>")
    
    with gr.Row():
        question_input = gr.Textbox(label="Enter your legal question:", lines=2, placeholder="Type your question here...")
        
    submit_button = gr.Button("Submit")
    answer_output = gr.Textbox(label="Answer", interactive=False)
    
    # Define what happens when the button is clicked
    submit_button.click(get_answer, inputs=question_input, outputs=answer_output)


In [10]:
iface.launch()

Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.




OllamaEmbeddings: 100%|██████████| 1/1 [00:03<00:00,  4.00s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.06s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.06s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.12s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.10s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.15s/it]
