# Introduction
For the simple demo of RAG pipeline, lets build a simple chatbot that can answer questions from PDF. By the end of this section, you will have a good overview of how to build a simple RAG chatbot using Langchain and Python. For this exercise, We will be using OpenAI models for this so please keep the OPENAI_API_KEY handy.




## Ingestion:

For this exercise, we will be using a couple of text files for the demonstration purpose.Those files are available in the "data" folder.   

Lets start! 

#### Install Necessary Libraries:


In [None]:
from openai._compat import model_json
!pip install langchain langchain_community langchain_openai chromadb langchainhub umap-learn matplotlib

## Setup the necessary Environment Variables:

In [None]:
# import os
# os.environ['OPENAI_MODEL_NAME'] = 'gpt-4-1106-preview'
# os.environ['OPENAI_API_KEY'] = 'sk-XXXXXXXXXXX'
# os.environ['OPENAI_API_BASE'] = 'https://api.openai.com'

### Better way to manage Environment Variables:

Keep these env variables in .env file for better management. You can use `python-dotenv` library to load the .env file.
.env file should be in the root directory of the project. Following is the example of .env file:

```shell
OPENAI_MODEL_NAME=gpt-4-1106-preview
OPENAI_API_KEY=sk-XXXXXXXXXXX
OPENAI_API_BASE=https://api.openai.com
```

load these env variables using the following code:



In [None]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())  # read local .env file

### Load the Text Data:

The first steps is to load text data into memory. We will use DirectoryLoader to load text files from the data folder. The `DirectoryLoader` class is used to load text files from a directory. The `TextLoader` class is used to load text files. The `load` method is used to load the text files into memory. The `page_content` attribute is used to access the text content of the loaded documents.

In [None]:
from langchain_community.document_loaders.text import TextLoader
from langchain_community.document_loaders.directory import DirectoryLoader

loader = DirectoryLoader('../data', glob="./*.txt", loader_cls=TextLoader)
documents = loader.load()
documents[0].page_content[:100]

### Split the Text into small chunks:

This segment breaks the text into smaller pieces or chunks using `RecursiveCharacterTextSplitter`. This is helpful for processing large documents in manageable parts. The `chunk_size` parameter defines the maximum size of each chunk, while `chunk_overlap` allows for some overlap between consecutive chunks to ensure continuity in the context. `len(docs)` shows the total number of chunks created. This is one of the most popular way to create chunks. We will discuss more ways in subsequent articles.


In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)
docs = text_splitter.split_documents(documents)

len(docs)


### Create Embeddings

Next, we convert the split documents into embeddings using `OpenAIEmbeddings` and stores these embeddings in a vector store (`Chroma`). Embeddings are vector representations of text, useful for various NLP tasks. This process is essential for creating a searchable database of text chunks based on their semantic content. 

In [None]:
from langchain_community.vectorstores.chroma import Chroma
from langchain_openai import OpenAIEmbeddings
import os
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Chroma.from_documents(documents=docs, embedding=embeddings)


Until this point in time, its a onetime process to create embeddings and store them in the vector store. You can use this vector store for any downstream tasks like Q&A, Chatbot, etc.

## Retrieval:

Now, since, we have the data and database ready, we can start building retrieval and generation part. So here, whenever user asks a question, we will do following
1. Convert the question into embedding
2. Gather approximate nearest neighbours of the query embedding from database
3. The gather text chunks are fed to LLM along with original query
4. LLM Generates answer to the question

Lets take a look at this in action.
#### Initializes a Retriever:

To be able to fetch the relevant documents, we initialise a retriever from the previously created `vectorstore`. This retriever is responsible for fetching relevant document chunks based on a given query.

The output from retriever is then formatted so that we can pass it to LLM for generation.

In [None]:
retriever = vectorstore.as_retriever() #initializes a retriever

def format_docs(docs):  
    return "\n\n".join(doc.page_content for doc in docs)  

retrieval_chain = retriever | format_docs # Format docs outputted by retrieval

## Generation
### Initialise the Large Language Model:

In [None]:
from langchain_openai import ChatOpenAI

# llm = ChatOpenAI(temperature=0)
llm = ChatOpenAI(model="gpt-4-1106-preview")


### Define a Main chain for RAG:

The RAG chain is defined here, integrating the retriever, document formatting function, prompt template, language model, and output parser. This chain outlines the entire process of retrieving context, formatting it, prompting the LM with this context and a question, and parsing the LM's response.

In [None]:
from langchain_core.prompts import ChatPromptTemplate  
from langchain_core.output_parsers import StrOutputParser  
from langchain_core.runnables import RunnablePassthrough  

PROMPT = """  
    You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:  
"""  

  
rag_chain = (  
    {"context": retrieval_chain , "question": RunnablePassthrough()}  
    | ChatPromptTemplate.from_template(PROMPT)  
    | llm  
    | StrOutputParser()  
)

You are ready with basic RAG pipeline!! 

Now, If you invoke this LLM chain with question, you will get answers.

In [None]:
rag_chain.invoke("Do you offer vegetarian food?")

In [None]:
rag_chain.invoke("What loan do you offer?")

# Visualisation of Retriever


Next, we will try to visualise 
1. Embedding or Vector Space in 3 dimensions. 
2. Locate query in the vector space
3. How we fetch k nearest neighbours

 



In [None]:
import umap
import numpy as np
from tqdm import tqdm

doc_strings = [doc.page_content for doc in docs]
vectors = embeddings.embed_documents(doc_strings)
# umap_transformer = umap.UMAP(random_state=0, transform_seed=0).fit(vectors) # For 2 dimensions
umap_transformer = umap.UMAP(random_state=0, transform_seed=0, n_components=3).fit(vectors) # For 3 dimensions


def umap_embed(vectors, umap_transformer):
    umap_embeddings = np.array([umap_transformer.transform([vector])[0] for vector in tqdm(vectors)])
    return umap_embeddings

global_embeddings = umap_embed(vectors, umap_transformer)

global_embeddings

In [None]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(global_embeddings[:, 0], global_embeddings[:, 1], global_embeddings[:, 2], s=10)
ax.set_title('Embeddings')
# plt.axis('off')

In [None]:
def calc_global_embeddings(query, embeddings, retriever, umap_transformer, embed_function, global_embeddings):
    q_embedding = embeddings.embed_query(query)

    docs = retriever.get_relevant_documents(query)
    page_contents = [doc.page_content for doc in docs]
    vectors_content_vectors = embeddings.embed_documents(page_contents)

    query_embeddings = embed_function([q_embedding], umap_transformer)
    retrieved_embeddings = embed_function(vectors_content_vectors, umap_transformer)

    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(global_embeddings[:, 0], global_embeddings[:, 1], global_embeddings[:, 2], s=10, color='gray')
    ax.scatter(query_embeddings[:, 0], query_embeddings[:, 1], query_embeddings[:, 2], s=150, marker='X', color='r')
    ax.scatter(retrieved_embeddings[:, 0], retrieved_embeddings[:, 1], retrieved_embeddings[:, 2], s=50, facecolors='none', edgecolors='g')
    ax.set_title(f'{query}')
    # plt.axis('off')
    plt.show()

In [None]:
calc_global_embeddings("Do you offer vegetarian food?", embeddings, retriever, umap_transformer, umap_embed, global_embeddings)

In [None]:
calc_global_embeddings("What loan do you offer?", embeddings, retriever, umap_transformer, umap_embed,
                       global_embeddings)
