# Pipeline 1 - Embedding

### Step 1. Loading

In this step, we load data from various sources. Make them ready to ingest.

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
DOCUMENT = os.getenv("DOCUMENT")

### Step 2. Parsing

##### Type 1. text document

In [5]:
from langchain.document_loaders import TextLoader
txt_path = DOCUMENT+"rag.txt"
txt_loader = TextLoader(txt_path)
text_documents = txt_loader.load()
#text_documents

##### Type 2. PDF document

We use PyMuPDFLoader in this experiment

In [6]:
from langchain.document_loaders import PyMuPDFLoader
pdf_path = DOCUMENT+ "2005.11401v4.pdf"
pdf_loader = PyMuPDFLoader(pdf_path)
pdf_documents = pdf_loader.load()

### Step 3. Chunking

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
text_chunks = text_splitter.split_documents(text_documents)
#documents[:3]

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
pdf_chunks = text_splitter.split_documents(pdf_documents)

In [9]:
chunks = text_chunks + pdf_chunks

### Step 4. Vectorizing

Option 1: Using openAI embedding API

In [10]:
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import DocArrayInMemorySearch

In [11]:
embeddings = OpenAIEmbeddings()
vectorstore = DocArrayInMemorySearch.from_documents(chunks, embeddings)

### Step 5. Storing

Trying to persist the vectordb with Chroma

In [4]:
from langchain.vectorstores import Chroma
persist_directory = os.getenv("STORAGE")
vectordb = Chroma.from_documents(documents=pdf_chunks,  embedding=embeddings, persist_directory=persist_directory)
vectordb.persist()

NameError: name 'pdf_chunks' is not defined

# Pipline 2. Retrieving

### Step 1. Query

In [5]:
#user_query = "What is retrieval augmented generation"
user_query = "Describe the RAG-Sequence Model?"

### Step 2. Search

Need to load from store if there is. Here the on memory vectorstore is used. 
There is opportunity to improve efficiency of search when the knowledgebase gets larger and more complicated (type of sources)

In [7]:
#retriever = vectorstore.as_retriever()
from langchain_openai.embeddings import OpenAIEmbeddings
#Load vectordb from persisted store
from langchain.vectorstores import Chroma
persist_directory = os.getenv("STORAGE")
embeddings = OpenAIEmbeddings()
newvectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
retriever = newvectordb.as_retriever()

### Step 3. Augmented Prompt

In [8]:
from langchain.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [17]:
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [18]:
from langchain_openai.chat_models import ChatOpenAI
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model="gpt-3.5-turbo")
question_answer_chain = create_stuff_documents_chain(model, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [22]:
user_query = "Which country is United State"
response = rag_chain.invoke({"input":user_query})
response

{'input': 'Which country is United State',
 'context': [Document(page_content='19', metadata={'author': '', 'creationDate': 'D:20210413004838Z', 'creator': 'LaTeX with hyperref', 'file_path': './document/2005.11401v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210413004838Z', 'page': 18, 'producer': 'pdfTeX-1.40.21', 'source': './document/2005.11401v4.pdf', 'subject': '', 'title': '', 'total_pages': 19, 'trapped': ''}),
  Document(page_content='19', metadata={'author': '', 'creationDate': "D:20210413004838Z00'00'", 'creator': 'LaTeX with hyperref', 'file_path': './document/2005.11401v4.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': "D:20240620010108Z00'00'", 'page': 18, 'producer': 'macOS Version 14.5 (Build 23F79) Quartz PDFContext, AppendMode 1.1', 'source': './document/2005.11401v4.pdf', 'subject': '', 'title': '', 'total_pages': 19, 'trapped': ''}),
  Document(page_content='country to host this international sports competition twice.” As Jeopardy questions are

In [23]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

In [25]:
contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    model, retriever, contextualize_q_prompt
)

In [26]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(model, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

### Step 4. Conversation Generating

Option 1.

In [32]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

question = "What is RAG?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})
chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
    ]
)

second_question = "How to implement it?"
ai_msg_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})
print(ai_msg_1["answer"])
print(ai_msg_2["answer"])

Retrieval-Augmented Generation (RAG) is a method that enhances the output of large language models by incorporating external knowledge sources for generating responses. It enables large language models to access authoritative information outside their original training data to improve the relevance and accuracy of their output. RAG extends the capabilities of language models without requiring retraining, making it a cost-effective way to enhance their performance.
To implement Retrieval-Augmented Generation (RAG), developers can introduce an information retrieval component that pulls data from external sources based on user input. This retrieved information, along with the user query, is then provided to the large language model for generating responses. By connecting the model to live social media feeds, news sites, or other updated sources, developers can ensure the model provides the latest and most relevant information to users, enhancing trust and accuracy.


Option 2. Case 1 - Specific knowledge query first

In [33]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

while True:
    question = input("Enter a query: ")
    

    if question == "exit":
            break
    
    response = rag_chain.invoke({"input": question, "chat_history": chat_history})
    chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=response["answer"]),
    ]
    )
    print(response["answer"])

Retrieval-Augmented Generation (RAG) optimizes large language models by referencing external authoritative knowledge bases to enhance responses without retraining. RAG extends the capabilities of large language models to specific domains or internal knowledge bases for improved accuracy and relevance. It allows for presenting accurate information with source attribution, increasing user trust in generative AI solutions.
Implementing Retrieval-Augmented Generation (RAG) involves redirecting a large language model to retrieve information from authoritative knowledge sources. Developers can connect the model to live data feeds, news sites, or other updated sources for real-time information. By providing the latest research, statistics, or news to the generative models, RAG ensures the output remains current and relevant.
Vector databases play a crucial role in storing and matching vector representations of user queries and database entries for efficient retrieval of relevant information. 

Option 2. Case 2 - common knowledge query first

In [34]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

while True:
    question = input("Enter a query: ")
    

    if question == "exit":
            break
    
    response = rag_chain.invoke({"input": question, "chat_history": chat_history})
    chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=response["answer"]),
    ]
    )
    print(response["answer"])

I don't know.
I don't know.
The United States is a country in North America.
The capital of the United States is Washington, D.C.
The capital of England is London.
Retrieval-Augmented Generation (RAG) optimizes large language models to incorporate external knowledge sources for generating responses.
RAG can be implemented by introducing an information retrieval component that pulls relevant data from external sources based on user input before generating a response.


Option 3.

In [37]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [38]:

while True:
    question = input("Enter a query: ")

    if question == "exit":
            break
    
    response = conversational_rag_chain.invoke(
        {"input": question},
        config={
        "configurable": {"session_id": "abc123"}
        },
    )  
    print(response["answer"])

Parent run f30edaea-eb47-418f-9a15-e3b1e9752789 not found for run 5ce879cb-37c2-4cee-a8b8-1833fdd0cc77. Treating as a root run.


I don't know.


Parent run 02f8e42b-bb76-408d-806e-dd832c7f9bd7 not found for run 74bc8a04-9955-489f-b3c5-1e9d3cc21f87. Treating as a root run.


Retrieval-Augmented Generation (RAG) optimizes large language models by referencing external knowledge bases. RAG enhances the capabilities of large language models without retraining them. It improves output relevance, accuracy, and usefulness in various contexts.


Parent run e1973177-1906-485a-91dd-8a46067c8e6e not found for run 65d5e329-9a48-425a-9637-86fd702aac02. Treating as a root run.


RAG implementation involves introducing an information retrieval component to pull relevant data. This data is combined with user input and given to the large language model (LLM) for generating responses. Implementing RAG enhances the LLM's output by incorporating external knowledge sources without the need for retraining the model.


Parent run 1e5fc832-b2ab-4114-b92d-40bd73153223 not found for run 76e8cfef-5724-4f5a-bc4d-ba4831dc0a55. Treating as a root run.


I don't know.


Parent run a5b14192-4ff3-4049-8c45-fdcc367e9bee not found for run 10935454-82b2-430a-904f-275aa90bead1. Treating as a root run.


I don't know.


Parent run 09d068eb-b274-4e12-b865-8477f0c485f7 not found for run 27184a4f-27fb-41b5-b3c8-c4abdea1509a. Treating as a root run.


I don't know.


Parent run f3440b76-2aad-45d5-9c07-02c8617158df not found for run a51de99c-c99b-407d-85f9-86b3c762b98b. Treating as a root run.


Besides text generation, other methods in natural language processing include text classification, named entity recognition, sentiment analysis, machine translation, and question-answering systems. These methods serve various purposes such as information extraction, summarization, and language understanding tasks.


Parent run 7b890905-b196-401b-9312-304b5428a739 not found for run 4b6ccd1e-8b15-4fc4-8615-27261b8eb90f. Treating as a root run.


Vector databases play a crucial role in semantic search technologies by storing vector representations of documents or data. These databases enable efficient matching of user queries to relevant vectors, allowing for accurate retrieval of information. The vector representations help in calculating relevance scores and retrieving semantically related documents for improved search results.


Parent run b9e2e058-75d4-44f6-9534-62eba833abe5 not found for run a3274270-182f-4dde-86db-092ff64bfa98. Treating as a root run.


Vector databases store numerical representations of data, enabling efficient information retrieval based on similarity metrics. They are used in various applications, such as semantic search and recommendation systems, to match queries with relevant vectors. These databases play a vital role in enhancing the performance of AI models by enabling faster and more accurate retrieval of information.


Parent run 293639de-43a7-49f4-b58d-f3ccc7209061 not found for run 5f849fc1-a626-40ca-98a8-74934f4ed86b. Treating as a root run.


One example of a vector database product is Milvus, an open-source vector similarity search engine by Zilliz. Milvus is designed for storing and processing vectors efficiently, enabling applications like recommendation systems, image search, and natural language processing tasks. It provides scalable and high-performance vector storage and retrieval capabilities for AI applications.


Parent run 38c0bb07-eebe-4438-a896-21ce16612bf5 not found for run 08ecb111-dfc1-4f31-87ad-057fea98aa3b. Treating as a root run.


The United States, commonly known as the U.S., is a country primarily located in North America.


Parent run 59e32d1b-9666-49ae-9ecb-8178d4f1c43e not found for run 433473f1-8fdc-46ea-ac66-2b2167057fff. Treating as a root run.


The capital of the United States is Washington, D.C.


In [39]:
for message in store["abc123"].messages:
    if isinstance(message, AIMessage):
        prefix = "AI"
    else:
        prefix = "User"

    print(f"{prefix}: {message.content}\n")

User: Where is US?

AI: I don't know.

User: What is RAG?

AI: Retrieval-Augmented Generation (RAG) optimizes large language models by referencing external knowledge bases. RAG enhances the capabilities of large language models without retraining them. It improves output relevance, accuracy, and usefulness in various contexts.

User: How to implement it?

AI: RAG implementation involves introducing an information retrieval component to pull relevant data. This data is combined with user input and given to the large language model (LLM) for generating responses. Implementing RAG enhances the LLM's output by incorporating external knowledge sources without the need for retraining the model.

User: what is the capital of US?

AI: I don't know.

User: Where is United State?

AI: I don't know.

User: Which country is United State?

AI: I don't know.

User: What are other method besides text generation?

AI: Besides text generation, other methods in natural language processing include text c

Option 4. Multiple Queries