In [1]:
!pip install -q langchain_ollama langchain_community pdfminer-six sentence_transformers faiss-cpu

In [2]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
from langchain_community.document_loaders import PDFMinerLoader
from langchain_text_splitters.character import CharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_ollama import ChatOllama
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

In [4]:
def load_document(file_path):
    loader = PDFMinerLoader(file_path)
    documents = loader.load()
    return documents

In [5]:
def setup_vectorstore(documents):
    embeddings = HuggingFaceEmbeddings()
    text_splitter = CharacterTextSplitter(
        separator="/n",
        chunk_size=1000,
        chunk_overlap=200
    )
    doc_chunks = text_splitter.split_documents(documents)
    vectorstore = FAISS.from_documents(doc_chunks, embeddings)
    return vectorstore

In [6]:
def create_chain(vectorstore):
    llm = ChatOllama(
        model="llama3.1",
        temperature=0
    )
    retriever = vectorstore.as_retriever()
    memory = ConversationBufferMemory(
        llm=llm,
        output_key="answer",
        memory_key="chat_history",
        return_messages=True
    )
    chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        chain_type="map_reduce",
        memory=memory,
        verbose=True
    )
    return chain

In [7]:
FILE_PATH = "Gen_AI_Engineer_Machine_Learning_Engineer_Assignment.pdf"
vectorstore = setup_vectorstore(load_document(FILE_PATH))
conversation_chain = create_chain(vectorstore)

  embeddings = HuggingFaceEmbeddings()
  embeddings = HuggingFaceEmbeddings()


In [8]:
def query_assistant(question):
    response = conversation_chain({"question": question})
    return response["answer"]

In [9]:
answer = query_assistant("What are the deliverables")

  response = conversation_chain({"question": question})
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


Prompt after formatting:
[32;1m[1;3mSystem: Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return any relevant text verbatim.
______________________
Gen AI Engineer / Machine Learning Engineer Assignment

Part 1: Retrieval-Augmented Generation (RAG) Model for QA Bot

Problem Statement:

Develop a Retrieval-Augmented Generation (RAG) model for a Question Answering (QA)
bot for a business. Use a vector database like Pinecone DB and a generative model like
Cohere API (or any other available alternative). The QA bot should be able to retrieve
relevant information from a dataset and generate coherent answers.

Task Requirements:

1.

Implement a RAG-based model that can handle questions related to a provided
document or dataset.

2. Use a vector database (such as Pinecone)

to store and retrieve document

embeddings efficiently.

3. Test the model with several queries and show how well it retrieves and generates

accurate answer

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


Prompt after formatting:
[32;1m[1;3mSystem: Given the following extracted parts of a long document and a question, create a final answer. 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
______________________
Here are the deliverables mentioned in the document:

**Part 1: Retrieval-Augmented Generation (RAG) Model for QA Bot**

* A Colab notebook demonstrating the entire pipeline, from data loading to question answering.
* Documentation explaining the model architecture, approach to retrieval, and how generative responses are created.
* Provide several example queries and the corresponding outputs.

**Part 2: Interactive QA Bot Interface**

* A deployed QA bot with a frontend interface where users can upload documents and interact with the bot.
* Documentation on how the user can upload files, ask questions, and view the bot's responses.
* Example interactions demonstrating the bot's capabilities.
Human: What are the deliverables[0m

[1m

In [11]:
print(answer)

The deliverables mentioned in the document are:

1. A Colab notebook
2. Documentation explaining the model architecture and approach to retrieval
3. Several example queries and their corresponding outputs
4. A deployed QA bot with a frontend interface
5. Documentation on how users can interact with the bot
6. Example interactions demonstrating the bot's capabilities


In [13]:
answer = query_assistant("How to containerize the application")

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: What are the deliverables
Assistant: The deliverables mentioned in the document are:

1. A Colab notebook
2. Documentation explaining the model architecture and approach to retrieval
3. Several example queries and their corresponding outputs
4. A deployed QA bot with a frontend interface
5. Documentation on how users can interact with the bot
6. Example interactions demonstrating the bot's capabilities
Human: How to containerize the application
Assistant: Use Docker.
Follow Up Input: How to containerize the application
Standalone question:[0m


Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return any relevant text verbatim.
______________________
Gen AI Engineer / Machine Learning Engineer Assignment

Part 1: Retrieval-Augmented Generation (RAG) Model for QA Bot

Problem Statement:

Develop a Retrieval-Augmented Generation (RAG) model for a Question Answering (QA)
bot for a business. Use a vector database like Pinecone DB and a generative model like
Cohere API (or any other available alternative). The QA bot should be able to retrieve
relevant information from a dataset and generate coherent answers.

Task Requirements:

1.

Implement a RAG-based model that can handle questions related to a provided
document or dataset.

2. Use a vector database (such as Pinecone)

to store and retrieve document

embeddings efficiently.

3. Test the model with several queries and show how well it retrieves and

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m
Prompt after formatting:
[32;1m[1;3mSystem: Given the following extracted parts of a long document and a question, create a final answer. 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
______________________
Use Docker to containerize the application for easy deployment.
Human: Here is the standalone question:

How do I containerize the application?[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [14]:
print(answer)

Use Docker.


In [17]:
answer = query_assistant("What is the alternate of Streamlit")

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: What are the deliverables
Assistant: The deliverables mentioned in the document are:

1. A Colab notebook
2. Documentation explaining the model architecture and approach to retrieval
3. Several example queries and their corresponding outputs
4. A deployed QA bot with a frontend interface
5. Documentation on how users can interact with the bot
6. Example interactions demonstrating the bot's capabilities
Human: How to containerize the application
Assistant: Use Docker.
Human: How to containerize the application
Assistant: Use Docker.
Human: what should be used to build a simple frontend interface
Assistant: I don't know. The text only mentions Streamlit or Gradio as options for uploading PDF documents and asking questions, but it doesn't provide information on their general use cases 

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return any relevant text verbatim.
______________________
Gen AI Engineer / Machine Learning Engineer Assignment

Part 1: Retrieval-Augmented Generation (RAG) Model for QA Bot

Problem Statement:

Develop a Retrieval-Augmented Generation (RAG) model for a Question Answering (QA)
bot for a business. Use a vector database like Pinecone DB and a generative model like
Cohere API (or any other available alternative). The QA bot should be able to retrieve
relevant information from a dataset and generate coherent answers.

Task Requirements:

1.

Implement a RAG-based model that can handle questions related to a provided
document or dataset.

2. Use a vector database (such as Pinecone)

to store and retrieve document

embeddings efficiently.

3. Test the model with several queries and show how well it retrieves and

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")



[1m> Finished chain.[0m
Prompt after formatting:
[32;1m[1;3mSystem: Given the following extracted parts of a long document and a question, create a final answer. 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
______________________
According to the text, an alternative to Streamlit is Gradio.
Human: What is an alternative to Streamlit?[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [18]:
print(answer)

An alternative to Streamlit is Gradio.
