# <b>Getting started with Generative-AI </b>
Submitted By: <b><i>Jenish Twayana</i></b>  

Submitted Date: <b><i>9th September, 2024</i></b>  

## Final Project
<b>Objective:</b>    

Build a Retrieval-Augmented Generation (RAG) based chatbot to efficiently
extract and provide information from the PDF of the Constitution of Nepal 2072.

## The project is hosted in Streamlit.
### Here is the link: https://askaboutnepalconstitution.streamlit.app/

### Set up the environment variables for LangChain Tracing
It logs the tracing data in the LangSmith web interface.

In [1]:
import os

openai_api_key = os.getenv('OPENAI_API_KEY')
pinecone_api_key = os.getenv('PINECONE_API_KEY')

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ['LANGCHAIN_API_KEY'] = os.getenv('LANGCHAIN_API_KEY')
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.langchain.plus"
os.environ["LANGCHAIN_PROJECT"] = "rag-assignment"

### Import the necessary packages and libraries

In [2]:
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_pinecone import PineconeVectorStore
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pinecone import Pinecone
from langchain.chains import RetrievalQAWithSourcesChain
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

### Initialise the Chat Model

In [3]:
# Initialize OpenAI
#LLM model
llm = ChatOpenAI(  
    openai_api_key=openai_api_key,  
    model_name='gpt-3.5-turbo',  
    temperature=0.0  
)

In [4]:
index_name = "rag-assignment"

### Initialize the Pinecone

In [5]:
pc = Pinecone(api_key=pinecone_api_key)
index = pc.Index(index_name)

### Initialize the Embeddings model

In [6]:
model_name = "text-embedding-ada-002"

embeddings = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=openai_api_key
)

# Initialize the vectorstore (Pinecone)

In [7]:
vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings, pinecone_api_key=pinecone_api_key, text_key='nepal-constitution-2072')

### Load the document (pdf)

In [8]:
file = 'nepal-constitution-2072.pdf'

In [9]:
# load documents
loader = PyPDFLoader(file)
documents = loader.load()

### Split the document

In [10]:
# split documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=150)
docs = text_splitter.split_documents(documents)

### Add the document chunks in the vectorstore (Pinecone)

In [None]:
vectorstore_from_docs = PineconeVectorStore.from_documents(
    docs,
    index_name=index_name,
    embedding=embeddings
)

In [11]:
print(index.describe_index_stats())

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 936}},
 'total_vector_count': 936}


### Testing the document chunks retrieval

In [13]:
query = "nepal constitution"
vectorstore.max_marginal_relevance_search(query, k=4)

[Document(metadata={'page': 0.0, 'source': '/tmp/tmpp7o83c8n/tmp.pdf'}, page_content='1 \n  \n \n \n \nTHE CONSTITUTION OF NEPAL'),
 Document(metadata={'page': 197.0, 'source': '/tmp/tmpp7o83c8n/tmp.pdf'}, page_content='198 \n until election to the President or Vice -President is held and he or she assumes \noffice.  \n281. Appraisal and review of special rights : The Government of Nepal shall make \nappraisal  and review of the implementation of special rights of the women and \nDalit  community and impacts thereof, on the basis of human development \nindex, concurrently with a national census to be held in every ten years.  \n282. Ambassadors and special emissaries : (1) The President may, on the basis of the \nprinciple of inclusion, appoint Nepalese ambassadors, and special emissaries \nfor any specific purposes.  \n   (2) The President shall receive letters of credentials from foreign \nambassadors and diplomatic representat ives.  \n283. Appointments to be made  in accordance wit

### QA with source chain

In [14]:
qa_with_sources = RetrievalQAWithSourcesChain.from_chain_type(  
    llm=llm,  
    chain_type="stuff",  
    retriever=vectorstore.as_retriever()  
)  
qa_with_sources.invoke(query) 

{'question': 'nepal constitution',
 'answer': "The Constitution of Nepal was published on September 20, 2015, in the Nepal Gazette. It emphasizes the people's sovereign right, autonomy, self-rule, freedom, sovereignty, territorial integrity, national unity, independence, and dignity of Nepal. It also aims to end discrimination and oppression and promote social and cultural solidarity, tolerance, harmony, and unity in diversity.\n",
 'sources': '/tmp/tmpp7o83c8n/tmp.pdf'}

### Initializing memory for the RAG

In [15]:
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    output_key='answer'
)

### Initializing Retriever

In [16]:
retriever=vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 4})

### QA chain for RAG

In [17]:
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory,
    chain_type="stuff", 
    return_source_documents=True,
    return_generated_question=True,
)

In [18]:
question = "who made the constitution?"
result = qa.invoke({"question": question})

In [19]:
for k,v in result.items():
    print(f'{k}: {v}')
    print('\n')

question: who made the constitution?


chat_history: [HumanMessage(content='who made the constitution?'), AIMessage(content='The Constitution of Nepal was made by the Constituent Assembly.')]


answer: The Constitution of Nepal was made by the Constituent Assembly.


source_documents: [Document(metadata={'page': 6.0, 'source': '/tmp/tmpp7o83c8n/tmp.pdf'}, page_content='7 \n Do hereby pass and promulgate this Constitution,  through the Constituent Assembly, \nin order to fulfil the aspirations for sustainable peace, good governance, development \nand prosperity through the federal , democratic , republic an, system of governance.'), Document(metadata={'page': 0.0, 'source': '/tmp/tmpp7o83c8n/tmp.pdf'}, page_content='1 \n  \n \n \n \nTHE CONSTITUTION OF NEPAL'), Document(metadata={'page': 214.0, 'source': '/tmp/tmpp7o83c8n/tmp.pdf'}, page_content='the time of commencement of this Constitution, in excess of the number \nspecified in this Constitution,  shall continue to hold their respect