# AI Chatbot Search PDF Files

## Install Packages

If you are using MacOS, please use `pip3`.

`-qU` means `quiet` and `Upgrade`

In [1]:
!pip install -qU \
    langchain==0.0.276 \
    openai==0.27.10 \
    tiktoken==0.4.0 \
    pinecone-client==2.2.2 \
    wikipedia==1.4.0 \
    pypdf==3.15.4

## Import Packages

In [2]:
from langchain.embeddings import OpenAIEmbeddings   
from langchain.chat_models import ChatOpenAI
from langchain.vectorstores import Pinecone
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationSummaryMemory
from langchain.retrievers import WikipediaRetriever

import pinecone
import time

from config import OPENAI_API_KEY, PINECONE_API_KEY, PINECONE_ENVIRONMENT, PINECONE_INDEX_NAME, EMBEDDING_MODEL

  from tqdm.autonotebook import tqdm


## Helper Functions

In [3]:
def print_source(result):
    sources = result["source_documents"]
    for i in range(min(3, len(sources))):
        print("="*60)
        print(f"Source [{i+1}] \t File: [{sources[i].metadata['source']}] \t Page: [{int(sources[i].metadata['page'])}]")
        print("="*60)
        print(sources[i].page_content)
        print("="*60)
        print()



def print_wiki_source(result):
    sources = result["source_documents"]
    for i in range(min(3, len(sources))):
        print("="*60)
        print(f"Source [{i+1}] \t Title: [{sources[i].metadata['title']}]")
        print(f"URL: [{sources[i].metadata['source']}]")
        print("="*60)
        print(sources[i].page_content)
        print("="*60)
        print()



def print_answer(result):
    print("="*30)
    print(" "*10 + "Question")
    print("="*30)
    print(result["question"])
    print("="*30)
    print()

    print("="*30)
    print(" "*10 + "Answer")
    print("="*30)
    print(result["answer"])
    print("="*30)
    print()
    


def if_existed(query, vectorstore):
    is_existed = True
    try:
        res = vectorstore.max_marginal_relevance_search(
            query=query,
            k=4,
            fetch_k=20,
            lambda_mult=0.5
        )
    except:
        is_existed = False
    
    return is_existed



def search(query, vector_chain, vectorstore, wiki_chain):
    if if_existed(query, vectorstore):
        res = vector_chain({"question": query})
        print_answer(res)
        print_source(res)
    else:
        print("Let me grab Wikipedia to answer your question......")
        res = wiki_chain({"question": query})
        print_answer(res)
        print_wiki_source(res)


## Initialize OpenAI Chat Model

Langchain offers LLMs and Chat models. For the purpose of conversation, [chat model](https://python.langchain.com/docs/modules/model_io/models/chat/) is used. By default, `gpt-3.5-turbo` [OpenAI model](https://platform.openai.com/docs/models/gpt-3-5) is used. 


In [4]:
llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY)

embedding_model = OpenAIEmbeddings(
    openai_api_key=OPENAI_API_KEY, 
    model=EMBEDDING_MODEL
)

print("="*30)
print("OpenAI initialization: OK")
print("="*30)
print()

OpenAI initialization: OK



## Initialize Pinecone

If the index does not exist in your Pinecone, it will automatically create a new one. 

- `metric='cosine'`: This is often used to find similarities between different documents. The advantage is that the scores are normalized to [-1,1] range. You can choose other options listed [here](https://docs.pinecone.io/docs/indexes#distance-metrics).
- `dimension=1536`: The OpenAI `text-embedding-ada-002` embedding has a dimension of 1536
- There is a limitation for the free plan for Pinecone. Please refer to the [starter plan](https://docs.pinecone.io/docs/indexes#starter-plan) for more details

In [5]:
pinecone.init(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_ENVIRONMENT
)

if PINECONE_INDEX_NAME not in pinecone.list_indexes():
    # we create a new index if it doesn't exist
    pinecone.create_index(
        name=PINECONE_INDEX_NAME,
        metric='cosine',
        dimension=1536  # 1536 dim of text-embedding-ada-002
    )
    # wait for index to be initialized
    time.sleep(1)

pinecone_index = pinecone.Index(PINECONE_INDEX_NAME)
pinecone_stats = pinecone_index.describe_index_stats()
print("="*30)
print("Pinecone initialization: OK")
print(pinecone_stats)
print("="*30)
print()

Pinecone initialization: OK
{'dimension': 1536,
 'index_fullness': 0.01366,
 'namespaces': {'': {'vector_count': 1366}},
 'total_vector_count': 1366}



## Retriever

### Pinecone retriever

A Pinecone vector store is used as a retriever.

- `search_type="mmr"`: [Maximal Marginal Relevance (MMR)](https://medium.com/tech-that-works/maximal-marginal-relevance-to-rerank-results-in-unsupervised-keyphrase-extraction-22d95015c7c5) is used for search method. MMR can include diversity in the search results while maintaining query relevance at the same time. Other search types can be found [here](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.pinecone.Pinecone.html#langchain.vectorstores.pinecone.Pinecone.as_retriever).
  - `"lambda_mult": 0.5`: This offers the optimal mix of diversity and accuracy in the result set
- `"k": 5`: Include top 5 search result in order to give the model more context


### Wikipedia Retriever

Include a [Wikipedia Retriever](https://python.langchain.com/docs/integrations/document_loaders/wikipedia) to handle the situation where the search entity does not exist in the Pinecone vector store. 

I think there is bug where the [Langchain Pinecone Search function](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.pinecone.Pinecone.html#langchain.vectorstores.pinecone.Pinecone.max_marginal_relevance_search) raises a Validation Error when it cannot find the vector from the Pinecone vector store. As a result, I created a [GitHub Issue](https://github.com/langchain-ai/langchain/issues/10111) to the Langchain community. 


In [6]:
vectorstore = Pinecone(pinecone_index, embedding_model, "text")
retriever = vectorstore.as_retriever(
    search_type="mmr", 
    search_kwargs={
                    "k": 5,
                    "lambda_mult": 0.5, # the optimal mix of diversity and accuracy in the result set
                    }
)

wiki_retriever = WikipediaRetriever()

print("="*30)
print("Pinecone retriever: OK")
print("="*30)
print()


Pinecone retriever: OK



## Chat Memory

Conversation Summary Memory is used based on [this article](https://www.pinecone.io/learn/series/langchain/langchain-conversational-memory/). This type of memory generates less tokens than the Conversation Buffer Memory, so it can keep the chat history in a more efficient way. 

In [7]:
memory = ConversationSummaryMemory(
    llm=llm, 
    memory_key="chat_history", 
    input_key='question', 
    output_key='answer', 
    return_messages=True
)
print("="*30)
print("Chat memory: OK")
print("="*30)
print()


Chat memory: OK



## Conversational Retrieval Chain

Create two Conversational Retrieval Chains for both Pinecone vector store and Wikipedia retriever. 

- `chain_type="stuff"`: For demo purpose, `"stuff"` can be used to include all relevant documents. However, `"map-reduce"` or `"refine"` can be used as the number of PDFs grows. For more details, please refer [here](https://www.youtube.com/watch?v=DXmiJKrQIvg&t=300s&ab_channel=SophiaYang).

For more information, please refer [here](https://python.langchain.com/docs/use_cases/chatbots). 

In [8]:
conversation_qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    memory=memory,
    return_source_documents=True, 
    verbose=False
)

wiki_qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=wiki_retriever,
    memory=memory,
    return_source_documents=True,
    verbose=False
)

print("="*30)
print("Conversational Retrieval Chain: OK")
print("="*30)
print()

Conversational Retrieval Chain: OK



## Testing

### Test 1: Ask a query to the Pinecone vector store

In [9]:
query = "How to diagnose Resistant Hypertension?"
search(query, conversation_qa_chain, vectorstore, wiki_qa_chain)

          Question
How to diagnose Resistant Hypertension?

          Answer
The process for diagnosing Resistant Hypertension involves several steps. First, proper office and out-of-office blood pressure measurements should be taken. This includes ambulatory blood pressure monitoring (ABPM) or home blood pressure monitoring if ABPM is not accessible. 

Next, pharmacotherapy should be optimized, taking into consideration clinical inertia. This means ensuring that the patient is receiving the appropriate medications at the optimal dosages. It is recommended to include a diuretic as part of the medication regimen.

In addition, adherence to medication should be assessed, as poor adherence can contribute to uncontrolled blood pressure. Other factors, such as lifestyle factors and underlying health conditions, should also be considered.

Overall, the diagnosis of Resistant Hypertension should be based on a combination of proper blood pressure measurements, optimization of medication therap

### Test 2: Ask a follow-up question

The model can recognize `it` refers to `Resistant Hypertension` in the previous question. This is called ["anaphora"](https://youtu.be/FFRnDRcbQQU?si=7Uoc3204dw_mzQz4&t=1792) in linguistics. 

In [10]:
query = "How to treat it?"
search(query, conversation_qa_chain, vectorstore, wiki_qa_chain)

          Question
How to treat it?

          Answer
The provided context does not explicitly state the specific treatment for Resistant Hypertension. However, it does mention the "ReHOT randomized study" and "add-on pharmacologic therapy" as potential approaches in managing resistant hypertension. It is recommended to consult medical guidelines, such as Hypertension Canada's 2020 Evidence Review and Guidelines for the Management of Resistant Hypertension, or speak with a healthcare professional for more information on the recommended treatment options for Resistant Hypertension.

Source [1] 	 File: [data/paper3.pdf] 	 Page: [4]
R
C
Conﬁrm diagnosis of true resistant hypertension
Figure 1. Diagnostic algorithm for a patient with suspected resistant hypertension. ABPM, ambulatory blood pressure monitoring; BP, blood
pressure; HT, hypertension. *Three or more drugs, at optimally tolerated dosages, and preferably including a diuretic.yHome BP monitoring can be
performed if ABPM is not ac

### Test 3: Clear the chat history

If the chat history is cleared, the model cannot know what "it" refers to. In others words, the model cannot know the antecedent of the anaphor.

In [11]:
memory.clear()
query = "How to treat it?"
search(query, conversation_qa_chain, vectorstore, wiki_qa_chain)

          Question
How to treat it?

          Answer
The treatment for "it" is not clear from the provided context. Please provide more specific information or clarify what condition or disease you are referring to.

Source [1] 	 File: [data/paper1.pdf] 	 Page: [12]
ed corticoste-roids.
127Symptomatic patients with ISHLT grade 2R or
patients with 3R cellular rejection should be treated with
intravenous pulsed corticosteroids. Patients with acute cellularrejection and hemodynamic compromise should be treated
aggressively with pulse intravenous corticosteroids and thy-
moglobulin. Repeat endomyocardial biopsy is usually per-formed 7-14 days after treatment.

Source [2] 	 File: [data/paper3.pdf] 	 Page: [4]
 pharmacotherapy
The ﬁrst-line drugs recommended for management of
hypertension are renin angiotensin system blockade (eitherangiotensin converting enzyme inhibitors or angiotensin re-ceptor blockers), dihydropyridine calcium channel blockers,
and thiazide-like diuretics with longer-a

### Test 4: Test the Wikipedia Retriever

"Bill Gates" does not exist in the Pinecone vector storage. Ask this question to trigger the Wikipedia Retriever.

In [12]:
query = "Who is Bill Gates?"
search(query, conversation_qa_chain, vectorstore, wiki_qa_chain)

          Question
Who is Bill Gates?

          Answer
Bill Gates is an American business magnate, software developer, and philanthropist. He co-founded Microsoft Corporation, one of the world's largest and most successful technology companies. Gates is known for his contributions to the personal computer revolution and his philanthropic efforts through the Bill & Melinda Gates Foundation.

Source [1] 	 File: [data/test.pdf] 	 Page: [9]
9

Source [2] 	 File: [data/test.pdf] 	 Page: [12]
inton. Grammar as a foreign language. In
Advances in Neural Information Processing Systems , 2015.
[38] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang
Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google’s neural machine
translation system: Bridging the gap between human and machine translation. arXiv preprint

Source [3] 	 File: [data/paper1.pdf] 	 Page: [17]
, et al. New horizons on the
50th anniversary of heart transplantation in Canada: “Where the