## QnA with Pubmed using ThirdAI's Playground

In this notebook, you will be able to

1. Download ThirdAI's BOLT LLM trained on Pubmed-800K and the processed data.

2. Ask any question and get relevant references from Pubmed.

3. (Optional) How to use your OpenAI key to generate grounded answers without hallucination.

In [None]:
!pip3 install thirdai==0.7.6
!pip3 install openai
!pip3 install paper-qa
!pip3 install langchain
!pip3 install json
!pip3 install transformers

In [None]:
from thirdai import bolt,licensing

import os
if "THIRDAI_KEY" in os.environ:
    licensing.activate(os.environ["THIRDAI_KEY"])
else:
    ## Please request for a trial license @ https://www.thirdai.com/try-bolt/
    licensing.activate("")  # Enter your ThirdAI key here
    
import json

### Load Model

In [None]:
#### Model Checkpoint
checkpoint = "pubmed_800k.bolt"
if not os.path.exists(checkpoint):
    os.system("wget -O pubmed_800k.bolt https://www.dropbox.com/s/kwoqt5c7bqbisbl/pubmed_800k.bolt?dl=0")

model = bolt.UniversalDeepTransformer.load(checkpoint)

### Load Dataset to display references

In [None]:
### Processed Data to show references
display_data = 'pubmed_800k_combined.json'
if not os.path.exists(display_data):
    os.system("wget -O pubmed_800k_combined.json https://www.dropbox.com/s/8phkx4fht9j2npy/pubmed_800k_combined.json?dl=0")

data_store = {}
with open(display_data, "r") as f:
    data = json.load(f)

for json_data in data:
    data_store[json_data["label"]] = json_data

### Get Answers from OpenAI using Langchain

In this section, we will show how to use LangChain and query OpenAI's QnA module to generate an answer from the references that you retrieve from the above DB. You'll have to specify your own OpenAI key for this module to work. You can replace this segment with any other generative model of your choice. You can choose to use an source model like MPT or Dolly for answer generation with the same prompt that you use with OpenAI.

In [None]:
import os
if "OPENAI_API_KEY" not in os.environ:
  os.environ["OPENAI_API_KEY"] = ""

In [None]:
from langchain.chat_models import ChatOpenAI
from paperqa.prompts import qa_prompt
from paperqa.chains import make_chain

llm = ChatOpenAI(
    model_name='gpt-3.5-turbo', 
    temperature=0.1,
)

qa_chain = make_chain(prompt=qa_prompt, llm=llm)

In [None]:
def get_references(query):
    search_results = db.search(query,top_k=3)
    references = []
    for result in search_results:
        references.append(result.text)
    return references

def get_answer(query, references):
    return qa_chain.run(question=query, context='\n\n'.join(references[:3]), answer_length="abt 50 words")

### Example Question 1

In [None]:
query = "what percentage of cancer patients have depression?"

references = get_references(query)
print(references)

In [None]:
answer = get_answer(query, references)

print(answer)

### Example Question 2

In [None]:
query = "How to detect depression in geriatric cancer patients ?"

references = get_references(query)
answer = get_answer(query, references)
print(answer)