

# Building a healthcare chatbot with the NVIDIA Haystack integration and PubMed


*notebook by Tilde Thurium:
 [Mastodon](https://tech.lgbt/@annthurium) || [Twitter](https://twitter.com/annthurium) || [LinkedIn](https://www.linkedin.com/in/annthurium/)*


**Prerequisites:**

*   [NVIDIA API Key](https://org.ngc.nvidia.com/setup)

## Installing the NVIDIA-Haystack extension

To start, let's install the latest release of `nvidia-haystack` with `pip`, as well as any other libraries we're going to need:

In [None]:
%%bash

!pip install nvidia-haystack==0.0.1
pip install pymed
pip install git+https://github.com/deepset-ai/haystack-core-integrations.git#subdirectory=integrations/nvidia

Get the credentials we need and set them as an environment variable.

In [None]:
import os
from google.colab import userdata
NVIDIA_API_KEY = userdata.get("NVIDIA_API_KEY")
os.environ['NVIDIA_API_KEY'] = NVIDIA_API_KEY

# PubMed Fetcher

PubMed is the best source of up to date medical research. Now we are going to write our own custom class to pull scientific papers from PubMed that are relevant to the query at hand.

The PubMed sdk basically just wraps the PubMed API so it's easier to query.


In [None]:
from pymed import PubMed
from typing import List
from haystack import component
from haystack.dataclasses import Document


pubmed = PubMed(tool="NvidiaHaystackDemo", email="tilde.thurium@deepset.ai")

def documentize(article):
  return Document(content=article.abstract, meta={'title': article.title, 'keywords': article.keywords, 'publication_date': article.publication_date})

@component
class PubMedFetcher():

  @component.output_types(articles=List[Document])
  def run(self, queries: list[str]):
    cleaned_queries = queries[0].strip().split('\n')

    articles = []
    try:
      for query in cleaned_queries:
        response = pubmed.query(query, max_results = 1)
        documents = [documentize(article) for article in response]
        articles.extend(documents)
    except Exception as e:
        print(e)
        print(f"Couldn't fetch articles for queries: {queries}" )
    results = {'articles': articles}
    return results

Instantiate two `NvidiaGenerator`s. One will convert the query into keywords, and the other will actually answer the query based on the documents provided.

In [None]:
from haystack_integrations.components.generators.nvidia import NvidiaGenerator, NvidiaGeneratorModel

llm = NvidiaGenerator(
    model=NvidiaGeneratorModel.NV_LLAMA2_RLHF_70B,
    model_arguments={
        "temperature": 0.2,
        "top_p": 0.7,
        "max_tokens": 1024,
        "seed": None,
        "bad": None,
        "stop": None,
    },
)
llm.warm_up()

keyword_llm = NvidiaGenerator(
    model=NvidiaGeneratorModel.NV_LLAMA2_RLHF_70B,
    model_arguments={
        "temperature": 0.2,
        "top_p": 0.7,
        "max_tokens": 1024,
        "seed": None,
        "bad": None,
        "stop": None,
    },
)
keyword_llm.warm_up()

Next, add the `PubmedFetcher` into a RAG pipeline.

In [None]:
from haystack import Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder

keyword_prompt_template = """
Your task is to convert the following question into 3 keywords that can be used to find relevant medical research papers on PubMed.
Here is an examples:
question: "What are the latest treatments for major depressive disorder?"
keywords:
Antidepressive Agents
Depressive Disorder, Major
Treatment-Resistant depression
---
question: {{ question }}
keywords:
"""

prompt_template = """
Answer the question truthfully based on the given documents.
If the documents don't contain an answer, say so.
cite the documents you used by mentioning their article title in the answer.
For example, begin your answer with ‘As stated in the article titled, ...’.

q: {{ question }}
Articles:
{% for article in articles %}
  {{article.content}}
  keywords: {{article.meta['keywords']}}
  title: {{article.meta['title']}}
{% endfor %}

"""
keyword_prompt_builder = PromptBuilder(template=keyword_prompt_template)

prompt_builder = PromptBuilder(template=prompt_template)
fetcher = PubMedFetcher()

pipe = Pipeline()

pipe.add_component("keyword_prompt_builder", keyword_prompt_builder)
pipe.add_component("keyword_llm", keyword_llm)
pipe.add_component("pubmed_fetcher", fetcher)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)

pipe.connect("keyword_prompt_builder.prompt", "keyword_llm.prompt")
pipe.connect("keyword_llm.replies", "pubmed_fetcher.queries")

pipe.connect("pubmed_fetcher.articles", "prompt_builder.articles")
pipe.connect("prompt_builder.prompt", "llm.prompt")



While we're at it, let's make an `ask` method to wrap our query fetching. This method makes it easy to pull the query response out of the results.

In [None]:
def ask(question):
  output = pipe.run(data={"keyword_prompt_builder":{"question":question},
                          "prompt_builder":{"question": question}})
  print(output['llm']['replies'])



Give it a try!

In [None]:
ask("How are mRNA vaccines being used for cancer treatment?")