<a href="https://colab.research.google.com/github/DivyaShreeK-dev/sdc/blob/main/medical_using_pubmed.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install requests transformers bs4 sentence-transformers langchain faiss-cpu


Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl.metadata (411 bytes)
Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-tra

In [2]:
import requests
from bs4 import BeautifulSoup
from transformers import pipeline
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

# 1. Function to search PubMed and retrieve relevant articles
def search_pubmed(query, max_results=5):
    # PubMed API endpoint
    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
    params = {
        'db': 'pubmed',
        'term': query,
        'retmax': max_results,
        'retmode': 'xml',
        'sort': 'relevance',
    }

    response = requests.get(base_url, params=params)
    if response.status_code != 200:
        raise Exception("Error searching PubMed")

    # Parse the response and extract article IDs
    soup = BeautifulSoup(response.content, 'xml')
    article_ids = [id_tag.text for id_tag in soup.find_all('Id')]
    return article_ids

# 2. Function to fetch article details from PubMed
def fetch_article_details(article_ids):
    # PubMed article details API endpoint
    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
    ids = ','.join(article_ids)
    params = {
        'db': 'pubmed',
        'id': ids,
        'retmode': 'xml',
    }

    response = requests.get(base_url, params=params)
    if response.status_code != 200:
        raise Exception("Error fetching article details from PubMed")

    # Parse the response and extract article titles and abstracts
    soup = BeautifulSoup(response.content, 'xml')
    articles = []
    for article in soup.find_all('PubmedArticle'):
        title = article.find('ArticleTitle').text if article.find('ArticleTitle') else 'No title'
        abstract = article.find('AbstractText').text if article.find('AbstractText') else 'No abstract available'
        articles.append(f"Title: {title}\nAbstract: {abstract}")
    return articles

# 3. Function to create a medical Q&A chatbot
def medical_qa_chatbot(query):
    # Search PubMed for relevant articles
    print(f"🔍 Searching PubMed for: {query}")
    article_ids = search_pubmed(query, max_results=5)

    if not article_ids:
        return "Sorry, I couldn't find any relevant articles on PubMed."

    # Fetch article details (titles and abstracts)
    articles = fetch_article_details(article_ids)

    # Combine articles into one document for summarization
    documents = "\n\n".join(articles)

    # Use SentenceTransformer to create embeddings
    model = SentenceTransformer("all-MiniLM-L6-v2")
    embeddings = model.encode([documents])

    # Create FAISS index to store and retrieve the embeddings
    index = faiss.IndexFlatL2(embeddings.shape[1])  # L2 distance (Euclidean distance)
    index.add(np.array([embeddings[0]]).astype(np.float32))

    # Use HuggingFace model for question answering
    qa_model = pipeline("question-answering", model="deepset/roberta-base-squad2")

    # Retrieve relevant documents from FAISS index
    retrieved_text = documents  # In this case, we're directly using the full text

    # Use the question answering model to answer based on the retrieved text
    answer = qa_model(question=query, context=retrieved_text)

    return answer['answer']

# 4. Example usage
query = input("Enter your medical question: ")
response = medical_qa_chatbot(query)
print("\n🔑 Answer:", response)


Enter your medical question: symptoms of diabetes
🔍 Searching PubMed for: symptoms of diabetes


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Device set to use cpu



🔑 Answer: excretion of large amounts of hypotonic urine
