<a href="https://colab.research.google.com/github/Mo7amed-Soliman/Chatbot-for-Cancer/blob/main/Oncologist_Assistant_Doctor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# **AI-Powered Oncologist ChatBot for Cancer Disease Support**
## **Overview**

Oncologist Assistant Doctor is an advanced AI chatbot designed to answer medical queries related to Cancer diseases. By leveraging Groq's language model (LLM) and a custom medical knowledge base, this system can provide accurate medical answers, powered by documents extracted from trusted medical websites.

This guide walks you through the implementation of the chatbot, integration with various APIs, and setting up a user-friendly interface using **Gradio** for real-time interaction.

# Full Code

In [2]:
!pip install ngrok
!pip install pyngrok nest_asyncio fastapi uvicorn
!pip -q install gradio langchain_groq langchain_community chromadb langdetect
!pip -q install  pyngrok flask_ngrok

Collecting ngrok
  Downloading ngrok-1.4.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Downloading ngrok-1.4.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ngrok
Successfully installed ngrok-1.4.0
Collecting pyngrok
  Downloading pyngrok-7.2.5-py3-none-any.whl.metadata (8.9 kB)
Collecting fastapi
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn
  Downloading uvicorn-0.34.2-py3-none-any.whl.metadata (6.5 kB)
Collecting starlette<0.47.0,>=0.40.0 (from fastapi)
  Downloading starlette-0.46.2-py3-none-any.whl.metadata (6.2 kB)
Downloading pyngrok-7.2.5-py3-none-any.whl (23 kB)
Downloading fastapi-0.115.12-py3-none-any.whl (95 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hDo

In [3]:
!ngrok config add-authtoken 2wFD0C31ZrnW5JPRgPqf9ZRbWHu_2N8zHXNSHzAQWwvrsFDq
### Must Change it from here https://dashboard.ngrok.com/get-started/setup/windows

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


In [10]:
from langchain_groq import ChatGroq
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
###########

## 3
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import json
import uvicorn
from pyngrok import ngrok
from fastapi.middleware.cors import CORSMiddleware
import nest_asyncio
#######

GROQ_API_KEY = "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" ### change from https://console.groq.com/keys



class  OncologistChatbot:
    """
    A chatbot designed to answer questions related to eye diseases using Groq's LLM
    and a custom medical knowledge base built from web content.
    """

    def __init__(self, groq_api_key):
        """
        Initializes the chatbot with necessary models and configurations.
        - Groq's LLM for language processing.
        - Hugging Face embedding model for document embeddings.
        - Recursive text splitter to divide large documents into smaller chunks.
        - ChromaDB for storing and querying processed documents.
        """
        # Initialize Groq model with the provided API key
        self.llm = ChatGroq(api_key=groq_api_key, model_name="meta-llama/llama-4-scout-17b-16e-instruct")

        # Initialize Hugging Face embedding model for document embedding
        self.embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

        # Initialize a text splitter to break large documents into chunks
        self.text_splitter = RecursiveCharacterTextSplitter(chunk_size=150, chunk_overlap=30)

        # Set the directory where ChromaDB will persist its data
        self.persist_directory = "chroma_db"

        # Process web content to create a vector database for document retrieval
        self.vector_db = self.process_web_content()

    def process_web_content(self):
        """
        Processes web content from a list of medical websites to build a knowledge base.
        - Loads content from predefined websites.
        - Splits documents into smaller chunks.
        - Creates a vector database (ChromaDB) for document similarity searches.
        """
        # List of trusted medical websites for content scraping
        medical_sites = [
            "https://www.webmd.com/",
            "https://www.mayoclinic.org/diseases-conditions/",
            "https://medlineplus.gov/",
            "https://www.healthline.com/health",
            "https://www.cdc.gov/diseasesconditions/",
        ]

        all_documents = []  # List to store all loaded documents

        # Iterate through each medical site and load documents
        for site in medical_sites:
            try:
                # Initialize a web loader for the site
                loader = WebBaseLoader(site)
                # Load documents from the site
                documents = loader.load()
                # Add loaded documents to the all_documents list
                all_documents.extend(documents)
            except Exception as e:
                print(f"⚠️ Failed to load data from {site}: {e}")

        # If no documents are loaded, raise an error
        if not all_documents:
            raise ValueError("❌ No data could be loaded from the websites.")

        # Split the documents into smaller chunks for easier processing
        chunks = self.text_splitter.split_documents(all_documents)

        # Create and return a Chroma vector database for document retrieval
        return Chroma.from_documents(chunks, self.embedding_model, persist_directory=self.persist_directory)

    def generate_answer(self, question, history):
        """
        Generates an answer to the user's question based on available medical knowledge and chat history.
        - Detects the language of the question.
        - Searches for relevant documents in the vector database.
        - Uses a prompt template to generate an answer based on the context.
        """
        from langdetect import detect  # Import language detection module

        # Detect the language of the user's question
        question_language = detect(question)

        # Retrieve similar documents from the vector database
        similar_docs = self.vector_db.similarity_search(question, k=2)

        # Combine the content of the similar documents to form context
        context = "\n".join([doc.page_content for doc in similar_docs]) if similar_docs else None

        # Format the chat history for use in the prompt template
        history_text = "\n".join([f"User: {q}\nBot: {a}" for q, a in history]) if history else "No prior history."

        # Define a prompt template for question answering
        qna_template = """You are an Oncologist specializing in Cancer.
You answer questions **in the same language as the input question only (Arabic or English) ** ({language}), based on your medical knowledge, the available context, and chat history.
- If no answer is available in the context, respond in the same language as the question:
**English:** "No answer is currently available."
**Arabic:** "لا يوجد إجابة متاحة حاليًا."
- Analyze the question medically before answering, relying on reliable scientific information.
- Keep your answers precise and to the point, avoiding unnecessary details.
- Provide additional advice if requested.
### Medical Context:
{context}

### Chat History:
{history}

### Question:
{question}

### Answer (in the same language  only (Arabic or English) {language}):"""


        # Create the prompt template object
        qna_prompt = PromptTemplate(
            template=qna_template,
            input_variables=['context', 'question', 'language', 'history']
        )

        # Load the QA chain using the Groq model and the defined prompt
        stuff_chain = load_qa_chain(self.llm, chain_type="stuff", prompt=qna_prompt)

        # If no context is found, provide a default message based on the question's language
        if not context:
            output = "No answer is currently available." if question_language == "en" else "لا يوجد إجابة متاحة حاليًا."
        else:
            # Generate the answer using the QA chain and context
            answer_generator = stuff_chain.stream({
                "input_documents": similar_docs,
                "question": question,
                "language": question_language,
                "history": history_text
            })

            # Accumulate the output from the answer generator
            output = ""
            for chunk in answer_generator:
                output += chunk["output_text"]

        # Format the output text based on the question's language (Arabic or English)

            return {
                 "answer": output,
                  "language": question_language
                 }



app = FastAPI()
origins = ["*"]

app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

## 4 and ngrok
class QuestionRequest(BaseModel):
    question: str

chatbot_instance = OncologistChatbot(groq_api_key=GROQ_API_KEY)
chat_history = []

@app.post("/ask")
async def ask_question(data: QuestionRequest):
    try:
        response = chatbot_instance.generate_answer(data.question, chat_history)
        chat_history.append((data.question, response["answer"]))
        return response

    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))

ngrok_tunnel = ngrok.connect(8000)
print('Public URL:', ngrok_tunnel.public_url)
nest_asyncio.apply()
uvicorn.run(app, port=8000)

ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-69' coro=<Server.serve() done, defined at /usr/local/lib/python3.11/dist-packages/uvicorn/server.py:68> exception=KeyboardInterrupt()>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/uvicorn/main.py", line 580, in run
    server.run()
  File "/usr/local/lib/python3.11/dist-packages/uvicorn/server.py", line 66, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/nest_asyncio.py", line 30, in run
    return loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/nest_asyncio.py", line 92, in run_until_complete
    self._run_once()
  File "/usr/local/lib/python3.11/dist-packages/nest_asyncio.py", line 133, in _run_once
    handle._run()
  File "/usr/lib/python3.11/asyncio/events.py", line 84, in _run
    s

Public URL: https://ce39-34-16-189-245.ngrok-free.app


INFO:     Started server process [758]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [758]
