# Medical Conversational Chatbot- G Version 4- LangDetect
This Pipeline answers questions strictly from a provided PDF or text file. It loads the document, counts tokens for query + document_text, and compresses the text via LLM summarisation if it exceeds the model’s context limit. An input guardrail blocks unsafe or disallowed queries before any model call. A strict prompt instructs the LLM to answer only from the document or reply “Not found in document.” The output guardrail then checks numeric grounding, token overlap, and entity presence to prevent hallucinations. Modular functions, logging, and error handling make it production‑ready, safe, and maintainable for enterprise‑level document‑based QA.


## Out of 71 Languages Provided, 52 Languages are LangDetect Supported.
### 51/52 Successfully Detected With High Confidence.

Afrikaans (af), Albanian (sq), Arabic (ar), Bengali (bn), Bulgarian (bg), Catalan (ca), Chinese (Simplified) (zh-cn), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Farsi / Persian (fa), Finnish (fi), French (fr), German (de), Greek (el), Gujarati (gu), Hebrew (he), Hindi (hi), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Kannada (kn), Korean (ko), Latvian (lv), Lithuanian (lt), Macedonian (mk), Malayalam (ml), Marathi (mr), Norwegian (no), Polish (pl), Portuguese (pt), Punjabi (pa), Romanian (ro), Russian (ru), Slovak (sk), Slovene (sl), Spanish (es), Swahili (sw), Swedish (sv), Tamil (ta), Telugu (te), Thai (th), Tagalog (tl), Turkish (tr), Ukrainian (uk), Urdu (ur), Vietnamese (vi).

### Not Detected- Chinese (Traditional)- "zh-tw"

In [1]:
import os
import sys
import re
import time
import json
import logging
from typing import Tuple, Optional, List, Deque, Dict
from collections import deque
import boto3
import tiktoken
from PyPDF2 import PdfReader
from langdetect import detect, detect_langs, DetectorFactory

## Logging Configuration

A logger is used to record messages about what your program is doing while it runs — like a running diary for your code.

In Our case, it’s configured to Capture events (info, warnings, errors) from the “Medical Chatbot G‑Version” pipeline, Format them with a timestamp, severity level, and message so they’re easy to read.Output them to the console in real time via StreamHandler.Avoid duplicates by checking if handlers already exist.

In [2]:
logger = logging.getLogger("Medical Chatbot G-Version")
if not logger.handlers:
    logger.setLevel(logging.INFO)
    ch = logging.StreamHandler(stream=sys.stdout)
    ch.setLevel(logging.INFO)
    formatter = logging.Formatter("[%(levelname)s] %(asctime)s - %(name)s - %(message)s")
    ch.setFormatter(formatter)
    logger.addHandler(ch)

## Configuration

In [3]:
AWS_REGION = os.environ.get("AWS_REGION", "us-east-1")
BEDROCK_MODEL_ID = os.environ.get("BEDROCK_MODEL_ID", "anthropic.claude-3-5-sonnet-20240620-v1:0")

MODEL_CONTEXT_TOKENS = int(os.environ.get("MODEL_CONTEXT_TOKENS", "128000"))  # Conservative default
DEFAULT_MAX_TOKENS = int(os.environ.get("MAX_TOKENS", "1024"))

# Language enforcement config
LANG_ENFORCEMENT_ENABLED = os.environ.get("LANG_ENFORCEMENT_ENABLED", "true").lower() == "true"
LANG_MIN_CHARS_FOR_DETECTION = int(os.environ.get("LANG_MIN_CHARS_FOR_DETECTION", "10"))
LANG_REPROMPT_MAX_ATTEMPTS = int(os.environ.get("LANG_REPROMPT_MAX_ATTEMPTS", "1"))

# Conversation history config
CONVO_HISTORY_ENABLED = os.environ.get("CONVO_HISTORY_ENABLED", "false").lower() == "true"
CONVO_MAX_TURNS = int(os.environ.get("CONVO_MAX_TURNS", "8"))  # rolling turn count cap
CONVO_MAX_TOKENS = int(os.environ.get("CONVO_MAX_TOKENS", "3000"))  # token budget for history
CONVO_INCLUDE_ASSISTANT = os.environ.get("CONVO_INCLUDE_ASSISTANT", "true").lower() == "true"
CONVO_POLICY_NOTE = (
    "Use past turns only to resolve pronouns, clarify references, or maintain continuity. "
    "Do not treat past assistant statements as authoritative facts. All factual content must be derived only "
    "from the provided Document section."
)

## Tokenization

The code defines a function to get a tokenizer from tiktoken, defaulting to "cl100k_base" for stable token counting. If initialization fails, it logs a warning and falls back. Another function uses the encoder to count tokens in any input string, ensuring consistent token budgeting for LLM usage.

In [4]:
# Tokenizer
# =========================
def get_encoder(model_name: str = "cl100k_base"):
    """
    Use a stable tokenizer for budgeting. cl100k_base approximates modern LLMs well.
    """
    try:
        return tiktoken.get_encoding(model_name)
    except Exception:
        logger.warning("Falling back to cl100k_base tokenizer.")
        return tiktoken.get_encoding("cl100k_base")

ENCODER = get_encoder("cl100k_base")

def count_tokens(text: str) -> int:
    """Count tokens in a string using tiktoken."""
    return len(ENCODER.encode(text or ""))

## Load the Document

In [5]:
# Document loading
# =========================
def load_document(file_path: str) -> str:
    """
    Load a document from a .pdf or .txt/.md and return text.
    Raises FileNotFoundError, ValueError, Exception.
    """
    if not os.path.exists(file_path):
        logger.error(f"File not found: {file_path}")
        raise FileNotFoundError(f"File not found: {file_path}")

    ext = os.path.splitext(file_path)[1].lower()
    try:
        if ext == ".pdf":
            logger.info(f"Loading PDF: {file_path}")
            reader = PdfReader(file_path)
            pages_text = []
            for i, page in enumerate(reader.pages):
                try:
                    page_text = page.extract_text() or ""
                except Exception as e:
                    logger.warning(f"Failed to extract text from page {i}: {e}")
                    page_text = ""
                pages_text.append(page_text)
            document_text = "\n".join(pages_text).strip()
        elif ext in [".txt", ".md"]:
            logger.info(f"Loading text file: {file_path}")
            with open(file_path, "r", encoding="utf-8") as f:
                document_text = f.read().strip()
        else:
            raise ValueError("Unsupported file type. Only .pdf and .txt/.md are supported.")
    except Exception as e:
        logger.exception("Unexpected error while loading document.")
        raise e

    if not document_text:
        raise ValueError("Document appears to be empty after extraction.")
    return document_text

## Token Budget Check

This function checks if your query plus document text will fit inside the model’s context window.

It works by Counting tokens in the query and document using count_tokens().Adding them to get total_tokens.Comparing total_tokens to budget_tokens (the model’s max context size) to set fits as True or False.Logging the result with details for debugging and monitoring.

Returning a tuple:

fits → whether it’s within budget

total_tokens → the actual combined token count

This ensures you don’t send more text than the model can handle, preventing truncation or errors.

In [6]:
# Token budgeting & chunking
# =========================
def check_token_budget(query: str, document_text: str, budget_tokens: int) -> Tuple[bool, int]:
    """
    Return whether query + document_text fits within the model's context window,
    and the total tokens used for that pair.
    """
    total_tokens = count_tokens(query) + count_tokens(document_text)
    fits = total_tokens <= budget_tokens
    logger.info(f"Token check: total={total_tokens}, budget={budget_tokens}, fits={fits}")
    return fits, total_tokens

In [7]:
def chunk_text(text: str, max_tokens_per_chunk: int, overlap: int = 100) -> List[str]:
    """
    Chunk text into token-bounded segments with overlap to preserve context for summarization.
    """
    tokens = ENCODER.encode(text)
    chunks = []
    start = 0
    max_len = max_tokens_per_chunk
    ov = min(overlap, max(0, max_len // 10))
    while start < len(tokens):
        end = min(start + max_len, len(tokens))
        chunk = ENCODER.decode(tokens[start:end])
        chunks.append(chunk)
        if end == len(tokens):
            break
        start = end - ov  # overlap
    return chunks

## Input Guardrail

In [8]:
# Input guardrail
# =========================
def apply_input_guardrail(query: str, document_text: str) -> Tuple[bool, Optional[str]]:
    """
    Validate input safety and relevance.
    """
    q_lower = query.lower()
    disallowed_patterns = [
        r"make\s+(a|an)\s+(bomb|weapon|explosive)",
        r"how\s+to\s+(manufacture|create)\s+(biological|chemical)\s+(weapon|agent)",
        r"bypass\s+(security|authentication)",
        r"exploit\s+(a|the)\s+vulnerability",
        r"harm\s+(someone|people)",
        r"kill\s+(someone|people)",
    ]
    for pat in disallowed_patterns:
        if re.search(pat, q_lower):
            logger.warning("Input guardrail: disallowed/harmful request detected.")
            return False, "Your request cannot be processed because it seeks unsafe or disallowed information."
    return True, None

## Output Guardrail

In [9]:
# Output guardrail helpers
# =========================
FALLBACK_MESSAGES: Dict[str, str] = {
    "af": "Nie in dokument gevind nie.",   # Afrikaans
    "sq": "Nuk u gjet në dokument.",       # Albanian
    "ar": "غير موجود في المستند.",        # Arabic
    "bn": "দস্তাবেজে পাওয়া যায়নি।",      # Bengali
    "bg": "Не е намерено в документа.",    # Bulgarian
    "ca": "No s'ha trobat al document.",   # Catalan
    "zh-cn": "文档中未找到。",              # Chinese (Simplified)
    "zh-tw": "文件中未找到。",              # Chinese (Traditional)
    "hr": "Nije pronađeno u dokumentu.",   # Croatian
    "cs": "V dokumentu nebylo nalezeno.",  # Czech
    "da": "Ikke fundet i dokumentet.",     # Danish
    "nl": "Niet gevonden in het document.",# Dutch / Flemish
    "en": "Not found in document.",        # English
    "et": "Dokumendist ei leitud.",        # Estonian
    "fa": "در سند یافت نشد.",              # Farsi / Persian
    "fi": "Ei löytynyt asiakirjasta.",     # Finnish
    "fr": "Non trouvé dans le document.",  # French
    "de": "Nicht im Dokument gefunden.",   # German
    "el": "Δεν βρέθηκε στο έγγραφο.",      # Greek
    "gu": "દસ્તાવેજમાં મળ્યું નથી.",       # Gujarati
    "he": "לא נמצא במסמך.",                # Hebrew
    "hi": "दस्तावेज़ में नहीं मिला।",       # Hindi
    "hu": "Nem található a dokumentumban.",# Hungarian
    "id": "Tidak ditemukan dalam dokumen.",# Indonesian / Malay
    "it": "Non trovato nel documento.",    # Italian
    "ja": "文書内に見つかりませんでした。", # Japanese
    "kn": "ದಾಖಲೆಯಲ್ಲಿಲ್ಲ.",                # Kannada
    "ko": "문서에서 찾을 수 없습니다.",      # Korean
    "lv": "Dokumentā nav atrasts.",        # Latvian
    "lt": "Dokumente nerasta.",            # Lithuanian
    "mk": "Не е најдено во документот.",   # Macedonian
    "ml": "രേഖയിൽ കണ്ടെത്തിയില്ല.",        # Malayalam
    "mr": "दस्तऐवजात सापडले नाही.",        # Marathi
    "no": "Ikke funnet i dokumentet.",     # Norwegian
    "pl": "Nie znaleziono w dokumencie.",  # Polish
    "pt": "Não encontrado no documento.",  # Portuguese
    "pa": "ਦਸਤਾਵੇਜ਼ ਵਿੱਚ ਨਹੀਂ ਮਿਲਿਆ।",      # Punjabi
    "ro": "Nu a fost găsit în document.",  # Romanian
    "ru": "Не найдено в документе.",       # Russian
    "sk": "V dokumente sa nenašlo.",       # Slovak
    "sl": "Ni najdeno v dokumentu.",       # Slovene
    "es": "No se encontró en el documento.",# Spanish
    "sw": "Haikupatikana kwenye hati.",    # Swahili
    "sv": "Hittades inte i dokumentet.",   # Swedish
    "ta": "ஆவணத்தில் கிடைக்கவில்லை.",       # Tamil
    "te": "పత్రంలో కనబడలేదు.",              # Telugu
    "th": "ไม่พบในเอกสาร.",                # Thai
    "tl": "Hindi natagpuan sa dokumento.", # Tagalog
    "tr": "Belgede bulunamadı.",           # Turkish
    "uk": "Не знайдено в документі.",      # Ukrainian
    "ur": "دستاویز میں نہیں ملا۔",          # Urdu
    "vi": "Không tìm thấy trong tài liệu." # Vietnamese
}

In [10]:
def apply_output_guardrail(
    answer: str,
    document_text: str,
    expected_lang: str,
    allow_summary_mode: bool = False
) -> str:
    """
    Guardrail to ensure the model's answer is grounded in the document.
    If grounding fails, returns a localized 'Not found in document' message
    based on the expected language.
    """
    grounded = True

    # 1) Empty or trivial answer
    if not answer or len(answer.strip()) < 3:
        grounded = False

    # 2) If not in summary mode, require some token overlap
    if grounded and not allow_summary_mode:
        overlap = len(set(answer.lower().split()) & set(document_text.lower().split()))
        if overlap < 1:
            grounded = False
            logger.warning("Output guardrail: insufficient overlap with document.")

    # 3) Optional numeric grounding check: ensure numbers in answer appear in document
    if grounded:
        nums_in_answer = re.findall(r"\d+", answer)
        for n in nums_in_answer:
            if n not in document_text:
                grounded = False
                logger.warning(f"Output guardrail: numeric value '{n}' not grounded in document.")
                break

    if not grounded:
        return FALLBACK_MESSAGES.get(expected_lang.lower(), FALLBACK_MESSAGES["en"])
    return answer

## Language Enforcer
This class manages language control by detecting input, enforcing output language in prompts, checking if the model’s response matches expectations, retrying when mismatched, and offering predefined fallback responses.

In [11]:
# Language enforcer
# =========================
class LanguageEnforcer:
    """
    Modular language detection and enforcement.
    - Detects input language from the user's query.
    - Injects strict instruction into the prompt to mirror that language.
    - Verifies output language and optionally re-prompts once.
    - Logs only language codes and lengths for audit safety.
    """
    LANG_NAMES = {
        "af": "Afrikaans",
        "sq": "Albanian",
        "ar": "Arabic",
        "bn": "Bengali",
        "bg": "Bulgarian",
        "ca": "Catalan",
        "zh-cn": "Chinese (Simplified)",
        "zh-tw": "Chinese (Traditional)",
        "hr": "Croatian",
        "cs": "Czech",
        "da": "Danish",
        "nl": "Dutch",
        "en": "English",
        "et": "Estonian",
        "fa": "Farsi / Persian",
        "fi": "Finnish",
        "fr": "French",
        "de": "German",
        "el": "Greek",
        "gu": "Gujarati",
        "he": "Hebrew",
        "hi": "Hindi",
        "hu": "Hungarian",
        "id": "Indonesian",
        "it": "Italian",
        "ja": "Japanese",
        "kn": "Kannada",
        "ko": "Korean",
        "lv": "Latvian",
        "lt": "Lithuanian",
        "mk": "Macedonian",
        "ml": "Malayalam",
        "mr": "Marathi",
        "no": "Norwegian",
        "pl": "Polish",
        "pt": "Portuguese",
        "pa": "Punjabi",
        "ro": "Romanian",
        "ru": "Russian",
        "sk": "Slovak",
        "sl": "Slovene",
        "es": "Spanish",
        "sw": "Swahili",
        "sv": "Swedish",
        "ta": "Tamil",
        "te": "Telugu",
        "th": "Thai",
        "tl": "Tagalog",
        "tr": "Turkish",
        "uk": "Ukrainian",
        "ur": "Urdu",
        "vi": "Vietnamese",
    }
    NORMALIZE = {
        "zh": "zh-cn",
        "zh-cn": "zh-cn",
        "zh-tw": "zh-tw",
    }

    def __init__(self, enabled: bool = True, min_chars: int = 15, max_reprompts: int = 1):
        self.enabled = enabled
        self.min_chars = min_chars
        self.max_reprompts = max_reprompts
        try:
            from langdetect import detect, detect_langs, DetectorFactory  # type: ignore
            DetectorFactory.seed = 0
            self.has_langdetect = True
            self._detect = detect
            self._detect_langs = detect_langs
        except Exception:
            self.has_langdetect = False
            self._detect = None
            self._detect_langs = None

    def _fallback_detect(self, text: str) -> Tuple[str, float]:
        """Very conservative heuristic fallback."""
        if re.search(r"[\u0900-\u097F]", text):
            return "hi", 0.8
        if re.search(r"[\u4e00-\u9fff]", text):  # CJK Unified Ideographs
            return "zh-cn", 0.8
        return "en", 0.5

    def detect_language(self, text: str) -> Tuple[str, float]:
        """Return (language_code, confidence)."""
        if not self.enabled:
            return "en", 1.0
        safe_len = len(text or "")
        if not self.has_langdetect or safe_len < self.min_chars:
            code, conf = self._fallback_detect(text or "")
            logger.info(f"Language detection (fallback): code={code}, conf≈{conf:.2f}, length={safe_len}")
            return code, conf
        try:
            langs = self._detect_langs(text or "")
            if not langs:
                return self._fallback_detect(text or "")
            top = sorted(langs, key=lambda x: x.prob, reverse=True)[0]
            code = self.NORMALIZE.get(top.lang.lower(), top.lang.lower())
            return code, float(top.prob)
        except Exception as e:
            logger.warning(f"Language detection failed, reverting to fallback. err={e}")
            return self._fallback_detect(text or "")

    def language_name(self, code: str) -> str:
        return self.LANG_NAMES.get(code, code)

    def augment_prompt_with_language(self, messages: List[dict], lang_code: str) -> List[dict]:
        """Add system instruction enforcing output language."""
        if not self.enabled:
            return messages
        lang_name = self.language_name(lang_code)
        enforcement = (
            f"Respond only in {lang_name} ({lang_code}). "
            f"Do not use any other language. "
            f"Maintain tone and terminology appropriate for clinical QA in {lang_name}."
        )
        new_messages: List[dict] = []
        inserted = False
        for m in messages:
            if m.get("role") == "system":
                combined = (m.get("content", "") or "").strip()
                combined = (enforcement + "\n" + combined).strip()
                new_messages.append({"role": "system", "content": combined})
                inserted = True
            else:
                new_messages.append(m)
        if not inserted:
            new_messages.insert(0, {"role": "system", "content": enforcement})
        return new_messages

    def _detect_answer_lang(self, answer: str) -> Optional[str]:
        """Detect language of the model's answer."""
        if not answer or len(answer) < 3:
            return None
        if not self.has_langdetect:
            return self._fallback_detect(answer)[0]
        try:
            return self._detect(answer)
        except Exception:
            return None

    def verify_language(self, expected_lang: str, answer: str) -> bool:
        """Verify that the output language matches expected language."""
        if not self.enabled:
            return True
        if len(answer or "") < self.min_chars // 2:
            return True
        detected = self._detect_answer_lang(answer)
        if not detected:
            return True
        detected = self.NORMALIZE.get(detected.lower(), detected.lower())
        match = (detected == expected_lang.lower())
        if not match:
            logger.warning(f"Language mismatch: expected={expected_lang}, detected={detected}")
        else:
            logger.info(f"Language verify OK: expected={expected_lang}, detected={detected}")
        return match

    def safe_fallback_message(self, expected_lang: str) -> str:
        """Provide a minimal, safe fallback message in the expected language."""
        fallback_map = {
            "af": "Jammer, ek kon nie die antwoord in jou taal verseker nie. Probeer asseblief weer.",
            "sq": "Na vjen keq, nuk munda të siguroj përgjigjen në gjuhën tuaj. Ju lutem provoni përsëri.",
            "ar": "عذرًا، لم أتمكن من ضمان الرد بلغتك. يرجى المحاولة مرة أخرى.",
            "bn": "দুঃখিত, আমি আপনার ভাষায় উত্তর নিশ্চিত করতে পারিনি। অনুগ্রহ করে আবার চেষ্টা করুন।",
            "bg": "Съжалявам, не успях да осигуря отговор на вашия език. Моля, опитайте отново.",
            "ca": "Ho sento, no he pogut assegurar la resposta en el teu idioma. Si us plau, torna-ho a provar.",
            "zh-cn": "抱歉，我无法确保用您的语言回答。请再试一次。",
            "zh-tw": "抱歉，我無法確保用您的語言回答。請再試一次。",
            "hr": "Žao mi je, nisam mogao osigurati odgovor na vašem jeziku. Molimo pokušajte ponovno.",
            "cs": "Omlouvám se, nepodařilo se mi zajistit odpověď ve vašem jazyku. Zkuste to prosím znovu.",
            "da": "Beklager, jeg kunne ikke sikre svaret på dit sprog. Prøv venligst igen.",
            "nl": "Sorry, ik kon het antwoord in uw taal niet garanderen. Probeer het alstublieft opnieuw.",
            "en": "Sorry, I could not ensure the response in your language. Please try again.",
            "et": "Vabandust, ma ei saanud vastust teie keeles tagada. Palun proovige uuesti.",
            "fa": "متأسفم، نتوانستم پاسخ را به زبان شما تضمین کنم. لطفاً دوباره تلاش کنید.",
            "fi": "Anteeksi, en voinut varmistaa vastausta kielelläsi. Yritä uudelleen.",
            "fr": "Désolé, je n’ai pas pu garantir la réponse dans votre langue. Veuillez réessayer.",
            "de": "Entschuldigung, ich konnte die Antwort in Ihrer Sprache nicht sicherstellen. Bitte versuchen Sie es erneut.",
            "el": "Συγγνώμη, δεν μπόρεσα να διασφαλίσω την απάντηση στη γλώσσα σας. Παρακαλώ δοκιμάστε ξανά.",
            "gu": "માફ કરશો, હું તમારી ભાષામાં જવાબ સુનિશ્ચિત કરી શક્યો નથી. કૃપા કરીને ફરી પ્રયાસ કરો.",
            "he": "מצטער, לא הצלחתי להבטיח את התשובה בשפתך. אנא נסה שוב.",
            "hi": "क्षमा करें, मैं आपकी भाषा में उत्तर सुनिश्चित नहीं कर सका। कृपया पुनः प्रयास करें।",
            "hu": "Sajnálom, nem tudtam biztosítani a választ az Ön nyelvén. Kérjük, próbálja újra.",
            "id": "Maaf, saya tidak dapat memastikan jawaban dalam bahasa Anda. Silakan coba lagi.",
            "it": "Spiacente, non sono riuscito a garantire la risposta nella tua lingua. Per favore riprova.",
            "ja": "申し訳ありませんが、ご希望の言語で回答できませんでした。もう一度お試しください。",
            "kn": "ಕ್ಷಮಿಸಿ, ನಾನು ನಿಮ್ಮ ಭಾಷೆಯಲ್ಲಿ ಉತ್ತರವನ್ನು ಖಚಿತಪಡಿಸಲು ಸಾಧ್ಯವಾಗಲಿಲ್ಲ. ದಯವಿಟ್ಟು ಮತ್ತೆ ಪ್ರಯತ್ನಿಸಿ.",
            "ko": "죄송합니다. 귀하의 언어로 응답을 보장할 수 없습니다. 다시 시도해 주세요.",
            "lv": "Atvainojiet, es nevarēju nodrošināt atbildi jūsu valodā. Lūdzu, mēģiniet vēlreiz.",
            "lt": "Atsiprašau, negalėjau užtikrinti atsakymo jūsų kalba. Bandykite dar kartą.",
            "mk": "Жалам, не можев да го обезбедам одговорот на вашиот јазик. Ве молиме обидете се повторно.",
            "ml": "ക്ഷമിക്കണം, നിങ്ങളുടെ ഭാഷയിൽ മറുപടി ഉറപ്പാക്കാൻ കഴിഞ്ഞില്ല. ദയവായി വീണ്ടും ശ്രമിക്കുക.",
            "mr": "क्षमस्व, मी तुमच्या भाषेत उत्तर सुनिश्चित करू शकलो नाही. कृपया पुन्हा प्रयत्न करा.",
            "no": "Beklager, jeg kunne ikke sikre svaret på språket ditt. Vennligst prøv igjen.",
            "pl": "Przepraszam, nie mogłem zapewnić odpowiedzi w twoim języku. Spróbuj ponownie.",
            "pt": "Desculpe, não consegui garantir a resposta no seu idioma. Por favor, tente novamente.",
            "pa": "ਮਾਫ਼ ਕਰਨਾ, ਮੈਂ ਤੁਹਾਡੀ ਭਾਸ਼ਾ ਵਿੱਚ ਜਵਾਬ ਯਕੀਨੀ ਨਹੀਂ ਬਣਾ ਸਕਿਆ। ਕਿਰਪਾ ਕਰਕੇ ਦੁਬਾਰਾ ਕੋਸ਼ਿਸ਼ ਕਰੋ।",
            "ro": "Ne pare rău, nu am putut asigura răspunsul în limba dvs. Vă rugăm să încercați din nou.",
            "ru": "Извините, я не смог обеспечить ответ на вашем языке. Пожалуйста, попробуйте снова.",
            "sk": "Ospravedlňujem sa, nepodarilo sa mi zabezpečiť odpoveď vo vašom jazyku. Skúste to znova.",
            "sl": "Opravičujem se, nisem mogel zagotoviti odgovora v vašem jeziku. Prosimo, poskusite znova.",
            "es": "Lo siento, no pude asegurar la respuesta en tu idioma. Por favor, inténtalo de nuevo.",
            "sw": "Samahani, sikuweza kuhakikisha jibu kwa lugha yako. Tafadhali jaribu tena.",
            "sv": "Tyvärr, jag kunde inte säkerställa svaret på ditt språk. Försök igen.",
            "ta": "மன்னிக்கவும், உங்கள் மொழியில் பதிலை உறுதிப்படுத்த முடியவில்லை. தயவுசெய்து மீண்டும் முயற்சிக்கவும்.",
            "te": "క్షమించండి, మీ భాషలో సమాధానాన్ని నిర్ధారించలేకపోయాను. దయచేసి మళ్లీ ప్రయత్నించండి.",
            "th": "ขออภัย ฉันไม่สามารถยืนยันคำตอบในภาษาของคุณได้ โปรดลองอีกครั้ง.",
            "tl": "Paumanhin, hindi ko masiguro ang tugon sa iyong wika. Pakisubukang muli.",
            "tr": "Üzgünüm, yanıtı dilinizde sağlayamadım. Lütfen tekrar deneyin.",
            "uk": "Вибачте, я не зміг забезпечити відповідь вашою мовою. Будь ласка, спробуйте ще раз.",
            "ur": "معاف کیجیے، میں آپ کی زبان میں جواب کو یقینی نہیں بنا सका۔ براہ کرم دوبارہ کوشش کریں۔",
            "vi": "Xin lỗi, tôi không thể đảm bảo câu trả lời bằng ngôn ngữ của bạn. Vui lòng thử lại.",
        }
        return fallback_map.get(expected_lang.lower(), fallback_map["en"])

LANG = LanguageEnforcer(
    enabled=LANG_ENFORCEMENT_ENABLED,
    min_chars=LANG_MIN_CHARS_FOR_DETECTION,
    max_reprompts=LANG_REPROMPT_MAX_ATTEMPTS,
)

## Conversation history manager

In [12]:
# Conversation history manager (modular, compliant)
# =========================
class ConversationTurn:
    def __init__(self, role: str, content: str):
        self.role = role  # "user" or "assistant"
        self.content = content

class ConversationManager:
    """
    Minimal in-memory rolling buffer for conversation history.
    - Stores last N turns (configurable).
    - Caps token usage for history injection (CONVO_MAX_TOKENS).
    - Optionally includes assistant messages.
    - Avoids logging message contents; logs only counts and sizes.
    """
    def __init__(
        self,
        max_turns: int = CONVO_MAX_TURNS,
        max_tokens: int = CONVO_MAX_TOKENS,
        include_assistant: bool = CONVO_INCLUDE_ASSISTANT,
    ):
        self.max_turns = max_turns
        self.max_tokens = max_tokens
        self.include_assistant = include_assistant
        self._store: Dict[str, Deque[ConversationTurn]] = {}

    def _get_session(self, session_id: str) -> Deque[ConversationTurn]:
        if session_id not in self._store:
            self._store[session_id] = deque(maxlen=self.max_turns)
        return self._store[session_id]

    def add_turn(self, session_id: str, role: str, content: str) -> None:
        if not CONVO_HISTORY_ENABLED:
            return
        dq = self._get_session(session_id)
        dq.append(ConversationTurn(role=role, content=content))
        logger.info(f"Conversation stored: sid={session_id}, turns={len(dq)}")

    def build_history_messages(self, session_id: str) -> List[dict]:
        """
        Returns a list of messages appropriate for Anthropic 'messages' API, constrained by token budget.
        Includes only user turns or both user+assistant based on include_assistant.
        """
        if not CONVO_HISTORY_ENABLED:
            return []

        dq = self._get_session(session_id)
        # Build candidate turns respecting include_assistant
        turns = []
        for t in dq:
            if t.role == "assistant" and not self.include_assistant:
                continue
            turns.append({"role": t.role, "content": t.content})

        # Trim by token budget, from the end (most recent) backward
        total = 0
        kept: List[dict] = []
        for m in reversed(turns):
            tkn = count_tokens(m["content"])
            if total + tkn > self.max_tokens:
                break
            kept.append(m)
            total += tkn
        kept.reverse()
        logger.info(f"History prepared: sid={session_id}, injected_turns={len(kept)}, tokens≈{total}")
        return kept

CONVO = ConversationManager()

## Prompt building

In [13]:
# Prompt building
# =========================
def build_prompt(
    query: str,
    document_text: str,
    lang_code: Optional[str] = None,
    history_messages: Optional[List[dict]] = None
) -> List[dict]:
    """
    Build strict prompt instructing the model to use only the provided document text.
    If answer not in document, it must reply exactly: 'Not found in document.'
    Optionally enforces response language via system instruction and includes conversation history.
    """
    system = (
        "You are an expert clinical QA assistant. Follow instructions exactly.\n"
        f"{CONVO_POLICY_NOTE}"
    )

    # History section (optional, before the current user turn)
    history_block = ""
    if history_messages:
        # We insert an assistant-readable block summarizing that previous turns exist;
        # we will add them as actual messages in the payload rather than concatenating into the user content.
        history_block = "Prior conversation context is provided above (for continuity only).\n"

    user = (
        f"{history_block}"
        "You must answer using ONLY the following document.\n"
        "If the answer is not in the document, reply exactly: 'Not found in document.'\n\n"
        "Document:\n"
        "-----\n"
        f"{document_text}\n"
        "-----\n\n"
        f"Question: {query}\n"
        "Answer:"
    )

    base_messages: List[dict] = [{"role": "system", "content": system}]

    # If history exists, append it as separate messages before the current user turn
    if history_messages:
        # Anthropic format requires content blocks; we keep it simple with a single text block per message
        for hm in history_messages:
            base_messages.append({
                "role": hm["role"],
                "content": hm["content"]
            })

    base_messages.append({"role": "user", "content": user})

    # Language enforcement
    if lang_code:
        base_messages = LANG.augment_prompt_with_language(base_messages, lang_code)
    return base_messages

## Bedrock Anthropic adapter

In [14]:
# Bedrock Anthropic adapter
# =========================
class BedrockAnthropicClient:
    """
    Minimal adapter for Anthropic Claude models on Amazon Bedrock (messages API).
    """
    def __init__(self, model_id: str, region: str):
        self.model_id = model_id
        self.client = boto3.client("bedrock-runtime", region_name=region)

    def chat(self, messages: List[dict], temperature: float = 0.0, max_tokens: int = DEFAULT_MAX_TOKENS) -> str:
        """
        Send a chat-style request to Anthropic Claude via Bedrock.
        - messages: list like [{"role": "system"/"user"/"assistant", "content": "..."}]
        - Returns assistant text content.
        """
        # Separate system prompt and other messages into Anthropic "messages"
        system_prompt = ""
        convo: List[dict] = []
        for m in messages:
            role = m.get("role")
            content = m.get("content", "")
            if role == "system":
                system_prompt += (content + "\n").strip() + "\n"
            elif role in ("user", "assistant"):
                convo.append({"role": role, "content": [{"type": "text", "text": content}]})

        body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "temperature": temperature,
            "system": system_prompt.strip() if system_prompt else None,
            "messages": convo if convo else [{"role": "user", "content": [{"type": "text", "text": ""}]}],
        }
        body = {k: v for k, v in body.items() if v is not None}

        response = self.client.invoke_model(
            modelId=self.model_id,
            body=json.dumps(body),
        )
        payload = response["body"].read()
        data = json.loads(payload)

        try:
            content_blocks = data.get("content", [])
            if content_blocks and content_blocks[0].get("type") == "text":
                return content_blocks[0].get("text", "").strip()
            texts = [b.get("text", "") for b in content_blocks if b.get("type") == "text"]
            return "\n".join(t for t in texts if t).strip()
        except Exception:
            logger.exception("Unexpected Bedrock response format.")
            return ""

bedrock_llm = BedrockAnthropicClient(model_id=BEDROCK_MODEL_ID, region=AWS_REGION)

## LLM Query Execution
`query_llm` sends messages to the LLM, retries on transient errors, logs failures, and returns the model’s response. It limits retries, delays between attempts, and raises exceptions after exceeding the maximum.

In [15]:
def query_llm(
    messages: List[dict],
    temperature: float = 0.0,
    max_retries: int = 2,
    retry_delay: float = 1.0,
    max_tokens: int = DEFAULT_MAX_TOKENS
) -> str:
    """
    Query the LLM with retries on transient errors (Bedrock).
    """
    attempt = 0
    while True:
        try:
            answer = bedrock_llm.chat(messages, temperature=temperature, max_tokens=max_tokens)
            return answer.strip()
        except Exception as e:
            attempt += 1
            logger.warning(f"LLM query failed (attempt {attempt}): {e}")
            if attempt > max_retries:
                logger.exception("Exceeded max retries for LLM call.")
                raise
            time.sleep(retry_delay)

## LLM Summarize
`_llm_summarize` summarizes medical text using Bedrock LLM. It builds a strict prompt, queries the model for concise factual summaries, and falls back to truncation if LLM fails.

In [16]:
# Summarization for compression
# =========================
def _llm_summarize(text: str, max_words: int = 250) -> str:
    """
    Summarize text using Bedrock LLM into a concise, factual summary.
    """
    prompt = (
        "You are a precise summarizer. Create a concise, faithful summary capturing key facts, "
        f"definitions, indications, contraindications, doses, adverse effects, and monitoring steps in <= {max_words} words. "
        "Do not invent information. Only use the provided text.\n\n"
        f"Text:\n{text}\n\nSummary:"
    )
    messages = [
        {"role": "system", "content": "You are a careful, faithful medical summarizer."},
        {"role": "user", "content": prompt},
    ]
    try:
        resp = query_llm(messages, temperature=0.0, max_tokens=DEFAULT_MAX_TOKENS)
        return (resp or "").strip()
    except Exception:
        logger.exception("LLM summarization failed.")
        words = re.split(r"\s+", text or "")
        return " ".join(words[:max_words])

## Compress Document

This `compress_document` function compresses a document to fit within a token budget by chunking text, summarizing each chunk with LLM, then iteratively re-summarizing until query + document stay under limit.

In [17]:
def compress_document(document_text: str, query: str, budget_tokens: int) -> str:
    """
    Compress the document so that query + compressed_document fits within the budget.
    Strategy:
    - Chunk the document to safe sizes.
    - Summarize each chunk.
    - Iteratively reduce until within budget.
    """
    reserved_for_prompt = 1000
    per_chunk_limit = 4000  # tokenizer tokens heuristic
    chunk_tokens_limit = min(per_chunk_limit, max(1000, (budget_tokens - reserved_for_prompt) // 4))
    chunks = chunk_text(document_text, chunk_tokens_limit)
    logger.info(f"Compress: initial chunks={len(chunks)}, tokens per chunk≈{chunk_tokens_limit}")

    summaries = []
    for i, ch in enumerate(chunks):
        logger.info(f"Summarizing chunk {i+1}/{len(chunks)}")
        summaries.append(_llm_summarize(ch, max_words=250))
    combined = "\n".join(summaries)

    fits, total = check_token_budget(query, combined, budget_tokens)
    iteration = 0
    while not fits and iteration < 3:
        iteration += 1
        logger.info(f"Re-summarizing (iteration {iteration}) because tokens={total} exceed budget={budget_tokens}")
        combined = _llm_summarize(combined, max_words=200)
        fits, total = check_token_budget(query, combined, budget_tokens)
    return combined

## Orchestration

In [28]:
# Orchestrator
# =========================
def run_direct_document_qa(
    file_path: str,
    query: str,
    model_context_tokens: int = MODEL_CONTEXT_TOKENS,
    session_id: Optional[str] = None
) -> str:
    """
    Orchestrates the full Direct-Document QA workflow on Bedrock.
    Steps:
    1) Detect language (with confidence guard)
    2) Load document
    3) Token budget check; compress if needed
    4) Input guardrail
    5) Build prompt (with language enforcement + optional conversation history)
    6) Query LLM (+ language verification and optional re-prompt)
    7) Output guardrail (grounding)
    8) Store turn in conversation history (if enabled)
    """

    # 1) Language detect
    expected_lang, lang_conf = LANG.detect_language(query)
    lang_name = LANG.language_name(expected_lang) 
    logger.info(f"Query language: code={expected_lang} ({lang_name}), conf≈{lang_conf:.2f}")
    sid = session_id or "default-session"

    # NEW: enforce confidence threshold
    LANG_CONF_THRESHOLD = 0.8  # configurable
    if lang_conf < LANG_CONF_THRESHOLD:
        msg = "I am unable to recognize your language with sufficient confidence."
        logger.warning(f"Low-confidence language detection: {expected_lang}, conf≈{lang_conf:.2f}")

        if CONVO_HISTORY_ENABLED:
            CONVO.add_turn(sid, "user", query)
            CONVO.add_turn(sid, "assistant", msg)

        return msg

    # 2) Load
    document_text = load_document(file_path)

    # 3) Budget
    fits, total_tokens = check_token_budget(query, document_text, model_context_tokens)
    if not fits:
        logger.info("Query + document exceeds context. Compressing document.")
        document_text = compress_document(document_text, query, model_context_tokens)
        fits, _ = check_token_budget(query, document_text, model_context_tokens)
        if not fits:
            logger.warning("After compression, content still too large. Applying hard truncation.")
            doc_tokens = ENCODER.encode(document_text)
            keep = max(1000, model_context_tokens // 2)
            document_text = ENCODER.decode(doc_tokens[-keep:])

    # 4) Input guardrail
    allowed, message = apply_input_guardrail(query, document_text)
    if not allowed:
        if CONVO_HISTORY_ENABLED:
            CONVO.add_turn(sid, "user", query)
            CONVO.add_turn(sid, "assistant", message or "Your request cannot be processed at this time.")
        return message or "Your request cannot be processed at this time."

    # Prepare conversation history
    history_msgs = CONVO.build_history_messages(sid) if CONVO_HISTORY_ENABLED else []

    # 5) Prompt (enforce output language + include history)
    messages = build_prompt(query, document_text, lang_code=expected_lang, history_messages=history_msgs)

    # 6) Execute (+ language verify and optional re-prompt)
    raw_answer = query_llm(messages, temperature=0.0, max_retries=2,
                           retry_delay=1.0, max_tokens=DEFAULT_MAX_TOKENS)

    attempts = 0
    while LANG.enabled and not LANG.verify_language(expected_lang, raw_answer) and attempts < LANG.max_reprompts:
        attempts += 1
        logger.info(f"Re-prompting due to language mismatch (attempt {attempts}/{LANG.max_reprompts}).")
        corrective_system = (
            f"CRITICAL: The previous response was not in {LANG.language_name(expected_lang)} "
            f"({expected_lang}). Respond strictly in {LANG.language_name(expected_lang)} only."
        )
        corrective_messages = [{"role": "system", "content": corrective_system}] + messages
        raw_answer = query_llm(corrective_messages, temperature=0.0, max_retries=2,
                               retry_delay=1.0, max_tokens=DEFAULT_MAX_TOKENS)

    if LANG.enabled and not LANG.verify_language(expected_lang, raw_answer):
        final_answer = LANG.safe_fallback_message(expected_lang)
        if CONVO_HISTORY_ENABLED:
            CONVO.add_turn(sid, "user", query)
            CONVO.add_turn(sid, "assistant", final_answer)
        return final_answer

    # 7) Output guardrail (grounding)
    summary_keywords = ["summary", "summarise", "résumé", "résumez", "सारांश", "सार"]
    is_summary_query = any(kw in query.lower() for kw in summary_keywords)
    doc_lang, _ = LANG.detect_language(document_text)
    is_crosslingual = (expected_lang != doc_lang)

    final_answer = apply_output_guardrail(
        raw_answer,
        document_text,
        expected_lang=expected_lang,
        allow_summary_mode=is_summary_query or is_crosslingual
    )

    # 8) Store turn in conversation history (if enabled)
    if CONVO_HISTORY_ENABLED:
        CONVO.add_turn(sid, "user", query)
        CONVO.add_turn(sid, "assistant", final_answer)

    return final_answer

## Test Query- English

In [29]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Summarise the entire document in 200 words" #English
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:48:44,710 - Medical Chatbot G-Version - Query language: code=en (English), conf≈1.00
[INFO] 2025-09-22 08:48:44,712 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:48:44,750 - Medical Chatbot G-Version - Token check: total=390, budget=128000, fits=True
[INFO] 2025-09-22 08:48:50,399 - Medical Chatbot G-Version - Language verify OK: expected=en, detected=en
[INFO] 2025-09-22 08:48:50,405 - Medical Chatbot G-Version - Language verify OK: expected=en, detected=en

=== Answer ===
This drug information sheet provides details on remdesivir, an antiviral agent used for COVID-19 treatment. Remdesivir is a nucleoside analogue RNA polymerase inhibitor that works by inhibiting viral RNA-dependent RNA polymerase, thus reducing SARS-CoV-2 replication. It is indicated for hospitalized COVID-19 patients requiring supplemental oxygen but not on invasive ventilation. The dosage consists of a 200 mg IV loading dose on Day 1, followed by a 1

## Test Query- Afrikaans

In [30]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Sommeer die hele dokument in 200 woorde" # Afrikaans
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:48:50,426 - Medical Chatbot G-Version - Query language: code=af (Afrikaans), conf≈1.00
[INFO] 2025-09-22 08:48:50,427 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:48:50,462 - Medical Chatbot G-Version - Token check: total=391, budget=128000, fits=True
[INFO] 2025-09-22 08:48:58,840 - Medical Chatbot G-Version - Language verify OK: expected=af, detected=af
[INFO] 2025-09-22 08:48:58,846 - Medical Chatbot G-Version - Language verify OK: expected=af, detected=af

=== Answer ===
Hierdie dokument is 'n inligtingsblad oor remdesivir, 'n antivirale middel vir COVID-19-behandeling. Dit bevat die volgende hoofpunte:

Klas: Remdesivir is 'n antivirale middel, spesifiek 'n nukleosied-analoog RNA-polimerase-inhibeerder.

Werkingsmeganisme: Dit inhibeer virale RNA-afhanklike RNA-polimerase, wat SARS-CoV-2-replikasie verminder.

Indikasies: Dit word gebruik vir gehospitaliseerde COVID-19-pasiënte wat aanvullende suurstof benodig, maa

## Test Query- Albanian

In [31]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Përmbledhni të gjithë dokumentin në 200 fjalë" # Albanian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:48:58,861 - Medical Chatbot G-Version - Query language: code=sq (Albanian), conf≈1.00
[INFO] 2025-09-22 08:48:58,863 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:48:58,897 - Medical Chatbot G-Version - Token check: total=398, budget=128000, fits=True
[INFO] 2025-09-22 08:49:08,728 - Medical Chatbot G-Version - Language verify OK: expected=sq, detected=sq
[INFO] 2025-09-22 08:49:08,733 - Medical Chatbot G-Version - Language verify OK: expected=sq, detected=sq

=== Answer ===
Ky dokument është një fletë informacioni për barnat që përshkruan remdesivirin, një agjent antiviral për trajtimin e COVID-19. Ai ofron informacion të detajuar për klinicistët në lidhje me mekanizmin e veprimit, indikacionet, dozimin, efektet anësore dhe kërkesat e monitorimit të remdesivirit.

Remdesiviri është një inhibitor i polimerazës RNA që vepron duke reduktuar replikimin e SARS-CoV-2. Indikohet për pacientët e shtruar në spital me COVID-19 të

## Test Query- Arabic

In [32]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "لخص المستند بأكمله في 200 كلمة" # Arabic
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:49:08,748 - Medical Chatbot G-Version - Query language: code=ar (Arabic), conf≈1.00
[INFO] 2025-09-22 08:49:08,749 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:49:08,784 - Medical Chatbot G-Version - Token check: total=400, budget=128000, fits=True
[INFO] 2025-09-22 08:49:21,176 - Medical Chatbot G-Version - Language verify OK: expected=ar, detected=ar
[INFO] 2025-09-22 08:49:21,180 - Medical Chatbot G-Version - Language verify OK: expected=ar, detected=ar

=== Answer ===
تلخيص المستند في حوالي 200 كلمة:

هذه نشرة معلومات دوائية عن عقار ريمديسيفير، وهو مضاد فيروسي لعلاج كوفيد-19. ريمديسيفير هو مثبط لإنزيم RNA بوليميراز المعتمد على RNA الفيروسي، مما يقلل من تكاثر فيروس سارس-كوف-2. 

يستخدم للمرضى المصابين بكوفيد-19 المؤكد والذين يحتاجون إلى أكسجين تكميلي ولكن ليسوا على التنفس الصناعي. 

الجرعة الموصى بها هي 200 ملغ عن طريق الوريد في اليوم الأول، ثم 100 ملغ يوميًا لمدة 4-9 أيام حسب الاستجابة السريرية.

الآثار الجانبية تشم

## Test Query- Bengali

In [33]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "পুরো নথিটি ২০০ শব্দে সংক্ষেপ করুন" # Bengali
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:49:21,209 - Medical Chatbot G-Version - Query language: code=bn (Bengali), conf≈1.00
[INFO] 2025-09-22 08:49:21,210 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:49:21,248 - Medical Chatbot G-Version - Token check: total=425, budget=128000, fits=True
[INFO] 2025-09-22 08:49:41,629 - Medical Chatbot G-Version - Language verify OK: expected=bn, detected=bn
[INFO] 2025-09-22 08:49:41,634 - Medical Chatbot G-Version - Language verify OK: expected=bn, detected=bn

=== Answer ===
এই ড্রাগ ইনফরমেশন শীটটি রেমডেসিভির সম্পর্কে তথ্য প্রদান করে, যা COVID-19 এর চিকিৎসায় ব্যবহৃত একটি অ্যান্টিভাইরাল ঔষধ। এটি একটি নিউক্লিওসাইড অ্যানালগ RNA পলিমারেজ ইনহিবিটর যা ভাইরাল RNA-নির্ভর RNA পলিমারেজকে বাধা দেয়, যার ফলে SARS-CoV-2 এর প্রতিলিপি তৈরি কমে যায়।

রেমডেসিভির হাসপাতালে ভর্তি COVID-19 রোগীদের জন্য নির্দেশিত যারা অতিরিক্ত অক্সিজেন প্রয়োজন করে কিন্তু ইনভেসিভ ভেন্টিলেশনে নেই। প্রথম দিনে 200 mg IV লোডিং ডোজ দেওয়া হয়, তারপর 4-9 দিন পর্য

## Test Query- Bulgarian

In [34]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Обобщете целия документ в 200 думи" # Bulgarian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:49:41,675 - Medical Chatbot G-Version - Query language: code=bg (Bulgarian), conf≈1.00
[INFO] 2025-09-22 08:49:41,675 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:49:41,710 - Medical Chatbot G-Version - Token check: total=395, budget=128000, fits=True
[INFO] 2025-09-22 08:49:49,830 - Medical Chatbot G-Version - Language verify OK: expected=bg, detected=bg
[INFO] 2025-09-22 08:49:49,838 - Medical Chatbot G-Version - Language verify OK: expected=bg, detected=bg

=== Answer ===
Този документ е информационен лист за лекарството ремдесивир, използвано за лечение на COVID-19. Ремдесивир е антивирусен агент от клас нуклеозидни аналози, инхибитори на РНК полимеразата. Той действа чрез инхибиране на вирусната РНК-зависима РНК полимераза, намалявайки репликацията на SARS-CoV-2. 

Показан е за хоспитализирани пациенти с потвърден COVID-19, нуждаещи се от допълнителен кислород, но не на инвазивна вентилация. Дозировката включва нат

## Test Query- Catalan

In [35]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Resumeix tot el document en 200 paraules" # Catalan
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:49:49,856 - Medical Chatbot G-Version - Query language: code=ca (Catalan), conf≈1.00
[INFO] 2025-09-22 08:49:49,858 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:49:49,892 - Medical Chatbot G-Version - Token check: total=390, budget=128000, fits=True
[INFO] 2025-09-22 08:49:56,171 - Medical Chatbot G-Version - Language verify OK: expected=ca, detected=ca
[INFO] 2025-09-22 08:49:56,177 - Medical Chatbot G-Version - Language verify OK: expected=ca, detected=ca

=== Answer ===
Aquest document és un full d'informació sobre el medicament remdesivir per al tractament de la COVID-19. Proporciona detalls clínics sobre aquest antiviral, que és un inhibidor de l'ARN polimerasa. El seu mecanisme d'acció consisteix a inhibir la replicació del SARS-CoV-2. 

Està indicat per a pacients hospitalitzats amb COVID-19 confirmada que requereixen oxigen suplementari però no ventilació invasiva. La dosificació inclou una dosi de càrrega de 20

## Test Query- Chinese (Simplified)

In [36]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "将整个文档概括为200个字" # Chinese (Simplified)
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:49:56,193 - Medical Chatbot G-Version - Query language: code=zh-cn (Chinese (Simplified)), conf≈1.00
[INFO] 2025-09-22 08:49:56,194 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:49:56,226 - Medical Chatbot G-Version - Token check: total=394, budget=128000, fits=True
[INFO] 2025-09-22 08:50:01,351 - Medical Chatbot G-Version - Language verify OK: expected=zh-cn, detected=zh-cn
[INFO] 2025-09-22 08:50:01,354 - Medical Chatbot G-Version - Language verify OK: expected=zh-cn, detected=zh-cn

=== Answer ===
这份药品信息表提供了关于瑞德西韦的详细信息,包括其作为抗病毒药物用于治疗COVID-19的机制、适应症、剂量、不良反应和监测要求。瑞德西韦是一种核苷类似物RNA聚合酶抑制剂,通过抑制病毒RNA依赖的RNA聚合酶来减少SARS-CoV-2的复制。它适用于需要补充氧气但不需要有创通气的住院COVID-19患者。推荐剂量为第1天200mg静脉注射,随后每天100mg,持续4-9天。常见不良反应包括恶心和转氨酶升高。使用前需检查肝肾功能,并在治疗期间定期监测。严重肾功能不全和已知对瑞德西韦过敏者禁用。


## Chinese Traditional - Unable to Detect

In [37]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "將整個文件概括為200個字" # Chinese (Traditional)
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:50:01,375 - Medical Chatbot G-Version - Query language: code=ko (Korean), conf≈1.00
[INFO] 2025-09-22 08:50:01,376 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:50:01,410 - Medical Chatbot G-Version - Token check: total=397, budget=128000, fits=True
[INFO] 2025-09-22 08:50:02,109 - Medical Chatbot G-Version - Re-prompting due to language mismatch (attempt 1/1).
[INFO] 2025-09-22 08:50:08,517 - Medical Chatbot G-Version - Language verify OK: expected=ko, detected=ko
[INFO] 2025-09-22 08:50:08,520 - Medical Chatbot G-Version - Language verify OK: expected=ko, detected=ko

=== Answer ===
이 문서는 COVID-19 치료를 위한 항바이러스제인 렘데시비르에 대한 약물 정보를 제공합니다. 주요 내용은 다음과 같습니다:

- 분류: RNA 중합효소 억제제
- 작용 기전: SARS-CoV-2의 복제를 감소시킴
- 적응증: 산소 보충이 필요한 입원 환자
- 용량: 첫날 200mg IV 투여 후 4-9일간 100mg IV 유지
- 부작용: 구역, 간효소 상승, 드물게 과민반응과 신기능 장애
- 모니터링: 간 및 신기능 검사
- 금기: 중증 신장애, 렘데시비르에 대한 과민반응

문서는 또한 렘데시비르의 기전, 적응증, 용량, 부작용, 모니터링 요구사항에 대한 상세 정보를 임상의에게 제공하는 것을 목적으로

## Test Query- Croatian

In [38]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Sažmite cijeli dokument u 200 riječi" # Croatian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:50:08,538 - Medical Chatbot G-Version - Query language: code=hr (Croatian), conf≈1.00
[INFO] 2025-09-22 08:50:08,541 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:50:08,574 - Medical Chatbot G-Version - Token check: total=394, budget=128000, fits=True
[INFO] 2025-09-22 08:50:14,265 - Medical Chatbot G-Version - Language verify OK: expected=hr, detected=hr
[INFO] 2025-09-22 08:50:14,272 - Medical Chatbot G-Version - Language verify OK: expected=hr, detected=hr

=== Answer ===
Ovaj dokument je informativni list o lijeku remdesivir za liječenje COVID-19. Remdesivir je antivirusni lijek koji inhibira virusnu RNA-ovisnu RNA polimerazu, smanjujući replikaciju SARS-CoV-2 virusa. Indiciran je za hospitalizirane pacijente s potvrđenim COVID-19 koji zahtijevaju dodatni kisik, ali nisu na invazivnoj ventilaciji. Doziranje uključuje početnu dozu od 200 mg intravenozno prvi dan, nakon čega slijedi dnevna doza održavanja od 100 mg tij

## Test Query- Czech

In [39]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Shrňte celý dokument do 200 slov" # Czech
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:50:14,298 - Medical Chatbot G-Version - Query language: code=sk (Slovak), conf≈0.86
[INFO] 2025-09-22 08:50:14,299 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:50:14,332 - Medical Chatbot G-Version - Token check: total=393, budget=128000, fits=True
[INFO] 2025-09-22 08:50:21,109 - Medical Chatbot G-Version - Language verify OK: expected=sk, detected=sk
[INFO] 2025-09-22 08:50:21,115 - Medical Chatbot G-Version - Language verify OK: expected=sk, detected=sk

=== Answer ===
Tento dokument je informačný list o lieku remdesivir, ktorý sa používa na liečbu COVID-19. Remdesivir je antivirotikum, ktoré inhibuje vírusovú RNA-dependentnú RNA polymerázu, čím znižuje replikáciu SARS-CoV-2. Je indikovaný pre hospitalizovaných pacientov s potvrdeným COVID-19, ktorí vyžadujú doplnkový kyslík, ale nie sú na invazívnej ventilácii. 

Dávkovanie zahŕňa úvodnú dávku 200 mg IV v prvý deň, po ktorej nasleduje udržiavacia dávka 100 mg IV den

## Test Query- Danish

In [40]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Sammenfat hele dokumentet i 200 ord" # Danish
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:50:21,147 - Medical Chatbot G-Version - Query language: code=no (Norwegian), conf≈1.00
[INFO] 2025-09-22 08:50:21,149 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:50:21,184 - Medical Chatbot G-Version - Token check: total=390, budget=128000, fits=True
[INFO] 2025-09-22 08:50:28,758 - Medical Chatbot G-Version - Language verify OK: expected=no, detected=no
[INFO] 2025-09-22 08:50:28,768 - Medical Chatbot G-Version - Language verify OK: expected=no, detected=no

=== Answer ===
Dette dokumentet er et legemiddelinformasjonsark om remdesivir, et antiviralt middel for behandling av COVID-19. Det gir klinisk informasjon om legemiddelets klasse, virkningsmekanisme, indikasjoner, dosering, bivirkninger og overvåkningskrav.

Remdesivir er en nukleosidanalog RNA-polymerasehemmer som hemmer viral RNA-avhengig RNA-polymerase, og reduserer dermed replikasjonen av SARS-CoV-2. Det er indisert for hospitaliserte pasienter med bekreftet 

## Test Query- Dutch

In [41]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Vat het hele document samen in 200 woorden" # Dutch
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:50:28,786 - Medical Chatbot G-Version - Query language: code=nl (Dutch / Flemish), conf≈1.00
[INFO] 2025-09-22 08:50:28,787 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:50:28,824 - Medical Chatbot G-Version - Token check: total=391, budget=128000, fits=True
[INFO] 2025-09-22 08:50:35,621 - Medical Chatbot G-Version - Language verify OK: expected=nl, detected=nl
[INFO] 2025-09-22 08:50:35,632 - Medical Chatbot G-Version - Language verify OK: expected=nl, detected=nl

=== Answer ===
Dit document is een geneesmiddelinformatieblad over remdesivir, een antiviraal middel voor de behandeling van COVID-19. Het bevat de volgende belangrijke informatie:

Klasse: Antiviraal middel, nucleoside-analoog RNA-polymeraseremmer.

Werkingsmechanisme: Remt viraal RNA-afhankelijk RNA-polymerase, waardoor de replicatie van SARS-CoV-2 wordt verminderd.

Indicaties: Gehospitaliseerde patiënten met bevestigde COVID-19 die extra zuurstof nodig h

## Test Query- Estonian

In [42]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Kokkuvõtke kogu dokument 200 sõnaga" # Estonian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:50:35,650 - Medical Chatbot G-Version - Query language: code=et (Estonian), conf≈1.00
[INFO] 2025-09-22 08:50:35,652 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:50:35,686 - Medical Chatbot G-Version - Token check: total=397, budget=128000, fits=True
[INFO] 2025-09-22 08:50:43,301 - Medical Chatbot G-Version - Language verify OK: expected=et, detected=et
[INFO] 2025-09-22 08:50:43,307 - Medical Chatbot G-Version - Language verify OK: expected=et, detected=et

=== Answer ===
See dokument on ravimiteave remdesiviri kohta, mis on viirusevastane ravim COVID-19 raviks. Peamised punktid:

- Remdesivir on nukleosiidi analoog RNA polümeraasi inhibiitor.
- See pärsib viiruse RNA-sõltuvat RNA polümeraasi, vähendades SARS-CoV-2 paljunemist.
- Näidustatud haiglaravil olevatele kinnitatud COVID-19 patsientidele, kes vajavad lisahapnikku, kuid ei ole invasiivsel ventilatsioonil.
- Annustamine: 200 mg IV esimesel päeval, seejärel 100 

## Test Query- Farsi / Persian

In [82]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "کل سند را خلاصه کنید" # Farsi / Persian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 09:08:40,784 - Medical Chatbot G-Version - Query language: code=fa (Farsi / Persian), conf≈1.00
[INFO] 2025-09-22 09:08:40,786 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 09:08:40,821 - Medical Chatbot G-Version - Token check: total=394, budget=128000, fits=True
[INFO] 2025-09-22 09:08:47,923 - Medical Chatbot G-Version - Language verify OK: expected=fa, detected=fa
[INFO] 2025-09-22 09:08:47,927 - Medical Chatbot G-Version - Language verify OK: expected=fa, detected=fa

=== Answer ===
این سند یک برگه اطلاعات دارویی درباره رمدسیویر است که برای درمان کووید-19 استفاده می‌شود. نکات اصلی عبارتند از:

- رمدسیویر یک داروی ضد ویروسی است که از تکثیر ویروس SARS-CoV-2 جلوگیری می‌کند.
- برای بیماران بستری مبتلا به کووید-19 که نیاز به اکسیژن دارند اما تحت تهویه تهاجمی نیستند، تجویز می‌شود.
- دوز اولیه 200 میلی‌گرم و سپس 100 میلی‌گرم روزانه به مدت 4 تا 9 روز است.
- عوارض جانبی شامل تهوع و افزایش آنزیم‌های کبدی است.
- نیاز به پایش عملکرد 

## Test Query-  Finnish

In [44]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Tiivistä koko asiakirja 200 sanaan" # Finnish
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:50:52,572 - Medical Chatbot G-Version - Query language: code=fi (Finnish), conf≈1.00
[INFO] 2025-09-22 08:50:52,573 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:50:52,607 - Medical Chatbot G-Version - Token check: total=394, budget=128000, fits=True
[INFO] 2025-09-22 08:50:59,827 - Medical Chatbot G-Version - Language verify OK: expected=fi, detected=fi
[INFO] 2025-09-22 08:50:59,833 - Medical Chatbot G-Version - Language verify OK: expected=fi, detected=fi

=== Answer ===
Tämä asiakirja on lääketietosivu remdesiviristä, joka on antiviraalinen lääke COVID-19:n hoitoon. Remdesivir on nukleosidianalogi RNA-polymeraasin estäjä, joka vähentää SARS-CoV-2:n replikaatiota estämällä viruksen RNA-riippuvaista RNA-polymeraasia.

Lääke on tarkoitettu sairaalahoidossa oleville COVID-19-potilaille, jotka tarvitsevat lisähappea mutta eivät ole invasiivisessa ventilaatiossa. Annostus on 200 mg laskimonsisäisesti ensimmäisenä päivänä, 

## Test Query- French

In [45]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Résumez l'ensemble du document en 200 mots" # French
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:50:59,850 - Medical Chatbot G-Version - Query language: code=fr (French), conf≈1.00
[INFO] 2025-09-22 08:50:59,851 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:50:59,886 - Medical Chatbot G-Version - Token check: total=392, budget=128000, fits=True
[INFO] 2025-09-22 08:51:06,534 - Medical Chatbot G-Version - Language verify OK: expected=fr, detected=fr
[INFO] 2025-09-22 08:51:06,541 - Medical Chatbot G-Version - Language verify OK: expected=fr, detected=fr

=== Answer ===
Voici un résumé du document en français en environ 200 mots :

Cette fiche d'information sur le remdesivir, un antiviral utilisé dans le traitement du COVID-19, fournit des détails essentiels aux cliniciens. Le remdesivir est un inhibiteur de l'ARN polymérase ARN-dépendante qui réduit la réplication du SARS-CoV-2. 

Il est indiqué pour les patients hospitalisés atteints de COVID-19 confirmé nécessitant une oxygénothérapie mais pas de ventilation invasi

## Test Query- German

In [46]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Fassen Sie das gesamte Dokument in 200 Wörtern zusammen" # German
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:51:06,559 - Medical Chatbot G-Version - Query language: code=de (German), conf≈1.00
[INFO] 2025-09-22 08:51:06,562 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:51:06,596 - Medical Chatbot G-Version - Token check: total=396, budget=128000, fits=True
[INFO] 2025-09-22 08:51:14,349 - Medical Chatbot G-Version - Language verify OK: expected=de, detected=de
[INFO] 2025-09-22 08:51:14,354 - Medical Chatbot G-Version - Language verify OK: expected=de, detected=de

=== Answer ===
Dieses Arzneimittelinformationsblatt bietet Ärzten Details zu Remdesivir, einem antiviralen Medikament zur Behandlung von COVID-19. Remdesivir ist ein Nukleosidanalogon, das die virale RNA-abhängige RNA-Polymerase hemmt und so die Replikation von SARS-CoV-2 reduziert. Es ist indiziert für hospitalisierte COVID-19-Patienten, die zusätzlichen Sauerstoff benötigen, aber nicht invasiv beatmet werden.

Die Dosierung besteht aus einer Initialdosis von 200 mg

## Test Query- Greek

In [47]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Συνοψίστε ολόκληρο το έγγραφο σε 200 λέξεις" # Greek
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:51:14,368 - Medical Chatbot G-Version - Query language: code=el (Greek), conf≈1.00
[INFO] 2025-09-22 08:51:14,371 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:51:14,405 - Medical Chatbot G-Version - Token check: total=421, budget=128000, fits=True
[INFO] 2025-09-22 08:51:25,902 - Medical Chatbot G-Version - Language verify OK: expected=el, detected=el
[INFO] 2025-09-22 08:51:25,907 - Medical Chatbot G-Version - Language verify OK: expected=el, detected=el

=== Answer ===
Το έγγραφο αυτό είναι ένα φύλλο πληροφοριών φαρμάκου για το remdesivir, ένα αντιιικό φάρμακο για τη θεραπεία της COVID-19. Το remdesivir ανήκει στην κατηγορία των νουκλεοσιδικών αναλόγων που αναστέλλουν την RNA πολυμεράση. Ο μηχανισμός δράσης του είναι η αναστολή της ιικής RNA-εξαρτώμενης RNA πολυμεράσης, μειώνοντας έτσι την αναπαραγωγή του SARS-CoV-2.

Ενδείκνυται για νοσηλευόμενους ασθενείς με επιβεβαιωμένη COVID-19 που χρειάζονται συμπληρωματικό οξυγ

## Test Query- Gujarati

In [48]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "સંપૂર્ણ દસ્તાવેજને 200 શબ્દોમાં સંક્ષિપ્ત કરો" # Gujarati
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:51:25,923 - Medical Chatbot G-Version - Query language: code=gu (Gujarati), conf≈1.00
[INFO] 2025-09-22 08:51:25,924 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:51:25,957 - Medical Chatbot G-Version - Token check: total=456, budget=128000, fits=True
[INFO] 2025-09-22 08:51:36,512 - Medical Chatbot G-Version - Language verify OK: expected=gu, detected=gu
[INFO] 2025-09-22 08:51:36,516 - Medical Chatbot G-Version - Language verify OK: expected=gu, detected=gu

=== Answer ===
આ દસ્તાવેજ રેમડેસિવિર નામની એન્ટિવાયરલ દવા વિશે માહિતી આપે છે. તે COVID-19ના સારવાર માટે વપરાય છે. રેમડેસિવિર વાયરલ RNA પોલીમરેઝને રોકીને SARS-CoV-2નું પ્રજનન ઘટાડે છે. તે હોસ્પિટલમાં દાખલ થયેલા COVID-19ના દર્દીઓ માટે વપરાય છે જેમને ઓક્સિજનની જરૂર હોય પણ વેન્ટિલેટર પર ન હોય. 

પ્રથમ દિવસે 200 mg IV અને પછીના 4-9 દિવસ સુધી રોજ 100 mg IV આપવામાં આવે છે. આડઅસરોમાં ઉબકા અને લિવરના એન્ઝાઇમ્સમાં વધારો થઈ શકે છે. ભાગ્યે જ એલર્જીક પ્રતિક્રિયા અને કિડનીની સમસ

## Test Query- Hebrew

In [49]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "סכם את כל המסמך ב-200 מילים" # Hebrew
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:51:36,530 - Medical Chatbot G-Version - Query language: code=he (Hebrew), conf≈1.00
[INFO] 2025-09-22 08:51:36,532 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:51:36,564 - Medical Chatbot G-Version - Token check: total=406, budget=128000, fits=True
[INFO] 2025-09-22 08:51:47,811 - Medical Chatbot G-Version - Language verify OK: expected=he, detected=he
[INFO] 2025-09-22 08:51:47,814 - Medical Chatbot G-Version - Language verify OK: expected=he, detected=he

=== Answer ===
מסמך זה הוא דף מידע תרופתי על רמדסיביר, תרופה אנטי-ויראלית לטיפול ב-COVID-19. הוא מספק מידע קליני מפורט לרופאים.

רמדסיביר הוא מעכב RNA פולימראז מסוג אנלוג נוקלאוזיד. הוא פועל על ידי עיכוב ה-RNA פולימראז התלוי ב-RNA של הנגיף, ובכך מפחית את שכפול SARS-CoV-2.

התרופה מיועדת לחולים מאושפזים עם COVID-19 מאומת הזקוקים לחמצן נוסף, אך לא על הנשמה פולשנית.

המינון כולל מנת העמסה של 200 מ"ג ביום הראשון, ולאחר מכן 100 מ"ג ליום למשך 4-9 ימים.

תופעות לוואי אפשריו

## Test Query- Hindi

In [50]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "पूरे दस्तावेज़ को 200 शब्दों में संक्षेप करें" # Hindi
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:51:47,830 - Medical Chatbot G-Version - Query language: code=hi (Hindi), conf≈1.00
[INFO] 2025-09-22 08:51:47,831 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:51:47,862 - Medical Chatbot G-Version - Token check: total=422, budget=128000, fits=True
[INFO] 2025-09-22 08:51:59,521 - Medical Chatbot G-Version - Language verify OK: expected=hi, detected=hi
[INFO] 2025-09-22 08:51:59,526 - Medical Chatbot G-Version - Language verify OK: expected=hi, detected=hi

=== Answer ===
यह दस्तावेज़ रेमडेसिविर नामक एंटीवायरल दवा के बारे में जानकारी प्रदान करता है, जो COVID-19 के इलाज के लिए उपयोग की जाती है। इसमें निम्नलिखित मुख्य बिंदु शामिल हैं:

- रेमडेसिविर एक न्यूक्लियोसाइड एनालॉग RNA पॉलीमरेज़ अवरोधक है।
- यह SARS-CoV-2 वायरस की प्रतिकृति को कम करता है।
- यह उन अस्पताल में भर्ती COVID-19 रोगियों के लिए निर्देशित है जिन्हें ऑक्सीजन की आवश्यकता है लेकिन वेंटिलेटर पर नहीं हैं।
- खुराक: पहले दिन 200 mg IV लोडिंग डोज, उसके बाद 4-9 दिन

## Test Query- Hungarian

In [51]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Foglalja össze az egész dokumentumot 200 szóban" # Hungarian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:51:59,541 - Medical Chatbot G-Version - Query language: code=hu (Hungarian), conf≈1.00
[INFO] 2025-09-22 08:51:59,542 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:51:59,575 - Medical Chatbot G-Version - Token check: total=398, budget=128000, fits=True
[INFO] 2025-09-22 08:52:07,451 - Medical Chatbot G-Version - Language verify OK: expected=hu, detected=hu
[INFO] 2025-09-22 08:52:07,455 - Medical Chatbot G-Version - Language verify OK: expected=hu, detected=hu

=== Answer ===
Ez a dokumentum egy gyógyszerinformációs lap a remdesivirről, amely egy antivirális szer a COVID-19 kezelésére. A fő pontok:

- A remdesivir egy nukleozid-analóg RNS-polimeráz-gátló, amely gátolja a SARS-CoV-2 vírus replikációját.

- Javallt kórházban kezelt, igazolt COVID-19 fertőzött betegeknek, akik kiegészítő oxigént igényelnek, de nincsenek invazív lélegeztetésen.

- Adagolás: 200 mg IV telítő dózis az 1. napon, majd 100 mg IV naponta 4-9 napig

## Test Query- Indonesian 

In [52]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Ringkas seluruh dokumen dalam 200 kata" # Indonesian 
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:52:07,473 - Medical Chatbot G-Version - Query language: code=id (Indonesian / Malay), conf≈1.00
[INFO] 2025-09-22 08:52:07,474 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:52:07,620 - Medical Chatbot G-Version - Token check: total=391, budget=128000, fits=True
[INFO] 2025-09-22 08:52:14,235 - Medical Chatbot G-Version - Language verify OK: expected=id, detected=id
[INFO] 2025-09-22 08:52:14,241 - Medical Chatbot G-Version - Language verify OK: expected=id, detected=id

=== Answer ===
Dokumen ini adalah lembar informasi obat tentang remdesivir, sebuah agen antivirus untuk pengobatan COVID-19. Remdesivir adalah penghambat RNA polimerase analog nukleosida yang bekerja dengan menghambat RNA polimerase RNA-dependen virus, mengurangi replikasi SARS-CoV-2. 

Indikasinya adalah untuk pasien rawat inap dengan COVID-19 yang membutuhkan oksigen tambahan tetapi tidak menggunakan ventilasi invasif. Dosis muatannya adalah 200 mg IV p

## Test Query- Italian

In [53]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Riassumi l'intero documento in 200 parole" # Italian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:52:14,258 - Medical Chatbot G-Version - Query language: code=it (Italian), conf≈1.00
[INFO] 2025-09-22 08:52:14,260 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:52:14,291 - Medical Chatbot G-Version - Token check: total=393, budget=128000, fits=True
[INFO] 2025-09-22 08:52:20,513 - Medical Chatbot G-Version - Language verify OK: expected=it, detected=it
[INFO] 2025-09-22 08:52:20,518 - Medical Chatbot G-Version - Language verify OK: expected=it, detected=it

=== Answer ===
Questo documento è una scheda informativa sul farmaco remdesivir, un agente antivirale utilizzato nel trattamento del COVID-19. Il remdesivir è un inibitore della RNA polimerasi RNA-dipendente che riduce la replicazione del virus SARS-CoV-2. È indicato per pazienti ospedalizzati con COVID-19 confermato che richiedono ossigeno supplementare ma non ventilazione invasiva.

Il dosaggio prevede una dose di carico di 200 mg per via endovenosa il primo giorn

## Test Query- Japanese

In [54]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "文書全体を200語で要約してください" # Japanese
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:52:20,533 - Medical Chatbot G-Version - Query language: code=ja (Japanese), conf≈1.00
[INFO] 2025-09-22 08:52:20,534 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:52:20,567 - Medical Chatbot G-Version - Token check: total=395, budget=128000, fits=True
[INFO] 2025-09-22 08:52:27,605 - Medical Chatbot G-Version - Language verify OK: expected=ja, detected=ja
[INFO] 2025-09-22 08:52:27,609 - Medical Chatbot G-Version - Language verify OK: expected=ja, detected=ja

=== Answer ===
この文書は、COVID-19治療に使用される抗ウイルス薬レムデシビルに関する薬剤情報シートです。

レムデシビルは、RNA依存性RNAポリメラーゼを阻害してSARS-CoV-2の複製を減少させる核酸アナログです。

適応は、補助酸素を必要とするが侵襲的人工呼吸器を使用していないCOVID-19入院患者です。

投与量は、初日に200 mg静脈内投与し、その後4〜9日間100 mg/日を投与します。

副作用には悪心や肝酵素上昇があり、まれに過敏反応や腎機能障害が起こる可能性があります。

治療中は肝機能と腎機能のモニタリングが必要です。

重度の腎機能障害(eGFR <30 mL/min)やレムデシビルに対する過敏症がある患者には禁忌です。

この情報シートは、臨床医にレムデシビルの作用機序、適応、投与量、副作用、モニタリング要件に関する詳細を提供することを目的としています。


## Test Query- Kannada

In [55]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "ಪೂರ್ಣ ದಾಖಲೆವನ್ನು 200 ಪದಗಳಲ್ಲಿ ಸಂಕ್ಷಿಪ್ತಗೊಳಿಸಿ" # Kannada
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:52:27,626 - Medical Chatbot G-Version - Query language: code=kn (Kannada), conf≈1.00
[INFO] 2025-09-22 08:52:27,627 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:52:27,663 - Medical Chatbot G-Version - Token check: total=458, budget=128000, fits=True
[INFO] 2025-09-22 08:52:45,596 - Medical Chatbot G-Version - Language verify OK: expected=kn, detected=kn
[INFO] 2025-09-22 08:52:45,599 - Medical Chatbot G-Version - Language verify OK: expected=kn, detected=kn

=== Answer ===
ಈ ಔಷಧ ಮಾಹಿತಿ ಪತ್ರವು ರೆಮ್ಡೆಸಿವಿರ್ ಬಗ್ಗೆ ವಿವರಗಳನ್ನು ನೀಡುತ್ತದೆ. ಇದು ಕೋವಿಡ್-19 ಚಿಕಿತ್ಸೆಗಾಗಿ ಬಳಸುವ ಆಂಟಿವೈರಲ್ ಔಷಧವಾಗಿದೆ. ಇದು ವೈರಸ್ ಆರ್ಎನ್ಎ ಪಾಲಿಮರೇಸ್ ಅನ್ನು ತಡೆಯುತ್ತದೆ. ಆಮ್ಲಜನಕ ಅಗತ್ಯವಿರುವ ಆಸ್ಪತ್ರೆಯಲ್ಲಿ ದಾಖಲಾದ ಕೋವಿಡ್-19 ರೋಗಿಗಳಿಗೆ ಇದನ್ನು ಬಳಸಲಾಗುತ್ತದೆ. ಮೊದಲ ದಿನ 200 ಮಿಗ್ರಾಂ ಮತ್ತು ನಂತರ 4-9 ದಿನಗಳವರೆಗೆ ದಿನಕ್ಕೆ 100 ಮಿಗ್ರಾಂ ನೀಡಲಾಗುತ್ತದೆ. ವಾಕರಿಕೆ ಮತ್ತು ಯಕೃತ್ತಿನ ಎನ್ಜೈಮ್ ಹೆಚ್ಚಳ ಸಾಮಾನ್ಯ ಅಡ್ಡಪರಿಣಾಮಗಳಾಗಿವೆ. ಚಿಕಿತ್ಸೆಯ ಮೊದಲು ಮತ್ತು ನಂತರ ಯಕೃತ್ತು ಮತ್ತು ಮೂತ್ರಪಿಂಡದ ಕಾರ್ಯಗಳನ್ನು ಪರೀ

## Test Query- Korean

In [56]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "전체 문서를 200단어로 요약하세요"# Korean
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:52:45,617 - Medical Chatbot G-Version - Query language: code=ko (Korean), conf≈1.00
[INFO] 2025-09-22 08:52:45,618 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:52:45,652 - Medical Chatbot G-Version - Token check: total=395, budget=128000, fits=True
[INFO] 2025-09-22 08:52:53,578 - Medical Chatbot G-Version - Language verify OK: expected=ko, detected=ko
[INFO] 2025-09-22 08:52:53,580 - Medical Chatbot G-Version - Language verify OK: expected=ko, detected=ko

=== Answer ===
이 문서는 COVID-19 치료를 위한 항바이러스제인 렘데시비르에 대한 약물 정보를 제공합니다. 렘데시비르는 RNA 의존성 RNA 중합효소를 억제하여 SARS-CoV-2의 복제를 감소시키는 뉴클레오시드 유사체입니다. 

주요 적응증은 보조 산소가 필요하지만 침습적 인공호흡기를 사용하지 않는 입원 COVID-19 환자입니다. 

용법은 첫날 200mg 정맥 주사 후 4-9일간 100mg 정맥 주사를 매일 유지합니다. 

주요 부작용으로는 구역과 트랜스아미나제 상승이 있으며, 드물게 과민반응과 신기능 장애가 발생할 수 있습니다. 

치료 중 간 기능과 신장 기능 검사를 주기적으로 모니터링해야 합니다. 

중증 신장애 환자(eGFR <30 mL/min)와 렘데시비르에 과민반응이 있는 환자에게는 금기입니다. 

이 정보는 임상의들이 렘데시비르의 기전, 적응증, 용량, 부작용 및 모니터링 요구사항을 이해하는 데 도

## Test Query- Latvian

In [57]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query ="Apkopojiet visu dokumentu 200 vārdos" # Latvian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:52:53,600 - Medical Chatbot G-Version - Query language: code=lv (Latvian), conf≈1.00
[INFO] 2025-09-22 08:52:53,601 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:52:53,634 - Medical Chatbot G-Version - Token check: total=394, budget=128000, fits=True
[INFO] 2025-09-22 08:53:02,601 - Medical Chatbot G-Version - Language verify OK: expected=lv, detected=lv
[INFO] 2025-09-22 08:53:02,606 - Medical Chatbot G-Version - Language verify OK: expected=lv, detected=lv

=== Answer ===
Šis dokuments ir zāļu informācijas lapa par remdesiviru, kas ir pretvīrusu līdzeklis COVID-19 ārstēšanai. Tas sniedz klīnicistiem detalizētu informāciju par remdesivira mehānismu, indikācijām, dozēšanu, blakusparādībām un uzraudzības prasībām.

Remdesivirs ir nukleozīdu analogs, kas inhibē vīrusu RNS-atkarīgo RNS polimerāzi, samazinot SARS-CoV-2 replikāciju. Tas ir indicēts hospitalizētiem pacientiem ar apstiprinātu COVID-19, kuriem nepieciešams papil

## Test Query- Lithuanian

In [58]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Apibendrinkite visą dokumentą 200 žodžiais" # Lithuanian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:53:02,623 - Medical Chatbot G-Version - Query language: code=lt (Lithuanian), conf≈1.00
[INFO] 2025-09-22 08:53:02,625 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:53:02,658 - Medical Chatbot G-Version - Token check: total=395, budget=128000, fits=True
[INFO] 2025-09-22 08:53:10,070 - Medical Chatbot G-Version - Language verify OK: expected=lt, detected=lt
[INFO] 2025-09-22 08:53:10,075 - Medical Chatbot G-Version - Language verify OK: expected=lt, detected=lt

=== Answer ===
Šis dokumentas yra vaisto informacinis lapas apie remdesivirą, antivirusinį vaistą, skirtą gydyti COVID-19. Jame pateikiama informacija apie vaisto klasę, veikimo mechanizmą, indikacijas, dozavimą, šalutinius poveikius ir stebėjimo reikalavimus.

Remdesiviras yra nukleozidų analogas, slopinantis viruso RNR polimerazę ir mažinantis SARS-CoV-2 replikaciją. Jis skiriamas hospitalizuotiems COVID-19 pacientams, kuriems reikia papildomo deguonies, bet nė

## Test Query- Macedonian

In [59]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Сумирајте го целиот документ во 200 зборови" # Macedonian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:53:10,090 - Medical Chatbot G-Version - Query language: code=mk (Macedonian), conf≈1.00
[INFO] 2025-09-22 08:53:10,092 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:53:10,126 - Medical Chatbot G-Version - Token check: total=403, budget=128000, fits=True
[INFO] 2025-09-22 08:53:18,782 - Medical Chatbot G-Version - Language verify OK: expected=mk, detected=mk
[INFO] 2025-09-22 08:53:18,788 - Medical Chatbot G-Version - Language verify OK: expected=mk, detected=mk

=== Answer ===
Овој документ е информативен лист за лекот ремдесивир, антивирусен агенс за третман на КОВИД-19. Тој содржи клучни информации за клиничарите, вклучувајќи:

Класа: Антивирусен нуклеозиден аналог инхибитор на РНК полимераза.

Механизам на дејство: Инхибира вирусната РНК-зависна РНК полимераза, намалувајќи ја репликацијата на САРС-КоВ-2.

Индикации: Хоспитализирани пациенти со потврден КОВИД-19 кои бараат дополнителен кислород, но не се на инвазивна в

## Test Query- Malayalam

In [60]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "മുഴുവൻ രേഖയും 200 വാക്കുകളിൽ സംഗ്രഹിക്കുക" # Malayalam
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:53:18,805 - Medical Chatbot G-Version - Query language: code=ml (Malayalam), conf≈1.00
[INFO] 2025-09-22 08:53:18,806 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:53:18,838 - Medical Chatbot G-Version - Token check: total=447, budget=128000, fits=True
[INFO] 2025-09-22 08:53:33,355 - Medical Chatbot G-Version - Language verify OK: expected=ml, detected=ml
[INFO] 2025-09-22 08:53:33,359 - Medical Chatbot G-Version - Language verify OK: expected=ml, detected=ml

=== Answer ===
ഈ രേഖ റെംഡെസിവിർ എന്ന കോവിഡ്-19 ചികിത്സയ്ക്കുള്ള ആന്റിവൈറൽ മരുന്നിനെക്കുറിച്ചുള്ള വിവരങ്ങൾ നൽകുന്നു. ഇത് ഒരു ന്യൂക്ലിയോസൈഡ് അനലോഗ് RNA പോളിമറേസ് ഇൻഹിബിറ്റർ ആണ്. SARS-CoV-2 വൈറസിന്റെ പ്രതിലിപി നിർമ്മാണത്തെ തടയുന്നു. 

ഓക്സിജൻ ആവശ്യമുള്ള, എന്നാൽ വെന്റിലേറ്റർ ഉപയോഗിക്കാത്ത കോവിഡ് രോഗികൾക്കാണ് ഇത് നൽകുന്നത്. ആദ്യ ദിവസം 200 mg IV ഡോസ് നൽകി, തുടർന്ന് 4-9 ദിവസം 100 mg വീതം നൽകുന്നു. 

പാർശ്വഫലങ്ങളിൽ ഓക്കാനം, ട്രാൻസാമിനേസ് വർധന എന്നിവ ഉൾപ്പെടുന്നു. അപൂർവമായ

## Test Query- Marathi

In [61]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "संपूर्ण दस्तऐवज 200 शब्दांमध्ये संक्षेप करा"# Marathi
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:53:33,379 - Medical Chatbot G-Version - Query language: code=mr (Marathi), conf≈1.00
[INFO] 2025-09-22 08:53:33,381 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:53:33,413 - Medical Chatbot G-Version - Token check: total=425, budget=128000, fits=True
[INFO] 2025-09-22 08:53:45,054 - Medical Chatbot G-Version - Language verify OK: expected=mr, detected=mr
[INFO] 2025-09-22 08:53:45,061 - Medical Chatbot G-Version - Language verify OK: expected=mr, detected=mr

=== Answer ===
हा दस्तऐवज रेमडेसिविर या कोविड-19 साठीच्या अँटीव्हायरल औषधाबद्दल माहिती देतो. त्यात खालील मुद्दे आहेत:

- रेमडेसिविर हे RNA पॉलिमरेज इनहिबिटर आहे जे SARS-CoV-2 चे प्रतिकृतीकरण कमी करते.
- हे रुग्णालयात दाखल झालेल्या, पुरवणी ऑक्सिजन आवश्यक असलेल्या कोविड-19 रुग्णांसाठी वापरले जाते.
- मात्रा: पहिल्या दिवशी 200 mg IV लोडिंग डोस, नंतर दररोज 100 mg IV 4 ते 9 दिवस.
- दुष्परिणाम: मळमळ, ट्रान्सअमिनेस वाढ; क्वचित अतिसंवेदनशीलता, मूत्रपिंड बिघाड.
- यकृत आणि मूत

## Test Query- Norwegian

In [62]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Oppsummer hele dokumentet på 200 ord" # Norwegian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:53:45,083 - Medical Chatbot G-Version - Query language: code=no (Norwegian), conf≈1.00
[INFO] 2025-09-22 08:53:45,086 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:53:45,120 - Medical Chatbot G-Version - Token check: total=389, budget=128000, fits=True
[INFO] 2025-09-22 08:53:52,863 - Medical Chatbot G-Version - Language verify OK: expected=no, detected=no
[INFO] 2025-09-22 08:53:52,875 - Medical Chatbot G-Version - Language verify OK: expected=no, detected=no

=== Answer ===
Dette dokumentet er et legemiddelinformasjonsark om remdesivir, et antiviralt middel for behandling av COVID-19. Det gir klinisk informasjon om legemiddelets klasse, virkningsmekanisme, indikasjoner, dosering, bivirkninger og overvåkningskrav.

Remdesivir er klassifisert som et nukleosidanalog RNA-polymerasehemmer. Det virker ved å hemme viral RNA-avhengig RNA-polymerase, noe som reduserer replikasjonen av SARS-CoV-2.

Indikasjonen er for hospitalis

## Test Query- Polish

In [63]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Streść cały dokument w 200 słowach" # Polish
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:53:52,891 - Medical Chatbot G-Version - Query language: code=pl (Polish), conf≈1.00
[INFO] 2025-09-22 08:53:52,892 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:53:52,926 - Medical Chatbot G-Version - Token check: total=392, budget=128000, fits=True
[INFO] 2025-09-22 08:53:59,740 - Medical Chatbot G-Version - Language verify OK: expected=pl, detected=pl
[INFO] 2025-09-22 08:53:59,745 - Medical Chatbot G-Version - Language verify OK: expected=pl, detected=pl

=== Answer ===
Dokument przedstawia informacje o leku remdesivir stosowanym w leczeniu COVID-19. Jest to lek przeciwwirusowy z grupy analogów nukleozydów, który hamuje wirusową polimerazę RNA zależną od RNA, zmniejszając replikację SARS-CoV-2. 

Wskazaniem do stosowania remdesiviru są hospitalizowani pacjenci z potwierdzonym COVID-19 wymagający tlenoterapii, ale nie będący na inwazyjnej wentylacji. 

Dawkowanie obejmuje dawkę nasycającą 200 mg dożylnie w pierwszym dn

## Test Query- Portuguese

In [64]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Resuma todo o documento em 200 palavras" # Portuguese (Brazil & Portugal)
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:53:59,760 - Medical Chatbot G-Version - Query language: code=pt (Portuguese), conf≈1.00
[INFO] 2025-09-22 08:53:59,761 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:53:59,795 - Medical Chatbot G-Version - Token check: total=390, budget=128000, fits=True
[INFO] 2025-09-22 08:54:05,873 - Medical Chatbot G-Version - Language verify OK: expected=pt, detected=pt
[INFO] 2025-09-22 08:54:05,880 - Medical Chatbot G-Version - Language verify OK: expected=pt, detected=pt

=== Answer ===
Este documento é uma ficha de informação sobre o medicamento remdesivir, um agente antiviral usado no tratamento da COVID-19. O remdesivir é um inibidor da RNA polimerase que reduz a replicação do SARS-CoV-2. É indicado para pacientes hospitalizados com COVID-19 confirmada que necessitam de oxigênio suplementar, mas não estão em ventilação invasiva.

A dosagem recomendada é uma dose de ataque de 200 mg IV no primeiro dia, seguida de uma dose de man

## Test Query- Punjabi

In [65]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "ਪੂਰੇ ਦਸਤਾਵੇਜ਼ ਨੂੰ 200 ਸ਼ਬਦਾਂ ਵਿੱਚ ਸੰਖੇਪ ਕਰੋ" # Punjabi
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:54:05,895 - Medical Chatbot G-Version - Query language: code=pa (Punjabi), conf≈1.00
[INFO] 2025-09-22 08:54:05,897 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:54:05,931 - Medical Chatbot G-Version - Token check: total=448, budget=128000, fits=True
[INFO] 2025-09-22 08:54:25,195 - Medical Chatbot G-Version - Language verify OK: expected=pa, detected=pa
[INFO] 2025-09-22 08:54:25,199 - Medical Chatbot G-Version - Language verify OK: expected=pa, detected=pa

=== Answer ===
ਇਹ ਦਸਤਾਵੇਜ਼ ਰੇਮਡੇਸੀਵੀਰ ਬਾਰੇ ਜਾਣਕਾਰੀ ਦਿੰਦਾ ਹੈ, ਜੋ ਕਿ COVID-19 ਦੇ ਇਲਾਜ ਲਈ ਇੱਕ ਐਂਟੀਵਾਇਰਲ ਦਵਾਈ ਹੈ। ਇਹ RNA ਪੋਲੀਮਰੇਜ਼ ਨੂੰ ਰੋਕ ਕੇ ਵਾਇਰਸ ਦੇ ਵਾਧੇ ਨੂੰ ਘਟਾਉਂਦੀ ਹੈ। ਇਹ ਹਸਪਤਾਲ ਵਿੱਚ ਦਾਖਲ COVID-19 ਮਰੀਜ਼ਾਂ ਲਈ ਵਰਤੀ ਜਾਂਦੀ ਹੈ ਜਿਨ੍ਹਾਂ ਨੂੰ ਆਕਸੀਜਨ ਦੀ ਲੋੜ ਹੈ ਪਰ ਵੈਂਟੀਲੇਟਰ 'ਤੇ ਨਹੀਂ ਹਨ। 

ਖੁਰਾਕ ਪਹਿਲੇ ਦਿਨ 200 mg IV ਅਤੇ ਫਿਰ 4-9 ਦਿਨਾਂ ਲਈ 100 mg IV ਰੋਜ਼ਾਨਾ ਹੈ। ਮਾੜੇ ਪ੍ਰਭਾਵਾਂ ਵਿੱਚ ਮਤਲੀ ਅਤੇ ਲੀਵਰ ਐਨਜ਼ਾਈਮਾਂ ਵਿੱਚ ਵਾਧਾ ਸ਼ਾਮਲ ਹਨ। ਗੁਰਦੇ ਦੀ ਗੰਭੀਰ ਕਮਜ਼ੋਰੀ ਵਾਲੇ ਮਰੀਜ਼ਾਂ ਲਈ ਇਹ ਮਨ੍ਹਾ ਹੈ। 

ਇਲਾਜ 

## Test Query- Romanian

In [66]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Rezumați întregul document în 200 de cuvinte" # Romanian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:54:25,214 - Medical Chatbot G-Version - Query language: code=ro (Romanian), conf≈1.00
[INFO] 2025-09-22 08:54:25,215 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:54:25,248 - Medical Chatbot G-Version - Token check: total=395, budget=128000, fits=True
[INFO] 2025-09-22 08:54:32,566 - Medical Chatbot G-Version - Language verify OK: expected=ro, detected=ro
[INFO] 2025-09-22 08:54:32,574 - Medical Chatbot G-Version - Language verify OK: expected=ro, detected=ro

=== Answer ===
Acest document este o fișă informativă despre medicamentul remdesivir, un agent antiviral utilizat în tratamentul COVID-19. Documentul include următoarele informații:

- Remdesivir face parte din clasa antiviralelor, fiind un inhibitor al ARN polimerazei.
- Mecanismul său de acțiune constă în inhibarea ARN polimerazei virale, reducând replicarea SARS-CoV-2.
- Este indicat pentru pacienții spitalizați cu COVID-19 confirmat, care necesită oxigen suplim

## Test Query- Russian

In [67]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Суммируйте весь документ в 200 словах" # Russian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:54:32,591 - Medical Chatbot G-Version - Query language: code=ru (Russian), conf≈1.00
[INFO] 2025-09-22 08:54:32,595 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:54:32,629 - Medical Chatbot G-Version - Token check: total=396, budget=128000, fits=True
[INFO] 2025-09-22 08:54:40,299 - Medical Chatbot G-Version - Language verify OK: expected=ru, detected=ru
[INFO] 2025-09-22 08:54:40,305 - Medical Chatbot G-Version - Language verify OK: expected=ru, detected=ru

=== Answer ===
Этот документ представляет собой информационный лист о препарате ремдесивир для лечения COVID-19. Ремдесивир - это противовирусный препарат класса нуклеозидных аналогов, ингибирующий РНК-зависимую РНК-полимеразу вируса SARS-CoV-2. Он показан для госпитализированных пациентов с подтвержденным COVID-19, требующих дополнительного кислорода, но не находящихся на инвазивной вентиляции легких. 

Схема дозирования включает нагрузочную дозу 200 мг внутривенно

## Test Query- Slovak

In [68]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Zhrňte celý dokument do 200 slov" # Slovak
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:54:40,332 - Medical Chatbot G-Version - Query language: code=sk (Slovak), conf≈1.00
[INFO] 2025-09-22 08:54:40,333 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:54:40,367 - Medical Chatbot G-Version - Token check: total=393, budget=128000, fits=True
[INFO] 2025-09-22 08:54:47,807 - Medical Chatbot G-Version - Language verify OK: expected=sk, detected=sk
[INFO] 2025-09-22 08:54:47,814 - Medical Chatbot G-Version - Language verify OK: expected=sk, detected=sk

=== Answer ===
Tento dokument je informačný list o lieku remdesivir, ktorý sa používa na liečbu COVID-19. Remdesivir je antivirotikum, ktoré inhibuje vírusovú RNA-dependentnú RNA polymerázu, čím znižuje replikáciu SARS-CoV-2. Je indikovaný pre hospitalizovaných pacientov s potvrdeným COVID-19, ktorí vyžadujú doplnkový kyslík, ale nie sú na invazívnej ventilácii. 

Dávkovanie zahŕňa úvodnú dávku 200 mg intravenózne v prvý deň, po ktorej nasleduje udržiavacia dávka 100

## Test Query- Slovene

In [69]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Povzemite celoten dokument v 200 besedah" # Slovene
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:54:47,851 - Medical Chatbot G-Version - Query language: code=sl (Slovene), conf≈0.82
[INFO] 2025-09-22 08:54:47,852 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:54:47,884 - Medical Chatbot G-Version - Token check: total=393, budget=128000, fits=True
[INFO] 2025-09-22 08:54:55,112 - Medical Chatbot G-Version - Language verify OK: expected=sl, detected=sl
[INFO] 2025-09-22 08:54:55,121 - Medical Chatbot G-Version - Language verify OK: expected=sl, detected=sl

=== Answer ===
Ta dokument je informacijski list o zdravilu remdesivir, protivirusnem zdravilu za zdravljenje COVID-19. Vsebuje naslednje ključne informacije:

- Remdesivir je nukleozidni analog, ki zavira virusno RNA-odvisno RNA polimerazo in tako zmanjšuje replikacijo SARS-CoV-2.

- Indiciran je za hospitalizirane bolnike s potrjenim COVID-19, ki potrebujejo dodatni kisik, vendar niso na invazivni ventilaciji.

- Odmerjanje: začetni odmerek 200 mg intravensko prvi

## Test Query- Spanish 

In [70]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Resuma todo el documento en 200 palabras" # Spanish (incl. Mexico)
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:54:55,142 - Medical Chatbot G-Version - Query language: code=es (Spanish), conf≈0.86
[INFO] 2025-09-22 08:54:55,144 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:54:55,178 - Medical Chatbot G-Version - Token check: total=389, budget=128000, fits=True
[INFO] 2025-09-22 08:55:01,663 - Medical Chatbot G-Version - Language verify OK: expected=es, detected=es
[INFO] 2025-09-22 08:55:01,671 - Medical Chatbot G-Version - Language verify OK: expected=es, detected=es

=== Answer ===
Este documento es una hoja informativa sobre el remdesivir, un medicamento antiviral utilizado para tratar el COVID-19. Proporciona información clave para los médicos sobre:

- Tipo de medicamento: Antiviral inhibidor de la polimerasa de ARN
- Mecanismo de acción: Inhibe la replicación del SARS-CoV-2
- Indicaciones: Pacientes hospitalizados con COVID-19 que requieren oxígeno suplementario pero no ventilación invasiva
- Dosificación: 200 mg IV el día 1

## Test Query- Swahili

In [71]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query ="Fupisha hati nzima kwa maneno 200" # Swahili
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:55:01,687 - Medical Chatbot G-Version - Query language: code=sw (Swahili), conf≈1.00
[INFO] 2025-09-22 08:55:01,690 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:55:01,725 - Medical Chatbot G-Version - Token check: total=393, budget=128000, fits=True
[INFO] 2025-09-22 08:55:12,961 - Medical Chatbot G-Version - Language verify OK: expected=sw, detected=sw
[INFO] 2025-09-22 08:55:12,965 - Medical Chatbot G-Version - Language verify OK: expected=sw, detected=sw

=== Answer ===
Hati hii ni kuhusu dawa ya remdesivir inayotumika kutibu COVID-19. Yafuatayo ni muhtasari wa hati hiyo:

Remdesivir ni dawa ya kupambana na virusi aina ya nucleoside analogue ambayo huzuia RNA polymerase. Inafanya kazi kwa kuzuia utengenezaji wa virusi vya SARS-CoV-2.

Inapendekezwa kwa wagonjwa waliolazwa hospitali wenye COVID-19 iliyothibitishwa na wanahitaji oksijeni ya ziada lakini hawako kwenye ventilator.

Dozi ya kwanza ni 200 mg kwa njia ya ms

## Test Query- Swedish

In [72]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Sammanfatta hela dokumentet på 200 ord" # Swedish
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:55:12,984 - Medical Chatbot G-Version - Query language: code=sv (Swedish), conf≈1.00
[INFO] 2025-09-22 08:55:12,985 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:55:13,028 - Medical Chatbot G-Version - Token check: total=392, budget=128000, fits=True
[INFO] 2025-09-22 08:55:20,931 - Medical Chatbot G-Version - Language verify OK: expected=sv, detected=sv
[INFO] 2025-09-22 08:55:20,937 - Medical Chatbot G-Version - Language verify OK: expected=sv, detected=sv

=== Answer ===
Här är en sammanfattning av dokumentet på cirka 200 ord:

Detta är ett läkemedelsinformationsblad om remdesivir, ett antiviralt medel för behandling av COVID-19. Remdesivir är en nukleosidanalog som hämmar virusets RNA-beroende RNA-polymeras och minskar därmed replikationen av SARS-CoV-2.

Indikationen är för sjukhusvårdade patienter med bekräftad COVID-19 som kräver extra syretillförsel men inte invasiv ventilation. 

Doseringen består av en laddning

## Test Query- Tamil

In [73]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "முழு ஆவணத்தையும் 200 சொற்களில் சுருக்கவும்" # Tamil
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:55:20,952 - Medical Chatbot G-Version - Query language: code=ta (Tamil), conf≈1.00
[INFO] 2025-09-22 08:55:20,954 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:55:20,988 - Medical Chatbot G-Version - Token check: total=435, budget=128000, fits=True
[INFO] 2025-09-22 08:55:35,400 - Medical Chatbot G-Version - Language verify OK: expected=ta, detected=ta
[INFO] 2025-09-22 08:55:35,404 - Medical Chatbot G-Version - Language verify OK: expected=ta, detected=ta

=== Answer ===
இந்த மருந்து தகவல் தாள் ரெம்டெசிவிர் பற்றிய விவரங்களை வழங்குகிறது. இது COVID-19க்கான ஒரு ஆன்டிவைரல் மருந்தாகும். இது வைரஸ் RNA பாலிமரேஸை தடுக்கிறது. SARS-CoV-2 பெருக்கத்தைக் குறைக்கிறது. 

மருத்துவமனையில் அனுமதிக்கப்பட்ட COVID-19 நோயாளிகளுக்கு பரிந்துரைக்கப்படுகிறது. முதல் நாள் 200 மி.கி. IV லோடிங் டோஸ், பின்னர் 4-9 நாட்களுக்கு தினமும் 100 மி.கி. IV பராமரிப்பு டோஸ் கொடுக்கப்படுகிறது.

பக்க விளைவுகளில் குமட்டல், டிரான்சாமினேஸ் உயர்வு ஆகியவை அடங்கும். அரி

## Test Query- Telugu

In [74]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "మొత్తం పత్రాన్ని 200 పదాలలో సారాంశం చేయండి" # Telugu
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:55:35,421 - Medical Chatbot G-Version - Query language: code=te (Telugu), conf≈1.00
[INFO] 2025-09-22 08:55:35,421 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:55:35,458 - Medical Chatbot G-Version - Token check: total=450, budget=128000, fits=True
[INFO] 2025-09-22 08:55:48,432 - Medical Chatbot G-Version - Language verify OK: expected=te, detected=te
[INFO] 2025-09-22 08:55:48,436 - Medical Chatbot G-Version - Language verify OK: expected=te, detected=te

=== Answer ===
ఈ పత్రం రెమ్డెసివిర్ అనే COVID-19 చికిత్స కోసం ఉపయోగించే యాంటీవైరల్ ఔషధం గురించి సమాచారాన్ని అందిస్తుంది. ఇది RNA పాలిమరేజ్ ఇన్హిబిటర్ గా పనిచేస్తుంది. ఆస్పత్రిలో చేరిన COVID-19 రోగులకు ఇది సూచించబడింది. మొదటి రోజు 200 mg IV లోడింగ్ డోస్ తో మొదలై, తరువాత 4-9 రోజులు రోజుకు 100 mg IV నిర్వహణ మోతాదు ఇవ్వబడుతుంది. దీని దుష్ప్రభావాలలో వాంతులు, ట్రాన్సమినేస్ పెరుగుదల ఉన్నాయి. చికిత్స సమయంలో కాలేయం, మూత్రపిండాల పనితీరును పరీక్షించాలి. తీవ్రమైన మూత్రపిండాల సమస

## Test Query- Thai

In [75]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "สรุปเอกสารทั้งหมดใน 200 คำ" # Thai
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:55:48,450 - Medical Chatbot G-Version - Query language: code=th (Thai), conf≈1.00
[INFO] 2025-09-22 08:55:48,451 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:55:48,486 - Medical Chatbot G-Version - Token check: total=403, budget=128000, fits=True
[INFO] 2025-09-22 08:56:01,719 - Medical Chatbot G-Version - Language verify OK: expected=th, detected=th
[INFO] 2025-09-22 08:56:01,723 - Medical Chatbot G-Version - Language verify OK: expected=th, detected=th

=== Answer ===
เอกสารนี้เป็นแผ่นข้อมูลยาเกี่ยวกับ remdesivir ซึ่งเป็นยาต้านไวรัสสำหรับรักษา COVID-19 โดยมีรายละเอียดดังนี้:

- ประเภท: ยาต้านไวรัสกลุ่ม nucleoside analogue ที่ยับยั้งเอนไซม์ RNA polymerase
- กลไกการออกฤทธิ์: ยับยั้งการทำงานของ viral RNA-dependent RNA polymerase ทำให้ไวรัส SARS-CoV-2 ลดการแบ่งตัว
- ข้อบ่งใช้: ผู้ป่วย COVID-19 ที่ต้องรับการรักษาในโรงพยาบาลและต้องการออกซิเจนเสริม แต่ไม่ได้ใช้เครื่องช่วยหายใจ
- ขนาดยา: วันแรกให้ 200 มก. ทางหลอดเลือดดำ หลังจ

## Test Query- Tagalog

In [76]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Ibuod ang buong dokumento sa 200 salita" # Tagalog
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:56:01,739 - Medical Chatbot G-Version - Query language: code=tl (Tagalog), conf≈1.00
[INFO] 2025-09-22 08:56:01,740 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:56:01,882 - Medical Chatbot G-Version - Token check: total=393, budget=128000, fits=True
[INFO] 2025-09-22 08:56:10,744 - Medical Chatbot G-Version - Language verify OK: expected=tl, detected=tl
[INFO] 2025-09-22 08:56:10,750 - Medical Chatbot G-Version - Language verify OK: expected=tl, detected=tl

=== Answer ===
Ang dokumentong ito ay isang drug information sheet tungkol sa remdesivir, isang antiviral na gamot para sa COVID-19. Ito ay nagbibigay ng mahahalagang impormasyon para sa mga clinician.

Ang remdesivir ay kabilang sa klase ng antiviral na nucleoside analogue RNA polymerase inhibitor. Gumagana ito sa pamamagitan ng pag-inhibit ng viral RNA-dependent RNA polymerase, na nagpapababa ng pagdami ng SARS-CoV-2.

Inirerekomenda ito para sa mga pasyenteng na-

## Test Query- Turkish

In [77]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Tüm belgeyi 200 kelimede özetleyin" # Turkish
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:56:10,765 - Medical Chatbot G-Version - Query language: code=tr (Turkish), conf≈1.00
[INFO] 2025-09-22 08:56:10,767 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:56:10,802 - Medical Chatbot G-Version - Token check: total=394, budget=128000, fits=True
[INFO] 2025-09-22 08:56:20,482 - Medical Chatbot G-Version - Language verify OK: expected=tr, detected=tr
[INFO] 2025-09-22 08:56:20,488 - Medical Chatbot G-Version - Language verify OK: expected=tr, detected=tr

=== Answer ===
Bu belge, COVID-19 tedavisinde kullanılan remdesivir adlı antiviral ilaca ilişkin bir ilaç bilgi formudur. Belge, remdesivirin sınıfı, etki mekanizması, endikasyonları, dozajı, yan etkileri ve izleme gereksinimleri hakkında klinisyenlere bilgi sağlamaktadır.

Remdesivir, viral RNA bağımlı RNA polimerazı inhibe ederek SARS-CoV-2'nin replikasyonunu azaltan bir nükleozid analoğu RNA polimeraz inhibitörüdür. İlaç, ek oksijen gerektiren ancak invaziv venti

## Test Query- Ukrainian

In [78]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Стисло викладіть увесь документ у 200 словах" # Ukrainian
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:56:20,504 - Medical Chatbot G-Version - Query language: code=uk (Ukrainian), conf≈1.00
[INFO] 2025-09-22 08:56:20,505 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:56:20,537 - Medical Chatbot G-Version - Token check: total=399, budget=128000, fits=True
[INFO] 2025-09-22 08:56:27,973 - Medical Chatbot G-Version - Language verify OK: expected=uk, detected=uk
[INFO] 2025-09-22 08:56:27,978 - Medical Chatbot G-Version - Language verify OK: expected=uk, detected=uk

=== Answer ===
Цей документ є інформаційним листком про препарат ремдесивір для лікування COVID-19. Ремдесивір - це противірусний засіб, що інгібує РНК-залежну РНК-полімеразу вірусу, зменшуючи реплікацію SARS-CoV-2. 

Показання: госпіталізовані пацієнти з підтвердженим COVID-19, які потребують додаткового кисню, але не на інвазивній вентиляції.

Дозування: 
- Навантажувальна доза: 200 мг внутрішньовенно в 1-й день
- Підтримуюча доза: 100 мг внутрішньовенно щодня п

## Test Query- Urdu

In [79]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "پورے دستاویز کو 200 الفاظ میں خلاصہ کریں" # Urdu
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:56:27,996 - Medical Chatbot G-Version - Query language: code=ur (Urdu), conf≈1.00
[INFO] 2025-09-22 08:56:27,998 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:56:28,030 - Medical Chatbot G-Version - Token check: total=412, budget=128000, fits=True
[INFO] 2025-09-22 08:56:42,471 - Medical Chatbot G-Version - Language verify OK: expected=ur, detected=ur
[INFO] 2025-09-22 08:56:42,475 - Medical Chatbot G-Version - Language verify OK: expected=ur, detected=ur

=== Answer ===
یہ دستاویز ریمڈیسیویر کے بارے میں ایک دوائی کی معلوماتی شیٹ ہے، جو کہ COVID-19 کے علاج کے لیے استعمال ہونے والی ایک اینٹی وائرل دوا ہے۔ یہ RNA پولیمریز کو روک کر SARS-CoV-2 وائرس کی نقل کو کم کرتی ہے۔ یہ دوا ان مریضوں کے لیے تجویز کی جاتی ہے جو اسپتال میں داخل ہیں، COVID-19 کی تصدیق شدہ تشخیص رکھتے ہیں، اور جنہیں آکسیجن کی ضرورت ہے لیکن وینٹیلیٹر پر نہیں ہیں۔

خوراک کے طور پر، پہلے دن 200 mg IV دیا جاتا ہے، اس کے بعد 4 سے 9 دنوں تک روزانہ 100 mg IV دیا ج

## Test Query- Vietnamese

In [80]:
if __name__ == "__main__":
    example_file = "Drug Information Sheet.pdf"
    example_query = "Tóm tắt toàn bộ tài liệu trong 200 từ" # Vietnamese
    example_session = os.environ.get("EXAMPLE_SESSION_ID", "demo-session-1")

    # Enable history for demo by setting CONVO_HISTORY_ENABLED=true
    try:
        response = run_direct_document_qa(
            example_file,
            example_query,
            MODEL_CONTEXT_TOKENS,
            session_id=example_session
        )
        print("\n=== Answer ===")
        print(response)
    except Exception as e:
        logger.exception("Pipeline execution failed.")
        raise

[INFO] 2025-09-22 08:56:42,491 - Medical Chatbot G-Version - Query language: code=vi (Vietnamese), conf≈1.00
[INFO] 2025-09-22 08:56:42,493 - Medical Chatbot G-Version - Loading PDF: Drug Information Sheet.pdf
[INFO] 2025-09-22 08:56:42,528 - Medical Chatbot G-Version - Token check: total=396, budget=128000, fits=True
[INFO] 2025-09-22 08:56:51,981 - Medical Chatbot G-Version - Language verify OK: expected=vi, detected=vi
[INFO] 2025-09-22 08:56:51,985 - Medical Chatbot G-Version - Language verify OK: expected=vi, detected=vi

=== Answer ===
Tài liệu này là một bảng thông tin thuốc về remdesivir, một loại thuốc kháng virus được sử dụng để điều trị COVID-19. Nó bao gồm các thông tin sau:

- Loại thuốc: Thuốc kháng virus - chất ức chế RNA polymerase tương tự nucleoside.
- Cơ chế tác dụng: Ức chế RNA polymerase phụ thuộc RNA của virus, giảm sự nhân lên của SARS-CoV-2.
- Chỉ định: Bệnh nhân nhập viện có xác nhận COVID-19 cần bổ sung oxy nhưng không cần thở máy xâm lấn.
- Liều dùng: Liều nạ