<a href="https://www.kaggle.com/code/boxy32/medical-assistant?scriptVersionId=234701546" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>


# Multilingual Medical Transcription & Report Generator

**Purpose:** This notebook acts as a powerful AI assistant for physicians dealing with patients speaking different languages. It addresses the critical challenge of language barriers in healthcare, particularly prevalent in multilingual regions like South Asia.
**How it Works**: It takes an audio recording of a patient consultation (in languages like English, Hindi, Gujarati, Bangla, Marathi, Tamil, or Telugu), transcribes the speech to text, translates it into English if necessary, and then uses a sophisticated AI (Google's Gemini) to generate a structured, professional medical report. This allows clinicians to quickly understand patient complaints, review potential diagnoses, and access key information regardless of the original language spoken.
Usage: Follow the steps below. You'll need an audio file of a patient consultation and a Google AI API key (ensure you use Kaggle Secrets to store your key securely!).
.

# 📦Step 0: Install Required Libraries
**Purpose:** This first code block sets up the necessary software toolkit for our AI assistant. Before we can process audio, transcribe, translate, or generate reports, we need to install the specific Python libraries that provide these functionalities.
Details:
**pydub & ffmpeg-python:** Essential for handling various audio file formats (.mp3, .m4a, .wav). ffmpeg is the underlying powerhouse for audio processing.
**git+https://github.com/openai/whisper.git:** Installs OpenAI's Whisper model directly from its source. Whisper is the state-of-the-art AI model we use for accurate speech-to-text transcription across multiple languages. The -q flag keeps the installation output tidy.
**translate:** A Python library used to translate the transcribed text into English, creating a common language base for the report generation step.
**google-generativeai:** The official Google library to interact with powerful generative models like Gemini, which will analyze the transcript and create the structured report. The -U flag ensures we have the latest version.
**reportlab:** A library used in the final step to create professional-looking PDF documents from the generated report text.


In [None]:
# Install required libraries 
!pip install -q pydub ffmpeg-python git+https://github.com/openai/whisper.git
!pip install translate
!pip install -U google-generativeai
!pip install reportlab

# 🌐 Step 1: Select the Spoken Language
**Purpose:** To maximize transcription accuracy, we need to tell the Whisper model which language is being spoken in the audio file. This step allows the user (e.g., the physician or assistant) to specify the language.
Details:
**language_options:** A Python dictionary maps easy-to-select numbers (1, 2, 3...) to language names (English, Hindi, Gujarati...) and their corresponding standard language codes (en, hi, gu...) that Whisper understands.
**User Interaction:** The code prints the available language options and prompts the user to enter the number corresponding to the language in the audio file.
**Input Handling:** It captures the user's choice, retrieves the language name and code, and confirms the selection. If invalid input is given, it defaults to English ('en'). Specifying the correct language significantly improves the quality of the transcription in Step 3.


# 📁 Step 2: Load Audio File
**Purpose:** This step locates and selects the specific audio recording of the patient consultation that needs to be processed.
Details:
**import os:** Imports the 'os' module, which provides functions for interacting with the operating system, like listing files in a directory.
**input_dir:** Specifies the directory where Kaggle notebooks typically expect input files (/kaggle/input/). You should upload your audio files (e.g., in a dataset linked to the notebook) so they appear here, perhaps in a subdirectory like sample-audios.
**File Discovery:** The code lists all files within the specified input_dir that have common audio extensions (.mp3, .m4a, .wav).
**Error Handling:** It checks if any audio files were found and raises an error if the directory is empty or contains no valid files.
**User Selection:** It displays the list of found audio files with numbers and prompts the user to select the specific file they want to process for this run.
**File Path Construction:** It creates the full path to the selected audio file, ready for the transcription step.

# 🔊 Step 3: Transcribe Audio using Whisper
**Purpose:** This is the core speech-to-text conversion step. It takes the selected audio file and uses the powerful Whisper AI model to generate a written transcript of the spoken words in the language identified in Step 1.
Details:
**import whisper:** Imports the Whisper library installed in Step 0.
**model** = whisper.load_model("large-v3"): Loads the specific version of the Whisper model. We use large-v3, which is computationally intensive but offers the highest accuracy, especially crucial for handling various accents, dialects, and potentially noisy clinical recordings across multiple languages.
**result** = model.transcribe(...): This is the command that performs the transcription.
**input_file**: The path to the audio file selected in Step 2.
**language**=lang_code: Critically passes the language code (e.g., 'gu' for Gujarati) selected in Step 1 to guide the model.
**task**="transcribe": Explicitly tells Whisper to perform speech-to-text, not speech-to-English translation (which it can also do).
**final_transcription** = result["text"].strip(): Extracts the actual text content from the complex result object returned by Whisper and removes any leading/trailing whitespace. The variable final_transcription now holds the raw text of the consultation in its original language.


# 🌍 Step 4: Translate to English (if necessary)
**Purpose:** To enable the use of a powerful English-based generative AI (Gemini) for report creation in the next step, we need the consultation text in English. This step translates the transcript obtained in Step 3 only if the original language was not English.
**Details:**
**from translate import Translator:** Imports the necessary class from the translation library.
**translate_to_english(text, from_lang_code) function:**
Checks if from_lang_code is already 'en'. If yes, it returns the text untranslated.
If translation is needed, it initializes the Translator specifying the source language (from_lang=from_lang_code) and target language (to_lang="en").
**max_chars** = 500: Breaks the text into smaller chunks (e.g., 500 characters) because many translation APIs have limits on the amount of text per request. This prevents errors with long transcripts.
It iterates through the chunks, translates each one, and handles potential errors during the API call (e.g., network issues, API limits).
Finally, it joins the translated chunks back together with spaces.
**translated_english = translate_to_english(...):** Calls the function with the final_transcription and lang_code to get the English version. If the original was English, translated_english will be identical to final_transcription.


In [None]:
# 📍 Step 1: Ask user to select spoken language (number-based)
language_options = {
    "1": ("English", "en"),
    "2": ("Hindi", "hi"),
    "3": ("Gujarati", "gu"),
    "4": ("Bangla", "bn"),
    "5": ("Marathi", "mr"),
    "6": ("Tamil", "ta"),
    "7": ("Telugu", "te")
}

print("\n🌐 Select the language spoken in the audio:")
for num, (lang, _) in language_options.items():
    print(f"{num}. {lang}")

selected_number = input("\n🔢 Enter the number for your language: ").strip()
selected_language, lang_code = language_options.get(selected_number, ("English", "en"))
print(f"\n✅ You selected: {selected_language} ({lang_code})")

# 📁 Step 2: Use audio files from Kaggle input directory
import os

input_dir = "/kaggle/input/bangla-test-2"
audio_files = [f for f in os.listdir(input_dir) if f.endswith(('.mp3', '.m4a', '.wav'))]

if not audio_files:
    raise FileNotFoundError("❌ No audio files found in the input directory!")

print("\n🎧 Available audio files:")
for idx, file in enumerate(audio_files, 1):
    print(f"{idx}. {file}")

selected_idx = int(input("\n🔢 Select the file number to transcribe: ").strip()) - 1
input_file = os.path.join(input_dir, audio_files[selected_idx])

print(f"\n🎯 Using file: {audio_files[selected_idx]}")

# 🔊 Step 3: Transcribe using Whisper with language code
import whisper

model = whisper.load_model("large-v3")
result = model.transcribe(input_file, language=lang_code, task="transcribe")
final_transcription = result["text"].strip()

print("\n📝 Transcribed Text:\n", final_transcription)

# 🌍 Step 4: Translate if needed
from translate import Translator

def translate_to_english(text, from_lang_code):
    if from_lang_code == "en":
        return text
    translator = Translator(from_lang=from_lang_code, to_lang="en")
    max_chars = 500
    chunks = [text[i:i+max_chars] for i in range(0, len(text), max_chars)]
    translated_chunks = []
    for idx, chunk in enumerate(chunks):
        try:
            translated = translator.translate(chunk)
            translated_chunks.append(translated)
        except Exception as e:
            translated_chunks.append(f"[Error in chunk {idx + 1}: {e}]")
    return ' '.join(translated_chunks)

translated_english = translate_to_english(final_transcription, lang_code)

print("\n🔁 Translated to English:\n", translated_english)


# 🤖 Step 5: Generate Structured Medical Report using Gemini
**Purpose:** This is where the AI Physician Assistant truly shines. Taking the standardized English transcript, this step uses Google's powerful Gemini generative AI model to analyze the text and create a well-structured, professional medical report.
Details:
**import google.generativeai as genai:** Imports the Google AI library.
**genai.configure(api_key="..."): IMPORTANT:** Configures the library with your unique Google AI API key. Never paste your key directly into the notebook. Use Kaggle Secrets (from kaggle_secrets import UserSecretsClient) to securely access your key. The placeholder key AIza... will not work.
generate_medical_report(text) function:
model = genai.GenerativeModel('gemini-1.5-pro'): Specifies the Gemini model to use. gemini-1.5-pro is a highly capable model suitable for complex understanding and generation tasks like this.
**prompt** = f"""...""": This defines the instructions given to the Gemini model. It's carefully crafted ("prompt engineering") to tell the AI:
**Its role:** "You are a medical expert AI assistant."
**The input**: The text variable containing the English transcript.
The desired output: A "well-structured, detailed medical report" with specific sections (Patient Complaints, Diagnosis, Medications, etc.) in paragraph format.
**Constraints:** "Ensure the report is comprehensive and professional."
**response** = model.generate_content(prompt): Sends the prompt and the transcript text to the Gemini API.
**return response.text.strip():** Extracts the generated report text from the AI's response.
**medical_report** = generate_medical_report(translated_english): Calls the function, feeding it the English transcript to generate the report.
The final print statement displays the AI-generated report.


In [None]:
# 🔐 Configure Gemini with your API key
import google.generativeai as genai
genai.configure(api_key="AIzaSyAWUNcaRpMV3g6rKDQBe4ezbJFVHUJ73q8")  # 🔑 Replace with your actual API key

# 📑 Function to generate structured medical report
def generate_medical_report(text):
    model = genai.GenerativeModel('gemini-1.5-pro')
    prompt = f"""
You are a medical expert AI assistant.

Given the following unstructured medical note in English, generate a **well-structured, detailed medical report**. 
It must include the following sections in paragraph format (avoid using any ** or markdown symbols):

1. Patient Complaints  
2. Diagnosis  
3. Medications  
4. Suggested Tests or Follow-ups  
5. Additional Notes or Observations

Medical Note:  
{text}

Ensure the report is comprehensive and professional.
"""
    response = model.generate_content(prompt)
    return response.text.strip()
  
# 🔁 Generate the report using the translated English text
medical_report = generate_medical_report(translated_english)

# 🖨️ Print the result
print("\n📝 Medical Report:\n")
print(medical_report)


# 📄 Step 6: Save the Report as a PDF
**Purpose:** To provide the generated medical report in a standard, easily shareable, and printable format, this step saves the report text into a PDF document.
**Details:**
Imports necessary components from the reportlab library (canvas for drawing, A4 for page size, inch for units) and textwrap for handling long lines of text.
**save_report_to_pdf(report_text, filename) function:**
**cleaned_text = report_text.replace("**", "").strip():** Removes any potential markdown bold formatting that Gemini might have included, ensuring clean text in the PDF.
**c = canvas.Canvas(filename, pagesize=A4):** Creates a new PDF file canvas with the specified filename and standard A4 page size.
Sets up margins (x_margin, y_margin), line height, and the starting Y position for drawing text near the top of the page.
Sets the font to Helvetica, size 12.
Iterates through each paragraph of the cleaned report text.
**wrapped_lines = wrap(paragraph, width=95):** Uses textwrap to automatically break long lines within a paragraph to fit within the defined page width (preventing text from running off the edge).
Draws each wrapped line onto the PDF page using c.drawString(), decrementing the Y position for each line.
**Handles page breaks automatically:** If the Y position gets too close to the bottom margin, it finalizes the current page (c.showPage()) and starts a new one, resetting the Y position to the top margin.
Adds a little extra space between paragraphs for readability.
**c.save()**:Finalizes and saves the PDF file.


In [8]:
from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch
from textwrap import wrap
import os

def save_report_to_pdf(report_text, filename="medical_report.pdf"):
    cleaned_text = report_text.replace("**", "").strip()

    c = canvas.Canvas(filename, pagesize=A4)
    width, height = A4
    x_margin, y_margin = 50, 50
    max_width = width - 2 * x_margin
    line_height = 15
    y_position = height - y_margin

    c.setFont("Helvetica", 12)

    for paragraph in cleaned_text.split("\n"):
        paragraph = paragraph.strip()
        if not paragraph:
            y_position -= line_height
            continue

        # Wrap paragraph text
        wrapped_lines = wrap(paragraph, width=95)

        for line in wrapped_lines:
            if y_position <= y_margin:
                c.showPage()
                c.setFont("Helvetica", 12)
                y_position = height - y_margin
            c.drawString(x_margin, y_position, line)
            y_position -= line_height

        y_position -= line_height  # Extra space between paragraphs

    c.save()

    # ✅ For Kaggle: inform user where the file is saved
    if os.path.exists(filename):
        print(f"✅ PDF saved: {filename}")
        print("📥 Go to the 'Output' tab on the right sidebar to download the file.")
    else:
        print("⚠️ File not found. Something went wrong.")


In [9]:
save_report_to_pdf(medical_report, "medical_report.pdf")


✅ PDF saved: medical_report.pdf
📥 Go to the 'Output' tab on the right sidebar to download the file.
