<a href="https://colab.research.google.com/github/SUKIHEALTH/SUKIHEALTH/blob/main/Audio_to_structured_medical_summary.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**SECTION: 0 - Install Dependencies**

In [12]:
!pip install -q gradio faster-whisper soundfile openai transformers accelerate bitsandbytes huggingface_hub
!sudo apt install -y ffmpeg


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.


In [13]:
!pip install -q safetensors

**SECTION: 1 - Imports and Constants**

In [14]:
import os
import torch
import numpy as np
import soundfile as sf
import tempfile
import gradio as gr
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
from faster_whisper import WhisperModel
from huggingface_hub import HfApi
import pandas as pd


In [15]:
from google.colab import drive
drive.mount('/content/drive')

GOOGLE_DRIVE_AUDIO_FOLDER = "/content/drive/MyDrive/Recordings2025"


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [16]:
def list_audio_files(drive_folder):
    if not os.path.exists(drive_folder):
        raise ValueError(f"📁 Folder not found: {drive_folder}")
    return sorted([
        f for f in os.listdir(drive_folder)
        if f.lower().endswith(('.mp3', '.wav', '.m4a'))
    ])


**SECTION: 2 - Load Models Globally (Whisper + Falcon)**

In [17]:
# Whisper Model
WHISPER_MODEL_SIZE = "base"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
COMPUTE_TYPE = "float16" if DEVICE == "cuda" else "int8"
whisper_model = WhisperModel(model_size_or_path=WHISPER_MODEL_SIZE, device=DEVICE, compute_type=COMPUTE_TYPE)

# Falcon Model
FALCON_MODEL_NAME = "tiiuae/falcon-rw-1b"
falcon_tokenizer = AutoTokenizer.from_pretrained(FALCON_MODEL_NAME)
falcon_model = AutoModelForCausalLM.from_pretrained(FALCON_MODEL_NAME, device_map="auto")
falcon_generator = pipeline("text-generation", model=falcon_model, tokenizer=falcon_tokenizer)


Device set to use cpu


**SECTION: 3 - Helper for Falcon Summary**

In [18]:
def generate_structured_summary(transcript):
    prompt = f"""You are an AI assistant helping with digital consultation notes generation:
Transcript:
\"\"\"{transcript}\"\"\"

Format:
Brief Summary: ...
Presenting Complaint: ...
History: ...
Assessment / Working Diagnosis: ...
Plan / Management: ...
Safety-Netting & Follow-Up: ..."""

    response = falcon_generator(prompt, max_new_tokens=512, do_sample=False)[0]["generated_text"]

    summary = {
        "Brief Summary": "",
        "Presenting Complaint": "", "History": "",
        "Assessment / Working Diagnosis": "", "Plan / Management": "",
        "Safety-Netting & Follow-Up": ""
    }

    for line in response.split('\n'):
        for key in summary:
            if line.startswith(key):
                summary[key] = line.replace(f"{key}:", "").strip()

    return summary


**SECTION: 4 - Audio Preprocessing (Downsampling + Limits)**

In [19]:
MAX_DURATION_SEC = 4000

def preprocess_audio_input(audio_file):
    # audio_file: (sample_rate, np.ndarray)
    if isinstance(audio_file, tuple):
        sample_rate, audio_data = audio_file
        if audio_data is None or len(audio_data) == 0:
            raise ValueError("📛 Empty Audio Data")
        duration_sec = len(audio_data) / sample_rate
        if duration_sec > MAX_DURATION_SEC:
            raise ValueError(f"🚫 Max allowed duration is {MAX_DURATION_SEC}s. Yours: {duration_sec:.1f}s")

        # Downsample to 16k if needed
        if sample_rate != 16000:
            import librosa
            audio_data = librosa.resample(audio_data.astype(np.float32), orig_sr=sample_rate, target_sr=16000)
            sample_rate = 16000

        tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
        sf.write(tmp.name, audio_data, sample_rate)
        return tmp.name
    else:
        return audio_file if os.path.exists(audio_file) else None

**SECTION: 5 - Full Pipeline**

In [20]:
def process_audio(audio_file):
    try:
        audio_path = preprocess_audio_input(audio_file)
        segments, info = whisper_model.transcribe(audio_path, vad_filter=True)
        transcript = " ".join([s.text for s in segments]).strip()

        lang = info.language
        print(f"🗣️ Detected language: {lang} (p={info.language_probability:.2f})")

        if not transcript or len(transcript.split()) < 5:
            return "⛔ Not enough spoken data", "", "", "", "", "", ""

        print("🧠 Generating structured summary...")
        summary = generate_structured_summary(transcript)

        return transcript, \
               summary["Brief Summary"], summary["Presenting Complaint"], summary["History"], \
               summary["Assessment / Working Diagnosis"], summary["Plan / Management"], summary["Safety-Netting & Follow-Up"]

    except Exception as e:
        return f"⚠️ Error: {e}", "", "", "", "", "", ""


**SECTION: 6 - Gradio UI**

In [None]:
import gradio as gr

with gr.Blocks() as app:
    gr.Markdown("## 🎓 AI Clinical Audio Assistant")
    gr.Markdown("🔊 Upload, record, or select consultation audio → Generate structured summary")

    with gr.Accordion("ℹ️ How to use", open=False):
        gr.Markdown("""
        - 🎤 Use your microphone to record a consultation, or
        - 🎧 Upload an existing .mp3/.wav/.m4a file, or
        - 📁 Choose a file stored in your Google Drive folder (auto-detected)
        -----------------------------------------------------
        Output includes a full transcript and structured summary with:
        - Presenting Complaint
        - History
        - Diagnosis
        - Plan
        - Safety-netting remarks
        """)

    with gr.Row():
        file_choices = list_audio_files(GOOGLE_DRIVE_AUDIO_FOLDER)
        drive_file = gr.Dropdown(label="📁 Select Audio from Google Drive", choices=file_choices)
        mic_or_upload = gr.Audio(label="🎧 Record or Upload Audio", type="numpy")

    generate_btn = gr.Button("🚀 Generate Clinical Summary")

    # Output fields
    transcript = gr.Textbox(label="🗣️ Transcript", lines=3, interactive=False)
    brief = gr.Textbox(label="📝 Brief Summary", lines=2)
    complaint = gr.Textbox(label="❓ Presenting Complaint", lines=2)
    history = gr.Textbox(label="📖 History", lines=3)
    assess = gr.Textbox(label="🩺 Diagnosis", lines=2)
    plan = gr.Textbox(label="🧾 Plan / Management", lines=2)
    safety = gr.Textbox(label="🛟 Safety-Netting & Follow-Up", lines=2)

    # Logic: pick either Drive or mic/upload
    def dispatch_input(uploaded_audio, drive_selection):
        if uploaded_audio is not None:
            return process_audio(uploaded_audio)
        elif drive_selection:
            full_path = os.path.join(GOOGLE_DRIVE_AUDIO_FOLDER, drive_selection)
            return process_audio(full_path)
        else:
            return "❌ Please provide or select an audio input.", "", "", "", "", "", ""

    generate_btn.click(fn=dispatch_input,
                       inputs=[mic_or_upload, drive_file],
                       outputs=[transcript, brief, complaint, history, assess, plan, safety])

app.launch(debug=True)

It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://717e0e7d95c3a2e92e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


**SECTION: 7 - Save Results and Export CSV**

In [None]:
# Optional: Save outputs
output_data = []

def save_result_to_dataset(summary, output_file="/content/synthetic_dataset.csv"):
    output_data.append(summary)
    df = pd.DataFrame(output_data)
    df.to_csv(output_file, index=False)


**SECTION: 8 - Upload Dataset to HF Hub**

In [None]:
from huggingface_hub import HfApi, create_repo, upload_file

def upload_to_huggingface(csv_path, repo_id="your-username/synthetic-consultations"):
    # Assumes you’ve logged in via `notebook_login()`
    api = HfApi()
    create_repo(repo_id, repo_type="dataset", exist_ok=True)
    upload_file(repo_id=repo_id, path_or_fileobj=csv_path, path_in_repo="synthetic_dataset.csv", repo_type="dataset")
    print(f"✅ Uploaded to: https://huggingface.co/datasets/{repo_id}")


**SECTION: 9 - Authenticate HuggingFace (1 Time)**

In [None]:
from huggingface_hub import notebook_login
notebook_login()
