# Emergency Stroke Response Pipeline Demo

This Jupyter notebook demonstrates an end‑to‑end pipeline for processing an emergency call and patient data in the context of suspected stroke.  The pipeline includes:

1. **Audio transcription (ASR)** – converting spoken words from a call recording into text using an automatic speech recognition model (e.g. OpenAI Whisper or Mozilla DeepSpeech).
2. **Clinical summarisation** – generating a concise summary of the call transcript and other clinical notes using a biomedical language model (e.g. BioBART, ClinicalT5).
3. **Stroke detection** – applying machine‑learning models to speech or vital signs to estimate the likelihood of a stroke.  For demonstration we include code stubs for dysarthria (slurred speech) detection and vitals‑based classification.
4. **Timeline generation** – combining timestamped events (call, vitals, interventions) into a chronological timeline.

> **Note:** This notebook provides runnable code for each stage using open‑source libraries.  However, the heavy models (ASR and LLM summarisation) require installation of external packages and may need GPU acceleration to run efficiently.  You can uncomment the installation commands to set up the environment when running this notebook.  Sample data is provided or generated synthetically where appropriate; you may replace these with your own recordings, clinical notes and vitals.


## 1. Setup

This section installs and imports the required libraries.  To keep the notebook light, most installation commands are commented out; remove the leading `#` to install packages in your own environment (internet access is required).

- `openai‑whisper` for speech‑to‑text transcription (PyTorch).
- `deepspeech` for an alternative TensorFlow‑based ASR engine.
- `transformers` and `sentencepiece` for biomedical language models.
- `torchaudio`, `librosa` and `scikit‑learn` for signal processing and machine learning.
- `pandas`, `numpy` and `matplotlib` for data manipulation and visualisation.


In [1]:
# !pip install -q openai-whisper deepspeech transformers sentencepiece torchaudio librosa scikit-learn matplotlib pandas numpy

import os
import numpy as np
import pandas as pd

# Uncomment and run this cell after installing the above packages to import heavy libraries
import whisper
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
# from deepspeech import Model as DeepSpeechModel
import librosa
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier


## 2. Stage 1 – Audio Transcription

In this stage we convert an emergency call recording into text using an automatic speech recognition (ASR) model.  Two popular open‑source options are:

- **OpenAI Whisper** – a state‑of‑the‑art ASR model trained on a large multilingual corpus.  It is easy to use via the `openai‑whisper` Python package.  The `base`, `small`, `medium` or `large` model size can be selected depending on resource availability.
- **Mozilla DeepSpeech / Coqui STT** – a TensorFlow‑based speech‑to‑text engine that can run offline.  Pre‑trained models and a language model (`.scorer` file) are required.

Below is example code for both methods.  Replace `sample_audio.wav` with the path to your own WAV file.  Whisper automatically handles resampling and language detection; DeepSpeech expects a 16 kHz mono WAV file.


In [2]:
#--- Whisper ASR ---
audio_path = 'call.wav'  # Path to your call recording (WAV/MP3/OGG)
whisper_model = whisper.load_model('base')  # Options: tiny, base, small, medium, large
result = whisper_model.transcribe(audio_path)
transcript = result['text']
print('Whisper transcript:')
print(transcript)


Whisper transcript:
 What do I think? I think I need an ambulance. I've got a teenager who's turning purple in his face. I think he's soaked out or something.


In [3]:
# --- DeepSpeech ASR ---
# Ensure you have downloaded the .pbmm model file and .scorer language model file
# Example download commands (run in a terminal):
#   wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
#   wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer

# Uncomment to load and run DeepSpeech
# ds_model_path = 'deepspeech-0.9.3-models.pbmm'
# ds_scorer_path = 'deepspeech-0.9.3-models.scorer'
# ds = DeepSpeechModel(ds_model_path)
# ds.enableExternalScorer(ds_scorer_path)
#
# # Load 16 kHz WAV file (use scipy or wave module)
# import wave
# with wave.open('sample_audio.wav', 'rb') as fin:
#     frames = fin.getnframes()
#     buffer = fin.readframes(frames)
#     sample_rate = fin.getframerate()
#
# assert sample_rate == 16000, 'DeepSpeech requires a 16 kHz sample rate'
# transcript = ds.stt(buffer)
# print('DeepSpeech transcript:')
# print(transcript)


## 3. Stage 2 – Clinical Summarisation

Once we have the raw transcript of the call, we summarise it into a concise clinical description using a biomedical language model.  Open‑source transformer models pre‑trained on clinical text include:

- **BioBART** (`GanjinZero/biobart-large`) – a BART‑based model fine‑tuned on radiology reports and other biomedical corpora.
- **ClinicalT5** – a T5 model adapted to clinical notes.  Some checkpoints require PhysioNet credentials.

The following example uses HuggingFace Transformers to load a BioBART model and create a summarisation pipeline.  You can substitute another model name from the HuggingFace Hub if desired.


In [4]:
# --- Clinical summarisation ---\n
model_name = 'GanjinZero/biobart-large'  

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
summariser = pipeline('summarization', model=model, tokenizer=tokenizer)
#\n# # Example transcript (replace with the output of the ASR stage)
sample_transcript = (transcript)
summary = summariser(sample_transcript, max_length=10, min_length=5)

print('Summary:')
print(summary[0]['summary_text'])

Device set to use cuda:0
Both `max_new_tokens` (=256) and `max_length`(=10) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Summary:
What do I think? I think I need an ambulance. I've got a teenager who's turning purple in his face. I think he's soaked out or something.


## 4. Stage 3 – Stroke Detection

In this stage we estimate the likelihood of a stroke based on either speech characteristics or physiological vital signs.  Two example approaches are illustrated:

1. **Speech‑based dysarthria detection:** Slurred or impaired speech is a hallmark of stroke.  You can extract acoustic features (e.g. Mel‑frequency cepstral coefficients) from the audio and train a classifier (e.g. SVM, Random Forest) on a dysarthric speech dataset such as TORGO or UASpeech.
2. **Vitals‑based classification:** Using vital signs like blood pressure, heart rate and oxygen saturation, train a classifier (e.g. Random Forest) to predict whether a patient has had a stroke.  Suitable training data could come from open ICU datasets (e.g. MIMIC) where stroke diagnosis labels are available.

Below are illustrative code snippets showing how to structure these models.  You will need to provide training data and may wish to fine‑tune hyperparameters.


In [11]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("asjad99/mimiciii")

print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/asjad99/mimiciii?dataset_version_number=1...


100%|██████████| 10.6M/10.6M [00:00<00:00, 42.4MB/s]

Extracting files...





Path to dataset files: /home/sraja/.cache/kagglehub/datasets/asjad99/mimiciii/versions/1


In [5]:
# --- Speech-based dysarthria detection ---
# This is a stub demonstrating how you might implement dysarthria (slurred speech) detection
# using librosa for feature extraction and scikit-learn for classification.

import numpy as np
import librosa
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

def extract_features(audio_path, n_mfcc=13):
    """Extract MFCC feature vector from an audio file."""
    y, sr = librosa.load(audio_path, sr=16000)
    # Compute MFCCs
    mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc)
    # Take mean over time axis
    return np.mean(mfccs, axis=1)

# # Load training data (X_train: list of MFCC feature vectors, y_train: labels 1=dysarthric, 0=healthy)
# X_train = ...
# y_train = ...

# # Train classifier
# clf = SVC(kernel='rbf', probability=True)
# clf.fit(X_train, y_train)

# # Predict stroke likelihood on new audio
# X_test = extract_features('call.wav').reshape(1, -1)
# prob = clf.predict_proba(X_test)[0, 1]
# print(f'Speech-based stroke probability: {prob:.2%}')


In [6]:
# --- Vitals-based stroke classification ---
# This demonstrates training a Random Forest on vital sign features to predict stroke.
# Replace this stub with your own implementation and dataset.

# import pandas as pd
# from sklearn.ensemble import RandomForestClassifier
# from sklearn.metrics import accuracy_score

# # Example training data: rows correspond to patients, columns to features like blood pressure (bp),
# # heart rate (hr), oxygen saturation (spo2), etc. 'label' indicates whether the patient had a stroke.
# # training_data = pd.DataFrame({
# #     'bp': [...],
# #     'hr': [...],
# #     'spo2': [...],
# #     'label': [...]
# # })

# X_train = training_data[['bp', 'hr', 'spo2']]
# y_train = training_data['label']

# # Train classifier
# rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
# rf_clf.fit(X_train, y_train)

# # Predict stroke on new patient vitals
# new_vitals = pd.DataFrame({'bp': [170], 'hr': [95], 'spo2': [92]})
# prob = rf_clf.predict_proba(new_vitals)[0, 1]
# print(f'Vitals-based stroke probability: {prob:.2%}')


## 5. Stage 4 – Timeline Generation

The final stage stitches together timestamped events from the call transcript, vital signs and interventions into an ordered timeline.  This helps clinicians visualise the patient’s journey from the first call to treatment.  We will use `pandas` to merge and sort events, then display a textual timeline.  You could also generate a Gantt chart using `matplotlib`.


In [7]:
# --- Timeline generation example ---
# Create sample event tables for call, vitals and interventions and merge them into an ordered timeline.

import pandas as pd

# Example call events
call_events = pd.DataFrame([
    {'time': '2025-08-13T12:30:45', 'event': '911 call received'},
    {'time': '2025-08-13T12:35:10', 'event': 'Ambulance dispatched'},
    {'time': '2025-08-13T12:50:00', 'event': 'Ambulance arrives'},
] )

# Example vital signs measurements
vital_events = pd.DataFrame([
    {'time': '2025-08-13T12:50:30', 'event': 'BP 180/95, HR 100, SPO2 93'},
    {'time': '2025-08-13T12:52:00', 'event': 'BP 185/100, HR 98, SPO2 90'},
] )

# Example interventions
intervention_events = pd.DataFrame([
    {'time': '2025-08-13T12:55:00', 'event': 'FAST exam positive'},
    {'time': '2025-08-13T13:10:00', 'event': 'Arrived at ER'},
] )

# Combine all events
events = pd.concat([call_events, vital_events, intervention_events])
events['time'] = pd.to_datetime(events['time'])
events = events.sort_values('time')

# Display timeline
for _, row in events.iterrows():
    print(f"{row['time'].strftime('%Y-%m-%d %H:%M:%S')} - {row['event']}")


2025-08-13 12:30:45 - 911 call received
2025-08-13 12:35:10 - Ambulance dispatched
2025-08-13 12:50:00 - Ambulance arrives
2025-08-13 12:50:30 - BP 180/95, HR 100, SPO2 93
2025-08-13 12:52:00 - BP 185/100, HR 98, SPO2 90
2025-08-13 12:55:00 - FAST exam positive
2025-08-13 13:10:00 - Arrived at ER
