Use-case stories and requirements for waveforms #7

briangow · 2024-01-31T14:35:14Z

briangow
Jan 31, 2024
Maintainer

Please use this discussion to submit stories of use-cases for waveform data from CHoRUS. These stories will be used to develop a comprehensive list of requirements for the CHoRUS waveform format / specification. Please see an initial set of stories with associated requirements below.

When submitting a new story please fill out the "As a:", "I need to:" , and "So that:" fields. If you also know of associated requirements for the waveforms, based on your story, please provide them. If your requirement overlaps with a requirement term already posted to the table, please use the phrasing from the one that is already posted (i.e. use "Signal availability by time" instead of "Know signal types at a given time"). If you aren't sure about the waveform requirements for your story, feel free to leave this field blank and we will fill it in. Please add a new post below and I will update the table.

ID	Story	Waveform requirements
S1	As a: researcher designing a study, I need: to identify all patients within CHoRUS that meet the following criteria: 1) Have at least ECG, ABP, and ICP waveforms, 2) ICP measured from brain ventricles, 3) Having at least XX hours of ICP where pulsatile waveform exists, So that: I can determine if there are enough matching subjects in CHoRUS to power the study I want to conduct	Metadata storage - connections to EHR device info, Multi-channel support, Signal availability by time, Standardized naming conventions, Standardized units, Cached/precomputed statistics
S2	As a: physician researching sepsis, I need: to select a patient who went into septic shock - identify an instance of vasopressor administration - visualize waveforms within XX minutes of administration, So that: I can train residents on the relation between monitor waveforms and septic shock	Metadata storage, Random access, Efficient rendering
S3	As a: physicist developing computational algorithms on physiological waveforms, I need: to search for a cohort of patients who each have at least XX number of PVC beats, So that: I can compute heart rate turbulence	Metadata storage, Numerics support, Cached/precomputed statistics
S4	As a: a Medical Engineering and Medical Physics student, I need: a plugin/toolbox which allows me to compute sample entropy on HR time series, So that: I can create a presentation on sample entropy analysis of heart rates	Numerics support
S5	As a: a sleep researcher, I need: analyze EEG and ICP waveforms, So that: I can determine if EEG + ICP waveforms may predict disease X	Variable length support (up to months), Multi-channel support
S6	As a: a medical device researcher, I need: monitor waveform data at the original sample rate with the resolution specified, So that: I can compare monitors roles in disease algorithm prediciton models	Original signal preservation, Metadata support(w/resolution)
S7	As a: the admin creating budgets for CHoRUS, I need: waveform data to take up as little storage space as possible, So that: we can afford to pay for cloud storage and provide waveform data in CHoRUS to the community	Efficient compression, Efficient gap storage
S8	As a: a tool developer for CHoRUS, I need: efficient decompression of waveforms, So that: I can provide a performant waveform viewer which can quickly jump to waveform locations	Efficient decompression
S9	As a: a developer for the cohort/patient data visualization CHoRUS app IntegratedViewer , I need to be able make RESTFul API call to get given chunk of the continues waveform file to build waveform panning/zooming/streaming features, So that: researchers/investigators can do physiological time series analysis and tagging clinical events	Random access, File needs to be imputed+uncompressed, OR chunked(short)+compressed+non-imputed Note: do not care for time value, if there is startTime and frequency for imputed files
S10	As a: a developer for the multi-user ML labels building CHoRUS app CADA, I need to build various types of interactive waveform tagging features, So that: users can build their ML labels for projects like Signal Quality Index, Signal classification, ARDS dashboard, Chart review, NLP NER and many more	Random access, File needs to be imputed+uncompressed, OR chunked(short)+compressed+non-imputed Optional sort channels Select channels/signals Data can be preprocessed Up/down sampling ability
S11	As a: tools developer, I need: to rapidly access blocks of the signals, corresponding to segment/time periods, sequentially within blocks, and in parallel across blocks, and concurrently access multiple channels, with minimal overhead for decompression if present, So that: I can minimize random IO and IO/Bandwidth while processing part or whole waveforms	Random access of data blocks based on timestamp, Rapid block location of multiple channels, Sufficient metadata in each block, Simple compression that are independent between blocks
S12	As a: casual data user, I need: to be able to parse the data with minimal documentation and tool stack, So that: I can read and process waveforms with minimal software dependencies	Self-describing and self-contained metadata schema
S13	As a: clinical/signal processing engineer, I need: identify episodes of acute cardiac ischemia, infarction, arrhythmia, and cardiac arrest events, So that: I can associate treatments and identify degradation in patient condition during the ICU stay	Ability to locate windows of multi-model waveform measurements (Q, ST, T-wave, ABP, Resp, O2) around date/time of treatment events, Ability to identify cardiac events (onset of acute ischemia, ST elevation events, tachycardia, etc) then query treatment events before and after cardiac event onset/offset, Ability to identify study cohorts via OMOP, then identify cardiac events (or lack of) in disease and control groups, Ability to re-analyze or apply predictive analytics models on waveform data before and after event date/times
S14	As a: ML researcher, I need: an interface for implementing interpolation, So that: I can easily build pipelines from source data that is captured at site-specific frequencies while maintaining control over the interpolation mechanism	Original signal preservation, software support for interpolation (optional)
S15	As a: data engineer, I need: the ability to ingest data with multiple sample rates, So that: I can store data with different input sample rates into a single patient record	Multi-frequency support
S16	As a: Neurologist studying autonomic nervous system dysfunction, I need: be able to update QRS annotations in the CHoRUS data because I noticed that the pre-calculated RR intervals for some of the patients in my cohort of CHoRUS patients were based on incorrect QRS locations. I have found the associated ECG waveform data and corrected the QRS locations with a different algorithm. I manually reviewed the updated QRS locations, So that: I can share the corrected QRS annotations and therefore the CHoRUS community can benefit from corrected R-R intervals	TBD
S17	As a: Researcher of sepsis, I need: 1) be able to engage with the CHoRUS community; 2) test and benchmark sepsis prediction models; 3) assign a resource ID to a cohort of CHoRUS patients that I selected using the CHoRUS/OHDSI computational phenotyping in the Integrated Viewer / IVe app. I finalized the cohort by performing a chart review using the CHoRUS CADA app. The patients in this cohort include those from data recently added to CHoRUS, So that: we can build a system in CHoRUS for continuously benchmarking models on specified cohorts	TBD
S18	As a: Site engineer for CHoRUS, I need: be able to convert from the native format of my waveforms to the CHoRUS waveform format, So that: I can provide waveforms in the desired format.	Conversion tools
S19	As a: physiologist studying changes in cardiac contractility, I need: to identify patients with a radial arterial pressure waveform recorded with a bandwidth of at least 40 Hz, So that: I can determine the systolic upslope with adequate precision	Device filter setting
S20	As a: research engineer studying the robustness of my AI/ML algorithms, I need: to identify sets of patients that used different monitoring equipment, So that: I can determine if my algorithm performance is affected by the monitoring devices that were used	Make/model of sensor and patient monitor
S21	As a: research engineer developing hemodynamic measurement algorithms from photoplethysmographic waveforms, I need: to identify waveforms recorded at different anatomical locations, So that: I can evaluate if the algorithms performance varies at the different locations	Sensor type, Anatomical placement, Make/model of sensor and patient monitor
S22	As a: central quality engineer, I have a suite of heuristics that I rely on for a quick check on the quality/completeness of uploaded files, I need: check for things such as: 1) monitoring time with at least one ECG lead on should be the longest among all available waveform channels; 2) average heart rate calculated for a given file should be within a physiologically sound range; 3) I expect ECG, impedance, and SPO2 channels to be always available, So that: I can alert the submitter of the uploaded files of issues in a timely manner if the quality/completeness checks fail.	TBD
S23	As a: CHoRUS central data quality engineer, I need: be able to easily identify and access any potential protected health information (PHI) in waveform files even if they have already been converted to the final CHoRUS waveform format, So that: we can identify sources of PHI prior to releasing the data, and work with local sites to further de-identify the waveforms as needed	TBD

tcpan · 2024-02-01T20:59:30Z

tcpan
Feb 1, 2024
Collaborator

ID	User Story	Requirements
S11	As a: tools developer, I need: to rapidly access blocks of the signals, corresponding to segment/time periods, sequentially within blocks, and in parallel across blocks, and concurrently access multiple channels, with minimal overhead for decompression if present So that: I can minimize random IO and IO/Bandwidth while processing part or whole waveforms	Random access of data blocks based on timestamp, rapid block location of multiple channels, sufficient metadata in each block, simple compression that are independent between blocks.
S12	As a: casual data user, I need: to be able to parse the data with minimal documentation and tool stack So that: I can read and process waveforms with minimal software dependencies	Self-describing, and self-contained metadata schema.

0 replies

wa6gz · 2024-02-05T18:26:33Z

wa6gz
Feb 5, 2024
Collaborator

Cases presented at Thursday's meeting

ID	User Story	Requirements
S14	As a: ML researcher, I need: an interface for implementing interpolation So that: I can easily build pipelines from source data that is captured at site-specific frequencies while maintaining control over the interpolation mechanism	Original signal preservation, software support for interpolation (optional)
S15	As a: data engineer, I need: the ability to ingest data with multiple sample rates So that: I can store data with different input sample rates into a single patient record	Multi-frequency support

0 replies

briangow · 2024-02-08T14:01:38Z

briangow
Feb 8, 2024
Maintainer Author

@tompollard, @bemoody, and I have organized and updated the waveform requirements for each use-case story. We tried to limit the requirements to only things that may affect the waveform format / specification. @hulab-ucsf, @del42, @manlik-brownsrdr, @tcpan, @wa6gz please let me know if we missed anything or if you have any suggested edits.

ID	Story	Waveform requirements
S1	As a: researcher designing a study, I need: to identify all patients within CHoRUS that meet the following criteria: 1) Have at least ECG, ABP, and ICP waveforms, 2) ICP measured from brain ventricles, 3) Having at least XX hours of ICP where pulsatile waveform exists, So that: I can determine if there are enough matching subjects in CHoRUS to power the study I want to conduct	Metadata: detailed procedure information included/linked in format, Metadata: signal name, Metadata: signal units, Metadata: standardized signal names, Metadata: standardized signal units, Metadata: Index which provides information about availability of signals by time, Signals: Multi-channel support
S2	As a: physician researching sepsis, I need: to select a patient who went into septic shock - identify an instance of vasopressor administration - visualize waveforms within XX minutes of administration, So that: I can train residents on the relation between monitor waveforms and septic shock	Metadata: able to access detailed metadata outside of the waveform record, Signals: random access, Efficiency: fast read times, Integration: Link times from EHR to waveform times
S3	As a: physicist developing computational algorithms on physiological waveforms, I need: to search for a cohort of patients who each have at least XX number of PVC beats, So that: I can compute heart rate turbulence	Annotations: support storing/update of derived statistics (at signal or record level), Annotations: point/beat annotations, Annotations: review/correct, Annotations: method for supporting annotations
S4	As a: a Medical Engineering and Medical Physics student, I need: a plugin/toolbox which allows me to compute sample entropy on HR time series, So that: I can create a presentation on sample entropy analysis of heart rates	Signals: irregular sample interval support (e.g. numerics), Signals: support for low time resolution (e.g. numerics), Documentation: guidance for using the format, Community: support and validation of the format, Compatibility: with existing toolboxes, Usability: easy to use and understand
S5	As a: a sleep researcher, I need: analyze EEG and ICP waveforms, So that: I can determine if EEG + ICP waveforms may predict disease X	Duplicate of S1 except: Signals: variable length support (up to months)
S6	As a: a medical device researcher, I need: monitor waveform data at the original sample rate with the resolution specified, So that: I can compare monitors roles in disease algorithm prediciton models	Signals: original signal preservation, Metadata: ability to record resolution
S7	As a: the admin creating budgets for CHoRUS, I need: waveform data to take up as little storage space as possible, So that: we can afford to pay for cloud storage and provide waveform data in CHoRUS to the community	Efficiency: compression over extended time periods with gaps
S8	As a: a tool developer for CHoRUS, I need: efficient decompression of waveforms, So that: I can provide a performant waveform viewer which can quickly jump to waveform locations	Efficiency: fast decompression, Efficiency: fast read times, Signals: random access
S9	As a: a developer for the cohort/patient data visualization CHoRUS app IntegratedViewer , I need to be able make RESTFul API call to get given chunk of the continues waveform file to build waveform panning/zooming/streaming features, So that: researchers/investigators can do physiological time series analysis and tagging clinical events	Signals: random access, Efficiency: file needs to be imputed+uncompressed, OR chunked(short)+compressed+non-imputed. Note: do not care for time value, if there is startTime and frequency for imputed files.
S10	As a: a developer for the multi-user ML labels building CHoRUS app CADA, I need to build various types of interactive waveform tagging features, So that: users can build their ML labels for projects like Signal Quality Index, Signal classification, ARDS dashboard, Chart review, NLP NER and many more	Signals: random access, Efficiency: File needs to be imputed+uncompressed, OR chunked(short)+compressed+non-imputed, Efficency: load selected channels without loading entire waveform record, Metadata: channel information including shape, Pre-processing: method for retrieving up/down sampled data
S11	As a: tools developer, I need: to rapidly access blocks of the signals, corresponding to segment/time periods, sequentially within blocks, and in parallel across blocks, and concurrently access multiple channels, with minimal overhead for decompression if present, So that: I can minimize random IO and IO/Bandwidth while processing part or whole waveforms	Signals: random access (of data blocks based on timestamp), Efficency: rapid block location of multiple channels, Metadata: information about compressed blocks (channels, time)
S12	As a: casual data user, I need: to be able to parse the data with minimal documentation and tool stack, So that: I can read and process waveforms with minimal software dependencies	Documentation: self-describing and self-contained metadata schema Community: strong support for format
S13	As a: clinical/signal processing engineer, I need: identify episodes of acute cardiac ischemia, infarction, arrhythmia, and cardiac arrest events, So that: I can associate treatments and identify degradation in patient condition during the ICU stay	Annotations: support storage of derived statistics, Integration: link times from EHR to waveform times, Signals: random access
S14	As a: ML researcher, I need: an interface for implementing interpolation, So that: I can easily build pipelines from source data that is captured at site-specific frequencies while maintaining control over the interpolation mechanism	Signals: original signal preservation
S15	As a: data engineer, I need: the ability to ingest data with multiple sample rates, So that: I can store data with different input sample rates into a single patient record	Signals: independent frequency support by channel
S16	As a: Neurologist studying autonomic nervous system dysfunction, I need: be able to update QRS annotations in the CHoRUS data because I noticed that the pre-calculated RR intervals for some of the patients in my cohort of CHoRUS patients were based on incorrect QRS locations. I have found the associated ECG waveform data and corrected the QRS locations with a different algorithm. I manually reviewed the updated QRS locations, So that: I can share the corrected QRS annotations and therefore the CHoRUS community can benefit from corrected R-R intervals	Annotations: support storing/update of derived statistics (at signal or record level), Annotations: support storing/update of human generated annotations, Annotations: version control (by annotator, etc) and selection of annotation sets
S17	As a: Researcher of sepsis, I need: 1) be able to engage with the CHoRUS community; 2) test and benchmark sepsis prediction models; 3) assign a resource ID to a cohort of CHoRUS patients that I selected using the CHoRUS/OHDSI computational phenotyping in the Integrated Viewer / IVe app. I finalized the cohort by performing a chart review using the CHoRUS CADA app. The patients in this cohort include those from data recently added to CHoRUS, So that: we can build a system in CHoRUS for continuously benchmarking models on specified cohorts	Community: strong support for format
S18	As a: Site engineer for CHoRUS, I need: be able to convert from the native format of my waveforms to the CHoRUS waveform format, So that: I can provide waveforms in the desired format.	Compatibility: with existing tools for conversion, Documentation: well defined specification

0 replies

briangow · 2024-02-19T15:36:22Z

briangow
Feb 19, 2024
Maintainer Author

The requirements above were refined, categorized, and discussed at meetings to determine which of the requirements were considered mandatory. A summary of those decisions is posted below.

CHoRUS requirements for a waveform format

This document outlines CHoRUS requirements for a waveform format, based on the set of user stories identified previously. Note that these requirements relate to the format for distributing the data (i.e. made available to researchers via the CHoRUS platform), rather than intermediate formats used for acquisition, transfer, etc.

Signals

There are several core needs that relate directly to the storage of signals. A file format meeting the requirements below should be able preserve all key information in a source file. The format should support:

ID	Name	User stories	Mandatory	Notes
RS1	multiple channels	S1	Yes
RS2	frequencies that differ by channel	S15	Yes
RS3	low time resolution signals (e.g. numerics)*	S4	Yes
RS4	irregular sample interval support (e.g. numerics)	S4	Yes	This is required for alarms, threshold changes, etc.
RS5	random access (including of compressed data blocks)	S2, S8, S9, S10, S11, S13	Yes
RS6	extended periods of time (up to months)	S5	Yes	up to 6 months
RS7	derived signals (e.g. upsampled, downsampled)	S10, S22	Yes
RS8	approaches for compensating for sample rate drift	S2, S13	Yes	for alignment between signal and EHR times. store time at beginning and end?

*time intervals > 15 minutes = low resolution - > suitable for a relational database. time intervals < 15 minutes but longer than waveform = medium resolution.

Metadata

By “metadata”, we mean any information that does not form part of the core signal data. The format should enable storage of (or a link to):

ID	Name	User stories	Mandatory	Notes
RM1	time index	S1	Yes
RM2	signal names	S1	Yes
RM3	signal units	S1	Yes
RM4	standardized signal names	S1	Yes
RM5	standardized signal units	S1	Yes
RM6	index for availability of signals by time	S1, S22	Optional	needs to be stored in a file or preferably the database
RM7	signal information (e.g. length, gain, resolution, offset)	S6, S11	Yes
RM8	electronic health record data such as procedures and medications	S1, S2	No	needs to contain appropriate IDs, etc. for linking to EHR.
RM9	information about compressed signal blocks (channels, time)	S11	Yes	preferably stored in waveform record
RM10	provenance for derived signal	N/A	Yes	preferably stored in waveform record, should include algorithm info (w/ ref), parameter settings, device info.
RM11	monitor information (e.g. model info, filter info)	S19, S20, S21	Yes
RM12	sensor information (e.g. model info, anatomical placement)	S20, S21	Yes

Annotations

Annotations are a type of metadata, but they are distinct from metadata because they may be contributed by many independent users and may not be considered as “ground truth”. The format (or a linked annotation format) should support:

ID	Name	User stories	Mandatory	Notes
RA1	point annotations	S3	Yes	(i.e. annotations that describe a single point in time, such as a beat).
RA2	segment annotations	S3	Yes	(i.e. a segment containing an event of interest)
RA3	signal-level annotations (e.g. RR interval)	S3, S16	Yes
RA4	record-level annotations (e.g. presence of arrhythmia)	S13	Yes
RA5	linkable to a signal file	S3	Yes
RA6	is able to record version number	S16	Yes
RA7	is able to record annotator	S16	Yes
RA8	records provenance of annotations (machine, human)	S16	Optional	including the algorithm version. Likely will use a tool that stores this in a separate file.
RA9	many to one relationship between annotation file(s) to signal file	S3, S16	Yes

Efficiency

The format should be both efficient in terms of storage and data-loading, primarily to (a) minimize storage and transfer costs and (b) enable rendering of waveforms using browser-based tools. The format should support:

ID	Name	User stories	Mandatory	Notes
RE1	signal compression	S2, S7, S8, S9, S10, S11	Yes	including the handling of extended time periods with gaps
RE2	read segments of a compressed chunked record without decompressing entire record	S10	Yes
RE3	fast seek of compressed channels	S11	Yes
RE4	fast read times	S2, S8, S9, S10, S11	Yes

Usability

Importantly, the format should be easy for the AI research community to use and understand. It should have:

ID	Name	User stories	Mandatory	Notes
RU1	well-defined specification	S18	Yes
RU2	human readable metadata	S4, S12, S23	Yes	should be directly human readable or through a tool. Optional fields in the format should indicate if they are present or absent.
RU3	clear documentation	S4	Yes
RU4	guidance for researchers	S4	Yes	e.g. guidance on how to load and analyze data
RU5	compatible with tools for importing/exporting to common formats	S4, S18, S23	Optional
RU6	existing community for support	S4, S12, S17	Optional	check with skills for workforce development

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use-case stories and requirements for waveforms #7

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Use-case stories and requirements for waveforms #7

briangow Jan 31, 2024 Maintainer

Replies: 4 comments

tcpan Feb 1, 2024 Collaborator

wa6gz Feb 5, 2024 Collaborator

briangow Feb 8, 2024 Maintainer Author

briangow Feb 19, 2024 Maintainer Author

CHoRUS requirements for a waveform format

Signals

Metadata

Annotations

Efficiency

Usability

briangow
Jan 31, 2024
Maintainer

tcpan
Feb 1, 2024
Collaborator

wa6gz
Feb 5, 2024
Collaborator

briangow
Feb 8, 2024
Maintainer Author

briangow
Feb 19, 2024
Maintainer Author