Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize Display of Events #1

Open
guswelter opened this issue Jan 21, 2021 · 2 comments
Open

Generalize Display of Events #1

guswelter opened this issue Jan 21, 2021 · 2 comments
Labels
enhancement New feature or request priority:high

Comments

@guswelter
Copy link
Contributor

If you modify auviewer/file.py line 291 (link) to 'clinical/medications' instead of 'ehr/medications' and then run the viewer against the sample file (as outlined in the Getting Started with Development documentation), you will see a hard-coded display of this specific events series:

image

The relevant places in code where this is done are:

  • getEvents() function in file.py on line 271
    • This function is called in the getInitialPayload() function in file.py, which is in turn called in the initial_file_payload() http handler function in serve.py (which handles the initial_file_payload API request)
  • Within the File.prepareData() JS function in static/www/js/classes/file.js starting at line 931
  • File.renderEventGraphs() function in static/www/js/classes/file.js on line 1216

You can see where the events data comes through from the initial_file_payload method in the browser console:

image

We need to generalize this behavior to handle any event series. Actually, the js code may already be fully generalized or mostly so. The backend code needs to be changed, however, to identify and assemble all series which can be rendered as events.

The viewer uses the audata library to read from the audata file format (audata is basically a file format superimposed on hdf5). One good place to start here is by familiarizing yourself with using the audata library to open the sample file. From the auviewer (outer) directory, try this Python (I have not checked this, but it should be something like this):

import audata as aud
af = aud.File.open('examples/sample_patient.h5')

# print the default representation of the medications dataset
print(af['/clinical/medications'])

# pull the entire medications series as a pandas dataframe, and then print it
df = af['/clinical/medications']).get()
print(df)

So, generally the proof-of-concept code in the viewer which renders the medications event series treats each row in the medications table as a medication.

My suggestion for now would be to start by adding a way to specify a list of series to treat as events in the template.

Later, we can do fancier things like:

  • Improve the visualization
  • Add different ways to represent/render events
  • Auto-magically try to guess which series are event series even if they haven't been specified
  • Allow the user to toggle between representation types (numeric series vs event series) in the browser
@alexwangtech
Copy link

The clinical/medications vs ehr/medications on line 291 isn't the only hardcoded stuff right? I saw a low_rate being used on line 300, in the line ce = self.f['low_rate'][:]. I might be still a bit confused/unfamiliar with the application, but why does a medication have to be specified? Is there a structural thing with the HDF5 file type that makes it difficult to recoqnize all renderable series?

@guswelter
Copy link
Contributor Author

That's true, 'low_rate' is hard-coded as well.

Is there a structural thing with the HDF5 file type that makes it difficult to recoqnize all renderable series?

Well, all series are "renderable," but the question is how to render them. Let's take the simplest single-column numeric time series:

Time Value
0    21
1    14
2    23
3    17

That could be displayed as a numeric time series (the default dot-plot with time on x-axis and numeric value on y-axis on the viewer currently). But it could also be displayed as an event series, where the value is the label. So, the first event occurs at time 0 and its event label is "21".

That's why I suggest to start by making it so that users can specify which series to render as an event series in the template file. Speaking of which, here is sample template file. Put this in the global_templates folder in the viewer data folder, and it will change how the sample file displays -- and you can start from there to add the ability to specify event series in the template: project_template.zip

However, I think we can and should, by default, display factor and string column types as event series. As a starting point, you can see here where we're currently skipping column types we don't recognize (and when you open the sample file in the viewer you should see lots of "Skipping unsupported factor series" in the console). So instead of skipping those, you can add in the logic to handle them as event series.

Other than that, as I mentioned in the original post, I suspect the js code is already fully "generalized". It's just in the backend that we need to add/generalize the logic of when to send transmit series as event series to the frontend. And this is the only reason it's hard-coded in the backend to pull the medications and low_rate series and send them as event data as a temporary measure.

Oh, one other thing... The simple example I gave above of the single-column dataset is one case. But if you look at the raw data for the medications series, it's a multi-column dataset, something like:

Time   Medication   Dosage   Units
0      Insulin      30 mL    mL
300    Saline       1 L      L

So, what I have actually hard-coded with the medications (even though the code is simple, this is what it's actually doing) is to throw all of these columns as a single event series. In other words, in the sample set, there are two events -- e.g. the first is Insulin at 30 mL with units L at time 0... that's all one event.

By default, the viewer treats each column as an individual time series. You might ask why... well for numeric series, the columns will be individual time series and they're put in a common table like that because the series are sampled uniformly and it's most efficient on disk to encode like that (not repeat time values).

With events, however (including factors, numerics, etc.), the user needs to be able to specify in the template that a single column, multiple columns, or perhaps all columns are a single series.

We're down the rabbit hole now. Have fun with all that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:high
Projects
None yet
Development

No branches or pull requests

2 participants