-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I retrieve the visit information efficiently in the MEDS? #23
Comments
Thanks for bringing this up. Better handling of visits is important and this could / should be improved. I think you have the right idea in terms of "Meds Extension". Basically, you want to define a second format "MEDS Visit" that is more organized around visit calculation, with transformations to that format from MEDS. I would recommend doing this transformation on the fly for models. So users pass in a MEDS dataset and your code would convert it to MEDS Visit as needed. Note that we already have something similar called MEDS Flat, that makes writing ETLs easier (see https://github.com/Medical-Event-Data-Standard/meds_etl?tab=readme-ov-file#meds-flat). For a first pass, I would call this format CEHRTPatient or something, with a name specific to your package. Once you get it working we could add it to this schema repository with the name MEDS Visit if you think it will be generally useful to other people. For writing the transformation from MEDS to MEDS Visit, I would recommend using the visit_id tags that are retained in https://github.com/Medical-Event-Data-Standard/meds_etl/blob/main/src/meds_etl/omop.py#L277 and https://github.com/Medical-Event-Data-Standard/meds_etl/blob/main/src/meds_etl/mimic/__init__.py#L74 I'm not sure if those ETLs have everything you need to construct visit groupings. If you run into issues, feel free to make a Pull Request for the ETLs to add the information you need. Now to answer your questions more explicitly
Yeah, this is invalid as measurements within an events (per the schema) must always have the same time.
I don't think so. If anything, we will probably make the core MEDS schema simpler (removing datetime_value, the double nesting of Event, etc). The idea here is that we want a minimal schema with users providing additional structure (like MEDS Visit or MEDS Flat) as needed. |
@ChaoPang One idea that is also worth thinking about: It might make your life easier if we add a metadata field "is_visit_record" to mark which records are supposed to indicate the start/end of each visit. That way you would know that "MIMIC_IV_Admission/Inpatient" is a visit record and that "Visit/IP" (from OMOP) is a visit record. That type of metadata field would make it easier for your code to support both OMOP and MIMIC (and potentially other datasets). |
@EthanSteinberg thanks a lot for the recommendation, that makes a lot of sense! I started looking at the
To identify visits, I iterated through all measurements with Below is the logic I used to convert the
|
That looks like a good first pass! Responding to various of your questions / comments inline:
Feel free to submit a pull request to add that metadata in. Adding additional metadata should generally not cause any problems.
It would mainly be useful if you want to support MIMIC as well. So you could have the same code path for both MIMIC and OMOP.
I think this is a good first pass, but I have some comments / suggestion. One problem is that code might run into issues with weird OMOP datasets. For instance, events[0] is not always the birth event, you sometimes have events before birth depending on the dataset.
I would generally recommend doing an explicit prefix only query. If code.startswith('Gender/') will only match gender codes. Etc, etc. Your current code might be too inclusive.
I don't see how this code could possibly work if you have an event with measurements from the different visits?
I would also recommend sorting the events after you are done
|
@ChaoPang I know I'm coming into this late, but it would be helpful to know a bit more about why you need to know if something is a visit / whether or not a measurement is within a visit, etc. Basically, I'm wondering if you could emulate something that we do in ESGPT that would fit into MEDS very naturally, which is we just explicitly add in events for "VISIT_START" and "VISIT_END". In particular, if you have a record for a patient with an admission at timestamp
What this encoding misses is the possibility that you have overlapping visits, or that you need to know in a more direct fashion that the events between the Would any of those strategies work? |
@mmcdermott thanks for sharing your thoughts on this, it's very much appreciated! This is very close to what I am doing in the CEHR-BERT/GPT conceptually, where the
I am taking the MEDS as the input and converting it to a CEHR-BERT extension so that it can be ingested by the model. The reason that I wanted to introduce the following entities is for the ease of data processing.
@EthanSteinberg However, I just realized that I don't actually need to use
|
@ChaoPang Yeah, getting rid of Event is the right call IMHO. We are extremely likely to remove Event from MEDS as well soon. |
Background
I am trying to figure out how to retrieve
visit_start_datetime
,visit_end_datetime
,visit_concept_id
, anddischarge_facility_code
from the MED more efficiently. The reason is that I am currently converting CEHR-BERT to be MED compatible, which will require those fields to construct all the model inputs.First experiment
In my first experiment, I used
event
to store the visit information, where I putvisit_start_datetime
asevent.time
, storedvisit_concept_id
as the first measurement of this event, and thedischarge_facility_code
as the last measurement (only for inpatient visits). To figure outvisit_end_datetime
, I had to iterate through all measurements and take the max timestamp from this event. Although this approach worked, the downside is that the MED datasets structured this way would not work for FEMR models directly (correct me if I am wrong) because anevent
can span a few days due to the inpatient visits, which may violate the meaning of an Event.Second experiment: Med Extension
In the second experiment, we tried to extend the MED schema, where I introduced
Visit
andPatientExtension
objects to store the information required for CEHR-BERT.In addition, I created a data conversion script to convert the MED extension back to the original MED format so it would work with FEMR models too.
Do you have any plans to introduce
Visit
to the schema? If not, what would you recommend we do to retrieve visits efficiently in MED? Currently,visit_occurrence_id
andvisit_end_datetime
are stored in the metadata field ofMeasurement
. The information is available in the MED, the only way I could think of is to look up thevisit_occurrence_id
in themetadata
field of the first measurement of each event, and connect events if they have the samevisit_occurrence_id
. What do you think is the best way to identify the groups of events that belong to the same inpatient visits?The text was updated successfully, but these errors were encountered: