-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link events to same day visits (option 2) and add parameter to creating visits (option 3) #26
Conversation
@@ -759,3 +740,64 @@ def get_standard_concept_ids(spark, concept_ids): | |||
WHERE ca.concept_id_1 IN ({concept_ids}) | |||
""".format(concept_ids=','.join([str(c) for c in concept_ids]))) | |||
return standard_concept_ids | |||
|
|||
|
|||
def link_events_with_visits(patient_event): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two visits occur on the same day with different visit types (ED + Inpatient), the two last functions may not consistently take the information from the same visit and potentially create a discrepancy between the visit_occurrence_id
and the corresponding visit_concept_id
. An alternative way would be to join to visit_occurrence
directly to get the visit_occurrence_id
and visit_concept_id
for those orphan events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is a case I hadn't considered. If we have two visits on the same day and an orphan event, should we try to link it to a visit? We don't have enough information I think to find out which would be the correct visit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@egillax This is probably a rare event, but this is more likely to happen if we specify a large gap period other than the same day. My point is that it doesn't matter to which visit we link the orphan events, however, it's important to link them consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChaoPang I agree about consistency.
I'll look into joining directly to the visit_occurrence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChaoPang I've been struggling a bit with your suggestion to join directly to the visit_occurrence
to get the ids. I'm very new to pyspark. Could you explain it a bit maybe?
# first row per person_id gives nan for date_difference, replace with value 2 | ||
patient_event = patient_event.na.fill(2, "date_difference") | ||
|
||
if link_events_visits: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if
block (when link_events_visits is True
) already covers the cases captured by else
, my understanding is that we only want to create visits
when visit_occurrence_id
is missing. However, it looks like the else
block creates visits
and ignores the existing visit_occurrence_id
, is that the behavior you expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In case of very few events with visit info I wanted to have the option to overwrite the visit info with the created one. So the grouping into "visits" is consistent. But for combination of linking and creating visits the existing info would be preserved.
But maybe it would be better to add some kind of overwrite option. So you would have:
- Link visits
- Link visits and create visits from orphans (preserving visit info that is there)
- Create visits from orphans and preserve existing visit info.
- Create visits for all events and overwrite existing visit info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could automatically apply the following logic for each event:
- If visit_occurrence_id is present, use that to link the event to the visit.
- If visit_occurrence_id is not present, but there is a visit on the same date, link to that.
- If visit_occurrence_id is not present, and there is no visit on the same date, create a visit.
That should work for most databases, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@schuemie that is what you get with this PR with by using both options --link_events_to_visits
and --create_visits_from_dates
.
Additionally I wanted to be able to create visits for every event to impose my own structure on the sequences. By using gap
of 1 I would get sequences of events on consecutive days and still have the time aware tokens to represent the gaps between those sequences.
Maybe it's cleanest if I use a separate option for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@egillax got it. To check my understanding, when the overwrite is enabled, we essentially ignore all the visits and impose our own structure, and create some visits
. Sometimes we might want to use a larger gap to define what a visit means since it's just a way to group events. I can imagine there are cases, where we want to link the events that occurred within 10 days of each other, which would create some meta
visits. Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChaoPang yes you are correct. You could also look at it as a way to control for how small gaps you'll insert the time aware tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@egillax thanks a lot for this nice feature! I have left my comments, please let know what you think.
Co-authored-by: Chao Pang <ChaoPang229@gmail.com>
Co-authored-by: Chao Pang <ChaoPang229@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @egillax it's been a long time, I approved the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @egillax it's been a long time, I approved the PR.
In this PR I've added
visit_occurrence_id
to same day visit if possible (fixes Dependency on visit_occurrence_id? #22)@schuemie @ChaoPang