Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New derived table: location_distance #253

Closed
pavgra opened this issue Mar 13, 2019 · 10 comments
Closed

New derived table: location_distance #253

pavgra opened this issue Mar 13, 2019 · 10 comments
Labels

Comments

@pavgra
Copy link

pavgra commented Mar 13, 2019

New derived table: location_distance

Proposal Owner: Pavel Grafkin, Gowtham Rao

Discussion: #220, OHDSI/WebAPI#649

Proposal overview:

  • Add a new derived table - location_distance - to store distances between patients and care sites

Description
We would like to introduce:

  • distance based criteria during cohort definition, e.g. find events where the care_site.location_id was within x units of distance from person.location_id;
  • Cohort Characterization analyses calculating a distribution of distances between person and care site.

So, we need to know the distances between people and care sites. Even though the distance between a person and a care site represents derived information, not all of OHDSI supported DBs have geo capabilities (and therefore we cannot compute the distances inside DB) plus the computation of the distances is pretty computationally intensive to do it on-demand, therefore there is a need for pre-calculation and physical data storage.

Performance considerations

Some of the questions raised during the discussions are related to volumes of the data and performance of queries.

Amount of relations can be described by number of Person * average number of Locations per person * average number of Care Sites a person visits during life in a Location and hardly will raise up to the sizes of the main domain tables (e.g. drug exposure, condition occurrence and so on). E.g. the median number of drug exposures per person in Optum is 44, the average is 61. I doubt that each person in DB will change 10 houses while visiting 4-6 care sites with different locations during life in each of houses.

Therefore, I would assume that the table might grow to the sizes of one of the main domains tables and so the quering performance will be equal. I cannot provide actual numbers since don't have access to any CDM v6 database which would hold location_history data. Maybe @gowthamrao can share some numbers (and I will try to get numbers w/o location history from some typical datasets)

Table format

location_distance

column type comment
person_location_id integer
care_site_location_id integer
distance float
unit_concept_id integer whether the distance is stored in miles, kilometers, meters, etc

Distance pre-calculation

WITH care_site_person_locations AS (
  SELECT DISTINCT cs_lh.location_id as care_site_location_id, p.location_id AS person_location_id
  FROM
    visit_occurrence v
    JOIN care_site cs ON cs.care_site_id = v.care_site_id
    JOIN location_history cs_lh ON  cs_lh.domain_id = /* OMOP generated. Care site */ CAST(57 AS VARCHAR) AND cs.care_site_id = cs_lh.entity_id AND cs_lh.start_date < v.visit_start_datetime AND v.visit_end_datetime < COALESCE(cs_lh.end_date, CAST('2099-12-31' AS DATE))
    JOIN location cs_l ON cs_l.location_id = cs_lh.location_id
    JOIN person p ON p.person_id = v.person_id
    JOIN location_history p_lh ON  p_lh.domain_id = /* OMOP generated. Person */ CAST(56 AS VARCHAR) AND p.person_id = p_lh.entity_id AND p_lh.start_date < v.visit_start_datetime AND v.visit_end_datetime < COALESCE(p_lh.end_date, CAST('2099-12-31' AS DATE))
    JOIN location p_l ON p_l.location_id = p_lh.location_id
)
INSERT INTO location_distance
SELECT
  person_location_id,
  care_site_location_id,
  ST_DISTANCE(
      ST_GeographyFromText('POINT(' || person_location.longitude || ' ' || person_location.latitude || ')'),
      ST_GeographyFromText('POINT(' || care_site_location.longitude || ' ' || care_site_location.latitude || ')')
  ),
  2110000000
FROM care_site_person_locations
  JOIN location person_location ON person_location.location_id = person_location_id
  JOIN location care_site_location ON care_site_location.location_id = care_site_location_id;
@cgreich
Copy link
Contributor

cgreich commented Mar 14, 2019

What kind of unit_concept_id is 2110000000?

@pavgra
Copy link
Author

pavgra commented Mar 14, 2019

This should be a unit of distance, e.g. miles (so something similar to http://athena.ohdsi.org/search-terms/terms/4121361). And I believe Geo Vocabs should introduce such concepts

@cgreich
Copy link
Contributor

cgreich commented Mar 14, 2019

Units are in domain "Unit" and vocabulary "UCUM". And the question is if we want to standardize to km and get rid of the field altogether. Otherwise it is concept_id={9536, 9546, 9363}

@pavgra
Copy link
Author

pavgra commented Mar 14, 2019

Otherwise it is concept_id={9536, 9546, 9363}

Nice to know, thanks

And the question is if we want to standardize to km and get rid of the field altogether

Since it simplifies things, sounds good for me. Anyway, Atlas / other tools will be able to convert units and therefore still allow end-user to work with any unit types

@vojtechhuser
Copy link
Collaborator

So nearest LabCorp may be 5 miles and nearest dialysis facility may be 15 miles. To calculate distance, (and assuming patient home is point A) - what are the specs for point B ?

@clairblacketer
Copy link
Contributor

This is going to be discussed at the 4/9/2019 CDM meeting

@clairblacketer
Copy link
Contributor

Here is the link to the vote for this proposal. If it passes it will go to the dev branch first and then a later discussion an vote will be held to put it in production.

@don-torok
Copy link

What is disadvantage of adding longitude and latitude as attributes of location table, other than they will be NULL for the vast majority of CDMs. Then location can be determine via your query.

@cgreich
Copy link
Contributor

cgreich commented Apr 14, 2019

@clairblacketer
Copy link
Contributor

Closing as latitude and longitude were added to the location table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants