Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Heel error when non-sensical units are used for a given measurement/observation #4

Open
vojtechhuser opened this issue Apr 12, 2018 · 10 comments
Labels
PRIORITY MEDIUM Under 60 Day Review Items currently out to the OHDSI community for final review.
Milestone

Comments

@vojtechhuser
Copy link

vojtechhuser commented Apr 12, 2018

FORUM: http://forums.ohdsi.org/t/improved-data-quality-checking-for-measurement/5421
SOLUTION:

  • For a subset of measurements, only valid units for a given measurements should be used in the data. For example, unit of % is used for weight is considered incorrect.(Heel data quality warning; subset is determined by annual OHDSI DataQuality network study). A completely missing unit also triggers such warning.
  • To facilitate machine-assisted and human analysis, for a subset of measurements, a single unit is recommended (data-driven network consensus) to be used. ETL process should convert data to the specified single target unit. For example, weight in lb is converted to kg such that all weight data are recorded in a single targeted unit.

NEXT STEPS: modify conventions for MEASUREMENT table


Observed units per lab test: https://github.com/OHDSI/StudyProtocolSandbox/blob/master/themis/extras/results2019/S3-units-with-tests.csv
(for example see data for LOINC,8302-2 - height - cm (6 datasets), inches (2 datasets))

Data driven concensus KB for the second point:
https://github.com/OHDSI/StudyProtocolSandbox/blob/master/themis/extras/results2019/S7-preferred_units-ABC.csv

This issue will document the progress on it - or people to make comments.

ThemisUnits knowledge base will be used.

Proposal is to implement it as R logic (not in SQL). (VH volunteers to do that).

If SQL has to be used - I am making a call for volunteer SQL developer willing to help.

@vojtechhuser
Copy link
Author

vojtechhuser commented Nov 28, 2018

Currently, per convention 10 - unit is optional. This proposal still allows this to be missing, but for advanced sites provides Heel notifications and warnings.

https://github.com/OHDSI/CommonDataModel/wiki/MEASUREMENT#conventions
Convention MEASUREMENT-10:

image

@dimshitc
Copy link

Well, an empty unit is totally correct for some tests, for example: haematocrit and other ratios.

@aostropolets
Copy link

I believe for those unit ‘ratio’ should be used. There is nothing wrong with leaving it null though

@vojtechhuser
Copy link
Author

This overlaps with issue #28 (which was closed and subsumed into this one).

ThemisMeasurements study was extended to include coded values and was renamed ThemisConcepts.
The repo was also moved out of sandbox and is now in https://github.com/OHDSI/OhdsiStudies
folder (ThemisConcepts)

@vojtechhuser
Copy link
Author

vojtechhuser commented Jan 4, 2019

For the hematocrit question, let's use the power of real world data and see what sites are using.
and indeed, 4 sites are using % as unit :-)

So teams at 13 sites thought hard about how to standardize their lab data. And without ever meeting (those 13 teams) and debating for a very long time, we can look at an emerging group consensus (assuming each site is diligent about having their data analysis-ready and neat)

see here - line 39
https://github.com/OHDSI/StudyProtocolSandbox/blob/master/themis/extras/results2019/S3-units-with-tests.csv#L39

image

Let not forget that we try to eliminate ambiguity for a machine or analyst. And 0.47 or 47 (as %) make a 100 fold difference.

@dimshitc
Copy link

dimshitc commented Jan 8, 2019

So, basically we need to create relationships from Measurement to preffered unit in concept_relationship table, right?
And then, during ETL, people use them converting what source gives.
For example, if source gives the body weight in pounds, in the CDM it will be converted to KG.

@mvanzandt mvanzandt added the Under 60 Day Review Items currently out to the OHDSI community for final review. label Apr 16, 2019
@cgreich
Copy link
Contributor

cgreich commented Apr 26, 2019

I believe for those unit ‘ratio’ should be used. There is nothing wrong with leaving it null though

I like the idea to mandate the unit "1" or "{ratio}" or something like that.

For example, if source gives the body weight in pounds, in the CDM it will be converted to KG

I think that is future. Right now, we want to create mandatory (preferred as you call it) llist of units.

@MelaniePhilofsky
Copy link
Collaborator

Current: Measurements should have a concept set for acceptable units. Once instantiated, then DQD rules should be built.
Example: Weight measurements can only have kg, lb, ounces units. All other units will generate a DQD error
Who's responsible: Start with Vojtech's study, Anna's study?, put out to the community, Find a group to review final recommendations.

Future: We harmonize all value and units for a particular measurement to one, OHDSI blessed unit. This will be a recommendation.

@dimshitc
Copy link

dimshitc commented Jul 10, 2023

Working on DQD, we already created sets of permissible units per measurement concept.
It is relatively broad, for example Weight measurements will have all weight units allowed: from tonne to picogram.
The idea behind that is that

  • LOINC has table with example units
  • UCUM has crosswalks between units of the same dimension
  • Data might have different variation of the same value/unit: 1000 mg or 1 g, for example.
    So we allowed all units of the same dimension.

It might seem to be too broad, but it's the way how to get values for a lot of measurements.
Also it's not that straightforward when it comes to the
cells/ml, units/ml, erythrocytes/ml, /ml, {copies}/ml, etc, when from the LOINC perspective there are different groups, but it's the same value,
and abstract units such as index, ratio, Generic unit for indivisible thing, score, etc.
So we need to extend these allowed unit groups manually.

What is your suggested approach?
Do you want to work on a small amount of measurements where you can pick units manually and carefully OR improve the existing approach by extending the unit groups? If latest, I can share the scripts of how to get the pairs we currently use in DQD

@MelaniePhilofsky
Copy link
Collaborator

@dimshitc

You should discuss this issue with @vojtechhuser since he is the creator or the original issue. I think you or Vojtech should be the owner of this issue. Let me know who it will be, so I can appropriately assign this issue.

We need a clear description of the problem, the current use case, and approach to remedy the issue. We will also need guidance for the ETLer and for the end users of the data. Though I am not sure the latter is necessary, but the former definitely is since the DQD will generate an error for the non-sensical units.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PRIORITY MEDIUM Under 60 Day Review Items currently out to the OHDSI community for final review.
Projects
None yet
Development

No branches or pull requests

7 participants