Skip to content

Bug in conversion #7

@RosaMGini

Description

@RosaMGini

dear @YMao-UMCU

i have created in a branch called 'debugging_datasource' an example that seems to me to show a bug. let me illustrate the example.

we are cleaning one single type of lab value, LAB_HBA1C, with target unit mmol/mol

concept_id;Min;Max;unit_target;condition_on_variable;variable
LAB_HBA1C;9;174;mmol/mol;;

the conversion from the other unit, %, is non-multiplicative. we set the conversion depending on the datasource, as follows

concept_id datasource unit_origin unit_target multiplication_factor_from_origin_to_target conversion_not_multiplication conversion_rate condition_on_value assumed_unit_if_missing next_attempt
LAB_HBA1C % mmol/mol (10.93 * value) - 23.5 0
LAB_HBA1C DS_A MISSING mmol/mol (10.93 * value) - 23.5 % 0
LAB_HBA1C DS_B MISSING mmol/mol 1 1 mmol/mol 1
LAB_HBA1C DS_B MISSING mmol/mol (10.93 * value) - 23.5 % 2

note in particular that

Rule 1) if the origin unit is % then 'next_attempt' is 0, meaning that if the value is out of threshold, no conversion is attempted
Rule 2) if the origin unit is missing, and datasource is DS_A, then one assumes the unit is % and one conversion is attempted; if the converted value is out of threshold, no further attempt is made
Rule 3) if the origin unit is missing, and datasource is DS_B, then one assumes the unit is mmol/mol and if only if this is out of threshold the conversion is attempted; if the latter converted value is out of threshold, no further attempt is made

I will run the cleaning on this dataset using the argument datasource = "DS_A", so Rule 1 and Rule 2 above apply, and not Rule 3

person_id concept_id value unit
P01 LAB_HBA1C 6.0 %
P02 LAB_HBA1C 42 mmol/mol
P03 LAB_HBA1C 70 %
P04 LAB_HBA1C 70

Therefore my ground truth is the following

person_id concept_id value_origin unit_origin included value unit_target conversion rule_applied
P01 LAB_HBA1C 6.0 % 1 42.08 mmol/mol 1 1
P02 LAB_HBA1C 42 mmol/mol 1 42 mmol/mol 0 0
P03 LAB_HBA1C 70 % 0 mmol/mol 1 90
P04 LAB_HBA1C 70 0 mmol/mol 3 91

So, P0 and P02 seem straightforward. If 70 is a %, when converted to mmol/mol it turns to 741.6, which is out of threshold. Therefore, the value of P03 is discarded because of the Rule 1) above and the value of P04 is discarded because of the Rule 2) above

The output of the test script is the following

person_id concept_id value_origin unit_origin included value unit_target conversion rule_applied
P01 LAB_HBA1C 6 % 1 42.08 mmol/mol 1 1
P02 LAB_HBA1C 42 mmol/mol 1 42 mmol/mol 0 0
P03 LAB_HBA1C 70 % 1 70 mmol/mol 1 1
P04 LAB_HBA1C 70 1 70 mmol/mol 3 0

So we have the following discrepancies:

P03) based on rule_applied, the value of P03 is converted before acceptance: but the converted value is out of threshold, so we should have included = 0; what we see in the output is that 'converted' value is still 70, which is not what is written in the specifications
P04) based on rule_applied, there is no conversion; but since datasource is DS_A, the first assumption when the origin unit is MISSING is that it is a %, so the value should be converted and then discarded

Is there anything I am missing in this analysis?

It looks like there may be a spillover from the specifications of DS_B? Once we solve this, I will refine the quality checks of the input file to investigate how the field 'datasource' is populated.

For your convenience I copy here the dictionary of conversion and rule_applied

Image

In the test you can also

  • change the argument of -datasource-
  • use as LAB_unit_conversion a different file where the specifications of DS_B are omitted
    this may be useful for debugging

I am available for a call!

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions