Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

small fixes and new rules #204

Merged
merged 5 commits into from
Jun 20, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 9 additions & 0 deletions README-impala.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,15 @@ remove.packages("SqlRender")

If you want to delete all the Achilles results use the following:

```bash
impala-shell -q 'drop database achilles cascade'
```


Heel implementation for Impala

Currently only Achilles precomputations are fully developed for Impala. No work was done on the Heel component of Achilles

```bash
impala-shell -q 'drop database achilles cascade'
```
28 changes: 27 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,11 +110,37 @@ License
=======
Achilles is licensed under Apache License 2.0


# Pre-computations

Achilles has some compatibility with Data Quality initiatives of the Data Quality Collaborative (DQC; http://repository.edm-forum.org/dqc or GitHub https://github.com/orgs/DQCollaborative). For example, a harmonized set of data quality terms has been published by Khan at al. in 2016.

What Achilles calls an *analysis* (a pre-computation for a given dataset), the term used by DQC would be *measure*

Some Heel Rules take advantage of derived measures. A feature of Heel introduced since version 1.4. A *derived measure* is a result of an SQL query that takes Achilles analyses as input. It is simply a different view of the precomputations that has some advantage to be materialized. The logic for computing a derived measures can be viewed in the `AchillesHeel_v5.sql` file.

Overview of derived measures can be seen in file `derived_analysis_details.csv`.

For possible future flexible setting of Achilles Heel rule thresholds, some Heel rules are split into two phase approach. First, a derived measure is computed and the result is stored in a separate table `ACHILLES_RESULTS_DERIVED`. A Heel rule logic is than made simpler by a simple comparison whether a derived measure is over a threshold. A link between which rules use which pre-computation is available in file `inst\csv\achilles_rule.csv` (see column `linked_measure`).


# Heel Rules

Rules are classified into `CDM conformance` rules and `DQ` rules (see column `rule_type` in the rule CSV file).


Some Heel rules can be generalized to non-OMOP datasets. Other rules are dependant on OMOP concept ids and a translation of the code to other CDMs would be needed (for example rule with `rule_id` of `29` uses OMOP specific concept;concept 195075).

Rules that have in their name a prefix `[GeneralPopulationOnly]` are applicable to datasets that represent a general population. Once metadata for this parameter is implemented by OHDSI, their execution can be limited to such datasets. In the meantime, users should ignore output of rules that are meant for general population if their dataset is not of that type.

Rules are classified into: error, warning and notification (see column `severity`).


Development
===========
Achilles is being developed in R Studio.

###Development status
### Development status
[![Build Status](https://travis-ci.org/OHDSI/Achilles.svg?branch=master)](https://travis-ci.org/OHDSI/Achilles)
[![codecov.io](https://codecov.io/github/OHDSI/Achilles/coverage.svg?branch=master)](https://codecov.io/github/OHDSI/Achilles?branch=master)

Expand Down
7 changes: 7 additions & 0 deletions extras/PackageMaintenance.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

#update CSV file in package
connectionDetails$schema=resultsDatabaseSchema
conn<-connect(connectionDetails)
achilles_analysis<-querySql(conn,'select * from achilles_analysis')
#this line caused issue 151: names(achilles_analysis) <- tolower(names(achilles_analysis))
write.csv(achilles_analysis,file = 'inst/csv/analysisDetails.csv',na = '',row.names = F)
5 changes: 4 additions & 1 deletion inst/csv/achilles_rule.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
rule_id,rule_name,severity,rule_type,rule_description,threshold,rule_classification,rule_scope,linked_measure
0,Achilles Heel version 1.3,,,this rule is not used for data analysis. It communicates the version of the ruleset.,,,,
0,Achilles Heel version 1.4.6,,,this rule is not used for data analysis. It communicates the version of the ruleset.,,,,
1,multiple checks for greater than zero,error,DQ,umbrella rule: this rule includes multiple error checks on over 35 analysis_ids,>0,complex,,
2,multiple checks where minimum value of a measure should not be negative,error,DQ,umbrella rule: this rule includes multiple error checks on over 20 analysis_ids where min value in distribution should not be negative,,complex,,
3,multiple checks related to death data where maximum value of a measure should not be positive,warning,DQ,death distributions where max should not be positive (using anlyses 511;512;513;514;515),,plausibility,,
Expand Down Expand Up @@ -41,3 +41,6 @@ rule_id,rule_name,severity,rule_type,rule_description,threshold,rule_classificat
",>30000,completeness,GeneralPopulationOnly,Death:BornDeceasedRatio
40,Death event outside observation period,error,DQ,death event should not be outside observation period; this rule was if previous versions subsumed in umbrella rule,,completeness,,510
41,No weight data in MEASUREMENT table,notification,DQ,implementation of similar Sentinel rule for certain vital signs; rule lukes at concept_id 3025315 (LOINC code 29463-7)),,completeness,,1800
42,Percentage of outpatient visits is too low,notification,DQ,"Rule is looking at percentage of outpatient visits. If this measure is too low (e.g. 5 percent), it may indicate a predominantly inpatient dataset. Threshold was decided on DQ-Study 2. General population only rule.",<0.42,completeness,GeneralPopulationOnly,201
43,99+ percent of persons have exactly one observation period,notification,DQ,Some datasets cannot provide observation period data based on health insurance start and stop dates. Rule notifies a user if 99+% of patients have exactly one observation period. ,>=99.0,completeness,,113
44,"Percentage of patients with at least 1 Measurement, 1 Dx and 1 Rx is below threshold",notification,DQ,This notification may indicate that a significant percentage of patients is missing data for either Measurement or Diagnosis or Medication. Many clinical studies may want to require at least some data in all three domains. Threshold was decided empirically in OHDSI DQ Study ,,completeness,,2002
13 changes: 7 additions & 6 deletions inst/csv/analysisDetails.csv
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
211,"Distribution of length of stay by visit_concept_id","visit_concept_id",,,,
212,"Number of persons with at least one visit occurrence, by calendar year by gender by age decile","calendar year","gender_concept_id","age decile",,
220,"Number of visit occurrence records by visit occurrence start month","calendar month",,,,
221,"Number of persons by visit start year","calendar year",,,,
300,"Number of providers",,,,,
301,"Number of providers by specialty concept_id","specialty_concept_id",,,,
302,"Number of providers with invalid care site id",,,,,
Expand Down Expand Up @@ -179,12 +180,12 @@
1700,"Number of records by cohort_concept_id","cohort_concept_id",,,,
1701,"Number of records with cohort end date < cohort start date",,,,,
1800,"Number of persons with at least one measurement occurrence, by measurement_concept_id","measurement_concept_id",,,,
1801,"Number of measurement occurrence records, by observation_concept_id","measurement_concept_id",,,,
1802,"Number of persons by measurement occurrence start month, by observation_concept_id","measurement_concept_id","calendar month",,,
1803,"Number of distinct observation occurrence concepts per person",,,,,
1804,"Number of persons with at least one observation occurrence, by observation_concept_id by calendar year by gender by age decile","measurement_concept_id","calendar year","gender_concept_id","age decile",
1805,"Number of observation occurrence records, by measurement_concept_id by measurement_type_concept_id","measurement_concept_id","measurement_type_concept_id",,,
1806,"Distribution of age by observation_concept_id","observation_concept_id","gender_concept_id",,,
1801,"Number of measurement occurrence records, by measurement_concept_id","measurement_concept_id",,,,
1802,"Number of persons by measurement occurrence start month, by measurement_concept_id","measurement_concept_id","calendar month",,,
1803,"Number of distinct mesurement occurrence concepts per person",,,,,
1804,"Number of persons with at least one mesurement occurrence, by measurement_concept_id by calendar year by gender by age decile","measurement_concept_id","calendar year","gender_concept_id","age decile",
1805,"Number of measurement occurrence records, by measurement_concept_id by measurement_type_concept_id","measurement_concept_id","measurement_type_concept_id",,,
1806,"Distribution of age by measurement_concept_id","measurement_concept_id","gender_concept_id",,,
1807,"Number of measurement occurrence records, by measurement_concept_id and unit_concept_id","measurement_concept_id","unit_concept_id",,,
1809,"Number of measurement records with invalid person_id",,,,,
1810,"Number of measurement records outside valid observation period",,,,,
Expand Down
34 changes: 25 additions & 9 deletions inst/csv/derived_analysis_details.csv
Original file line number Diff line number Diff line change
@@ -1,9 +1,25 @@
measure_id,name,statistic_value_name,stratum_1_name,description,associated_rules
UnmappedDataByDomain:SourceValueCnt,Count of source values in unmapped data,count of source values,domain,The measure analyzes how many source codes are unmapped.,34
AgeAtFirstObsByDecile:DecileCnt,Count of deciles appearing in the data (at first observation),count of deciles,,"The measure analyzes deciles of patients at their first observation. If only certain age groups are being observed, the count of deciles will be low.",33
Provider:PatientProviderRatio,Patient Provider Ratio,ratio,,"The measure looks at how many patients and how many providers are defined in the data. For example, the ratio may indicate abnormaly low number of providers.",31
Meas:NoNumValue:Percentage,Percentage of rows in MEASUREMENT table that have NULL recorded as numerical value,percentage,,The measure looks at data recorded in MESUREMENT table. A significant percentage of such rows typically contain a numerical result.,28
UnmappedData:byDomain:Percentage,Percentage of rows that are unmapped,percentage,domain,The measure looks at relative size of unmapped data.,27
Provider:SpeciatlyCnt,Count of specialties found in the provider table,count of specialties,,"The measure looks at how many different specialties are present. For general datasets, we expect at least some minimum number of specialties.",
DrugExposure:ConceptCnt,Count of distinct drug_concept_ids (drug_exposure),count of concepts,,"Count of distinct drugs. For most datasets, a low number may indicate a data quality problem.",
DrugEra:ConceptCnt,Count of distinct drug_concept_ids (drug_era),count of concepts,,Count of distinct drugs.,
measure_id,name,statistic_value_name,stratum_1_name,stratum_2_name,description,associated_rules
UnmappedDataByDomain:SourceValueCnt,Count of source values in unmapped data,count of source values,domain,,The measure analyzes how many source codes are unmapped.,34
AgeAtFirstObsByDecile:DecileCnt,Count of deciles appearing in the data (at first observation),count of deciles,,,"The measure analyzes deciles of patients at their first observation. If only certain age groups are being observed, the count of deciles will be low.",33
Provider:PatientProviderRatio,Patient Provider Ratio,ratio,,,"The measure looks at how many patients and how many providers are defined in the data. For example, the ratio may indicate abnormaly low number of providers.",31
Meas:NoNumValue:Percentage,Percentage of rows in MEASUREMENT table that have NULL recorded as numerical value,percentage,,,The measure looks at data recorded in MESUREMENT table. A significant percentage of such rows typically contain a numerical result.,28
UnmappedData:byDomain:Percentage,Percentage of rows that are unmapped,percentage,domain,,The measure looks at relative size of unmapped data.,27
Provider:SpeciatlyCnt,Count of specialties found in the provider table,count of specialties,,,"The measure looks at how many different specialties are present. For general datasets, we expect at least some minimum number of specialties.",38
DrugExposure:ConceptCnt,Count of distinct drug_concept_ids (drug_exposure),count of concepts,,,"Count of distinct drugs. For most datasets, a low number may indicate a data quality problem.",
DrugEra:ConceptCnt,Count of distinct drug_concept_ids (drug_era),count of concepts,,,Count of distinct drugs.,
ach_2000:Percentage,Percentage of patients with at least 1 Dx and 1 Rx,percentage,,,Indicates patient with some minimum events in their record,
ach_2001:Percentage,Percentage of patients with at least 1 Dx and 1 Proc,percentage,,,Indicates patient with some minimum events in their record,
ach_2002:Percentage,"Percentage of patients with at least 1 Meas, 1 Dx and 1 Rx",percentage,,,Indicates patient with some minimum events in their record,
ach_2003:Percentage,Percentage of patients with at least 1 visit,percentage,,,Indicates patient with some minimum events in their record,32
Achilles:byAnalysis:RowCnt,,count of rows,,,"Metadata about which measures were included when Achilles was last executed. Also allows count of types for certain domains (e.g., visit type). This is least sensitive data about a dataset. Pure metadata.",
Visit:Type:PersonWithAtLeastOne:byDecile:Percentage,Percentage of patients that have at least one visit by visity type,percentage,visit_concept_id,decile,The measure indicates which visit types are present in the dataset by decile using non-sensitive percentage view of count of persons.,
Device:ConceptCnt,Count of distinct concepts (Device),count of concepts,,,"Count of distinct concepts. For most datasets, a low number may indicate a data quality problem.",
Measurement:ConceptCnt,Count of distinct concepts (Measurement),count of concepts,,,"Count of distinct concepts. For most datasets, a low number may indicate a data quality problem.",
Observation:ConceptCnt,Count of distinct concepts (Observation),count of concepts,,,"Count of distinct concepts. For most datasets, a low number may indicate a data quality problem.",
Procedure:ConceptCnt,Count of distinct concepts (Procedure),count of concepts,,,"Count of distinct concepts. For most datasets, a low number may indicate a data quality problem.",
Note:ConceptCnt,Count of distinct concepts (Note),count of concepts,,,"Count of distinct concepts. For most datasets, a low number may indicate a data quality problem.",
Death:DeathCause:ConceptCnt,Count of distinct concepts (Note),count of concepts,,,"Count of distinct concepts. For most datasets, a low number may indicate a data quality problem.",
Death:DeathType:ConceptCnt,Count of distinct concepts (Note),count of concepts,,,"Count of distinct concepts. For most datasets, a low number may indicate a data quality problem.",
Death:byYear:SafePatientCnt,Count of patients by year of death,count,calendar year,,Safe patient count indicates that low counts will not be included in the measure,
Death:byDecade:SafePatientCnt,Count of patients by decade,count,calendar decade,,"Count of deaths by calendar decade (e.g., 1990s, 2000s,2100s). Large aggretion by decade is a less sensitive measure to report. ",
Death:BornDeceasedRatio,Ratio of born persons to deceased persons by year,ratio,calendar year,,,39
59 changes: 59 additions & 0 deletions inst/sql/sql_server/AchillesHeel_v5.sql
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ SQL for ACHILLES results (for either OMOP CDM v4 or OMOP CDM v5)
{DEFAULT @createTable = TRUE}
{DEFAULT @derivedDataSmPtCount = 11}
{DEFAULT @ThresholdAgeWarning = 125}
{DEFAULT @ThresholdOutpatientVisitPerc = 0.43}
{DEFAULT @ThresholdMinimalPtMeasDxRx = 20.5}


--@results_database_schema.ACHILLES_Heel part:
Expand Down Expand Up @@ -79,6 +81,8 @@ create table @results_database_schema.ACHILLES_results_derived
);





--general derived measures
--non-CDM sources may generate derived measures directly
Expand Down Expand Up @@ -1338,3 +1342,58 @@ from
where analysis_id = 1800 and stratum_1 = '3025315'
) a
where a.row_present = 0;



--ruleid 42 DQ rule
--Percentage of outpatient visits (concept_id 9202) is too low (for general population).
--This may indicate a dataset with mostly inpatient data (that may be biased and missing some EHR events)
--Threshold was decided as 10th percentile in empiric comparison of 12 real world datasets in the DQ-Study2



INSERT INTO @results_database_schema.ACHILLES_HEEL_results (ACHILLES_HEEL_warning,rule_id)
select 'NOTIFICATION: [GeneralPopulationOnly] Percentage of outpatient visits is below threshold'
as achilles_heel_warning,
42 as rule_id
from
(
select
1.0*count_value/(select sum(count_value) from @results_database_schema.achilles_results where analysis_id = 201) as outp_perc
from @results_database_schema.achilles_results where analysis_id = 201 and stratum_1='9202'
) d
where d.outp_perc < @ThresholdOutpatientVisitPerc;

--ruleid 43 DQ rule
--looks at observation period data, if all patients have exactly one the rule alerts the user
--This rule is based on majority of real life datasets.
--For some datasets (e.g., UK national data with single payor, one observation period is perfectly valid)


INSERT INTO @results_database_schema.ACHILLES_HEEL_results (ACHILLES_HEEL_warning,rule_id)
select 'NOTIFICATION: 99+ percent of persons have exactly one observation period'
as achilles_heel_warning,
43 as rule_id
from
(select 100.0*count_value/(select count_value as total_pts from @results_database_schema.achilles_results r where analysis_id =1) as one_obs_per_perc
from @results_database_schema.achilles_results where analysis_id = 113 and stratum_1 = '1'
) d
where d.one_obs_per_perc >= 99.0;



--ruleid 44 DQ rule
--uses iris measure: patients with at least 1 Meas, 1 Dx and 1 Rx


INSERT INTO @results_database_schema.ACHILLES_HEEL_results (ACHILLES_HEEL_warning,rule_id)
SELECT
'NOTIFICATION: Percentage of patients with at least 1 Measurement, 1 Dx and 1 Rx is below threshold' as ACHILLES_HEEL_warning,
44 as rule_id
FROM @results_database_schema.ACHILLES_results_derived d
where d.measure_id = 'ach_2002:Percentage'
and d.statistic_value < @ThresholdMinimalPtMeasDxRx --threshold identified in the DataQuality study
;